One of the best ways to learn Lisp-Stat is to see examples of actual work. This section contains examples of performing statistical analysis, derived from the book Introduction to the Practices of Statistics (2017) by Moore, McCabe and Craig and plotting from the Vega-Lite example gallery.
This is the multi-page printable view of this section. Click here to print.
Examples
- 1: Plotting
- 2: Statistics
1 - Plotting
The plots here show equivalents to the Vega-Lite example gallery. Before you begin working with these example, be certain to read the plotting tutorial where you will learn the basics of working with plot specifications and data.
Preliminaries
Load Vega-Lite
Load Vega-Lite and network libraries:
and change to the Lisp-Stat user package:
Load example data
The examples in this section use the vega-lite data sets. Load them all now:
Bar charts
Bar charts are used to display information about categorical variables.
Simple bar chart
In this simple bar chart example we’ll demonstrate using literal
embedded data in the form of a plist
. Later you’ll see how to use a data-frame
directly.
Grouped bar chart
Stacked bar chart
This example uses Seattle weather from the Vega website. Load it into a data frame like so:
We’ll use a data-frame
as the data source via the Common Lisp
backquote
mechanism.
The spec list begins with a backquote (`
) and then the data frame is
inserted as a literal value with a comma (,
). We’ll use this
pattern frequently.
Population pyramid
Vega calls this a diverging stacked bar chart. It is a population pyramid for the US in 2000, created using the stack feature of vega-lite. You could also create one using concat.
First, load the population data if you haven’t done so:
Note the use of read-vega
in this case. This is because the data in
the Vega example is in an application specific JSON format (Vega, of
course).
Histograms & density
Basic
For this simple histogram example we’ll use the IMDB film rating data set.
Relative frequency
Use a relative frequency histogram to compare data sets with different numbers of observations.
The data is binned with first transform. The number of values per bin and the total number are calculated in the second and the third transform to calculate the relative frequency in the last transformation step.
2D histogram scatterplot
If you haven’t already loaded the imdb
data set, do so now:
Stacked density
Note the use of the multiple escape
characters
(|) surrounding the field BODY-MASS-(G)
. This is required because
the JSON data set has parenthesis in the variable names, and these are
reserved characters in Common Lisp. The JSON importer wrapped these
in the escape character.
Scatter plots
Basic
A basic Vega-Lite scatterplot showing horsepower and miles per gallon for various cars.
Colored
In this example we’ll show how to add additional information to the cars scatter plot to show the cars origin. The Vega-Lite example shows that we have to add two new directives to the encoding of the plot:
With this change we can see that the higher horsepower, lower efficiency, cars are from the USA, and the higher efficiency cars from Japan and Europe.
Text marks
The same information, but further indicated with a text marker. This Vega-Lite example uses a data transformation.
Notice here we use a string for the field value and not a symbol.
This is because Vega is case sensitive, whereas Lisp is not. We could
have also used a lower-case :as
value, but did not to highlight this
requirement for certain Vega specifications.
Mean & SD overlay
This Vega-Lite scatterplot with mean and standard deviation overlay demonstrates the use of layers in a plot.
Lisp-Stat equivalent
Linear regression
Loess regression
Residuals
A dot plot showing each film in the database, and the difference from
the average movie rating. The display is sorted by year to visualize
everything in sequential order. The graph is for all films before
2019. Note the use of the filter-rows
function.
Query
The cars scatterplot allows you to see miles per gallon vs. horsepower. By adding sliders, you can select points by the number of cylinders and year as well, effectively examining 4 dimensions of data. Drag the sliders to highlight different points.
External links
You can add external links to plots.
Strip plot
The Vega-Lite strip plot example shows the relationship between horsepower and the number of cylinders using tick marks.
1D strip plot
Bubble plot
This Vega-Lite example is a visualization of global deaths from natural disasters. A copy of the chart from Our World in Data.
Note how we modified the example by using a lower case entity
in the
filter to match our default lower case variable names. Also note how
we are explicit with parsing the year field as a temporal column.
This is because, when creating a chart with inline data, Vega-Lite
will parse the field as an integer instead of a date.
Line plots
Simple
Point markers
By setting the point property of the line mark definition to an object defining a property of the overlaying point marks, we can overlay point markers on top of line.
Multi-series
This example uses the custom symbol encoding for variables to generate the proper types and labels for x, y and color channels.
Step
Stroke-dash
Confidence interval
Line chart with a confidence interval band.
Area charts
Simple
Stacked
Stacked area plots
Horizon graph
A horizon graph is a technique for visualising time series data in a manner that makes comparisons easier. It is based on work done at the UW Interactive Data Lab. See Sizing the Horizon: The Effects of Chart Size and Layering on the Graphical Perception of Time Series Visualizations for more details on Horizon Graphs.
With overlay
Area chart with overlaying lines and point markers.
Note the use of the variable symbols, e.g. stocks:price
to fill in
the variable’s information instead of :type :quantitative :title ...
Stream graph
Tabular plots
Table heatmap
Heatmap with labels
Layering text over a table heatmap
Histogram heatmap
Circular plots
Pie chart
Donut chart
Radial plot
This radial plot uses both angular and radial extent to convey multiple dimensions of data. However, this approach is not perceptually effective, as viewers will most likely be drawn to the total area of the shape, conflating the two dimensions. This example also demonstrates a way to add labels to circular plots.
Transformations
Normally data transformations should be done in Lisp-Stat with a data frame. These examples illustrate how to accomplish transformations using Vega-Lite. This might be useful if, for example, you’re serving up a lot of plots and want to move the processing to the users browser.
Difference from avg
Frequency distribution
Cumulative frequency distribution of films in the IMDB database.
Layered & cumulative histogram
Layering averages
Layering averages over raw values.
Error bars
Confidence interval
Error bars showing confidence intervals.
Standard deviation
Error bars showing standard deviation.
Box plots
Min/max whiskers
A vertical box plot showing median, min, and max body mass of penguins.
Tukey
A vertical box plot showing median and lower and upper quartiles of the distribution of body mass of penguins.
Summaries
Box plot with pre-computed summaries. Use this pattern to plot
summaries done in a data-frame
.
Layered
Rolling average
Plot showing a 30 day rolling average with raw values in the background.
Histogram w/mean
Interactive
This section demonstrates interactive plots.
Scatter plot matrix
This Vega-Lite interactive scatter plot matrix includes interactive elements and demonstrates creating a SPLOM (scatter plot matrix).
This example is one of those mentioned in the plotting
tutorial that uses a non-standard location for
the data
property.
Weather exploration
This graph shows an interactive view of Seattle’s weather, including maximum temperature, amount of precipitation, and type of weather. By clicking and dragging on the scatter plot, you can see the proportion of days in that range that have sun, rain, fog, snow, etc.
Interactive scatterplot
Crossfilter
Cross-filtering makes it easier and more intuitive for viewers of a plot to interact with the data and understand how one metric affects another. With cross-filtering, you can click a data point in one dashboard view to have all dashboard views automatically filter on that value.
Click and drag across one of the charts to see the other variables filtered.
2 - Statistics
These notebooks describe how to undertake statistical analyses introduced as examples in the Ninth Edition of Introduction to the Practices of Statistics (2017) by Moore, McCabe and Craig. The notebooks are organised in the same manner as the chapters of the book. The data comes from the site IPS9 in R by Nicholas Horton.
To run the notebooks you will have to install a third-party library, common-lisp-jupyter. See the cl-jupyter installation page for how to perform the installation.
After installing cl-jupyter
, clone the IPS repository into your ~/common-lisp/
directory.
Note
Be careful when upgradingcommon-lisp-jupyter
. Breaking changes are often introduced without warning. If you experience problems, use cl-jupyter revision b1021ab
by using the git checkout command.
Looking at data
- Chapter 1 – Distributions : Exploratory data analysis using plots and numbers
- Chapter 2 – Data Relationships : Examining relationships between variables