One of the best ways to learn Lisp-Stat is to see examples of actual work. This section contains example notebooks illustrating statistical analysis. These notebooks describe how to undertake statistical analyses introduced as examples in the Ninth Edition of Introduction to the Practices of Statistics (2017) by Moore, McCabe and Craig. The notebooks are organised in the same manner as the chapters of the book. The data comes from the site IPS9 in R by Nicholas Horton.
To run the notebooks yourself you can use a ready made online notebook:
The plots here show equivalents to the Vega-Lite example
gallery. Before you begin working with these example, be certain to read the plotting tutorial where you will learn the basics of working with plot specifications and data.
Preliminaries
Load Vega-Lite
Load Vega-Lite and network libraries:
(asdf:load-system:plot/vega)
and change to the Lisp-Stat user package:
(in-package:ls-user)
Load example data
The examples in this section use the vega-lite data sets. Load them all now:
(vega:load-vega-examples)
Bar charts
Bar charts are used to display information about categorical variables.
Simple bar chart
In this simple bar chart example we’ll demonstrate using literal
embedded data in the form of a plist. Later you’ll see how to use a data-frame directly.
This example uses Seattle weather from the Vega website. Load it into
a data frame like so:
(defdfseattle-weather (read-csvvega:seattle-weather))
;=> #<DATA-FRAME (1461 observations of 6 variables)>
We’ll use a data-frame as the data source via the Common Lisp
backquote
mechanism.
The spec list begins with a backquote (`) and then the data frame is
inserted as a literal value with a comma (,). We’ll use this
pattern frequently.
(plot:plot (vega:defplotstacked-bar-chart`(:mark:bar:data (:values,seattle-weather)
:encoding (:x (:time-unit:month:field:date:type:ordinal:title"Month of the year")
:y (:aggregate:count:type:quantitative)
:color (:field:weather:type:nominal:title"Weather type":scale (:domain#("sun""fog""drizzle""rain""snow")
:range#("#e7ba52","#c7c7c7","#aec7e8","#1f77b4","#9467bd")))))))
Population pyramid
Vega calls this a diverging stacked bar
chart.
It is a population pyramid for the US in 2000, created using the
stack feature of
vega-lite. You could also create one using
concat.
First, load the population data if you haven’t done so:
(defdfpopulation (vega:read-vegavega:population))
;=> #<DATA-FRAME (570 observations of 4 variables)>
Note the use of read-vega in this case. This is because the data in
the Vega example is in an application specific JSON format (Vega, of
course).
Use a relative frequency histogram to compare data sets with different
numbers of observations.
The data is binned with first transform. The number of values per bin
and the total number are calculated in the second and the third
transform to calculate the relative frequency in the last
transformation step.
(plot:plot (vega:defplotstacked-density`(:title"Distribution of Body Mass of Penguins":width400:height80:data (:values,penguins)
:mark:bar:transform#((:density|BODY-MASS-(G)|:groupby#(:species)
:extent#(25006500)))
:encoding (:x (:field:value:type:quantitative:title"Body Mass (g)")
:y (:field:density:type:quantitative:stack:zero)
:color (:field:species:type:nominal)))))
Note the use of the multiple escape
characters
(|) surrounding the field BODY-MASS-(G). This is required because
the JSON data set has parenthesis in the variable names, and these are
reserved characters in Common Lisp. The JSON importer wrapped these
in the escape character.
(plot:plot (vega:defplothp-mpg`(:title"Horsepower vs. MPG":data (:values,vgcars)
:mark:point:encoding (:x (:field:horsepower:type"quantitative")
:y (:field:miles-per-gallon:type"quantitative")))))
Colored
In this example we’ll show how to add additional information to
the cars scatter plot to show the cars origin. The Vega-Lite
example
shows that we have to add two new directives to the encoding of the
plot:
Notice here we use a string for the field value and not a symbol.
This is because Vega is case sensitive, whereas Lisp is not. We could
have also used a lower-case :as value, but did not to highlight this
requirement for certain Vega specifications.
A dot plot showing each film in the database, and the difference from
the average movie rating. The display is sorted by year to visualize
everything in sequential order. The graph is for all films before
2019. Note the use of the filter-rows function.
The cars scatterplot allows you to see miles per gallon
vs. horsepower. By adding sliders, you can select points by the
number of cylinders and year as well, effectively examining 4
dimensions of data. Drag the sliders to highlight different points.
(plot:plot (vega:defplotnatural-disaster-deaths`(:title"Deaths from global natural disasters":width600:height400:data (:values,(filter-rowsdisasters'(not (string=entity"All natural disasters"))))
:mark (:type:circle:opacity0.8:stroke:black:stroke-width1)
:encoding (:x (:field:year:type:temporal:axis (:grid:false))
:y (:field:entity:type:nominal:axis (:title""))
:size (:field:deaths:type:quantitative:title"Annual Global Deaths":legend (:clip-height30)
:scale (:range-max5000))
:color (:field:entity:type:nominal:legendnil)))))
Note how we modified the example by using a lower case entity in the
filter to match our default lower case variable names. Also note how
we are explicit with parsing the year field as a temporal column.
This is because, when creating a chart with inline data, Vega-Lite
will parse the field as an integer instead of a date.
Line plots
Simple
(plot:plot (vega:defplotsimple-line-plot`(:title"Google's stock price from 2004 to early 2010":data (:values,(filter-rowsstocks'(string=symbol"GOOG")))
:mark:line:encoding (:x (:field:date:type:temporal)
:y (:field:price:type:quantitative)))))
Point markers
By setting the point property of the line mark definition to an object
defining a property of the overlaying point marks, we can overlay
point markers on top of line.
(plot:plot (vega:defplotpoint-mark-line-plot`(:title"Stock prices of 5 Tech Companies over Time":data (:values,stocks)
:mark (:type:line:pointt)
:encoding (:x (:field:date:time-unit:year)
:y (:field:price:type:quantitative:aggregate:mean)
:color (:field:symbol:typenominal)))))
Multi-series
This example uses the custom symbol encoding for variables to
generate the proper types and labels for x, y and color channels.
(plot:plot (vega:defplotmulti-series-line-chart`(:title"Stock prices of 5 Tech Companies over Time":data (:values,stocks)
:mark:line:encoding (:x (:fieldstocks:date)
:y (:fieldstocks:price)
:color (:fieldstocks:symbol)))))
Step
(plot:plot (vega:defplotstep-chart`(:title"Google's stock price from 2004 to early 2010":data (:values,(filter-rowsstocks'(string=symbol"GOOG")))
:mark (:type:line:interpolate"step-after")
:encoding (:x (:fieldstocks:date)
:y (:fieldstocks:price)))))
Stroke-dash
(plot:plot (vega:defplotstroke-dash`(:title"Stock prices of 5 Tech Companies over Time":data (:values,stocks)
:mark:line:encoding (:x (:fieldstocks:date)
:y (:fieldstocks:price)
:stroke-dash (:fieldstocks:symbol)))))
Confidence interval
Line chart with a confidence interval band.
(plot:plot (vega:defplotline-chart-ci`(:data (:values,vgcars)
:encoding (:x (:field:year:time-unit:year))
:layer#((:mark (:type:errorband:extent:ci)
:encoding (:y (:field:miles-per-gallon:type:quantitative:title"Mean of Miles per Gallon (95% CIs)")))
(:mark:line:encoding (:y (:field:miles-per-gallon:aggregate:mean)))))))
This radial plot uses both angular and radial extent to convey
multiple dimensions of data. However, this approach is not
perceptually effective, as viewers will most likely be drawn to the
total area of the shape, conflating the two dimensions. This example
also demonstrates a way to add labels to circular plots.
Normally data transformations should be done in Lisp-Stat with a data
frame. These examples illustrate how to accomplish transformations
using Vega-Lite. This might be useful if, for example, you’re serving
up a lot of plots and want to move the processing to the users
browser.
Difference from avg
(plot:plot (vega:defplotdifference-from-average`(:data (:values,(filter-rowsimdb'(not (eqlimdb-rating:na))))
:transform#((:joinaggregate#((:op:mean;we could do this above using alexandria:thread-first:field:imdb-rating:as:average-rating)))
(:filter"(datum['imdbRating'] - datum.averageRating) > 2.5"))
:layer#((:mark:bar:encoding (:x (:field:imdb-rating:type:quantitative:title"IMDB Rating")
:y (:field:title:type:ordinal:title"Title")))
(:mark (:type:rule:color"red")
:encoding (:x (:aggregate:average:field:average-rating:type:quantitative)))))))
Frequency distribution
Cumulative frequency distribution of films in the IMDB database.
Plot showing a 30 day rolling average with raw values in the background.
(plot:plot (vega:defplotmoving-average`(:width400:height300:data (:values,seattle-weather)
:transform#((:window#((:field:temp-max:op:mean:as:rolling-mean))
:frame#(-1515)))
:encoding (:x (:field:date:type:temporal:title"Date")
:y (:type:quantitative:axis (:title"Max Temperature and Rolling Mean")))
:layer#((:mark (:type:point:opacity0.3)
:encoding (:y (:field:temp-max:title"Max Temperature")))
(:mark (:type:line:color"red":size3)
:encoding (:y (:field:rolling-mean:title"Rolling Mean of Max Temperature")))))))
This example is one of those mentioned in the plotting
tutorial that uses a non-standard location for
the data property.
Weather exploration
This graph shows an interactive view of Seattle’s weather, including
maximum temperature, amount of precipitation, and type of weather. By
clicking and dragging on the scatter plot, you can see the proportion
of days in that range that have sun, rain, fog, snow, etc.
Cross-filtering makes it easier and more intuitive for viewers of a
plot to interact with the data and understand how one metric affects
another. With cross-filtering, you can click a data point in one
dashboard view to have all dashboard views automatically filter on
that value.
Click and drag across one of the charts to see the other variables
filtered.