This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Examples

Using Lisp-Stat in the real world

1: Plotting
2: Statistics

One of the best ways to learn Lisp-Stat is to see examples of actual work. This section contains examples of performing statistical analysis, derived from the book Introduction to the Practices of Statistics (2017) by Moore, McCabe and Craig and plotting from the Vega-Lite example gallery.

1 - Plotting

Example plots

The plots here show equivalents to the Vega-Lite example gallery. Before you begin working with these example, be certain to read the plotting tutorial where you will learn the basics of working with plot specifications and data.

Preliminaries

Load Vega-Lite

Load Vega-Lite and network libraries:

(asdf:load-system :plot/vega)

and change to the Lisp-Stat user package:

(in-package :ls-user)

Load example data

The examples in this section use the vega-lite data sets. Load them all now:

(vega:load-vega-examples)

Bar charts

Bar charts are used to display information about categorical variables.

Simple bar chart

In this simple bar chart example we’ll demonstrate using literal embedded data in the form of a plist. Later you’ll see how to use a data-frame directly.

(plot:plot
 (vega:defplot simple-bar-chart
   `(:mark :bar
     :data (:values ,(plist-df '(:a #(A B C D E F G H I)
	                             :b #(28 55 43 91 81 53 19 87 52))))
     :encoding (:x (:field :a :type :nominal :axis ("labelAngle" 0))
                :y (:field :b :type :quantitative)))))

Grouped bar chart

(plot:plot
 (vega:defplot grouped-bar-chart
   `(:mark :bar
     :data (:values ,(plist-df '(:category #(A A A B B B C C C)
	                             :group    #(x y z x y z x y z)
	                             :value    #(0.1 0.6 0.9 0.7 0.2 1.1 0.6 0.1 0.2))))
     :encoding (:x (:field :category)
                :y (:field :value :type :quantitative)
		        :x-offset (:field :group)
		        :color    (:field group)))))

Stacked bar chart

This example uses Seattle weather from the Vega website. Load it into a data frame like so:

(defdf seattle-weather (read-csv vega:seattle-weather))
;=> #<DATA-FRAME (1461 observations of 6 variables)>

We’ll use a data-frame as the data source via the Common Lisp backquote mechanism. The spec list begins with a backquote (`) and then the data frame is inserted as a literal value with a comma (,). We’ll use this pattern frequently.

(plot:plot
 (vega:defplot stacked-bar-chart
   `(:mark :bar
     :data (:values ,seattle-weather)
     :encoding (:x (:time-unit :month
		            :field     :date
		            :type      :ordinal
		            :title     "Month of the year")
                :y (:aggregate :count
		            :type      :quantitative)
		        :color (:field :weather
			            :type  :nominal
			            :title "Weather type"
	            :scale (:domain #("sun" "fog" "drizzle" "rain" "snow")
				        :range  #("#e7ba52", "#c7c7c7", "#aec7e8", "#1f77b4", "#9467bd")))))))

Population pyramid

Vega calls this a diverging stacked bar chart. It is a population pyramid for the US in 2000, created using the stack feature of vega-lite. You could also create one using concat.

First, load the population data if you haven’t done so:

(defdf population (vega:read-vega vega:population))
;=> #<DATA-FRAME (570 observations of 4 variables)>

Note the use of read-vega in this case. This is because the data in the Vega example is in an application specific JSON format (Vega, of course).

(plot:plot
 (vega:defplot pyramid-bar-chart
   `(:mark :bar
     :data (:values ,population)
     :width 300
     :height 200
     :transform #((:filter "datum.year == 2000")
		  (:calculate "datum.sex == 2 ? 'Female' : 'Male'" :as :gender)
		  (:calculate "datum.sex == 2 ? -datum.people : datum.people" :as :signed-people))
     :encoding (:x (:aggregate :sum
		            :field :signed-people
		            :title "population")
                :y (:field :age
		            :axis nil
		            :sort :descending)
		        :color (:field :gender
		        :scale (:range #("#675193" "#ca8861"))))
     :config (:view (:stroke nil)
	          :axis (:grid :false)))))

Histograms & density

Basic

For this simple histogram example we’ll use the IMDB film rating data set.

(plot:plot
  (vega:defplot imdb-plot
    `(:mark :bar
      :data (:values ,imdb)
      :encoding (:x (:bin (:maxbins 8) :field :imdb-rating)
                 :y (:aggregate :count)))))

Relative frequency

Use a relative frequency histogram to compare data sets with different numbers of observations.

The data is binned with first transform. The number of values per bin and the total number are calculated in the second and the third transform to calculate the relative frequency in the last transformation step.

(plot:plot
 (vega:defplot relative-frequency-histogram
   `(:title "Relative Frequency"
     :data (:values ,vgcars)
     :transform #((:bin t
	               :field :horsepower
		           :as #(:bin-horsepower :bin-horsepower-end))
                  (:aggregate #((:op :count
				                 :as "Count"))
                   :groupby   #(:bin-horsepower :bin-horsepower-end))
		          (:joinaggregate #((:op :sum
				                     :field "Count"
				                     :as "TotalCount")))
		          (:calculate "datum.Count/datum.TotalCount"
		                      :as :percent-of-total))
     :mark (:type :bar :tooltip t)
     :encoding (:x (:field :bin-horsepower
	                :title "Horsepower"
		            :bin (:binned t))
		        :x2 (:field :bin-horsepower-end)
	            :y (:field :percent-of-total
		            :type "quantitative"
		            :title "Relative Frequency"
		            :axis (:format ".1~%"))))))

2D histogram scatterplot

If you haven’t already loaded the imdb data set, do so now:

(defparameter imdb
  (vega:read-vega vega:movies))

(plot:plot
  (vega:defplot histogram-scatterplot
    `(:mark :circle
      :data (:values ,imdb)
      :encoding (:x (:bin (:maxbins 10) :field :imdb-rating)
                 :y (:bin (:maxbins 10) :field :rotten-tomatoes-rating)
	             :size (:aggregate :count)))))

Stacked density

(plot:plot
 (vega:defplot stacked-density
   `(:title "Distribution of Body Mass of Penguins"
     :width 400
     :height 80
     :data (:values ,penguins)
     :mark :bar
     :transform #((:density |BODY-MASS-(G)|
		           :groupby #(:species)
		           :extent #(2500 6500)))
     :encoding (:x (:field :value
		            :type :quantitative
		            :title "Body Mass (g)")
		        :y (:field :density
		            :type :quantitative
		            :stack :zero)
		        :color (:field :species
			            :type :nominal)))))

Note the use of the multiple escape characters (|) surrounding the field BODY-MASS-(G). This is required because the JSON data set has parenthesis in the variable names, and these are reserved characters in Common Lisp. The JSON importer wrapped these in the escape character.

Scatter plots

Basic

A basic Vega-Lite scatterplot showing horsepower and miles per gallon for various cars.

(plot:plot
  (vega:defplot hp-mpg
  `(:title "Horsepower vs. MPG"
    :data (:values ,vgcars)
    :mark :point
	:encoding (:x (:field :horsepower :type "quantitative")
	           :y (:field :miles-per-gallon :type "quantitative")))))

Colored

In this example we’ll show how to add additional information to the cars scatter plot to show the cars origin. The Vega-Lite example shows that we have to add two new directives to the encoding of the plot:

(plot:plot
  (vega:defplot hp-mpg-plot
  `(:title "Vega Cars"
    :data (:values ,vgcars)
    :mark :point
	:encoding (:x     (:field :horsepower :type "quantitative")
	           :y     (:field :miles-per-gallon :type "quantitative")
			   :color (:field :origin :type "nominal")
			   :shape (:field :origin :type "nominal")))))

With this change we can see that the higher horsepower, lower efficiency, cars are from the USA, and the higher efficiency cars from Japan and Europe.

Text marks

The same information, but further indicated with a text marker. This Vega-Lite example uses a data transformation.

(plot:plot
  (vega:defplot colored-text-hp-mpg-plot
  `(:title "Vega Cars"
    :data (:values ,vgcars)
	:transform #((:calculate "datum.origin[0]" :as "OriginInitial"))
    :mark :text
	:encoding (:x     (:field :horsepower :type "quantitative")
	           :y     (:field :miles-per-gallon :type "quantitative")
	           :color (:field :origin :type "nominal")
			   :text  (:field "OriginInitial" :type "nominal")))))

Notice here we use a string for the field value and not a symbol. This is because Vega is case sensitive, whereas Lisp is not. We could have also used a lower-case :as value, but did not to highlight this requirement for certain Vega specifications.

Mean & SD overlay

This Vega-Lite scatterplot with mean and standard deviation overlay demonstrates the use of layers in a plot.

Lisp-Stat equivalent

(plot:plot
  (vega:defplot mean-hp-mpg-plot
  `(:title "Vega Cars"
    :data (:values ,vgcars)
    :layer #((:mark :point
	          :encoding (:x (:field :horsepower :type "quantitative")
			             :y (:field :miles-per-gallon
						            :type "quantitative")))
	         (:mark (:type :errorband :extent :stdev :opacity 0.2)
	          :encoding (:y (:field :miles-per-gallon
			                 :type "quantitative"
							 :title "Miles per Gallon")))
	         (:mark :rule
	          :encoding (:y (:field :miles-per-gallon
			                 :type "quantitative"
							 :aggregate :mean)))))))

Linear regression

(plot:plot
 (vega:defplot linear-regression
   `(:data (:values ,imdb)
     :layer #((:mark (:type :point :filled t)
	           :encoding (:x (:field :rotten-tomatoes-rating
			                  :type :quantitative
			                  :title "Rotten Tomatoes Rating")
			              :y (:field :imdb-rating
			                  :type :quantitative
			                  :title "IMDB Rating")))

	          (:mark (:type :line :color "firebrick")
	           :transform #((:regression :imdb-rating
			                 :on :rotten-tomatoes-rating))
		       :encoding (:x (:field :rotten-tomatoes-rating
			                  :type :quantitative
			                  :title "Rotten Tomatoes Rating")
			              :y (:field :imdb-rating
			                  :type :quantitative
			                  :title "IMDB Rating")))

	          (:transform #((:regression :imdb-rating
			                 :on :rotten-tomatoes-rating
			                 :params t)
		                    (:calculate "'R²: '+format(datum.rSquared, '.2f')"
			                 :as :r2))
	           :mark (:type :text
		              :color "firebrick"
		              :x :width
		              :align :right
		              :y -5)
	           :encoding (:text (:type :nominal :field :r2)))))))

Loess regression

(plot:plot
 (vega:defplot loess-regression
   `(:data (:values ,imdb)
     :layer #((:mark (:type :point :filled t)
	           :encoding (:x (:field :rotten-tomatoes-rating
			                  :type :quantitative
			                  :title "Rotten Tomatoes Rating")
			              :y (:field :imdb-rating
			                  :type :quantitative
			                  :title "IMDB Rating")))

	          (:mark (:type :line
		              :color "firebrick")
		      :transform #((:loess :imdb-rating
			                :on :rotten-tomatoes-rating))
		      :encoding (:x (:field :rotten-tomatoes-rating
			                 :type :quantitative
			                 :title "Rotten Tomatoes Rating")
			             :y (:field :imdb-rating
			                 :type :quantitative
			                 :title "IMDB Rating")))))))

Residuals

A dot plot showing each film in the database, and the difference from the average movie rating. The display is sorted by year to visualize everything in sequential order. The graph is for all films before 2019. Note the use of the filter-rows function.

(plot:plot
 (vega:defplot residuals
   `(:data (:values
              ,(filter-rows imdb
                            '(and (not (eql imdb-rating :na))
				                  (local-time:timestamp< release-date
							      (local-time:parse-timestring "2019-01-01")))))
     :transform #((:joinaggregate #((:op    :mean
				                     :field :imdb-rating
				                     :as    :average-rating)))
		           (:calculate "datum['imdbRating'] - datum.averageRating"
		            :as :rating-delta))
     :mark :point
     :encoding (:x (:field :release-date
		            :type :temporal
		            :title "Release Date")
		        :y (:field :rating-delta
		            :type :quantitative
		            :title "Rating Delta")
		        :color (:field :rating-delta
			            :type :quantitative
			            :scale (:domain-mid 0)
			            :title "Rating Delta")))))

Query

The cars scatterplot allows you to see miles per gallon vs. horsepower. By adding sliders, you can select points by the number of cylinders and year as well, effectively examining 4 dimensions of data. Drag the sliders to highlight different points.

(plot:plot
 (vega:defplot scatter-queries
   `(:data (:values ,vgcars)
     :transform #((:calculate "year(datum.year)" :as :year))
     :layer #((:params #((:name :cyl-year
			   :value #((:cylinders 4
				         :year 1799))
			   :select (:type :point
				        :fields #(:cylinders :year))
			   :bind (:cylinders (:input :range
				                  :min 3
					              :max 8
					              :step 1)
				      :year (:input :range
				             :min 1969
				             :max 1981
				             :step 1))))
	           :mark :circle
	           :encoding (:x (:field :horsepower
			                  :type :quantitative)
			              :y (:field :miles-per-gallon
			                  :type :quantitative)
			              :color (:condition (:param :cyl-year
					              :field :origin
					              :type :nominal)
				                  :value "grey")))

	      (:transform #((:filter (:param :cyl-year)))
	       :mark :circle
	       :encoding (:x (:field :horsepower
			              :type :quantitative)
			          :y (:field :miles-per-gallon
			              :type :quantitative)
			          :color (:field :origin
				              :type :nominal)
			          :size (:value 100)))))))

External links

You can add external links to plots.

(plot:plot
 (vega:defplot scatter-external-links
   `(:data (:values ,vgcars)
     :mark :point
     :transform #((:calculate "'https://www.google.com/search?q=' + datum.name", :as :url))
     :encoding (:x (:field :horsepower
		            :type :quantitative)
		        :y (:field :miles-per-gallon
		            :type :quantitative)
		        :color (:field :origin
			            :type :nominal)
		        :tooltip (:field :name
			              :type :nominal)
		        :href (:field :url
		               :type :nominal)))))

Strip plot

The Vega-Lite strip plot example shows the relationship between horsepower and the number of cylinders using tick marks.

(plot:plot
  (vega:defplot strip-plot
  `(:title "Vega Cars"
    :data (:values ,vgcars)
	:mark :tick
	:encoding (:x (:field :horsepower :type :quantitative)
	           :y (:field :cylinders  :type :ordinal)))))

1D strip plot

(plot:plot
  (vega:defplot 1d-strip-plot
  `(:title "Seattle Precipitation"
    :data (:values ,seattle-weather)
	:mark :tick
	:encoding (:x (:field :precipitation :type :quantitative)))))

Bubble plot

This Vega-Lite example is a visualization of global deaths from natural disasters. A copy of the chart from Our World in Data.

(plot:plot
 (vega:defplot natural-disaster-deaths
   `(:title "Deaths from global natural disasters"
     :width 600
     :height 400
     :data (:values ,(filter-rows disasters '(not (string= entity "All natural disasters"))))
     :mark (:type :circle
	    :opacity 0.8
	    :stroke :black
	    :stroke-width 1)
     :encoding (:x (:field :year
		    :type :temporal
		    :axis (:grid :false))
		:y (:field :entity
		    :type :nominal
		    :axis (:title ""))
		:size (:field :deaths
		       :type :quantitative
		       :title "Annual Global Deaths"
		       :legend (:clip-height 30)
		       :scale (:range-max 5000))
		:color (:field :entity
			:type :nominal
			:legend nil)))))

Note how we modified the example by using a lower case entity in the filter to match our default lower case variable names. Also note how we are explicit with parsing the year field as a temporal column. This is because, when creating a chart with inline data, Vega-Lite will parse the field as an integer instead of a date.

Line plots

Simple

(plot:plot
 (vega:defplot simple-line-plot
   `(:title "Google's stock price from 2004 to early 2010"
     :data (:values ,(filter-rows stocks '(string= symbol "GOOG")))
     :mark :line
     :encoding (:x (:field :date
		            :type  :temporal)
		        :y (:field :price
		            :type  :quantitative)))))

Point markers

By setting the point property of the line mark definition to an object defining a property of the overlaying point marks, we can overlay point markers on top of line.

(plot:plot
 (vega:defplot point-mark-line-plot
   `(:title "Stock prices of 5 Tech Companies over Time"
     :data (:values ,stocks)
     :mark (:type :line :point t)
     :encoding (:x (:field :date
		            :time-unit :year)
		        :y (:field :price
		            :type :quantitative
		            :aggregate :mean)
		        :color (:field :symbol
			            :type nominal)))))

Multi-series

This example uses the custom symbol encoding for variables to generate the proper types and labels for x, y and color channels.

(plot:plot
 (vega:defplot multi-series-line-chart
   `(:title "Stock prices of 5 Tech Companies over Time"
     :data (:values ,stocks)
     :mark :line
     :encoding (:x (:field stocks:date)
                :y (:field stocks:price)
		        :color (:field stocks:symbol)))))

Step

(plot:plot
 (vega:defplot step-chart
   `(:title "Google's stock price from 2004 to early 2010"
     :data (:values ,(filter-rows stocks '(string= symbol "GOOG")))
     :mark (:type :line
	        :interpolate "step-after")
     :encoding (:x (:field stocks:date)
		        :y (:field stocks:price)))))

Stroke-dash

(plot:plot
 (vega:defplot stroke-dash
   `(:title "Stock prices of 5 Tech Companies over Time"
     :data (:values ,stocks)
     :mark :line
     :encoding (:x (:field stocks:date)
		        :y (:field stocks:price)
		        :stroke-dash (:field stocks:symbol)))))

Confidence interval

Line chart with a confidence interval band.

(plot:plot
 (vega:defplot line-chart-ci
   `(:data (:values ,vgcars)
     :encoding (:x (:field :year
		            :time-unit :year))
     :layer #((:mark (:type :errorband
		              :extent :ci)
	           :encoding (:y (:field :miles-per-gallon
			                  :type :quantitative
			                  :title "Mean of Miles per Gallon (95% CIs)")))

	          (:mark :line
	           :encoding (:y (:field :miles-per-gallon
			                  :aggregate :mean)))))))

Area charts

Simple

(plot:plot
 (vega:defplot area-chart
   `(:title "Unemployment across industries"
     :width 300
     :height 200
     :data (:values ,unemployment-ind)
     :mark :area
     :encoding (:x (:field :date
		            :time-unit :yearmonth
		            :axis (:format "%Y"))
		        :y (:field :count
		            :aggregate :sum
		            :title "count")))))

Stacked

Stacked area plots

(plot:plot
 (vega:defplot stacked-area-chart
   `(:title "Unemployment across industries"
     :width 300
     :height 200
     :data (:values ,unemployment-ind)
     :mark :area
     :encoding (:x (:field :date
		            :time-unit :yearmonth
		            :axis (:format "%Y"))
		        :y (:field :count
		            :aggregate :sum
		            :title "count")
		        :color (:field :series
			            :scale (:scheme "category20b"))))))

Horizon graph

A horizon graph is a technique for visualising time series data in a manner that makes comparisons easier. It is based on work done at the UW Interactive Data Lab. See Sizing the Horizon: The Effects of Chart Size and Layering on the Graphical Perception of Time Series Visualizations for more details on Horizon Graphs.

(plot:plot
 (vega:defplot horizon-graph
   `(:title "Horizon graph with 2 layers"
     :width 300
     :height 50
     :data (:values ,(plist-df `(:x ,(aops:linspace 1 20 20)
	                             :y #(28 55 43 91 81 53 19 87 52 48 24 49 87 66 17 27 68 16 49 15))))
     :encoding (:x (:field :x
		            :scale (:zero :false
			        :nice :false))
		        :y (:field :y
		            :type :quantitative
		            :scale (:domain #(0 50))
		            :axis (:title "y")))
     :layer #((:mark (:type :area
		              :clip t
		              :orient :vertical
		              :opacity 0.6))
	          (:transform #((:calculate "datum.y - 50"
			                 :as :ny))
	           :mark (:type :area
		              :clip t
		              :orient :vertical)
	           :encoding (:y (:field "ny"
			                  :type :quantitative
			                  :scale (:domain #(0 50)))
			              :opacity (:value 0.3))))
     :config (:area (:interpolate :monotone)))))

With overlay

Area chart with overlaying lines and point markers.

(plot:plot
 (vega:defplot area-with-overlay
   `(:title "Google's stock price"
     :data (:values ,(filter-rows stocks '(string= symbol "GOOG")))
     :mark (:type :area
	        :line t
	        :point t)
     :encoding (:x (:field stocks:date)
		        :y (:field stocks:price)))))

Note the use of the variable symbols, e.g. stocks:price to fill in the variable’s information instead of :type :quantitative :title ...

Stream graph

(plot:plot
 (vega:defplot stream-graph
   `(:title "Unemployment Stream Graph"
     :width 300
     :height 200
     :data (:values ,unemployment-ind)
     :mark :area
     :encoding (:x (:field :date
		            :time-unit "yearmonth"
		            :axis (:domain :false
			        :format "%Y"
			        :tick-size 0))
		        :y (:field count
		            :aggregate :sum
		            :axis null
		            :stack :center)
		        :color (:field :series
			            :scale (:scheme "category20b"))))))

Tabular plots

Table heatmap

(plot:plot
 (vega:defplot table-heatmap
   `(:data (:values ,vgcars)
     :mark :rect
     :encoding (:x (:field vgcars:cylinders)
		        :y (:field vgcars:origin)
		        :color (:field :horsepower
			            :aggregate :mean))
     :config (:axis (:grid t :tick-band :extent)))))

Heatmap with labels

Layering text over a table heatmap

(plot:plot
 (vega:defplot heatmap-labels
   `(:data (:values ,vgcars)
     :transform #((:aggregate #((:op :count :as :num-cars))
		           :groupby #(:origin :cylinders)))
     :encoding (:x (:field :cylinders
		            :type :ordinal)
		        :y (:field :origin
		            :type :ordinal))
     :layer #((:mark :rect
	           :encoding (:color (:field :num-cars
				                  :type :quantitative
				                  :title "Count of Records"
				                  :legend (:direction :horizontal
					              :gradient-length 120))))
	          (:mark :text
	           :encoding (:text (:field :num-cars
				                 :type :quantitative)
			                     :color (:condition (:test "datum['numCars'] < 40"
					                     :value :black)
				                         :value :white))))
     :config (:axis (:grid t
		     :tick-band :extent)))))

Histogram heatmap

(plot:plot
 (vega:defplot heatmap-histogram
   `(:data (:values ,imdb)
     :transform #((:and #((:field :imdb-rating :valid t)
			  (:field :rotten-tomatoes-rating :valid t))))
     :mark :rect
     :width 300
     :height 200
     :encoding (:x (:bin (:maxbins 60)
		            :field :imdb-rating
		            :type :quantitative
			        :title "IMDB Rating")
		        :y (:bin (:maxbins 40)
		            :field :rotten-tomatoes-rating
		            :type :quantitative
			        :title "Rotten Tomatoes Rating")
		        :color (:aggregate :count
			            :type :quantitative))
     :config (:view (:stroke :transparent)))))

Circular plots

Pie chart

(plot:plot
 (vega:defplot pie-chart
   `(:data (:values ,(plist-df `(:category ,(aops:linspace 1 6 6)
	                             :value #(4 6 10 3 7 8))))
     :mark :arc
     :encoding (:theta (:field :value
			            :type :quantitative)
		        :color (:field :category
			            :type :nominal)))))

Donut chart

(plot:plot
 (vega:defplot donut-chart
   `(:data (:values ,(plist-df `(:category ,(aops:linspace 1 6 6)
	                             :value #(4 6 10 3 7 8))))
     :mark (:type :arc :inner-radius 50)
     :encoding (:theta (:field :value
			            :type :quantitative)
		        :color (:field :category
			            :type :nominal)))))

Radial plot

This radial plot uses both angular and radial extent to convey multiple dimensions of data. However, this approach is not perceptually effective, as viewers will most likely be drawn to the total area of the shape, conflating the two dimensions. This example also demonstrates a way to add labels to circular plots.

(plot:plot
 (vega:defplot radial-plot
   `(:data (:values ,(plist-df '(:value #(12 23 47 6 52 19))))
     :layer #((:mark (:type :arc
		              :inner-radius 20
		              :stroke "#fff"))
	          (:mark (:type :text
		              :radius-offset 10)
	           :encoding (:text (:field :value
				          :type :quantitative))))
     :encoding (:theta (:field :value
			            :type :quantitative
			            :stack t)
	            :radius (:field :value
			             :scale (:type :sqrt
				         :zero t
				         :range-min 20))
		        :color (:field :value
			            :type :nominal
			            :legend nil)))))

Transformations

Normally data transformations should be done in Lisp-Stat with a data frame. These examples illustrate how to accomplish transformations using Vega-Lite. This might be useful if, for example, you’re serving up a lot of plots and want to move the processing to the users browser.

Difference from avg

(plot:plot
 (vega:defplot difference-from-average
   `(:data (:values ,(filter-rows imdb '(not (eql imdb-rating :na))))
     :transform #((:joinaggregate #((:op :mean ;we could do this above using alexandria:thread-first
				     :field :imdb-rating
				     :as :average-rating)))
		          (:filter "(datum['imdbRating'] - datum.averageRating) > 2.5"))
     :layer #((:mark :bar
	           :encoding (:x (:field :imdb-rating
			                  :type :quantitative
			                  :title "IMDB Rating")
			              :y (:field :title
			                  :type :ordinal
			                  :title "Title")))
	          (:mark (:type :rule :color "red")
	           :encoding (:x (:aggregate :average
			                  :field :average-rating
			                  :type :quantitative)))))))

Frequency distribution

Cumulative frequency distribution of films in the IMDB database.

(plot:plot
 (vega:defplot cumulative-frequency-distribution
   `(:data (:values ,imdb)
     :transform #((:sort #((:field :imdb-rating))
		           :window #((:op :count
			                  :field :count as :cumulative-count))
		           :frame #(nil 0)))
     :mark :area
     :encoding (:x (:field :imdb-rating
		            :type :quantitative)
		        :y (:field :cumulative-count
		            :type :quantitative)))))

Layered & cumulative histogram

(plot:plot
 (vega:defplot layered-histogram
   `(:data (:values ,(filter-rows imdb '(not (eql imdb-rating :na))))
     :transform #((:bin t
		           :field :imdb-rating
		           :as #(:bin-imdb-rating :bin-imdb-rating-end))
		          (:aggregate #((:op :count :as :count))
		           :groupby #(:bin-imdb-rating :bin-imdb-rating-end))
		          (:sort #((:field :bin-imdb-rating))
		           :window #((:op :sum
			                  :field :count :as :cumulative-count))
		                      :frame #(nil 0)))
     :encoding (:x (:field :bin-imdb-rating
		            :type :quantitative
		            :scale (:zero :false)
		            :title "IMDB Rating")
		        :x2 (:field :bin-imdb-rating-end))
     :layer #((:mark :bar
	           :encoding (:y (:field :cumulative-count
			                  :type :quantitative
			                  :title "Cumulative Count")))
	          (:mark (:type :bar
		              :color "yellow"
		              :opacity 0.5)
	           :encoding (:y (:field :count
			                  :type :quantitative
			                  :title "Count")))))))

Layering averages

Layering averages over raw values.

(plot:plot
 (vega:defplot layered-averages
   `(:data (:values ,(filter-rows stocks '(string= symbol "GOOG")))
     :layer #((:mark (:type :point
		              :opacity 0.3)
	          :encoding (:x (:field :date
			                 :time-unit :year)
			             :y (:field :price
			                 :type quantitative)))

	          (:mark :line
	           :encoding (:x (:field :date
			                  :time-unit :year)
			              :y (:field :price
			                  :aggregate :mean)))))))

Error bars

Confidence interval

Error bars showing confidence intervals.

(plot:plot
 (vega:defplot error-bar-ci
   `(:data (:values ,barley)
     :encoding (:y (:field :variety
		            :type  :ordinal
		            :title "Variety"))
     :layer #((:mark (:type :point
		              :filled t)
	           :encoding (:x (:field :yield
			                  :aggregate :mean
			                  :type :quantitative
			                  :scale (:zero :false)
			                  :title "Barley Yield")
			  :color (:value "black")))

	      (:mark (:type :errorbar :extent :ci)
	       :encoding (:x (:field :yield
			              :type :quantitative
			              :title "Barley Yield")))))))

Standard deviation

Error bars showing standard deviation.

(plot:plot
 (vega:defplot error-bar-sd
   `(:data (:values ,barley)
     :encoding (:y (:field :variety
		            :type :ordinal
		            :title "Variety"))
     :layer #((:mark (:type :point
		              :filled t)
	           :encoding (:x (:field :yield
			                  :aggregate :mean
			                  :type :quantitative
			                  :scale (:zero :false)
			                  :title "Barley Yield")
			              :color (:value "black")))

	          (:mark (:type :errorbar :extent :stdev)
	           :encoding (:x (:field :yield
			                  :type :quantitative
			                  :title "Barley Yield")))))))

Box plots

Min/max whiskers

A vertical box plot showing median, min, and max body mass of penguins.

(plot:plot
 (vega:defplot box-plot-min-max
   `(:data (:values ,penguins)
     :mark (:type :boxplot
	        :extent "min-max")
     :encoding (:x (:field :species
		            :type :nominal
		            :title "Species")
	            :y (:field |BODY-MASS-(G)|
		            :type :quantitative
		            :scale (:zero :false)
		            :title "Body Mass (g)")
		        :color (:field :species
			            :type :nominal
			            :legend nil)))))

Tukey

A vertical box plot showing median and lower and upper quartiles of the distribution of body mass of penguins.

(plot:plot
 (vega:defplot box-plot-tukey
   `(:data (:values ,penguins)
     :mark :boxplot
     :encoding (:x (:field :species
		            :type :nominal
		            :title "Species")
	            :y (:field |BODY-MASS-(G)|
		            :type :quantitative
		            :scale (:zero :false)
		            :title "Body Mass (g)")
		        :color (:field :species
			            :type :nominal
			            :legend nil)))))

Summaries

Box plot with pre-computed summaries. Use this pattern to plot summaries done in a data-frame.

(plot:plot
 (vega:defplot box-plot-summaries
   `(:title "Body Mass of Penguin Species (g)"
     :data (:values ,(plist-df '(:species #("Adelie" "Chinstrap" "Gentoo")
	                             :lower #(2850 2700 3950)
	                             :q1 #(3350 3487.5 4700)
	                             :median #(3700 3700 5000)
	                             :q3 #(4000 3950 5500)
	                             :upper #(4775 4800 6300)
	                             :outliers #(#() #(2700 4800) #()))))
     :encoding (:y (:field :species
		            :type :nominal
		            :title null))
     :layer #((:mark (:type :rule)
	           :encoding (:x (:field :lower
			                  :type :quantitative
			                  :scale (:zero :false)
			                  :title null)
			              :x2 (:field :upper)))

	          (:mark (:type :bar :size 14)
	           :encoding (:x (:field :q1
			                  :type :quantitative)
			              :x2 (:field :q3)
			              :color (:field :species
				                  :type :nominal
				                  :legend null)))

	          (:mark (:type :tick
		              :color :white
		              :size 14)
	           :encoding (:x (:field :median
			                  :type :quantitative)))

	      (:transform #((:flatten #(:outliers)))
	       :mark (:type :point :style "boxplot-outliers")
	       :encoding (:x (:field :outliers
			              :type :quantitative)))))))

Layered

Rolling average

Plot showing a 30 day rolling average with raw values in the background.

(plot:plot
 (vega:defplot moving-average
   `(:width 400
     :height 300
     :data (:values ,seattle-weather)
     :transform #((:window #((:field :temp-max
			                  :op :mean
			                  :as :rolling-mean))
	               :frame #(-15 15)))
     :encoding (:x (:field :date
		            :type :temporal
		            :title "Date")
		        :y (:type :quantitative
		            :axis (:title "Max Temperature and Rolling Mean")))
     :layer #((:mark (:type :point :opacity 0.3)
	            :encoding (:y (:field :temp-max
			                   :title "Max Temperature")))

	          (:mark (:type :line :color "red" :size 3)
	           :encoding (:y (:field :rolling-mean
			                  :title "Rolling Mean of Max Temperature")))))))

Histogram w/mean

(plot:plot
 (vega:defplot histogram-with-mean
   `(:data (:values ,imdb)
     :layer #((:mark :bar
	            :encoding (:x (:field :imdb-rating
			                   :bin t
			                   :title "IMDB Rating")
			               :y (:aggregate :count)))

	          (:mark :rule
	           :encoding (:x (:field :imdb-rating
			                  :aggregate :mean
			                  :title "Mean of IMDB Rating")
			              :color (:value "red")
			              :size (:value 5)))))))

Interactive

This section demonstrates interactive plots.

Scatter plot matrix

This Vega-Lite interactive scatter plot matrix includes interactive elements and demonstrates creating a SPLOM (scatter plot matrix).

(defparameter vgcars-splom
 (vega::make-plot "vgcars-splom"
		  vgcars
		  `("$schema" "https://vega.github.io/schema/vega-lite/v5.json"
			:title "Scatterplot Matrix for Vega Cars"
			:repeat (:row    #(:horsepower :acceleration :miles-per-gallon)
			         :column #(:miles-per-gallon :acceleration :horsepower))
			:spec (:data (:url "/data/vgcars-splom-data.json")
			:mark :point
			:params #((:name "brush"
			:select (:type "interval"
			         :resolve "union"
					 :on "[mousedown[event.shiftKey], window:mouseup] > window:mousemove!"
					 :translate "[mousedown[event.shiftKey], window:mouseup] > window:mousemove!"
					 :zoom "wheel![event.shiftKey]"))
				    (:name "grid"
					 :select (:type "interval"
					 :resolve "global"
					 :translate "[mousedown[!event.shiftKey], window:mouseup] > window:mousemove!"
					 :zoom "wheel![!event.shiftKey]")
					 :bind :scales))
	        :encoding (:x (:field (:repeat "column") :type "quantitative")
			           :y (:field (:repeat "row") :type "quantitative" :axis ("minExtent" 30))
					   :color (:condition (:param "brush" :field :origin :type "nominal")
					           :value "grey"))))))
(plot:plot vgcars-splom)

This example is one of those mentioned in the plotting tutorial that uses a non-standard location for the data property.

Weather exploration

This graph shows an interactive view of Seattle’s weather, including maximum temperature, amount of precipitation, and type of weather. By clicking and dragging on the scatter plot, you can see the proportion of days in that range that have sun, rain, fog, snow, etc.

(plot:plot
 (vega:defplot weather-exploration
   `(:title "Seattle Weather, 2012-2015"
     :data (:values ,seattle-weather)
     :vconcat #(;; upper graph
		(:encoding (:color (:condition (:param :brush
						               :title "Weather"
						               :field :weather
						               :type :nominal
						               :scale (:domain #("sun" "fog" "drizzle" "rain" "snow")
							           :range #("#e7ba52", "#a7a7a7", "#aec7e8", "#1f77b4", "#9467bd")))
			                :value "lightgray")
			    :size (:field :precipitation
				       :type  :quantitative
				       :title "Precipitation"
				       :scale (:domain #(-1 50)))
			    :x (:field :date
				    :time-unit :monthdate
				    :title "Date"
				    :axis (:format "%b"))
			    :y (:field :temp-max
				    :type :quantitative
				    :scale (:domain #(-5 40))
				    :title "Maximum Daily Temperature (C)"))
		 :width 600
		 :height 300
		 :mark :point
		 :params #((:name :brush
			        :select (:type :interval
				             :encodings #(:x))))
		 :transform #((:filter (:param :click))))

		;; lower graph
		(:encoding (:color (:condition (:param :click
						                :field :weather
						                :scale (:domain #("sun", "fog", "drizzle", "rain", "snow")
							            :range #("#e7ba52", "#a7a7a7", "#aec7e8", "#1f77b4", "#9467bd")))
				            :value "lightgray")
			    :x (:aggregate :count)
			    :y (:field :weather
				:title "Weather"))
		 :width 600
		 :mark :bar
		 :params #((:name :click
			        :select (:type :point
				             :encodings #(:color))))
		 :transform #((:filter (:param :brush))))))))

Interactive scatterplot

(plot:plot
 (vega:defplot global-health
   `(:title "Global Health Statistics by Country and Year"
     :data (:values ,gapminder)
     :width 800
     :height 500
     :layer #((:transform #((:filter (:field :country
				             :equal "afghanistan"))
			                (:filter (:param :year)))
	           :mark (:type :text
		              :font-size 100
		              :x 420
		              :y 250
		              :opacity 0.06)
	           :encoding (:text (:field :year)))

	          (:transform #((:lookup :cluster
			                 :from (:key :id
				                    :fields #(:name)
				                    :data (:values #(("id" 0 "name" "South Asia")
						                  ("id" 1 "name" "Europe & Central Asia")
						                  ("id" 2 "name" "Sub-Saharan Africa")
						                  ("id" 3 "name" "America")
						                  ("id" 4 "name" "East Asia & Pacific")
						                  ("id" 5 "name" "Middle East & North Africa"))))))
	           :encoding (:x (:field :fertility
			                  :type :quantitative
			                  :scale (:domain #(0 9))
			                  :axis (:tick-count 5
				              :title "Fertility"))
			              :y (:field :life-expect
			                  :type :quantitative
			                  :scale (:domain #(20 85))
			              :axis (:tick-count 5
				                 :title "Life Expectancy")))
	       :layer #((:mark (:type :line
				            :size 4
				            :color "lightgray"
				            :stroke-cap "round")
			         :encoding (:detail (:field :country)
				                :order (:field :year)
				                :opacity (:condition (:test (:or #((:param :hovered :empty :false)
								                                   (:param :clicked :empty :false)))
							              :value 0.8)
					                      :value 0)))

			        (:params #((:name :year
				                :value #((:year 1955))
				                :select (:type :point
					                     :fields #(:year))
				                :bind (:name :year
					                   :input :range
					                   :min 1955
					                   :max 2005
					                   :step 5))
				               (:name :hovered
				                :select (:type :point
					                     :fields #(:country)
					                     :toggle :false
					                     :on :mouseover))
				               (:name :clicked
				                :select (:type :point
					                     :fields #(:country))))
                     :transform #((:filter (:param :year)))
			         :mark (:type :circle
				            :size 100
				            :opacity 0.9)
			         :encoding (:color (:field :name
					                    :title "Region")))

			        (:transform #((:filter (:and #((:param :year)
						                           (:or #((:param :clicked :empty :false)
							                              (:param :hovered :empty :false)))))))
			 :mark (:type :text
				    :y-offset -12
				    :font-size 12
				    :font-weight :bold)
			 :encoding (:text (:field :country)
				        :color (:field :name
					    :title "Region")))

			(:transform #((:filter (:param :hovered :empty :false))
				          (:filter (:not (:param :year))))
			 :layer #((:mark (:type :text
					          :y-offset -12
					          :font-size 12
					          :color "gray")
				       :encoding (:text (:field :year)))
				      (:mark (:type :circle
					          :color "gray"))))))))))

Crossfilter

Cross-filtering makes it easier and more intuitive for viewers of a plot to interact with the data and understand how one metric affects another. With cross-filtering, you can click a data point in one dashboard view to have all dashboard views automatically filter on that value.

Click and drag across one of the charts to see the other variables filtered.

(plot:plot
 (vega:defplot cross-filter
   `(:title "Cross filtering of flights"
     :data (:values ,flights-2k)
     :transform #((:calculate "hours(datum.date)", :as "time")) ;what does 'hours' do?
     :repeat (:column #(:distance :delay :time))
     :spec (:layer #((:params #((:name :brush
				                 :select (:type :interval
					                      :encodings #(:x))))
		      :mark :bar
		      :encoding (:x (:field (:repeat :column)
				             :bin (:maxbins 20))
				         :y (:aggregate :count)
				         :color (:value "#ddd")))

		     (:transform #((:filter (:param :brush)))
		      :mark :bar
		      :encoding (:x (:field (:repeat :column)
				             :bin (:maxbins 20))
				         :y (:aggregate :count))))))))

2 - Statistics

Examples of statistical analysis

These notebooks describe how to undertake statistical analyses introduced as examples in the Ninth Edition of Introduction to the Practices of Statistics (2017) by Moore, McCabe and Craig. The notebooks are organised in the same manner as the chapters of the book. The data comes from the site IPS9 in R by Nicholas Horton.

To run the notebooks you will have to install a third-party library, common-lisp-jupyter. See the cl-jupyter installation page for how to perform the installation.

After installing cl-jupyter, clone the IPS repository into your ~/common-lisp/ directory.

Note

Be careful when upgrading common-lisp-jupyter. Breaking changes are often introduced without warning. If you experience problems, use cl-jupyter revision b1021ab by using the git checkout command.

Looking at data

Chapter 1 – Distributions : Exploratory data analysis using plots and numbers
Chapter 2 – Data Relationships : Examining relationships between variables