This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Getting Started

Install to plotting in five minutes

Prerequisites

  • SBCL or CCL Common Lisp
  • MacOS or Windows 10
  • Quicklisp
  • Chrome

Load & Configure

First load Lisp-Stat, plotting libraries and data and configure the environment. We assume you have already obtained the libraries via a package manager like clpm or quicklisp. See the installation instructions on github.

Lisp-Stat

(asdf:load-system :lisp-stat)
(in-package :ls-user)

Vega-Lite

(asdf:load-system :plot/vglt)
(asdf:load-system :dfio/json)

Data

(defdf cars
  (dfio:vl-to-df
    (dex:get
	  "https://raw.githubusercontent.com/vega/vega-datasets/master/data/cars.json"
	  :want-stream t)))

View

Print the data frame (showing the first 25 rows by default)

(pprint cars)
;; ORIGIN YEAR       ACCELERATION WEIGHT_IN_LBS HORSEPOWER DISPLACEMENT CYLINDERS MILES_PER_GALLON NAME
;; USA    1970-01-01         12.0          3504        130        307.0         8             18.0 chevrolet chevelle malibu
;; USA    1970-01-01         11.5          3693        165        350.0         8             15.0 buick skylark 320
;; USA    1970-01-01         11.0          3436        150        318.0         8             18.0 plymouth satellite
;; USA    1970-01-01         12.0          3433        150        304.0         8             16.0 amc rebel sst
;; USA    1970-01-01         10.5          3449        140        302.0         8             17.0 ford torino
;; USA    1970-01-01         10.0          4341        198        429.0         8             15.0 ford galaxie 500
;; USA    1970-01-01          9.0          4354        220        454.0         8             14.0 chevrolet impala
;; USA    1970-01-01          8.5          4312        215        440.0         8             14.0 plymouth fury iii
;; USA    1970-01-01         10.0          4425        225        455.0         8             14.0 pontiac catalina
;; USA    1970-01-01          8.5          3850        190        390.0         8             15.0 amc ambassador dpl
;; Europe 1970-01-01         17.5          3090        115        133.0         4 NIL              citroen ds-21 pallas
;; USA    1970-01-01         11.5          4142        165        350.0         8 NIL              chevrolet chevelle concours (sw)
;; USA    1970-01-01         11.0          4034        153        351.0         8 NIL              ford torino (sw)
;; USA    1970-01-01         10.5          4166        175        383.0         8 NIL              plymouth satellite (sw)
;; USA    1970-01-01         11.0          3850        175        360.0         8 NIL              amc rebel sst (sw)
;; USA    1970-01-01         10.0          3563        170        383.0         8             15.0 dodge challenger se
;; USA    1970-01-01          8.0          3609        160        340.0         8             14.0 plymouth 'cuda 340
;; USA    1970-01-01          8.0          3353        140        302.0         8 NIL              ford mustang boss 302
;; USA    1970-01-01          9.5          3761        150        400.0         8             15.0 chevrolet monte carlo
;; USA    1970-01-01         10.0          3086        225        455.0         8             14.0 buick estate wagon (sw)
;; Japan  1970-01-01         15.0          2372         95        113.0         4             24.0 toyota corona mark ii
;; USA    1970-01-01         15.5          2833         95        198.0         6             22.0 plymouth duster
;; USA    1970-01-01         15.5          2774         97        199.0         6             18.0 amc hornet
;; USA    1970-01-01         16.0          2587         85        200.0         6             21.0 ford maverick                 ..

Show the last few rows:

(tail cars)
;; ORIGIN YEAR       ACCELERATION WEIGHT_IN_LBS HORSEPOWER DISPLACEMENT CYLINDERS MILES_PER_GALLON NAME
;; USA    1982-01-01         17.3          2950         90          151         4               27 chevrolet camaro
;; USA    1982-01-01         15.6          2790         86          140         4               27 ford mustang gl
;; Europe 1982-01-01         24.6          2130         52           97         4               44 vw pickup
;; USA    1982-01-01         11.6          2295         84          135         4               32 dodge rampage
;; USA    1982-01-01         18.6          2625         79          120         4               28 ford ranger
;; USA    1982-01-01         19.4          2720         82          119         4               31 chevy s-10

Statistics

Look at a few statistics on the data set.

(mean cars:acceleration) ; => 15.5197
LS-USER> (summary cars)
ORIGIN: 254 (63%) x USA, 79 (19%) x Japan, 73 (18%) x Europe,
YEAR: 61 (15%) x 1982-01-01, 40 (10%) x 1973-01-01, 36 (9%) x 1978-01-01, 35 (9%) x 1970-01-01, 34 (8%) x 1976-01-01, 30 (7%) x 1975-01-01, 29 (7%) x 1971-01-01, 29 (7%) x 1979-01-01, 29 (7%) x 1980-01-01, 28 (7%) x 1972-01-01, 28 (7%) x 1977-01-01, 27 (7%) x 1974-01-01,
ACCELERATION: 406 reals, min=8, q25=13.674999999999999d0, q50=15.45d0, q75=17.16666632692019d0, max=24.8d0
WEIGHT-IN-LBS: 406 reals, min=1613, q25=2226, q50=2822.5, q75=3620, max=5140
HORSEPOWER: 400 reals, min=46, q25=75.77778, q50=94.33333, q75=129.57143, max=2306 (1%) x NIL,
DISPLACEMENT: 406 reals, min=68, q25=104.25, q50=147.92307, q75=277.76923, max=455
CYLINDERS: 207 (51%) x 4, 108 (27%) x 8, 84 (21%) x 6, 4 (1%) x 3, 3 (1%) x 5,
MILES-PER-GALLON: 398 reals, min=9, q25=17.33333317438761d0, q50=22.727271751923993d0, q75=29.14999923706055d0, max=46.6d08 (2%) x NIL,

Note: The car models, essentially the row names, have been removed from the summary.

Plot

Create a scatter plot specification with default values:

(defparameter cars-plot (vglt:scatter-plot cars "HORSEPOWER" "MILES-PER-GALLON"))

Render the plot:

(plot:plot-from-file (vglt:save-plot 'cars-plot))

Horsepower vs. MPG scatter plot

1 - Installation

Automated and manual installation

New to Lisp

If you are a Lisp newbie and want to get started as fast as possible, then Portacle is probably your best option. Portacle is a multi-platform IDE for Common Lisp that includes Emacs, SBCL, Git, Quicklisp, all configured and ready to use.

If you are an existing emacs user, you can configure emacs for Common Lisp.

Users new to lisp should also consider going through the basic tutorial, which guides you step-by-step through the basics of working with Lisp as a statistics practitioner.

Experienced with Lisp

We assume an experienced user will have their own Emacs and lisp implementation and will want to install according to their own tastes and setup. The repo links you need are below, or you can install with clpm or quicklisp.

Prerequisites

All that is needed is an ANSI Common Lisp implementation. Development is done with Genera, CCL and SBCL. Other platforms should work, but will not have been tested.

Installation

ASDF

If you want to modify Lisp-Stat you’ll need to retrieve the files from github and place them in a directory that is known to ASDF. This long shell command will checkout all the required systems:

cd ~/quicklisp/local-projects && \
git clone https://github.com/Lisp-Stat/data-frame.git && \
git clone https://github.com/Lisp-Stat/dfio.git && \
git clone https://github.com/Lisp-Stat/special-functions.git && \
git clone https://github.com/Lisp-Stat/numerical-utilities.git && \
git clone https://github.com/Lisp-Stat/documentation.git && \
git clone https://github.com/Lisp-Stat/plot.git && \
git clone https://github.com/Lisp-Stat/select.git && \
git clone https://github.com/Lisp-Stat/cephes.cl.git && \
git clone https://github.com/Symbolics/alexandria-plus && \
git clone https://github.com/Lisp-Stat/lisp-stat.git

The above assumes you have the default installation directories. Adjust accordingly if you have changed this. If this is the first time running Lisp-Stat, use Quicklisp to get the dependencies:

(ql:quickload :lisp-stat)

From now on you can load it with:

(asdf:load-system :lisp-stat)

If ASDF claims it can’t find the required systems (this might happen the first time around), reset the system configuration with:

(asfd:clear-source-registry)

and try again.

Quicklisp

If you have quicklisp installed, you can use:

(ql:quickload :lisp-stat)

If Quicklisp claims it cannot find the systems, try this at the REPL:

(ql:register-local-projects)

Quicklisp is good at managing the project depencency retrieval, but most of the time we use ASDF because of its REPL integration. You only have to use Quicklisp once to get the dependencies, then use ASDF for day-to-day work.

Documentation

Lisp-Stat reference manuals are generated with the declt system. This produces high quality PDFs, markdown, HTML and Info output. The API reference manuals are available in HTML in the reference section of this website; PDF and Info files that can be download from the individual systems docs/ directory.

You can install the info manuals into the emacs help system and this allows searching and browsing from within the editing environment. To do this, use the install-info command. As an example, on my MS Windows 10 machine, with MSYS2/emacs installation:

install-info --add-once select.info /c/msys64/mingw64/share/info/dir

installs the select manual into a Lisp-Stat node at the top level of the info tree.

Initialization file

You can put customisations to your environment in either your implementation’s init file, or in a separate and load it from the implementation’s init file. For example, I keep my customisations in #P"~/ls-init.lisp" and load it from SBCL’s init file ~/.sbclrc in a Lisp-Stat initialisation section like this:

;;; Lisp-Stat
(asdf:load-system :lisp-stat)
(load #P"~/ls-init.lisp")

Settings in your personal lisp-stat init file override the system defaults.

Here’s an example ls-init.lisp file that loads some common R data sets.

(require 'dexador)
;;; Load default data sets
(defparameter *default-datasets*
  '(rdata:iris rdata:toothgrowth rdata:plantgrowth rdata:usarrests)
  "Data sets loaded as part of personal Lisp-Stat initialisation. Available in every session.")

(progn				  ;do all initialisation here
  (map nil #'(lambda (x)
	       (format t "Loading ~A" (make-symbol (symbol-name x)))
	       (eval `(defdf ,(intern (symbol-name x))
			  (read-csv ,(symbol-value x)))))
       *default-datasets*))

With this init file, you can immediately access the data sets in the *default-datasets* list defined above, e.g.:

(head iris)
;;   X2 SEPAL-LENGTH SEPAL-WIDTH PETAL-LENGTH PETAL-WIDTH SPECIES
;; 0  1          5.1         3.5          1.4         0.2 setosa
;; 1  2          4.9         3.0          1.4         0.2 setosa
;; 2  3          4.7         3.2          1.3         0.2 setosa
;; 3  4          4.6         3.1          1.5         0.2 setosa
;; 4  5          5.0         3.6          1.4         0.2 setosa
;; 5  6          5.4         3.9          1.7         0.4 setosa

Try it out

Load Lisp-Stat:

(asdf:load-system :lisp-stat)

Change to the Lisp-Stat user package:

(in-package :ls-user)

Load some data:

(load #P"LS:DATASETS;CAR-PRICES")

Find the sample mean and median:

(mean car-prices)   ; => 2.810199998617172d0
(median car-prices) ; => 2.55

Next steps

Get Started
Examples
R Users

2 - Data Frame

Getting started with data frames

Load data

We will use one of the example data sets from R, mtcars, for these examples. First, load Lisp-Stat and switch into the Lisp-Stat package:

(asdf:load-system :lisp-stat)
(in-package :ls-user)

Now define the data frame, naming it mtcars:

(defdf mtcars (read-csv rdata:mtcars))
;;WARNING: Missing column name was filled in
;;#<DATA-FRAME (32 observations of 11 variables)>

This macro defines a global variable named mtcars and sets up some convenience functions.

Examine data

Lisp-Stat’s printing system is integrated with the Common Lisp Pretty Printing facility. By default Lisp-Stat sets *print-pretty* to nil.

Basic information

Type the name of the data frame at the REPL to get a simple one-line summary.

mtcars ;; => #<DATA-FRAME (32 observations of 12 variables)>

Printing data

By default, head returns the first 6 rows:

(head mtcars)
;;   X1                 MPG CYL DISP  HP DRAT    WT  QSEC VS AM GEAR CARB
;; 0 Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
;; 1 Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
;; 2 Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
;; 3 Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
;; 4 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
;; 5 Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

and tail the last 6 rows:

;;   X1              MPG CYL  DISP  HP DRAT    WT QSEC VS AM GEAR CARB
;; 0 Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
;; 1 Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
;; 2 Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
;; 3 Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
;; 4 Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
;; 5 Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

pprint can be used to print the whole data frame:

(pprint mtcars)

;;    X1                   MPG CYL  DISP  HP DRAT    WT  QSEC VS AM GEAR CARB
;;  0 Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
;;  1 Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
;;  2 Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
;;  3 Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
;;  4 Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
;;  5 Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
;;  6 Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
;;  7 Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
;;  8 Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
;;  9 Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
;; 10 Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
;; 11 Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
;; 12 Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
;; 13 Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
;; 14 Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
;; 15 Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
;; 16 Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
;; 17 Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
;; 18 Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
;; 19 Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
;; 20 Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
;; 21 Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
;; 22 AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
;; 23 Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4 ..

The two dots “..” at the end indicate that output has been truncated. Lisp-Stat sets the default for pretty printer *print-lines* to 25 rows and output more than this is truncated. If you’d like to print all rows, set this value to nil.

Notice the column named X1. This is the name given to the column by the import function. Note the warning that was issued during the import. Missing columns are named X1, X2, …, Xn in increasing order for the duration of the Lisp-Stat session.

This column is actually the row name, so we’ll rename it:

(replace-key! mtcars row-name x1)

Note that your row may be named something other than X1, depending on whether or not you have loaded any other data frames with variable name replacement. Also note: the ‘!’ at the end of the function name is a convention indicating a destructive operation.

Now let’s view the results:

(head mtcars)
;;   ROW-NAME           MPG CYL DISP  HP DRAT    WT  QSEC VS AM GEAR CARB
;; 0 Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
;; 1 Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
;; 2 Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
;; 3 Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
;; 4 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
;; 5 Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Column names

To see the names of the columns, use the column-names function:

(column-names mtcars)
;; => ("ROW-NAME" "MPG" "CYL" "DISP" "HP" "DRAT" "WT" "QSEC" "VS" "AM" "GEAR" "CARB")

Dimensions

We saw the dimensions above in basic information. That was a printed for human consumption. To get the values in a form suitable for passing to other functions, use the dims command:

(aops:dims mtcars) ;; => (32 12)

Common Lisp specifies dimensions in row-column order, so mtcars has 32 rows and 12 columns.

Basic Statistics

Minimum & Maximum

To get the minimum or maximum of a column, say mpg, you can use several Common Lisp methods. Let’s see what mpg looks like by typing the name of the column into the REPL:

 mtcars:mpg
;; => #(21 21 22.8d0 21.4d0 18.7d0 18.1d0 14.3d0 24.4d0 22.8d0 19.2d0 17.8d0 16.4d0 17.3d0 15.2d0 10.4d0 10.4d0 14.7d0 32.4d0 30.4d0 33.9d0 21.5d0 15.5d0 15.2d0 13.3d0 19.2d0 27.3d0 26 30.4d0 15.8d0 19.7d0 15 21.4d0)

You could, for example, use something like this to find the minimum:

(reduce #'min mtcars:mpg) ;; => 10.4d0

or the Lisp-Stat function sequence-maximum to find the maximum

(sequence-maximum mtcars:mpg) ;; => 33.9d0

or perhaps you’d prefer alexandria:extremum, a general-purpose tool to find the minimum in a different way:

(extremum mtcars:mpg #'<) ;; => 10.4d0

The important thing to note is that mtcars:mpg is a standard Common Lisp vector and you can manipulate it like one.

Mean & standard deviation

(mean mtcars:mpg) ;; => 20.090625000000003d0
(standard-deviation mtcars:mpg) ;; => 5.932029552301219d0

Summarise

You can summarise a column with the summarize-column function:

LS-USER> (summarize-column 'mtcars:mpg)

MPG (Miles/(US) gallon)
 n: 32
 missing: 0
 min=10.40
 q25=15.40
 q50=19.20
 mean=20.09
 q75=22.80
 max=33.90

or the entire data frame:

LS-USER> (summary mtcars)
(

MPG (Miles/(US) gallon)
 n: 32
 missing: 0
 min=10.40
 q25=15.40
 q50=19.20
 mean=20.09
 q75=22.80
 max=33.90

CYL (Number of cylinders)
14 (44%) x 8, 11 (34%) x 4, 7 (22%) x 6,

DISP (Displacement (cu.in.))
 n: 32
 missing: 0
 min=71.10
 q25=120.65
 q50=205.87
 mean=230.72
 q75=334.00
 max=472.00

HP (Gross horsepower)
 n: 32
 missing: 0
 min=52
 q25=96.00
 q50=123
 mean=146.69
 q75=186.25
 max=335

DRAT (Rear axle ratio)
 n: 32
 missing: 0
 min=2.76
 q25=3.08
 q50=3.70
 mean=3.60
 q75=3.95
 max=4.93

WT (Weight (1000 lbs))
 n: 32
 missing: 0
 min=1.51
 q25=2.54
 q50=3.33
 mean=3.22
 q75=3.68
 max=5.42

QSEC (1/4 mile time)
 n: 32
 missing: 0
 min=14.50
 q25=16.88
 q50=17.71
 mean=17.85
 q75=18.90
 max=22.90

VS (Engine (0=v-shaped, 1=straight))
ones: 14 (44%)

AM (Transmission (0=automatic, 1=manual))
ones: 13 (41%)

GEAR (Number of forward gears)
15 (47%) x 3, 12 (38%) x 4, 5 (16%) x 5,

CARB (Number of carburetors)
10 (31%) x 4, 10 (31%) x 2, 7 (22%) x 1, 3 (9%) x 3, 1 (3%) x 6, 1 (3%) x 8, )

Recall that the column named model is treated specially, notice that it is not included in the summary. You can see why it’s excluded by examining the column’s summary:

LS-USER>(pprint (summarize-column 'mtcars:model)))
1 (3%) x "Mazda RX4", 1 (3%) x "Mazda RX4 Wag", 1 (3%) x "Datsun 710", 1 (3%) x "Hornet 4 Drive", 1 (3%) x "Hornet Sportabout", 1 (3%) x "Valiant", 1 (3%) x "Duster 360", 1 (3%) x "Merc 240D", 1 (3%) x "Merc 230", 1 (3%) x "Merc 280", 1 (3%) x "Merc 280C", 1 (3%) x "Merc 450SE", 1 (3%) x "Merc 450SL", 1 (3%) x "Merc 450SLC", 1 (3%) x "Cadillac Fleetwood", 1 (3%) x "Lincoln Continental", 1 (3%) x "Chrysler Imperial", 1 (3%) x "Fiat 128", 1 (3%) x "Honda Civic", 1 (3%) x "Toyota Corolla", 1 (3%) x "Toyota Corona", 1 (3%) x "Dodge Challenger", 1 (3%) x "AMC Javelin", 1 (3%) x "Camaro Z28", 1 (3%) x "Pontiac Firebird", 1 (3%) x "Fiat X1-9", 1 (3%) x "Porsche 914-2", 1 (3%) x "Lotus Europa", 1 (3%) x "Ford Pantera L", 1 (3%) x "Ferrari Dino", 1 (3%) x "Maserati Bora", 1 (3%) x "Volvo 142E",

Columns with unique values in each row aren’t very interesting.

“Use” a data frame

By use-ing a data frame package you can avoid the use of the package qualifier symbol : and directly refer to the variable name. This is similar to R’s attach function.

(use-package 'mtcars)
(mean mpg) ;; => 20.090625000000003d0

the unuse-package function stops using the symbols from the data-frame.

(unuse-package 'mtcars)

Saving data

To save a data frame to a CSV file, use the write-csv method. Here we save mtcars into the Lisp-Stat datasets directory, including the column names:

(write-csv
	mtcars #P"LS:DATASETS;mtcars.csv"
	:add-first-row t)