This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Getting Started

Install to plotting in five minutes

Prerequisites

  • SBCL or CCL Common Lisp
  • MacOS or Windows 10
  • Quicklisp
  • Chrome

Load & Configure

First load Lisp-Stat, plotting libraries and data and configure the environment.

Lisp-Stat

(ql:quickload :lisp-stat)
(in-package :ls-user)

Vega-Lite

(ql:quickload :plot/vglt)

Data

(define-data-frame cars
  (vglt:vl-to-df
    (dex:get
	  "https://raw.githubusercontent.com/vega/vega-datasets/master/data/cars.json"
	  :want-stream t)))

View

Print the data frame (showing the first 25 rows by default)

(pprint cars)
;; ORIGIN YEAR       ACCELERATION WEIGHT_IN_LBS HORSEPOWER DISPLACEMENT CYLINDERS MILES_PER_GALLON NAME
;; USA    1970-01-01         12.0          3504        130        307.0         8             18.0 chevrolet chevelle malibu
;; USA    1970-01-01         11.5          3693        165        350.0         8             15.0 buick skylark 320
;; USA    1970-01-01         11.0          3436        150        318.0         8             18.0 plymouth satellite
;; USA    1970-01-01         12.0          3433        150        304.0         8             16.0 amc rebel sst
;; USA    1970-01-01         10.5          3449        140        302.0         8             17.0 ford torino
;; USA    1970-01-01         10.0          4341        198        429.0         8             15.0 ford galaxie 500
;; USA    1970-01-01          9.0          4354        220        454.0         8             14.0 chevrolet impala
;; USA    1970-01-01          8.5          4312        215        440.0         8             14.0 plymouth fury iii
;; USA    1970-01-01         10.0          4425        225        455.0         8             14.0 pontiac catalina
;; USA    1970-01-01          8.5          3850        190        390.0         8             15.0 amc ambassador dpl
;; Europe 1970-01-01         17.5          3090        115        133.0         4 NIL              citroen ds-21 pallas
;; USA    1970-01-01         11.5          4142        165        350.0         8 NIL              chevrolet chevelle concours (sw)
;; USA    1970-01-01         11.0          4034        153        351.0         8 NIL              ford torino (sw)
;; USA    1970-01-01         10.5          4166        175        383.0         8 NIL              plymouth satellite (sw)
;; USA    1970-01-01         11.0          3850        175        360.0         8 NIL              amc rebel sst (sw)
;; USA    1970-01-01         10.0          3563        170        383.0         8             15.0 dodge challenger se
;; USA    1970-01-01          8.0          3609        160        340.0         8             14.0 plymouth 'cuda 340
;; USA    1970-01-01          8.0          3353        140        302.0         8 NIL              ford mustang boss 302
;; USA    1970-01-01          9.5          3761        150        400.0         8             15.0 chevrolet monte carlo
;; USA    1970-01-01         10.0          3086        225        455.0         8             14.0 buick estate wagon (sw)
;; Japan  1970-01-01         15.0          2372         95        113.0         4             24.0 toyota corona mark ii
;; USA    1970-01-01         15.5          2833         95        198.0         6             22.0 plymouth duster
;; USA    1970-01-01         15.5          2774         97        199.0         6             18.0 amc hornet
;; USA    1970-01-01         16.0          2587         85        200.0         6             21.0 ford maverick                 ..

Show the last few rows:

(tail cars)
;; ORIGIN YEAR       ACCELERATION WEIGHT_IN_LBS HORSEPOWER DISPLACEMENT CYLINDERS MILES_PER_GALLON NAME
;; USA    1982-01-01         17.3          2950         90          151         4               27 chevrolet camaro
;; USA    1982-01-01         15.6          2790         86          140         4               27 ford mustang gl
;; Europe 1982-01-01         24.6          2130         52           97         4               44 vw pickup
;; USA    1982-01-01         11.6          2295         84          135         4               32 dodge rampage
;; USA    1982-01-01         18.6          2625         79          120         4               28 ford ranger
;; USA    1982-01-01         19.4          2720         82          119         4               31 chevy s-10

Statistics

Look at a few statistics on the data set.

(mean cars:acceleration) ; => 15.5197
(summary cars)
  CARS:MILES_PER_GALLON
                        398 reals, min=9, q25=17.33333317438761d0,
                        q50=22.727271751923993d0, q75=29.14999923706055d0,
                        max=46.6d0;
                        8 (2%) x "NIL"
  CARS:CYLINDERS
                 207 (51%) x 4,
                 108 (27%) x 8,
                 84 (21%) x 6,
                 4 (1%) x 3,
                 3 (1%) x 5
  CARS:DISPLACEMENT
                    406 reals, min=68, q25=104.25, q50=147.92307,
                    q75=277.76923, max=455
  CARS:HORSEPOWER
                  400 reals, min=46, q25=75.77778, q50=94.33333, q75=129.57143,
                  max=230;
                  6 (1%) x "NIL"
  CARS:WEIGHT_IN_LBS
                     406 reals, min=1613, q25=2226, q50=2822.5, q75=3620,
                     max=5140
  CARS:ACCELERATION
                    406 reals, min=8, q25=13.674999999999999d0, q50=15.45d0,
                    q75=17.16666632692019d0, max=24.8d0
  CARS:YEAR
            61 (15%) x "1982-01-01",
            40 (10%) x "1973-01-01",
            36 (9%) x "1978-01-01",
            35 (9%) x "1970-01-01",
            34 (8%) x "1976-01-01",
            30 (7%) x "1975-01-01",
            29 (7%) x "1971-01-01",
            29 (7%) x "1979-01-01",
            29 (7%) x "1980-01-01",
            28 (7%) x "1972-01-01",
            28 (7%) x "1977-01-01",
            27 (7%) x "1974-01-01"
  CARS:ORIGIN
              254 (63%) x "USA", 79 (19%) x "Japan", 73 (18%) x "Europe">

Note: The car models, essentially the row names, have been removed from the summary.

Plot

Create a scatter plot specification with default values:

(defparameter cars-plot (vglt:scatter-plot cars "HORSEPOWER" "MILES_PER_GALLON"))

Render the plot:

(plot:plot-from-file (vglt:save-plot 'cars-plot))

1 - Installation

Automated and manual installation

New to Lisp

If you are a Lisp newbie and want to get started as fast as possible, then Portacle is probably your best option. Portacle is a multi-platform IDE for Common Lisp that includes Emacs, SBCL, Git, Quicklisp, all configured and ready to use.

If you are an existing emacs user, you can configure emacs for Common Lisp.

Users new to lisp should also consider going through the basic tutorial, which guides you step-by-step through the basics of working with Lisp as a statistics practitioner.

Experienced with Lisp

We assume an experienced user will have their own Emacs and lisp implementation and will want to install according to their own tastes and setup. The repo links you need are below, or you can install with quicklisp.

Prerequisites

All that is needed is an ANSI Common Lisp implementation. Development is done with CCL and SBCL. Other platforms should work, but will not have been tested.

Installation

Automated install

The easiest way to install Lisp-Stat is with Quicklisp:

(ql:quickload :lisp-stat)

Manual install

If you want to modify Lisp-Stat you’ll need to retrieve the files from github and place them in a directory that is known to quicklisp. This long shell command will checkout all the required systems:

cd ~/quicklisp/local-projects && \
git clone https://github.com/Lisp-Stat/data-frame.git && \
git clone https://github.com/Lisp-Stat/dfio.git && \
git clone https://github.com/Lisp-Stat/special-functions.git && \
git clone https://github.com/Lisp-Stat/numerical-utilities.git && \
git clone https://github.com/Lisp-Stat/documentation.git && \
git clone https://github.com/Lisp-Stat/plot.git && \
git clone https://github.com/Lisp-Stat/select.git && \
git clone https://github.com/Lisp-Stat/lisp-stat.git

The above assumes you have the default installation directories. Adjust accordingly if you have changed this. If Quicklisp claims it cannot find the systems, try this at the REPL:

(ql:register-local-projects)

Documentation

Lisp-Stat reference manuals are generated with the declt system. This produces high quality PDFs, markdown, HTML and Info output. The API reference manuals are available in HTML in the reference section of this website; PDF and Info files that can be download from the individual systems docs/ directory.

You can install the info manuals into the emacs help system and this allows searching and browsing from within the editing environment. To do this, use the install-info command. As an example, on my MS Windows 10 machine, with MSYS2/emacs installation:

install-info --add-once select.info /c/msys64/mingw64/share/info/dir

installs the select manual into a Lisp-Stat node at the top level of the info tree.

Try it out

Load Lisp-Stat:

(ql:quickload :lisp-stat)

Change to the Lisp-Stat user package:

(in-package :ls-user)

Load some data:

(load #P"LS:DATASETS;CAR-PRICES")

Find the sample mean and median:

(mean car-prices)
(median car-prices)

Next steps

Get Started
Examples
R Users

2 - Data Frame

Getting started with data frames

Load data

We will use one of the example data sets from R, mtcars, for these examples. First, load Lisp-Stat and the R data libraries, and switch into the Lisp-Stat package:

(ql:quickload :lisp-stat)
(ql:quickload :lisp-stat/rdata)
(in-package   :ls-user)

Now define the data frame, naming it mtcars:

(define-data-frame mtcars
	(read-csv (rdata:rdata 'rdata:datasets 'rdata:mtcars)))
;;WARNING: Missing column name was filled in
;;#<DATA-FRAME (32 observations of 11 variables)>

This macro defines a global variable named mtcars and sets up some convenience functions.

Examine data

Lisp-Stat’s printing system is integrated with the Common Lisp Pretty Printing facility. By default Lisp-Stat sets *print-pretty* to nil.

Basic information

Type the name of the data frame at the REPL to get a simple one-line summary.

mtcars ;; => #<DATA-FRAME (32 observations of 12 variables)>

Printing data

By default, head returns the first 6 rows:

(head mtcars)
;;   X1                 MPG CYL DISP  HP DRAT    WT  QSEC VS AM GEAR CARB
;; 0 Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
;; 1 Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
;; 2 Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
;; 3 Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
;; 4 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
;; 5 Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

and tail the last 6 rows:

;;   X1              MPG CYL  DISP  HP DRAT    WT QSEC VS AM GEAR CARB
;; 0 Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
;; 1 Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
;; 2 Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
;; 3 Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
;; 4 Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
;; 5 Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

pprint can be used to print the whole data frame:

(pprint mtcars)

;;    X1                   MPG CYL  DISP  HP DRAT    WT  QSEC VS AM GEAR CARB
;;  0 Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
;;  1 Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
;;  2 Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
;;  3 Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
;;  4 Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
;;  5 Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
;;  6 Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
;;  7 Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
;;  8 Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
;;  9 Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
;; 10 Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
;; 11 Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
;; 12 Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
;; 13 Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
;; 14 Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
;; 15 Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
;; 16 Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
;; 17 Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
;; 18 Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
;; 19 Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
;; 20 Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
;; 21 Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
;; 22 AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
;; 23 Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4 ..

The two dots “..” at the end indicate that output has been truncated. Lisp-Stat sets the default for pretty printer *print-lines* to 25 rows and output more than this is truncated. If you’d like to print all rows, set this value to nil.

Notice the column named X1. This is the name given to the column by the import function. Note the warning that was issued during the import. Missing columns are named X1, X2, …, Xn in increasing order for the duration of the Lisp-Stat session.

This column is actually the row name, so we’ll rename it:

(replace-key mtcars row-name x1)

and view the results

(head mtcars)
;;   ROW-NAME           MPG CYL DISP  HP DRAT    WT  QSEC VS AM GEAR CARB
;; 0 Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
;; 1 Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
;; 2 Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
;; 3 Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
;; 4 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
;; 5 Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Column names

To see the names of the columns, use the column-names function:

(column-names mtcars)
;; => ("ROW-NAMES" "MPG" "CYL" "DISP" "HP" "DRAT" "WT" "QSEC" "VS" "AM" "GEAR" "CARB")

Dimensions

We saw the dimensions above in basic information. That was a printed for human consumption. To get the values in a form suitable for passing to other functions, use the dims command:

(aops:dims mtcars) ;; => (32 12)

Common Lisp specifies dimensions in row-column order, so mtcars has 32 rows and 12 columns.

Basic Statistics

Minimum & Maximum

To get the minimum or maximum of a column, say mpg, you can use several Common Lisp methods. Let’s see what mpg looks like by typing the name of the column into the REPL:

 mtcars:mpg
;; => #(21 21 22.8d0 21.4d0 18.7d0 18.1d0 14.3d0 24.4d0 22.8d0 19.2d0 17.8d0 16.4d0 17.3d0 15.2d0 10.4d0 10.4d0 14.7d0 32.4d0 30.4d0 33.9d0 21.5d0 15.5d0 15.2d0 13.3d0 19.2d0 27.3d0 26 30.4d0 15.8d0 19.7d0 15 21.4d0)

You could, for example, use something like this to find the minimum:

(reduce #'min mtcars:mpg) ;; => 10.4d0

or the Lisp-Stat function sequence-maximum to find the maximum

(sequence-maximum mtcars:mpg) ;; => 33.9d0

or perhaps you’d prefer alexandria:extremum, a general-purpose tool to find the minimum in a different way:

(extremum mtcars:mpg #'<) ;; => 10.4d0

The important thing to note is that mtcars:mpg is a standard Common Lisp vector and you can manipulate it like one.

Mean & standard deviation

(mean mtcars:mpg) ;; => 20.090625000000003d0
(standard-deviation mtcars:mpg) ;; => 5.932029552301219d0

Summarise

You can summarise a column with the column-summary function:

(column-summary mtcars:mpg)
;; => 32 reals, min=10.4d0, q25=15.399999698003132d0, q50=19.2d0, q75=22.8d0, max=33.9d0

or the entire data frame:

(summary mtcars)
#<DATA-FRAME (12 x 32)
  MTCARS:CARB
              10 (31%) x 4,
              10 (31%) x 2,
              7 (22%) x 1,
              3 (9%) x 3,
              1 (3%) x 6,
              1 (3%) x 8
  MTCARS:GEAR
              15 (47%) x 3, 12 (38%) x 4, 5 (16%) x 5
  MTCARS:AM bits, ones: 13 (41%)
  MTCARS:VS bits, ones: 14 (44%)
  MTCARS:QSEC
              32 reals, min=14.5d0, q25=16.884999999999998d0, q50=17.71d0,
              q75=18.9d0, max=22.9d0
  MTCARS:WT
            32 reals, min=1.513d0, q25=2.5425d0, q50=3.325d0,
            q75=3.6766665957371387d0, max=5.424d0
  MTCARS:DRAT
              32 reals, min=2.76d0, q25=3.08d0, q50=3.6950000000000003d0,
              q75=3.952000046730041d0, max=4.93d0
  MTCARS:HP
            32 reals, min=52, q25=96.0, q50=123, q75=186.25, max=335
  MTCARS:DISP
              32 reals, min=71.1d0, q25=120.65d0, q50=205.86666333675385d0,
              q75=334.0, max=472
  MTCARS:CYL
             14 (44%) x 8, 11 (34%) x 4, 7 (22%) x 6
  MTCARS:MPG
             32 reals, min=10.4d0, q25=15.399999698003132d0, q50=19.2d0,
             q75=22.8d0, max=33.9d0

Recall that a column named row-name is treated specially, notice that it is not included in the summary. You can see why it’s excluded by examining the column’s summary:

(pprint (column-summary mtcars:row-name))
1 (3%) x "Mazda RX4",
1 (3%) x "Mazda RX4 Wag",
1 (3%) x "Datsun 710",
1 (3%) x "Hornet 4 Drive",
1 (3%) x "Hornet Sportabout",
1 (3%) x "Valiant",
1 (3%) x "Duster 360",
1 (3%) x "Merc 240D",
1 (3%) x "Merc 230",
1 (3%) x "Merc 280",
1 (3%) x "Merc 280C",
1 (3%) x "Merc 450SE",
1 (3%) x "Merc 450SL",
1 (3%) x "Merc 450SLC",
1 (3%) x "Cadillac Fleetwood",
1 (3%) x "Lincoln Continental",
1 (3%) x "Chrysler Imperial",
1 (3%) x "Fiat 128",
1 (3%) x "Honda Civic",
1 (3%) x "Toyota Corolla",
1 (3%) x "Toyota Corona",
1 (3%) x "Dodge Challenger",
1 (3%) x "AMC Javelin",
1 (3%) x "Camaro Z28", ..

Columns with unique values in each row aren’t very interesting.

“Use” a data frame

By use-ing a data frame package you can avoid the use of the package qualifier symbol : and directly refer to the variable name. This is similar to R’s attach function.

(use-package 'mtcars)
(mean mpg) ;; => 20.090625000000003d0

the unuse-package function stops using the symbols from the data-frame.

(unuse-package 'mtcars)

Saving data

To save a data frame to a CSV file, use the data-frame-to-csv method. Here we save mtcars into the Lisp-Stat datasets directory, including the column names:

(data-frame-to-csv mtcars
		           :stream #P"LS:DATASETS;mtcars.csv"
		           :add-first-row t)