This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Documentation

This section contains user documentation for Lisp-Stat. It is designed for technical users who wish to understand how to use Lisp-Stat to perform statistical analysis.

Other content such as marketing material, case studies, and community updates are in the About and Community pages.

1 - What is Lisp-Stat?

A statistical computing environment written in Common Lisp

Lisp-Stat is a domain specific language (DSL) for statistical analysis and machine learning. It is targeted at statistics practitioners with little or no experience in programming.

Lisp has a history of being deployed for domain experts to use, and it’s a great language for beginners; the Symbolics Graphics Division wrote the software used by graphic artists to develop scenes in several films prior the rise of Pixar. One of the first statistical systems developed, XLisp-Stat, was a contemporary to R until the primary author joined the ‘R Core’ group.

Raisons d’être

There are several reasons to prefer Lisp-Stat over R or Python. The first is that it is fast. Lisp compilers produce native executable code that is nearly as fast as C. The Common Lisp numerical tower has support for rational numbers, which is a natural way to work with samples. For example an experiment may produce 11598 positives out of a sample of 25000. With exact rational arithmetic, there is no need to force everything to a float, the value is just what the experiment said: 11598 / 25000.

Not only does Common Lisp provide a compiler that produces machine code, it has native threading, a rich ecosystem of code libraries, and a history of industrial deployments, including:

  • Credit card authorisation at Amex (Authorizers Assistant)
  • US DoD logistics (and more that we do not know of)
  • CIA and NSA are big users based on Lisp sales
  • DWave, HSL and Rigetti use lisp for programming their quantum computers
  • Apple’s Siri was originally written in Lisp
  • Amazon got started with Lisp & C; so did Y-combinator
  • Google’s flight search engine is written in Common Lisp
  • AT&T used a stripped down version of Symbolics Lisp to process CDRs in the first IP telephony switches

Core Systems

Lisp-Stat is composed of several systems (projects), each independently useful and brought together under the Lisp-Stat umbrella. Dependencies between systems have been minimised to the extent possible so you can use them individually without importing all of Lisp-Stat.

Data-Frame

A data frame is a data structure conceptually similar to a R data frame. It provides column-centric storage for data sets where each named column contains the values for one variable, and each row contains one set of observations. For data frames, we use the ‘tibble’ from the tidyverse as inspiration for functionality.

Data frames can contain values of any type. If desired, additional attributes, such as the numerical type, unit and other information may be attached to the variable for convenience or efficiency. For example you could specify a unit of length, say m/s (meters per second), to ensure that mathematical operations on that variable always produce lengths (though the unit may change).

DFIO

The Data Frame I/O system provides input and output operations for data frames. A data frame may be written to and read from files, strings or streams, including network streams or relational databases.

Select

Select is a facility for selecting portions of sequences or arrays. It provides:

  • An API for making selections (elements selected by the Cartesian product of vectors of subscripts for each axis) of array-like objects. The most important function is select. Unless you want to define additional methods for select, this is pretty much all you need from this library.
  • An extensible DSL for selecting a subset of valid subscripts. This is useful if, for example, you want to resolve column names in a data frame in your implementation of select, or implementing filtering based on row values.

Array Operations

This library is a collection of functions and macros for manipulating Common Lisp arrays and performing numerical calculations with them. The library provides shorthand codes for frequently used operations, displaced array functions, indexing, transformations, generation, permutation and reduction of columns. Array operations may also be applied to data frames, and data frames may be converted to/from arrays.

Special Functions

This library implements numerical special functions in Common Lisp with a focus on high accuracy double-float calculations. These functions are the basis for the statistical distributions functions, e.g. gamma, beta, etc.

Cephes

Cephes.cl is a CFFI wrapper over the Cephes Math Library, a high quality C implementation of statistical functions. We use this both for an accuracy check (Boost uses these to check its accuracy too), and to fill in the gaps where we don’t yet have common lisp implementations of these functions.

Numerical Utilities

Numerical Utilities is the base system that most others depend on. It is a collection of packages providing:

  • num=, et. al. comparison operators for floats
  • simple arithmetic functions, like sum and l2norm
  • element-wise operations for arrays and vectors
  • intervals
  • special matrices and shorthand for their input
  • sample statistics
  • Chebyshev polynomials
  • quadratures
  • univariate root finding
  • horner’s, simpson’s and other functions for numerical analysis

Lisp-Stat

This is the top level system that uses the other packages to create a statistical computing environment. It is also the location for the ‘unified’ interface, where the holes are plugged with third party packages. For example cl-mathstats contains functionality not yet in Lisp-Stat, however its architecture does not lend itself well to incorporation via an ASDF depends-on, so as we consolidate the libraries, missing functionality will be placed in the Lisp-Stat system. Eventually parts of numerical-utilities, especially the statistics functions, will be relocated here.

Acknowledgements

Tamas Papp was the original author of many of these libraries. Starting with relatively clean, working, code that solves real-world problems was a great start to the development of Lisp-Stat.

What next?

2 - Getting Started

Install to plotting in five minutes

The Easy Way

You can use an OCI image or pre-built notebook in the cloud for an instant start. See installation for how.

The CLI Way

If you have a working installation of SBCL, Google Chrome and Quicklisp you can be up and running in 5 minutes.

Prerequisites

  • Steel Bank Common Lisp (SBCL) or CCL
  • MacOS, Linux or Windows 10+
  • Quicklisp
  • Chrome, Firefox or Edge

Loading

First load Lisp-Stat, Plot and sample data. We will use Quicklisp for this, which will both download the system if it isn’t already available, and compile and load it.

Lisp-Stat

(ql:quickload :lisp-stat) (in-package :ls-user) ;access to Lisp-Stat functions

Plotting

(ql:quickload :plot/vega)

Data

(data :vgcars)

View

Print the vgcars data-frame (showing the first 25 rows by default)

(print-data vgcars) ;; ORIGIN YEAR ACCELERATION WEIGHT_IN_LBS HORSEPOWER DISPLACEMENT CYLINDERS MILES_PER_GALLON NAME ;; USA 1970-01-01 12.0 3504 130 307.0 8 18.0 chevrolet chevelle malibu ;; USA 1970-01-01 11.5 3693 165 350.0 8 15.0 buick skylark 320 ;; USA 1970-01-01 11.0 3436 150 318.0 8 18.0 plymouth satellite ;; USA 1970-01-01 12.0 3433 150 304.0 8 16.0 amc rebel sst ;; USA 1970-01-01 10.5 3449 140 302.0 8 17.0 ford torino ;; USA 1970-01-01 10.0 4341 198 429.0 8 15.0 ford galaxie 500 ;; USA 1970-01-01 9.0 4354 220 454.0 8 14.0 chevrolet impala ;; USA 1970-01-01 8.5 4312 215 440.0 8 14.0 plymouth fury iii ;; USA 1970-01-01 10.0 4425 225 455.0 8 14.0 pontiac catalina ;; USA 1970-01-01 8.5 3850 190 390.0 8 15.0 amc ambassador dpl ;; Europe 1970-01-01 17.5 3090 115 133.0 4 NIL citroen ds-21 pallas ;; USA 1970-01-01 11.5 4142 165 350.0 8 NIL chevrolet chevelle concours (sw) ;; USA 1970-01-01 11.0 4034 153 351.0 8 NIL ford torino (sw) ;; USA 1970-01-01 10.5 4166 175 383.0 8 NIL plymouth satellite (sw) ;; USA 1970-01-01 11.0 3850 175 360.0 8 NIL amc rebel sst (sw) ;; USA 1970-01-01 10.0 3563 170 383.0 8 15.0 dodge challenger se ;; USA 1970-01-01 8.0 3609 160 340.0 8 14.0 plymouth 'cuda 340 ;; USA 1970-01-01 8.0 3353 140 302.0 8 NIL ford mustang boss 302 ;; USA 1970-01-01 9.5 3761 150 400.0 8 15.0 chevrolet monte carlo ;; USA 1970-01-01 10.0 3086 225 455.0 8 14.0 buick estate wagon (sw) ;; Japan 1970-01-01 15.0 2372 95 113.0 4 24.0 toyota corona mark ii ;; USA 1970-01-01 15.5 2833 95 198.0 6 22.0 plymouth duster ;; USA 1970-01-01 15.5 2774 97 199.0 6 18.0 amc hornet ;; USA 1970-01-01 16.0 2587 85 200.0 6 21.0 ford maverick ..

Show the last few rows:

(tail vgcars) ;; ORIGIN YEAR ACCELERATION WEIGHT_IN_LBS HORSEPOWER DISPLACEMENT CYLINDERS MILES_PER_GALLON NAME ;; USA 1982-01-01 17.3 2950 90 151 4 27 chevrolet camaro ;; USA 1982-01-01 15.6 2790 86 140 4 27 ford mustang gl ;; Europe 1982-01-01 24.6 2130 52 97 4 44 vw pickup ;; USA 1982-01-01 11.6 2295 84 135 4 32 dodge rampage ;; USA 1982-01-01 18.6 2625 79 120 4 28 ford ranger ;; USA 1982-01-01 19.4 2720 82 119 4 31 chevy s-10

Statistics

Look at a few statistics on the data set.

(mean vgcars:acceleration) ; => 15.5197

The summary command, that works in data frames or individual variables, summarises the variable. Below is a summary with some variables elided.

LS-USER> (summary vgcars) "ORIGIN": 254 (63%) x "USA", 79 (19%) x "Japan", 73 (18%) x "Europe" "YEAR": 61 (15%) x "1982-01-01", 40 (10%) x "1973-01-01", 36 (9%) x "1978-01-01", 35 (9%) x "1970-01-01", 34 (8%) x "1976-01-01", 30 (7%) x "1975-01-01", 29 (7%) x "1971-01-01", 29 (7%) x "1979-01-01", 29 (7%) x "1980-01-01", 28 (7%) x "1972-01-01", 28 (7%) x "1977-01-01", 27 (7%) x "1974-01-01" ACCELERATION (1/4 mile time) n: 406 missing: 0 min=8 q25=13.67 q50=15.45 mean=15.52 q75=17.17 max=24.80 WEIGHT-IN-LBS (Weight in lbs) n: 406 missing: 0 min=1613 q25=2226 q50=2822.50 mean=2979.41 q75=3620 max=5140 ...

Plot

Create a scatter plot specification comparing horsepower and miles per gallon:

(plot:plot (vega:defplot hp-mpg `(:title "Horsepower vs. MPG" :description "Horsepower vs miles per gallon for various cars" :data (:values ,vgcars) :mark :point :encoding (:x (:field :horsepower :type :quantitative) :y (:field :miles-per-gallon :type :quantitative)))))

2.1 - Installation

Installing and configuring Lisp-Stat

Notebook

Binder

The easiest way to get started is with the link above which will open a preconfigured notebook on mybinder.org.

Users new to lisp should also consider going through the Lisp-Stat basic tutorial, which guides you step-by-step through the basics of working with Lisp as a statistics practitioner.

OCI/Docker

You can also run a pre-built OCI image. This is a minimal Docker file:

FROM ghcr.io/lisp-stat/cl-jupyter:latest

Our images are based on Jupyter Docker Stacks and all of their documentation is applicable to the cl-jupyter image.

For a quickstart:

docker run -it -p 8888:8888 ghcr.io/lisp-stat/cl-jupyter:latest # Entered start.sh with args: jupyter lab # ... # To access the server, open this file in a browser: # file:///home/jovyan/.local/share/jupyter/runtime/jpserver-7-open.html # Or copy and paste one of these URLs: # http://eca4aa01751c:8888/lab?token=d4ac9278f5f5388e88097a3a8ebbe9401be206cfa0b83099 # http://127.0.0.1:8888/lab?token=d4ac9278f5f5388e88097a3a8ebbe9401be206cfa0b83099

This command pulls the latest cl-jupyter image from ghcr.io if it is not already present on the local host. It then starts a container running a Jupyter Server with the JupyterLab frontend and exposes the server on host port 8888. The server logs appear in the terminal and include a URL to the server.

Initialization file

You can put customisations to your environment in either your implementation’s init file, or in a personal init file and load it from the implementation’s init file. For example, I keep my customisations in #P"~/ls-init.lisp" and load it from SBCL’s init file ~/.sbclrc in a Lisp-Stat initialisation section like this:

;;; Lisp-Stat (asdf:load-system :lisp-stat) (load #P"~/ls-init.lisp")

Settings in your personal lisp-stat init file override the system defaults.

Here’s an example ls-init.lisp file that loads some common R data sets:

(defparameter *default-datasets* '("tooth-growth" "plant-growth" "usarrests" "iris" "mtcars") "Data sets loaded as part of personal Lisp-Stat initialisation. Available in every session.") (map nil #'(lambda (x) (format t "Loading ~A~%" x) (data x)) *default-datasets*)

With this init file, you can immediately access the data sets in the *default-datasets* list defined above, e.g.:

(head iris) ;; X2 SEPAL-LENGTH SEPAL-WIDTH PETAL-LENGTH PETAL-WIDTH SPECIES ;; 0 1 5.1 3.5 1.4 0.2 setosa ;; 1 2 4.9 3.0 1.4 0.2 setosa ;; 2 3 4.7 3.2 1.3 0.2 setosa ;; 3 4 4.6 3.1 1.5 0.2 setosa ;; 4 5 5.0 3.6 1.4 0.2 setosa ;; 5 6 5.4 3.9 1.7 0.4 setosa

Emacs / Hemlock

We assume an experienced user will have their own Emacs and lisp implementation and will want to install according to their own tastes and setup. The repo links you need are below, or you can install with quicklisp.

Prerequisites

All that is needed is an ANSI Common Lisp implementation. Development is done with SBCL. Other platforms should work, but will not have been tested, nor can we offer support (maintaining & testing on multiple implementations requires more resources than the project has available). Note that CCL is not in good health, and there are a few numerical bugs that remain unfixed. A shame, as we really liked CCL.

You may want to consider emacs-vega-view for viewing plots from within emacs.

Installation

The easiest way to install Lisp-Stat is via Quicklisp, a library manager for Common Lisp. It works with your existing Common Lisp implementation to download, install, and load any of over 1,500 libraries with a few simple commands.

Quicklisp is like a package manager in Linux. It can load packages from the local file system, or download them if required. If you have quicklisp installed, you can use:

(ql:quickload :lisp-stat)

Quicklisp is good at managing the project dependency retrieval, but most of the time we use ASDF because of its REPL integration. You only have to use Quicklisp once to get the dependencies, then use ASDF for day-to-day work.

You can install additional Lisp-Stat modules in the same way. For example to install the CEPHES module:

(ql:quickload :cephes)

Loading

Once you have obtained Lisp-Stat via Quicklisp, you can load in one of two ways:

  • ASDF
  • Quicklisp

Loading with ASDF

(asdf:load-system :lisp-stat)

If you are using emacs, you can use the slime shortcuts to load systems by typing , and then load-system in the mini-buffer. This is what the Lisp-Stat developers use most often, the shortcuts are a helpful part of the workflow.

Loading with Quicklisp

To load with Quicklisp:

(ql:quickload :lisp-stat)

Quicklisp uses the same ASDF command as above to load Lisp-Stat.

Updating Lisp-Stat

When a new release is announced, you can update via Quicklisp like so:

(ql:update-dist "lisp-stat")

Documentation

You can install the info manuals into the emacs help system and this allows searching and browsing from within the editing environment. To do this, use the install-info command. As an example, on my MS Windows 10 machine, with MSYS2/emacs installation:

install-info --add-once select.info /c/msys64/mingw64/share/info/dir

installs the select manual at the top level of the info tree. You can also install the common lisp hyperspec and browse documentation for the base Common Lisp system. This really is the best way to use documentation whilst programming Common Lisp and Lisp-Stat. See the emacs external documentation and “How do I install a piece of Texinfo documentation?” for more information on installing help files in emacs.

See getting help for information on how to access Info documentation as you code. This is the mechanism used by Lisp-Stat developers because you don’t have to leave the emacs editor to look up function documentation in a browser.

Try it out

Load Lisp-Stat:

(asdf:load-system :lisp-stat)

Change to the Lisp-Stat user package:

(in-package :ls-user)

Load some data:

(data :sg-weather)

Find the sample mean and median:

(mean sg-weather:precipitation) ;=> .0714 (median sg-weather:max-temps) ;=> 31.55

Next steps

2.2 - Site Organisation

How this manual is organised

This manual is organised by audience. The overview and getting started sections are applicable to all users. Other sections are focused on statistical practitioners, developers or users new to Common Lisp.

Examples

This part of the documentation contains worked examples of statistical analysis and plotting. It has less explanatory material, and more worked examples of code than other sections. If you have a common use-case and want to know how to solve it, look here.

Tutorials

This section contains tutorials, primers and ‘vignettes’. Typically tutorials contain more explanatory material, whilst primers are short-form tutorials on a particular system.

System manuals

The manuals are written at a level somewhere between an API reference and a core task. (‘annotated reference’) They document, with text and examples, the core APIs of each system. These are useful references for power users, developers and if you need to go a bit beyond the core tasks.

Reference

The reference manuals document the API for each system. These are typically used by developers building extensions to Lisp-Stat.

Resources

Common Lisp and statistical resources, such as books, tutorials and website. Not specific to Lisp-Stat, but useful for statistical practitioners learning Lisp.

Contributing

This section describes how to contribute to Lisp-Stat. There are both ideas on what to contribute, as well as instructions on how to contribute. Also note the section on the top right of all the documentation pages, just below the search box:

If you see a mistake in the documentation, please use the Create documentation issue link to go directly to github and report the error.

2.3 - Getting Help

Ways to get help with Lisp-Stat

There are several ways to get help with Lisp-Stat and your statistical analysis. This section describes way to get help with your data objects, with Lisp-Stat commands to process them, and with Common Lisp.

We use the algolia search engine to index the site. This search engine is specialised to work well with documentation websites like this one. If you’re looking for something and can’t find it in the navigation panes, use the search box:

Apropos

If you’re not quite sure what you’re looking for, you can use the apropos command. You can do this either from the REPL or hemlock/emacs. Here are two examples:

LS-USER> (apropos "remove-if") SB-SEQUENCE:REMOVE-IF (fbound) SB-SEQUENCE:REMOVE-IF-NOT (fbound) REMOVE-IF (fbound) REMOVE-IF-NOT (fbound)

If you use the emacs/slime command sequence C-c C-d a, (all the slime documentation commands start with C-c C-d) emacs will ask you for a string. Let’s say you typed in remove-if. Emacs will open a buffer like the one below with all the docs strings for similar functions or variables:

Emacs apropos

Restart from errors

Common lisp has what is called a condition system, which is somewhat unique. One of the features of the condition system is something call restarts. Basically, one part of the system can signal a condition, and another part of it can handle the condition. One of the ways a signal can be handled is by providing various restarts. Restarts happen by the debugger, and many users new to Common Lisp tend to shy away from the debugger (this is common to other languages too). In Common Lisp the debugger is both for developers and users.

Well written Lisp programs will provide a good set of restarts for commonly encountered situations. As an example, suppose we are plotting a data set that has a large number of data points. Experience has shown that greater than 50,000 data points can cause browser performance issues, so we’ve added a restart to warn you, seen below:

Here you can see we have options to take all the data, take n (that the user will provide) or take up to the maximum recommended number. Always look at the options offered to you by the debugger and see if any of them will fix the problem for you.

Describe data

You can use the describe command to print a description of just about anything in the Lisp environment. Lisp-Stat extends this functionality to describe data. For example:

LS-USER> (describe 'mtcars) LS-USER::MTCARS [symbol] MTCARS names a special variable: Value: #<DATA-FRAME (32 observations of 12 variables) Motor Trend Car Road Tests> Documentation: Motor Trend Car Road Tests Description The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (197374 models). Note Henderson and Velleman (1981) comment in a footnote to Table 1: Hocking [original transcriber]'s noncrucial coding of the Mazda's rotary engine as a straight six-cylinder engine and the Porsche's flat engine as a V engine, as well as the inclusion of the diesel Mercedes 240D, have been retained to enable direct comparisons to be made with previous analyses. Source Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391411. Variables: Variable | Type | Unit | Label -------- | ---- | ---- | ----------- MODEL | STRING | NIL | NIL MPG | DOUBLE-FLOAT | M/G | Miles/(US) gallon CYL | INTEGER | NA | Number of cylinders DISP | DOUBLE-FLOAT | IN3 | Displacement (cu.in.) HP | INTEGER | HP | Gross horsepower DRAT | DOUBLE-FLOAT | NA | Rear axle ratio WT | DOUBLE-FLOAT | LB | Weight (1000 lbs) QSEC | DOUBLE-FLOAT | S | 1/4 mile time VS | CATEGORICAL | NA | Engine (0=v-shaped, 1=straight) AM | CATEGORICAL | NA | Transmission (0=automatic, 1=manual) GEAR | CATEGORICAL | NA | Number of forward gears CARB | CATEGORICAL | NA | Number of carburetors

Documentation

The documentation command can be used to read the documentation of a function or variable. Here’s how to read the documentation for the Lisp-Stat mean function:

LS-USER> (documentation 'mean 'function) "The mean of elements in OBJECT."

You can also view the documentation for variables or data objects:

LS-USER> (documentation '*ask-on-redefine* 'variable) "If non-nil the system will ask the user for confirmation before redefining a data frame"

Emacs inspector

When Lisp prints an interesting object to emacs/slime, it will be displayed in orange text. This indicates that it is a presentation, a special kind of object that we can manipulate. For example if you type the name of a data frame, it will return a presentation object:

Now if you right click on this object you’ll get the presentation menu:

From this menu you can go to the source code of the object, inspect & change values, describe it (as seen above, but within an emacs window), and copy it.

Slime inspector

The slime inspector is an alternative inspector for emacs, with some additional functionality.

Slime documentation

Slime documentation provides ways to browse documentation from the editor. We saw one example above with apropos. You can also browse variable and function documentation. For example if you have the cursor positioned over a function:

(show-data-frames)

and you type C-c C-d f (describe function at point), you’ll see this in an emacs window:

#<FUNCTION SHOW-DATA-FRAMES>
  [compiled function]


Lambda-list: (&KEY (HEAD NIL) (STREAM *STANDARD-OUTPUT*))
Derived type: (FUNCTION (&KEY (:HEAD T) (:STREAM T)) *)
Documentation:
  Print all data frames in the current environment in
  reverse order of creation, i.e. most recently created first.
  If HEAD is not NIL, print the first six rows, similar to the
  HEAD function.
Source file: s:/src/data-frame/src/defdf.lisp

Other help

You can also get help from the Lisp-Stat community, the user mailing list, github or stackoverflow

2.4 - Your First Project

How to start your first project

Lisp-Stat includes a project template that you can use as a guide for your own projects.

Use the template

To get started, go to the project template

  1. Click Use this template
  2. Select a name for your new project and click Create repository from template
  3. Make your own local working copy of your new repo using git clone, replacing https://github.com/me/example.git with your repo’s URL: git clone --depth 1 https://github.com/me/example.git
  4. You can now edit your own versions of the project’s source files.

This will clone the project template into your own github repository so you can begin adding your own files to it.

Directory Structure

By convention, we use a directory structure that looks like this:

...
├── project
|   ├── data
|   |   ├── foo.csv
|   |   ├── bar.json
|   |   └── baz.tsv
|   └── src
|   |   ├── load.lisp
|   |   └── analyse.lisp
|   |   └── baz.tsv
|   └── tests
|   |   ├── test.lisp
|   └── doc
|   |   ├── project.html
...

data

Often your project will have sample data used for examples illustrating how to use the system. Such example data goes here, as would static data files that your system includes, for example post codes (zip codes). For some projects, we keep the project data here too. If the data is obtained over the network or a data base, login credentials and code related to that is kept here. Basically, anything neccessary to obtain the data should be kept in this directory.

src

The lisp source code for loading, cleaning and analysing your data. If you are using the template for a Lisp-Stat add-on package, the source code for the functionality goes here.

tests

Tests for your code. We recommend CL-UNIT2 for test frameworks.

docs

Generated documentation goes here. This could be both API documentation and user guides and manuals. If an index.html file appears here, github will automatically display it’s contents at project.github.io, if you have configured the repository to display documentation that way.

Load your project

If you’ve cloned the project template into your local Common Lisp directory, ~/common-lisp/, then you can load it with (ql:quickload :project). Lisp will download and compile the necessary dependencies and your project will be loaded. The first thing you’ll want to do is to configure your project.

Configure your project

First, change the directory and repository name to suit your environment and make sure git remotes are working properly. Save yourself some time and get git working before configuring the project further.

ASDF

The project.asd file is the Common Lisp system definition file. Rename this to be the same as your project directory and edit its contents to reflect the state of your project. To start with, don’t change any of the file names; just edit the meta data. As you add or rename source code files in the project you’ll update the file names here so Common Lisp will know that to compile. This file is analgous to a makefile in C – it tells lisp how to build your project.

Initialisation

If you need project-wide initialisation settings, you can do this in the file src/init.lisp. The template sets up a logical path name for the project:

(defun setup-project-translations () (setf (logical-pathname-translations "PROJECT") `(("DATA;**;*.*.*" ,(merge-pathnames "data/**/*.*" (asdf:system-source-directory 'project)))))) (setup-project-translations)

To use it, you’ll modify the directories and project name for your project, and then call (setup-project-translations) in one of your lisp initialisation files (either ls-init.lisp or .sbclrc). By default, the project data directory will be set to a subdirectory below the main project directory, and you can access files there with PROJECT:DATA;mtcars.csv for example. When you configure your logical pathnames, you’ll replace “PROJECT” with your projects name.

We use logical style pathnames throughout the Lisp-Stat documentation, even if a code level translation isn’t in place.

Basic workflow

The project templates illustrates the basic steps for a simple analysis.

Load data

The first step is to load data. The PROJECT:SRC;load file shows creating three data frames, from three different sources: CSV, TSV and JSON. Use this as a template for loading your own data.

Cleanse data

load.lisp also shows some simple cleansing, adding labels, types and attributes, and transforming (recoding) a variable. You can follow these examples for your own data sets, with the goal of creating a data frame from your data.

Analyse

PROJECT:SRC;analyse shows taking the mean and standard deviation of the mpg variable of the loaded data set. Your own analysis will, of course, be different. The examples here are meant to indicate the purpose. You may have one or more files for your analysis, including supporting functions, joining data sets, etc.

Plot

Plotting can be useful at any stage of the process. It’s inclusion as the third step isn’t intended to imply a particular importance or order. The file PROJECT:SRC;plot shows how to plot the information in the disasters data frame.

Save

Finally, you’ll want to save your data frame after you’ve got it where you want it to be. You can save project in a ’native’ format, a lisp file, that will preserve all your meta data and is editable, or a CSV file. You should only use a CSV file if you need to use the data in another system. PROJECT:SRC;save containes an example that shows how to save your work.

3 - Examples

Using Lisp-Stat in the real world

One of the best ways to learn Lisp-Stat is to see examples of actual work. This section contains example notebooks illustrating statistical analysis. These notebooks describe how to undertake statistical analyses introduced as examples in the Ninth Edition of Introduction to the Practices of Statistics (2017) by Moore, McCabe and Craig. The notebooks are organised in the same manner as the chapters of the book. The data comes from the site IPS9 in R by Nicholas Horton.

To run the notebooks yourself you can use a ready made online notebook:

Binder

Or checkout the IPS9 repo and all of the examples.

Part I: Looking at Data

We also include plotting examples from the Vega-Lite example gallery.

3.1 - Distributions

Examining data - Distributions

3.2 - Data Relationships

Examining Data - Relationships

3.3 - Plotting

Example plots

The plots here show equivalents to the Vega-Lite example gallery. Before you begin working with these example, be certain to read the plotting tutorial where you will learn the basics of working with plot specifications and data.

Preliminaries

Load Vega-Lite

Load Vega-Lite and network libraries:

(asdf:load-system :plot/vega)

and change to the Lisp-Stat user package:

(in-package :ls-user)

Load example data

The examples in this section use the vega-lite data sets. Load them all now:

(vega:load-vega-examples)

Bar charts

Bar charts are used to display information about categorical variables.

Simple bar chart

In this simple bar chart example we’ll demonstrate using literal embedded data in the form of a plist. Later you’ll see how to use a data-frame directly.

(plot:plot (vega:defplot simple-bar-chart `(:mark :bar :data (:values ,(plist-df '(:a #(A B C D E F G H I) :b #(28 55 43 91 81 53 19 87 52)))) :encoding (:x (:field :a :type :nominal :axis ("labelAngle" 0)) :y (:field :b :type :quantitative)))))

Grouped bar chart

(plot:plot (vega:defplot grouped-bar-chart `(:mark :bar :data (:values ,(plist-df '(:category #(A A A B B B C C C) :group #(x y z x y z x y z) :value #(0.1 0.6 0.9 0.7 0.2 1.1 0.6 0.1 0.2)))) :encoding (:x (:field :category) :y (:field :value :type :quantitative) :x-offset (:field :group) :color (:field group)))))

Stacked bar chart

This example uses Seattle weather from the Vega website. Load it into a data frame like so:

(defdf seattle-weather (read-csv vega:seattle-weather)) ;=> #<DATA-FRAME (1461 observations of 6 variables)>

We’ll use a data-frame as the data source via the Common Lisp backquote mechanism. The spec list begins with a backquote (`) and then the data frame is inserted as a literal value with a comma (,). We’ll use this pattern frequently.

(plot:plot (vega:defplot stacked-bar-chart `(:mark :bar :data (:values ,seattle-weather) :encoding (:x (:time-unit :month :field :date :type :ordinal :title "Month of the year") :y (:aggregate :count :type :quantitative) :color (:field :weather :type :nominal :title "Weather type" :scale (:domain #("sun" "fog" "drizzle" "rain" "snow") :range #("#e7ba52", "#c7c7c7", "#aec7e8", "#1f77b4", "#9467bd")))))))

Population pyramid

Vega calls this a diverging stacked bar chart. It is a population pyramid for the US in 2000, created using the stack feature of vega-lite. You could also create one using concat.

First, load the population data if you haven’t done so:

(defdf population (vega:read-vega vega:population)) ;=> #<DATA-FRAME (570 observations of 4 variables)>

Note the use of read-vega in this case. This is because the data in the Vega example is in an application specific JSON format (Vega, of course).

(plot:plot (vega:defplot pyramid-bar-chart `(:mark :bar :data (:values ,population) :width 300 :height 200 :transform #((:filter "datum.year == 2000") (:calculate "datum.sex == 2 ? 'Female' : 'Male'" :as :gender) (:calculate "datum.sex == 2 ? -datum.people : datum.people" :as :signed-people)) :encoding (:x (:aggregate :sum :field :signed-people :title "population") :y (:field :age :axis nil :sort :descending) :color (:field :gender :scale (:range #("#675193" "#ca8861")))) :config (:view (:stroke nil) :axis (:grid :false)))))

Histograms & density

Basic

For this simple histogram example we’ll use the IMDB film rating data set.

(plot:plot (vega:defplot imdb-plot `(:mark :bar :data (:values ,imdb) :encoding (:x (:bin (:maxbins 8) :field :imdb-rating) :y (:aggregate :count)))))

Relative frequency

Use a relative frequency histogram to compare data sets with different numbers of observations.

The data is binned with first transform. The number of values per bin and the total number are calculated in the second and the third transform to calculate the relative frequency in the last transformation step.

(plot:plot (vega:defplot relative-frequency-histogram `(:title "Relative Frequency" :data (:values ,vgcars) :transform #((:bin t :field :horsepower :as #(:bin-horsepower :bin-horsepower-end)) (:aggregate #((:op :count :as "Count")) :groupby #(:bin-horsepower :bin-horsepower-end)) (:joinaggregate #((:op :sum :field "Count" :as "TotalCount"))) (:calculate "datum.Count/datum.TotalCount" :as :percent-of-total)) :mark (:type :bar :tooltip t) :encoding (:x (:field :bin-horsepower :title "Horsepower" :bin (:binned t)) :x2 (:field :bin-horsepower-end) :y (:field :percent-of-total :type "quantitative" :title "Relative Frequency" :axis (:format ".1~%"))))))

2D histogram scatterplot

If you haven’t already loaded the imdb data set, do so now:

(defparameter imdb (vega:read-vega vega:movies))
(plot:plot (vega:defplot histogram-scatterplot `(:mark :circle :data (:values ,imdb) :encoding (:x (:bin (:maxbins 10) :field :imdb-rating) :y (:bin (:maxbins 10) :field :rotten-tomatoes-rating) :size (:aggregate :count)))))

Stacked density

(plot:plot (vega:defplot stacked-density `(:title "Distribution of Body Mass of Penguins" :width 400 :height 80 :data (:values ,penguins) :mark :bar :transform #((:density |BODY-MASS-(G)| :groupby #(:species) :extent #(2500 6500))) :encoding (:x (:field :value :type :quantitative :title "Body Mass (g)") :y (:field :density :type :quantitative :stack :zero) :color (:field :species :type :nominal)))))

Note the use of the multiple escape characters (|) surrounding the field BODY-MASS-(G). This is required because the JSON data set has parenthesis in the variable names, and these are reserved characters in Common Lisp. The JSON importer wrapped these in the escape character.

Scatter plots

Basic

A basic Vega-Lite scatterplot showing horsepower and miles per gallon for various cars.

(plot:plot (vega:defplot hp-mpg `(:title "Horsepower vs. MPG" :data (:values ,vgcars) :mark :point :encoding (:x (:field :horsepower :type "quantitative") :y (:field :miles-per-gallon :type "quantitative")))))

Colored

In this example we’ll show how to add additional information to the cars scatter plot to show the cars origin. The Vega-Lite example shows that we have to add two new directives to the encoding of the plot:

(plot:plot (vega:defplot hp-mpg-plot `(:title "Vega Cars" :data (:values ,vgcars) :mark :point :encoding (:x (:field :horsepower :type "quantitative") :y (:field :miles-per-gallon :type "quantitative") :color (:field :origin :type "nominal") :shape (:field :origin :type "nominal")))))

With this change we can see that the higher horsepower, lower efficiency, cars are from the USA, and the higher efficiency cars from Japan and Europe.

Text marks

The same information, but further indicated with a text marker. This Vega-Lite example uses a data transformation.

(plot:plot (vega:defplot colored-text-hp-mpg-plot `(:title "Vega Cars" :data (:values ,vgcars) :transform #((:calculate "datum.origin[0]" :as "OriginInitial")) :mark :text :encoding (:x (:field :horsepower :type "quantitative") :y (:field :miles-per-gallon :type "quantitative") :color (:field :origin :type "nominal") :text (:field "OriginInitial" :type "nominal")))))

Notice here we use a string for the field value and not a symbol. This is because Vega is case sensitive, whereas Lisp is not. We could have also used a lower-case :as value, but did not to highlight this requirement for certain Vega specifications.

Mean & SD overlay

This Vega-Lite scatterplot with mean and standard deviation overlay demonstrates the use of layers in a plot.

Lisp-Stat equivalent

(plot:plot (vega:defplot mean-hp-mpg-plot `(:title "Vega Cars" :data (:values ,vgcars) :layer #((:mark :point :encoding (:x (:field :horsepower :type "quantitative") :y (:field :miles-per-gallon :type "quantitative"))) (:mark (:type :errorband :extent :stdev :opacity 0.2) :encoding (:y (:field :miles-per-gallon :type "quantitative" :title "Miles per Gallon"))) (:mark :rule :encoding (:y (:field :miles-per-gallon :type "quantitative" :aggregate :mean)))))))

Linear regression

(plot:plot (vega:defplot linear-regression `(:data (:values ,imdb) :layer #((:mark (:type :point :filled t) :encoding (:x (:field :rotten-tomatoes-rating :type :quantitative :title "Rotten Tomatoes Rating") :y (:field :imdb-rating :type :quantitative :title "IMDB Rating"))) (:mark (:type :line :color "firebrick") :transform #((:regression :imdb-rating :on :rotten-tomatoes-rating)) :encoding (:x (:field :rotten-tomatoes-rating :type :quantitative :title "Rotten Tomatoes Rating") :y (:field :imdb-rating :type :quantitative :title "IMDB Rating"))) (:transform #((:regression :imdb-rating :on :rotten-tomatoes-rating :params t) (:calculate "'R²: '+format(datum.rSquared, '.2f')" :as :r2)) :mark (:type :text :color "firebrick" :x :width :align :right :y -5) :encoding (:text (:type :nominal :field :r2)))))))

Loess regression

(plot:plot (vega:defplot loess-regression `(:data (:values ,imdb) :layer #((:mark (:type :point :filled t) :encoding (:x (:field :rotten-tomatoes-rating :type :quantitative :title "Rotten Tomatoes Rating") :y (:field :imdb-rating :type :quantitative :title "IMDB Rating"))) (:mark (:type :line :color "firebrick") :transform #((:loess :imdb-rating :on :rotten-tomatoes-rating)) :encoding (:x (:field :rotten-tomatoes-rating :type :quantitative :title "Rotten Tomatoes Rating") :y (:field :imdb-rating :type :quantitative :title "IMDB Rating")))))))

Residuals

A dot plot showing each film in the database, and the difference from the average movie rating. The display is sorted by year to visualize everything in sequential order. The graph is for all films before 2019. Note the use of the filter-rows function.

(plot:plot (vega:defplot residuals `(:data (:values ,(filter-rows imdb '(and (not (eql imdb-rating :na)) (local-time:timestamp< release-date (local-time:parse-timestring "2019-01-01"))))) :transform #((:joinaggregate #((:op :mean :field :imdb-rating :as :average-rating))) (:calculate "datum['imdbRating'] - datum.averageRating" :as :rating-delta)) :mark :point :encoding (:x (:field :release-date :type :temporal :title "Release Date") :y (:field :rating-delta :type :quantitative :title "Rating Delta") :color (:field :rating-delta :type :quantitative :scale (:domain-mid 0) :title "Rating Delta")))))

Query

The cars scatterplot allows you to see miles per gallon vs. horsepower. By adding sliders, you can select points by the number of cylinders and year as well, effectively examining 4 dimensions of data. Drag the sliders to highlight different points.

(plot:plot (vega:defplot scatter-queries `(:data (:values ,vgcars) :transform #((:calculate "year(datum.year)" :as :year)) :layer #((:params #((:name :cyl-year :value #((:cylinders 4 :year 1799)) :select (:type :point :fields #(:cylinders :year)) :bind (:cylinders (:input :range :min 3 :max 8 :step 1) :year (:input :range :min 1969 :max 1981 :step 1)))) :mark :circle :encoding (:x (:field :horsepower :type :quantitative) :y (:field :miles-per-gallon :type :quantitative) :color (:condition (:param :cyl-year :field :origin :type :nominal) :value "grey"))) (:transform #((:filter (:param :cyl-year))) :mark :circle :encoding (:x (:field :horsepower :type :quantitative) :y (:field :miles-per-gallon :type :quantitative) :color (:field :origin :type :nominal) :size (:value 100)))))))

You can add external links to plots.

(plot:plot (vega:defplot scatter-external-links `(:data (:values ,vgcars) :mark :point :transform #((:calculate "'https://www.google.com/search?q=' + datum.name", :as :url)) :encoding (:x (:field :horsepower :type :quantitative) :y (:field :miles-per-gallon :type :quantitative) :color (:field :origin :type :nominal) :tooltip (:field :name :type :nominal) :href (:field :url :type :nominal)))))

Strip plot

The Vega-Lite strip plot example shows the relationship between horsepower and the number of cylinders using tick marks.

(plot:plot (vega:defplot strip-plot `(:title "Vega Cars" :data (:values ,vgcars) :mark :tick :encoding (:x (:field :horsepower :type :quantitative) :y (:field :cylinders :type :ordinal)))))

1D strip plot

(plot:plot (vega:defplot 1d-strip-plot `(:title "Seattle Precipitation" :data (:values ,seattle-weather) :mark :tick :encoding (:x (:field :precipitation :type :quantitative)))))

Bubble plot

This Vega-Lite example is a visualization of global deaths from natural disasters. A copy of the chart from Our World in Data.

(plot:plot (vega:defplot natural-disaster-deaths `(:title "Deaths from global natural disasters" :width 600 :height 400 :data (:values ,(filter-rows disasters '(not (string= entity "All natural disasters")))) :mark (:type :circle :opacity 0.8 :stroke :black :stroke-width 1) :encoding (:x (:field :year :type :temporal :axis (:grid :false)) :y (:field :entity :type :nominal :axis (:title "")) :size (:field :deaths :type :quantitative :title "Annual Global Deaths" :legend (:clip-height 30) :scale (:range-max 5000)) :color (:field :entity :type :nominal :legend nil)))))

Note how we modified the example by using a lower case entity in the filter to match our default lower case variable names. Also note how we are explicit with parsing the year field as a temporal column. This is because, when creating a chart with inline data, Vega-Lite will parse the field as an integer instead of a date.

Line plots

Simple

(plot:plot (vega:defplot simple-line-plot `(:title "Google's stock price from 2004 to early 2010" :data (:values ,(filter-rows stocks '(string= symbol "GOOG"))) :mark :line :encoding (:x (:field :date :type :temporal) :y (:field :price :type :quantitative)))))

Point markers

By setting the point property of the line mark definition to an object defining a property of the overlaying point marks, we can overlay point markers on top of line.

(plot:plot (vega:defplot point-mark-line-plot `(:title "Stock prices of 5 Tech Companies over Time" :data (:values ,stocks) :mark (:type :line :point t) :encoding (:x (:field :date :time-unit :year) :y (:field :price :type :quantitative :aggregate :mean) :color (:field :symbol :type nominal)))))

Multi-series

This example uses the custom symbol encoding for variables to generate the proper types and labels for x, y and color channels.

(plot:plot (vega:defplot multi-series-line-chart `(:title "Stock prices of 5 Tech Companies over Time" :data (:values ,stocks) :mark :line :encoding (:x (:field stocks:date) :y (:field stocks:price) :color (:field stocks:symbol)))))

Step

(plot:plot (vega:defplot step-chart `(:title "Google's stock price from 2004 to early 2010" :data (:values ,(filter-rows stocks '(string= symbol "GOOG"))) :mark (:type :line :interpolate "step-after") :encoding (:x (:field stocks:date) :y (:field stocks:price)))))

Stroke-dash

(plot:plot (vega:defplot stroke-dash `(:title "Stock prices of 5 Tech Companies over Time" :data (:values ,stocks) :mark :line :encoding (:x (:field stocks:date) :y (:field stocks:price) :stroke-dash (:field stocks:symbol)))))

Confidence interval

Line chart with a confidence interval band.

(plot:plot (vega:defplot line-chart-ci `(:data (:values ,vgcars) :encoding (:x (:field :year :time-unit :year)) :layer #((:mark (:type :errorband :extent :ci) :encoding (:y (:field :miles-per-gallon :type :quantitative :title "Mean of Miles per Gallon (95% CIs)"))) (:mark :line :encoding (:y (:field :miles-per-gallon :aggregate :mean)))))))

Area charts

Simple

(plot:plot (vega:defplot area-chart `(:title "Unemployment across industries" :width 300 :height 200 :data (:values ,unemployment-ind) :mark :area :encoding (:x (:field :date :time-unit :yearmonth :axis (:format "%Y")) :y (:field :count :aggregate :sum :title "count")))))

Stacked

Stacked area plots

(plot:plot (vega:defplot stacked-area-chart `(:title "Unemployment across industries" :width 300 :height 200 :data (:values ,unemployment-ind) :mark :area :encoding (:x (:field :date :time-unit :yearmonth :axis (:format "%Y")) :y (:field :count :aggregate :sum :title "count") :color (:field :series :scale (:scheme "category20b"))))))

Horizon graph

A horizon graph is a technique for visualising time series data in a manner that makes comparisons easier. It is based on work done at the UW Interactive Data Lab. See Sizing the Horizon: The Effects of Chart Size and Layering on the Graphical Perception of Time Series Visualizations for more details on Horizon Graphs.

(plot:plot (vega:defplot horizon-graph `(:title "Horizon graph with 2 layers" :width 300 :height 50 :data (:values ,(plist-df `(:x ,(aops:linspace 1 20 20) :y #(28 55 43 91 81 53 19 87 52 48 24 49 87 66 17 27 68 16 49 15)))) :encoding (:x (:field :x :scale (:zero :false :nice :false)) :y (:field :y :type :quantitative :scale (:domain #(0 50)) :axis (:title "y"))) :layer #((:mark (:type :area :clip t :orient :vertical :opacity 0.6)) (:transform #((:calculate "datum.y - 50" :as :ny)) :mark (:type :area :clip t :orient :vertical) :encoding (:y (:field "ny" :type :quantitative :scale (:domain #(0 50))) :opacity (:value 0.3)))) :config (:area (:interpolate :monotone)))))

With overlay

Area chart with overlaying lines and point markers.

(plot:plot (vega:defplot area-with-overlay `(:title "Google's stock price" :data (:values ,(filter-rows stocks '(string= symbol "GOOG"))) :mark (:type :area :line t :point t) :encoding (:x (:field stocks:date) :y (:field stocks:price)))))

Note the use of the variable symbols, e.g. stocks:price to fill in the variable’s information instead of :type :quantitative :title ...

Stream graph

(plot:plot (vega:defplot stream-graph `(:title "Unemployment Stream Graph" :width 300 :height 200 :data (:values ,unemployment-ind) :mark :area :encoding (:x (:field :date :time-unit "yearmonth" :axis (:domain :false :format "%Y" :tick-size 0)) :y (:field count :aggregate :sum :axis null :stack :center) :color (:field :series :scale (:scheme "category20b"))))))

Tabular plots

Table heatmap

(plot:plot (vega:defplot table-heatmap `(:data (:values ,vgcars) :mark :rect :encoding (:x (:field vgcars:cylinders) :y (:field vgcars:origin) :color (:field :horsepower :aggregate :mean)) :config (:axis (:grid t :tick-band :extent)))))

Heatmap with labels

Layering text over a table heatmap

(plot:plot (vega:defplot heatmap-labels `(:data (:values ,vgcars) :transform #((:aggregate #((:op :count :as :num-cars)) :groupby #(:origin :cylinders))) :encoding (:x (:field :cylinders :type :ordinal) :y (:field :origin :type :ordinal)) :layer #((:mark :rect :encoding (:color (:field :num-cars :type :quantitative :title "Count of Records" :legend (:direction :horizontal :gradient-length 120)))) (:mark :text :encoding (:text (:field :num-cars :type :quantitative) :color (:condition (:test "datum['numCars'] < 40" :value :black) :value :white)))) :config (:axis (:grid t :tick-band :extent)))))

Histogram heatmap

(plot:plot (vega:defplot heatmap-histogram `(:data (:values ,imdb) :transform #((:and #((:field :imdb-rating :valid t) (:field :rotten-tomatoes-rating :valid t)))) :mark :rect :width 300 :height 200 :encoding (:x (:bin (:maxbins 60) :field :imdb-rating :type :quantitative :title "IMDB Rating") :y (:bin (:maxbins 40) :field :rotten-tomatoes-rating :type :quantitative :title "Rotten Tomatoes Rating") :color (:aggregate :count :type :quantitative)) :config (:view (:stroke :transparent)))))

Circular plots

Pie chart

(plot:plot (vega:defplot pie-chart `(:data (:values ,(plist-df `(:category ,(aops:linspace 1 6 6) :value #(4 6 10 3 7 8)))) :mark :arc :encoding (:theta (:field :value :type :quantitative) :color (:field :category :type :nominal)))))

Donut chart

(plot:plot (vega:defplot donut-chart `(:data (:values ,(plist-df `(:category ,(aops:linspace 1 6 6) :value #(4 6 10 3 7 8)))) :mark (:type :arc :inner-radius 50) :encoding (:theta (:field :value :type :quantitative) :color (:field :category :type :nominal)))))

Radial plot

This radial plot uses both angular and radial extent to convey multiple dimensions of data. However, this approach is not perceptually effective, as viewers will most likely be drawn to the total area of the shape, conflating the two dimensions. This example also demonstrates a way to add labels to circular plots.

(plot:plot (vega:defplot radial-plot `(:data (:values ,(plist-df '(:value #(12 23 47 6 52 19)))) :layer #((:mark (:type :arc :inner-radius 20 :stroke "#fff")) (:mark (:type :text :radius-offset 10) :encoding (:text (:field :value :type :quantitative)))) :encoding (:theta (:field :value :type :quantitative :stack t) :radius (:field :value :scale (:type :sqrt :zero t :range-min 20)) :color (:field :value :type :nominal :legend nil)))))

Transformations

Normally data transformations should be done in Lisp-Stat with a data frame. These examples illustrate how to accomplish transformations using Vega-Lite. This might be useful if, for example, you’re serving up a lot of plots and want to move the processing to the users browser.

Difference from avg

(plot:plot (vega:defplot difference-from-average `(:data (:values ,(filter-rows imdb '(not (eql imdb-rating :na)))) :transform #((:joinaggregate #((:op :mean ;we could do this above using alexandria:thread-first :field :imdb-rating :as :average-rating))) (:filter "(datum['imdbRating'] - datum.averageRating) > 2.5")) :layer #((:mark :bar :encoding (:x (:field :imdb-rating :type :quantitative :title "IMDB Rating") :y (:field :title :type :ordinal :title "Title"))) (:mark (:type :rule :color "red") :encoding (:x (:aggregate :average :field :average-rating :type :quantitative)))))))

Frequency distribution

Cumulative frequency distribution of films in the IMDB database.

(plot:plot (vega:defplot cumulative-frequency-distribution `(:data (:values ,imdb) :transform #((:sort #((:field :imdb-rating)) :window #((:op :count :field :count as :cumulative-count)) :frame #(nil 0))) :mark :area :encoding (:x (:field :imdb-rating :type :quantitative) :y (:field :cumulative-count :type :quantitative)))))

Layered & cumulative histogram

(plot:plot (vega:defplot layered-histogram `(:data (:values ,(filter-rows imdb '(not (eql imdb-rating :na)))) :transform #((:bin t :field :imdb-rating :as #(:bin-imdb-rating :bin-imdb-rating-end)) (:aggregate #((:op :count :as :count)) :groupby #(:bin-imdb-rating :bin-imdb-rating-end)) (:sort #((:field :bin-imdb-rating)) :window #((:op :sum :field :count :as :cumulative-count)) :frame #(nil 0))) :encoding (:x (:field :bin-imdb-rating :type :quantitative :scale (:zero :false) :title "IMDB Rating") :x2 (:field :bin-imdb-rating-end)) :layer #((:mark :bar :encoding (:y (:field :cumulative-count :type :quantitative :title "Cumulative Count"))) (:mark (:type :bar :color "yellow" :opacity 0.5) :encoding (:y (:field :count :type :quantitative :title "Count")))))))

Layering averages

Layering averages over raw values.

(plot:plot (vega:defplot layered-averages `(:data (:values ,(filter-rows stocks '(string= symbol "GOOG"))) :layer #((:mark (:type :point :opacity 0.3) :encoding (:x (:field :date :time-unit :year) :y (:field :price :type quantitative))) (:mark :line :encoding (:x (:field :date :time-unit :year) :y (:field :price :aggregate :mean)))))))

Error bars

Confidence interval

Error bars showing confidence intervals.

(plot:plot (vega:defplot error-bar-ci `(:data (:values ,barley) :encoding (:y (:field :variety :type :ordinal :title "Variety")) :layer #((:mark (:type :point :filled t) :encoding (:x (:field :yield :aggregate :mean :type :quantitative :scale (:zero :false) :title "Barley Yield") :color (:value "black"))) (:mark (:type :errorbar :extent :ci) :encoding (:x (:field :yield :type :quantitative :title "Barley Yield")))))))

Standard deviation

Error bars showing standard deviation.

(plot:plot (vega:defplot error-bar-sd `(:data (:values ,barley) :encoding (:y (:field :variety :type :ordinal :title "Variety")) :layer #((:mark (:type :point :filled t) :encoding (:x (:field :yield :aggregate :mean :type :quantitative :scale (:zero :false) :title "Barley Yield") :color (:value "black"))) (:mark (:type :errorbar :extent :stdev) :encoding (:x (:field :yield :type :quantitative :title "Barley Yield")))))))

Box plots

Min/max whiskers

A vertical box plot showing median, min, and max body mass of penguins.

(plot:plot (vega:defplot box-plot-min-max `(:data (:values ,penguins) :mark (:type :boxplot :extent "min-max") :encoding (:x (:field :species :type :nominal :title "Species") :y (:field |BODY-MASS-(G)| :type :quantitative :scale (:zero :false) :title "Body Mass (g)") :color (:field :species :type :nominal :legend nil)))))

Tukey

A vertical box plot showing median and lower and upper quartiles of the distribution of body mass of penguins.

(plot:plot (vega:defplot box-plot-tukey `(:data (:values ,penguins) :mark :boxplot :encoding (:x (:field :species :type :nominal :title "Species") :y (:field |BODY-MASS-(G)| :type :quantitative :scale (:zero :false) :title "Body Mass (g)") :color (:field :species :type :nominal :legend nil)))))

Summaries

Box plot with pre-computed summaries. Use this pattern to plot summaries done in a data-frame.

(plot:plot (vega:defplot box-plot-summaries `(:title "Body Mass of Penguin Species (g)" :data (:values ,(plist-df '(:species #("Adelie" "Chinstrap" "Gentoo") :lower #(2850 2700 3950) :q1 #(3350 3487.5 4700) :median #(3700 3700 5000) :q3 #(4000 3950 5500) :upper #(4775 4800 6300) :outliers #(#() #(2700 4800) #())))) :encoding (:y (:field :species :type :nominal :title null)) :layer #((:mark (:type :rule) :encoding (:x (:field :lower :type :quantitative :scale (:zero :false) :title null) :x2 (:field :upper))) (:mark (:type :bar :size 14) :encoding (:x (:field :q1 :type :quantitative) :x2 (:field :q3) :color (:field :species :type :nominal :legend null))) (:mark (:type :tick :color :white :size 14) :encoding (:x (:field :median :type :quantitative))) (:transform #((:flatten #(:outliers))) :mark (:type :point :style "boxplot-outliers") :encoding (:x (:field :outliers :type :quantitative)))))))

Layered

Rolling average

Plot showing a 30 day rolling average with raw values in the background.

(plot:plot (vega:defplot moving-average `(:width 400 :height 300 :data (:values ,seattle-weather) :transform #((:window #((:field :temp-max :op :mean :as :rolling-mean)) :frame #(-15 15))) :encoding (:x (:field :date :type :temporal :title "Date") :y (:type :quantitative :axis (:title "Max Temperature and Rolling Mean"))) :layer #((:mark (:type :point :opacity 0.3) :encoding (:y (:field :temp-max :title "Max Temperature"))) (:mark (:type :line :color "red" :size 3) :encoding (:y (:field :rolling-mean :title "Rolling Mean of Max Temperature")))))))

Histogram w/mean

(plot:plot (vega:defplot histogram-with-mean `(:data (:values ,imdb) :layer #((:mark :bar :encoding (:x (:field :imdb-rating :bin t :title "IMDB Rating") :y (:aggregate :count))) (:mark :rule :encoding (:x (:field :imdb-rating :aggregate :mean :title "Mean of IMDB Rating") :color (:value "red") :size (:value 5)))))))

Interactive

This section demonstrates interactive plots.

Scatter plot matrix

This Vega-Lite interactive scatter plot matrix includes interactive elements and demonstrates creating a SPLOM (scatter plot matrix).

(defparameter vgcars-splom (vega::make-plot "vgcars-splom" vgcars `("$schema" "https://vega.github.io/schema/vega-lite/v5.json" :title "Scatterplot Matrix for Vega Cars" :repeat (:row #(:horsepower :acceleration :miles-per-gallon) :column #(:miles-per-gallon :acceleration :horsepower)) :spec (:data (:url "/data/vgcars-splom-data.json") :mark :point :params #((:name "brush" :select (:type "interval" :resolve "union" :on "[mousedown[event.shiftKey], window:mouseup] > window:mousemove!" :translate "[mousedown[event.shiftKey], window:mouseup] > window:mousemove!" :zoom "wheel![event.shiftKey]")) (:name "grid" :select (:type "interval" :resolve "global" :translate "[mousedown[!event.shiftKey], window:mouseup] > window:mousemove!" :zoom "wheel![!event.shiftKey]") :bind :scales)) :encoding (:x (:field (:repeat "column") :type "quantitative") :y (:field (:repeat "row") :type "quantitative" :axis ("minExtent" 30)) :color (:condition (:param "brush" :field :origin :type "nominal") :value "grey")))))) (plot:plot vgcars-splom)

This example is one of those mentioned in the plotting tutorial that uses a non-standard location for the data property.

Weather exploration

This graph shows an interactive view of Seattle’s weather, including maximum temperature, amount of precipitation, and type of weather. By clicking and dragging on the scatter plot, you can see the proportion of days in that range that have sun, rain, fog, snow, etc.

(plot:plot (vega:defplot weather-exploration `(:title "Seattle Weather, 2012-2015" :data (:values ,seattle-weather) :vconcat #(;; upper graph (:encoding (:color (:condition (:param :brush :title "Weather" :field :weather :type :nominal :scale (:domain #("sun" "fog" "drizzle" "rain" "snow") :range #("#e7ba52", "#a7a7a7", "#aec7e8", "#1f77b4", "#9467bd"))) :value "lightgray") :size (:field :precipitation :type :quantitative :title "Precipitation" :scale (:domain #(-1 50))) :x (:field :date :time-unit :monthdate :title "Date" :axis (:format "%b")) :y (:field :temp-max :type :quantitative :scale (:domain #(-5 40)) :title "Maximum Daily Temperature (C)")) :width 600 :height 300 :mark :point :params #((:name :brush :select (:type :interval :encodings #(:x)))) :transform #((:filter (:param :click)))) ;; lower graph (:encoding (:color (:condition (:param :click :field :weather :scale (:domain #("sun", "fog", "drizzle", "rain", "snow") :range #("#e7ba52", "#a7a7a7", "#aec7e8", "#1f77b4", "#9467bd"))) :value "lightgray") :x (:aggregate :count) :y (:field :weather :title "Weather")) :width 600 :mark :bar :params #((:name :click :select (:type :point :encodings #(:color)))) :transform #((:filter (:param :brush))))))))

Interactive scatterplot

(plot:plot (vega:defplot global-health `(:title "Global Health Statistics by Country and Year" :data (:values ,gapminder) :width 800 :height 500 :layer #((:transform #((:filter (:field :country :equal "afghanistan")) (:filter (:param :year))) :mark (:type :text :font-size 100 :x 420 :y 250 :opacity 0.06) :encoding (:text (:field :year))) (:transform #((:lookup :cluster :from (:key :id :fields #(:name) :data (:values #(("id" 0 "name" "South Asia") ("id" 1 "name" "Europe & Central Asia") ("id" 2 "name" "Sub-Saharan Africa") ("id" 3 "name" "America") ("id" 4 "name" "East Asia & Pacific") ("id" 5 "name" "Middle East & North Africa")))))) :encoding (:x (:field :fertility :type :quantitative :scale (:domain #(0 9)) :axis (:tick-count 5 :title "Fertility")) :y (:field :life-expect :type :quantitative :scale (:domain #(20 85)) :axis (:tick-count 5 :title "Life Expectancy"))) :layer #((:mark (:type :line :size 4 :color "lightgray" :stroke-cap "round") :encoding (:detail (:field :country) :order (:field :year) :opacity (:condition (:test (:or #((:param :hovered :empty :false) (:param :clicked :empty :false))) :value 0.8) :value 0))) (:params #((:name :year :value #((:year 1955)) :select (:type :point :fields #(:year)) :bind (:name :year :input :range :min 1955 :max 2005 :step 5)) (:name :hovered :select (:type :point :fields #(:country) :toggle :false :on :mouseover)) (:name :clicked :select (:type :point :fields #(:country)))) :transform #((:filter (:param :year))) :mark (:type :circle :size 100 :opacity 0.9) :encoding (:color (:field :name :title "Region"))) (:transform #((:filter (:and #((:param :year) (:or #((:param :clicked :empty :false) (:param :hovered :empty :false))))))) :mark (:type :text :y-offset -12 :font-size 12 :font-weight :bold) :encoding (:text (:field :country) :color (:field :name :title "Region"))) (:transform #((:filter (:param :hovered :empty :false)) (:filter (:not (:param :year)))) :layer #((:mark (:type :text :y-offset -12 :font-size 12 :color "gray") :encoding (:text (:field :year))) (:mark (:type :circle :color "gray"))))))))))

Crossfilter

Cross-filtering makes it easier and more intuitive for viewers of a plot to interact with the data and understand how one metric affects another. With cross-filtering, you can click a data point in one dashboard view to have all dashboard views automatically filter on that value.

Click and drag across one of the charts to see the other variables filtered.

(plot:plot (vega:defplot cross-filter `(:title "Cross filtering of flights" :data (:values ,flights-2k) :transform #((:calculate "hours(datum.date)", :as "time")) ;what does 'hours' do? :repeat (:column #(:distance :delay :time)) :spec (:layer #((:params #((:name :brush :select (:type :interval :encodings #(:x)))) :mark :bar :encoding (:x (:field (:repeat :column) :bin (:maxbins 20)) :y (:aggregate :count) :color (:value "#ddd"))) (:transform #((:filter (:param :brush))) :mark :bar :encoding (:x (:field (:repeat :column) :bin (:maxbins 20)) :y (:aggregate :count))))))))

4 - Tutorials

End to end demonstrations of statistical analysis

These learning tutorials demonstrate how to perform end-to-end statistical analysis of sample data using Lisp-Stat. Sample data is provided for both the examples and the optional exercises. By completing these tutorials you will understand the tasks required for a typical statistical workflow.

4.1 - Basics

An introduction to the basics of LISP-STAT

Preface

This document is intended to be a tutorial introduction to the basics of LISP-STAT and is based on the original tutorial for XLISP-STAT written by Luke Tierney, updated for Common Lisp and the 2021 implementation of LISP-STAT.

LISP-STAT is a statistical environment built on top of the Common Lisp general purpose programming language. The first three sections contain the information you will need to do elementary statistical calculations and plotting. The fourth section introduces some additional methods for generating and modifying data. The fifth section describes some features of the user interface that may be helpful. The remaining sections deal with more advanced topics, such as interactive plots, regression models, and writing your own functions. All sections are organized around examples, and most contain some suggested exercises for the reader.

This document is not intended to be a complete manual. However, documentation for many of the commands that are available is given in the appendix. Brief help messages for these and other commands are also available through the interactive help facility described in Section 5.1 below.

Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 (S20018) (formerly X3.226-1994 (R1999)). The Common Lisp language was developed as a standardized and improved successor of Maclisp. By the early 1980s several groups were already at work on diverse successors to MacLisp: Lisp Machine Lisp (aka ZetaLisp), Spice Lisp, NIL and S-1 Lisp. Common Lisp sought to unify, standardize, and extend the features of these MacLisp dialects. Common Lisp is not an implementation, but rather a language specification. Several implementations of the Common Lisp standard are available, including free and open-source software and proprietary products. Common Lisp is a general-purpose, multi-paradigm programming language. It supports a combination of procedural, functional, and object-oriented programming paradigms. As a dynamic programming language, it facilitates evolutionary and incremental software development, with iterative compilation into efficient run-time programs. This incremental development is often done interactively without interrupting the running application.

Using this Tutorial

The best way to learn about a new computer programming language is usually to use it. You will get most out of this tutorial if you read it at your computer and work through the examples yourself. To make this tutorial easier the named data sets used in this tutorial have been stored in the file basic.lisp in the LS:DATASETS;TUTORIALS folder of the system. To load this file, execute:

(load #P"LS:DATASETS;TUTORIALS;basic")

at the command prompt (REPL). The file will be loaded and some variables will be defined for you.

Why LISP-STAT Exists

There are three primary reasons behind the decision to produce the LISP-STAT environment. The first is speed. The other major languages used for statistics and numerical analysis, R, Python and Julia are all fine languages, but with the rise of ‘big data’ and large data sets, require workarounds for processing large data sets. Furthermore, as interpreted languages, they are relatively slow when compared to Common Lisp, that has a compiler that produces native machine code.

Not only does Common Lisp provide a compiler that produces machine code, it has native threading, a rich ecosystem of code libraries, and a history of industrial deployments, including:

  • Credit card authorization at AMEX (Authorizers Assistant)
  • US DoD logistics (and more, that we don’t know of)
  • CIA and NSA are big users based on Lisp sales
  • DWave and Rigetti use lisp for programming their quantum computers
  • Apple’s Siri was originally written in Lisp
  • Amazon got started with Lisp & C; so did Y-combinator
  • Google’s flight search engine is written in Common Lisp
  • AT&T used a stripped down version of Symbolics Lisp to process CDRs in the first IP switches

Python and R are never (to my knowledge) deployed as front-line systems, but used in the back office to produce models that are executed by other applications in enterprise environments. Common Lisp eliminates that friction.

Availability

Source code for LISP-STAT is available in the Lisp-Stat github repository. The Getting Started section of the documentation contains instructions for downloading and installing the system.

Disclaimer

LISP-STAT is an experimental program. Although it is in daily use on several projects, the corporate sponsor, Symbolics Pte Ltd, takes no responsibility for losses or damages resulting directly or indirectly from the use of this program.

LISP-STAT is an evolving system. Over time new features will be introduced, and existing features that do not work may be changed. Every effort will be made to keep LISP-STAT consistent with the information in this tutorial, but if this is not possible the reference documentation should give accurate information about the current use of a command.

Starting and Finishing

Once you have obtained the source code or pre-built image, you can load Lisp-Stat using QuickLisp. If you do not have quicklisp, stop here and get it. It is the de-facto package manager for Common Lisp and you will need it. This is what you will see if loading using the Slime IDE:

CL-USER> (asdf:load-system :lisp-stat) To load "lisp-stat": Load 1 ASDF system: lisp-stat ; Loading "lisp-stat" .................................................. .................................................. [package num-utils]............................... [package num-utils]............................... [package dfio.decimal]............................ [package dfio.string-table]....................... ..... (:LISP-STAT) CL-USER>

You may see more or less output, depending on whether dependent packages have been compiled before. If this is your first time running anything in this implementation of Common Lisp, you will probably see output related to the compilation of every module in the system. This could take a while, but only has to be done once.

Once completed, to use the functions provided, you need to make the LISP-STAT package the current package, like this:

(in-package :ls-user) #<PACKAGE "LS-USER"> LS-USER>

The final LS-USER> in the window is the Slime prompt. Notice how it changes when you executed (in-package). In Slime, the prompt always indicates the current package, *package*. Any characters you type while the prompt is active will be added to the line after the final prompt. When you press return, LISP-STAT will try to interpret what you have typed and will print a response. For example, if you type a 1 and press return then LISP-STAT will respond by simply printing a 1 on the following line and then give you a new prompt:

LS-USER> 1 1 LS-USER>

If you type an expression like (+ 1 2), then LISP-STAT will print the result of evaluating the expression and give you a new prompt:

LS-USER> (+ 1 2) 3 LS-USER>

As you have probably guessed, this expression means that the numbers 1 and 2 are to be added together. The next section will give more details on how LISP-STAT expressions work. In this tutorial I will sometimes show interactions with the program as I have done here: The LS-USER> prompt will appear before lines you should type. LISP-STAT will supply this prompt when it is ready; you should not type it yourself. In later sections I will omit the new prompt following the result in order to save space.

Now that you have seen how to start up LISP-STAT it is a good idea to make sure you know how to get out. The exact command to exit depends on the Common Lisp implementation you use. For SBCL, you can type the expression

LS-USER> (exit)

In other implementations, the command is quit. One of these methods should cause the program to exit and return you to the IDE. In Slime, you can use the , short-cut and then type sayoonara.

The Basics

Before we can start to use LISP-STAT for statistical work we need to learn a little about the kind of data LISP-STAT uses and about how the LISP-STAT listener and evaluator work.

Data

LISP-STAT works with two kinds of data: simple data and compound data. Simple data are numbers

1                   ; an integer
-3.14               ; a floating point number
#C(0 1)             ; a complex number (the imaginary unit)

logical values

T                   ; true
nil                 ; false

strings (always enclosed in double quotes)

"This is a string 1 2 3 4"

and symbols (used for naming things; see the following section)

x
x12
12x
this-is-a-symbol

Compound data are lists

(this is a list with 7 elements)
(+ 1 2 3)
(sqrt 2)

or vectors

#(this is a vector with 7 elements)
#(1 2 3)

Higher dimensional arrays are another form of compound data; they will be discussed below in Section 9, “Arrays”.

All the examples given above can be typed directly into the command window as they are shown here. The next subsection describes what LISP-STAT will do with these expressions.

The Listener and the Evaluator

A session with LISP-STAT basically consists of a conversation between you and the listener. The listener is the window into which you type your commands. When it is ready to receive a command it gives you a prompt. At the prompt you can type in an expression. You can use the mouse or the backspace key to correct any mistakes you make while typing in your expression. When the expression is complete and you type a return the listener passes the expression on to the evaluator. The evaluator evaluates the expression and returns the result to the listener for printing.1 The evaluator is the heart of the system.

The basic rule to remember in trying to understand how the evaluator works is that everything is evaluated. Numbers and strings evaluate to themselves:

LS-USER> 1
1
LS-USER> "Hello"
"Hello"
LS-USER>

Lists are more complicated. Suppose you type the list (+ 1 2 3) at the listener. This list has four elements: the symbol + followed by the numbers 1, 2 and 3. Here is what happens:

> (+ 1 2 3)
6
>

This list is evaluated as a function application. The first element is a symbol representing a function, in this case the symbol + representing the addition function. The remaining elements are the arguments. Thus the list in the example above is interpreted to mean “Apply the function + to the numbers 1, 2 and 3”.

Actually, the arguments to a function are always evaluated before the function is applied. In the previous example the arguments are all numbers and thus evaluate to themselves. On the other hand, consider

LS-USER> (+ (* 2 3) 4)
10
LS-USER>

The evaluator has to evaluate the first argument to the function + before it can apply the function.

Occasionally you may want to tell the evaluator not to evaluate something. For example, suppose we wanted to get the evaluator to simply return the list (+ 1 2) back to us, instead of evaluating it. To do this we need to quote our list:

LS-USER> (quote (+ 1 2))
(+ 1 2)
LS-USER>

quote is not a function. It does not obey the rules of function evaluation described above: Its argument is not evaluated. quote is called a special form – special because it has special rules for the treatment of its arguments. There are a few other special forms that we will need; I will introduce them as they are needed. Together with the basic evaluation rules described here these special forms make up the basics of the Lisp language. The special form quote is used so often that a shorthand notation has been developed, a single quote before the expression you want to quote:

LS-USER> '(+ 1 2)      ; single quote shorthand

This is equivalent to (quote (+ 1 2)). Note that there is no matching quote following the expression.

By the way, the semicolon ; is the Lisp comment character. Anything you type after a semicolon up to the next time you press return is ignored by the evaluator.

Exercises

For each of the following expressions try to predict what the evaluator will return. Then type them in, see what happens and try to explain any differences.

  1. (+ 3 5 6)

  2. (+ (- 1 2) 3)

  3. ’(+ 3 5 6)

  4. ’( + (- 1 2) 3)

  5. (+ (- (* 2 3) (/ 6 2)) 7)

  6. ’x

Remember, to quit from LISP-STAT type (exit), quit or use the IDE’s exit mechanism.

Elementary Statistical Operations

This section introduces some of the basic graphical and numerical statistical operations that are available in LISP-STAT.

First Steps

Statistical data usually consists of groups of numbers. Devore and Peck [@DevorePeck Exercise 2.11] describe an experiment in which 22 consumers reported the number of times they had purchased a product during the previous 48 week period. The results are given as a table:


0   2   5   0   3   1   8   0   3   1   1
9   2   4   0   2   9   3   0   1   9   8

To examine this data in LISP-STAT we represent it as a list of numbers using the list function:

(list 0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8)

Note that the numbers are separated by white space (spaces, tabs or even returns), not commas.

The mean function can be used to compute the average of a list of numbers. We can combine it with the list function to find the average number of purchases for our sample:

(mean '(0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8)) ; => 3.227273

The median of these numbers can be computed as

(median '(0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8)) ; => 2

It is of course a nuisance to have to type in the list of 22 numbers every time we want to compute a statistic for the sample. To avoid having to do this I will give this list a name using the def special form 2:

(def purchases (list 0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8)) ; PURCHASES

Now the symbol purchases has a value associated with it: Its value is our list of 22 numbers. If you give the symbol purchases to the evaluator then it will find the value of this symbol and return that value:

LS-USER> purchases
(0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8)

We can now easily compute various numerical descriptive statistics for this data set:

LS-USER> (mean purchases)
3.227273
LS-USER> (median purchases)
2
LS-USER> (sd purchases)
3.2795
LS-USER> (interquartile-range purchases)
4

LISP-STAT also supports elementwise arithmetic operations on vectors of numbers. Technically, overriding, or ‘shadowing’ any of the Common Lisp functions is undefined. This is usually an euphuism for ‘something really bad will happen’, so the vector functions are located in the package elmt and prefixed by e to distinguish them from the Common Lisp variants, e.g. e+ for addition, e* for multiplication, etc. Presently these functions work only on vectors, so we’ll define a new purchases variable as a vector type:

(def purchases-2 #(0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8))

The # symbol tells the listener to interpret the list as a vector, much like the ' signals a list.

Now we can add 1 to each of the purchases:

LS-USER> (e+ 1 purchases-2)
    (1 3 6 1 4 2 9 1 4 2 2 10 3 5 1 3 10 4 1 2 10 9)

and after adding 1 we can compute the natural logarithms of the results:

LS-USER> (elog (e+ 1 purchases-2))
(0 1.098612 1.791759 0 1.386294 0.6931472 2.197225 0 1.386294 0.6931472
0.6931472 2.302585 1.098612 1.609438 0 1.098612 2.302585 1.386294 0
0.6931472 2.302585 2.197225)

Exercises

For each of the following expressions try to predict what the evaluator will return. Then type them in, see what happens and try to explain any differences.

  1. (mean (list 1 2 3))

  2. (e+ #(1 2 3) 4)

  3. (e* #(1 2 3) #(4 5 6))

  4. (e+ #(1 2 3) #(4 5 7))

Summary Statistics

Devore and Peck [@DevorePeck page 54, Table 10] give precipitation levels recorded during the month of March in the Minneapolis - St. Paul area over a 30 year period. Let’s enter these data into LISP-STAT with the name precipitation:

(def precipitation #(.77 1.74 .81 1.20 1.95 1.20 .47 1.43 3.37 2.20 3.30 3.09 1.51 2.10 .52 1.62 1.31 .32 .59 .81 2.81 1.87 1.18 1.35 4.75 2.48 .96 1.89 .90 2.05))

In typing the expression above I have inserted return and tab a few times in order to make the typed expression easier to read. The tab key indents the next line to a reasonable point to make the expression more readable.

Here are some numerical summaries:

LS-USER> (mean precipitation)
1.685
LS-USER> (median precipitation)
1.47
LS-USER> (standard-deviation precipitation)
1.0157
LS-USER> (interquartile-range precipitation)
1.145

The distribution of this data set is somewhat skewed to the right. Notice the separation between the mean and the median. You might want to try a few simple transformations to see if you can symmetrize the data. Square root and log transformations can be computed using the expressions

(esqrt precipitation)

and

(elog precipitation)

You should look at plots of the data to see if these transformations do indeed lead to a more symmetric shape. The means and medians of the transformed data are:

LS-USER> (mean (esqrt precipitation)) 1.243006 LS-USER> (median (esqrt precipitation)) 1.212323 LS-USER> (mean (elog precipitation)) 0.3405517 LS-USER> (median (elog precipitation)) 0.384892

Generating and Modifying Data

This section briefly summarizes some techniques for generating random and systematic data.

Generating Random Data

The state of the internal random number generator can be “randomly” reseeded, and the current value of the generator state can be saved. The mechanism used is the standard Common Lisp mechanism. The current random state is held in the variable *random-state*. The function make-random-state can be used to set and save the state. It takes an optional argument. If the argument is NIL or omitted make-random-state returns a copy of the current value of *random-state*. If the argument is a state object, a copy of it is returned. If the argument is t a new, “randomly” initialized state object is produced and returned. 3

Forming Subsets and Deleting Cases

The select function allows you to select a single element or a group of elements from a list or vector. For example, if we define x by

(def x (list 3 7 5 9 12 3 14 2))

then (select x i) will return the ith element of x. Common Lisp, like the language C, but in contrast to FORTRAN, numbers elements of list and vectors starting at zero. Thus the indices for the elements of x are 0, 1, 2, 3, 4, 5, 6, 7 . So

LS-USER> (select x 0)
3
LS-USER> (select x 2)
5

To get a group of elements at once we can use a list of indices instead of a single index:

LS-USER> (select x (list 0 2))
(3 5)

If you want to select all elements of x except element 2 you can use the expression

(remove 2 (iota 8))

as the second argument to the function select:

LS-USER> (remove 2 (iota 8)) (0 1 3 4 5 6 7) LS-USER> (select x (remove 2 (iota 8))) (3 7 9 12 3 14 2)

Combining Lists & Vectors

At times you may want to combine several short lists or vectors into a single longer one. This can be done using the append function. For example, if you have three variables x, y and z constructed by the expressions

(def x (list 1 2 3)) (def y (list 4)) (def z (list 5 6 7 8))

then the expression

(append x y z)

will return the list

(1 2 3 4 5 6 7 8).

For vectors, we use the more general function concatenate, which operates on sequences, that is objects of either list or vector:

LS-USER> (concatenate 'vector #(1 2) #(3 4)) #(1 2 3 4)

Notice that we had to indicate the return type, using the 'vector argument to concatenate. We could also have said 'list to have it return a list, and it would have coerced the arguments to the correct type.

Modifying Data

So far when I have asked you to type in a list of numbers I have been assuming that you will type the list correctly. If you made an error you had to retype the entire def expression. Since you can use cut & paste this is really not too serious. However it would be nice to be able to replace the values in a list after you have typed it in. The setf special form is used for this. Suppose you would like to change the 12 in the list x used in the Section 4.3 to 11. The expression

(setf (select x 4) 11)

will make this replacement:

LS-USER> (setf (select x 4) 11) 11 LS-USER> x (3 7 5 9 11 3 14 2)

The general form of setf is

(setf form value)

where form is the expression you would use to select a single element or a group of elements from x and value is the value you would like that element to have, or the list of the values for the elements in the group. Thus the expression

(setf (select x (list 0 2)) (list 15 16))

changes the values of elements 0 and 2 to 15 and 16:

LS-USER> (setf (select x (list 0 2)) (list 15 16))
(15 16)
LS-USER> x
(15 7 16 9 11 3 14 2)

As a result, if we change an element of (the item referred to by) x with setf then we are also changing the element of (the item referred to by) y, since both x and y refer to the same item. If you want to make a copy of x and store it in y before you make changes to x then you must do so explicitly using, say, the copy-list function. The expression

(defparameter y (copy-list x))

will make a copy of x and set the value of y to that copy. Now x and y refer to different items and changes to x will not affect y.

Useful Shortcuts

This section describes some additional features of LISP-STAT that you may find useful.

Getting Help

On line help is available for many of the functions in LISP-STAT 4. As an example, here is how you would get help for the function iota:

LS-USER> (documentation 'iota 'function)
"Return a list of n numbers, starting from START (with numeric contagion
from STEP applied), each consecutive number being the sum of the previous one
and STEP. START defaults to 0 and STEP to 1.

Examples:

  (iota 4)                      => (0 1 2 3)
  (iota 3 :start 1 :step 1.0)   => (1.0 2.0 3.0)
  (iota 3 :start -1 :step -1/2) => (-1 -3/2 -2)
"

Note the quote in front of iota. documentation is itself a function, and its argument is the symbol representing the function iota. To make sure documentation receives the symbol, not the value of the symbol, you need to quote the symbol.

Another useful function is describe that, depending on the Lisp implementation, will return documentation and additional information about the object:

LS-USER> (describe 'iota)
ALEXANDRIA:IOTA
  [symbol]

IOTA names a compiled function:
  Lambda-list: (ALEXANDRIA::N &KEY (ALEXANDRIA::START 0) (STEP 1))
  Derived type: (FUNCTION
                 (UNSIGNED-BYTE &KEY (:START NUMBER) (:STEP NUMBER))
                 (VALUES T &OPTIONAL))
  Documentation:
    Return a list of n numbers, starting from START (with numeric contagion
    from STEP applied), each consecutive number being the sum of the previous one
    and STEP. START defaults to 0 and STEP to 1.

    Examples:

      (iota 4)                      => (0 1 2 3)
      (iota 3 :start 1 :step 1.0)   => (1.0 2.0 3.0)
      (iota 3 :start -1 :step -1/2) => (-1 -3/2 -2)

  Inline proclamation: INLINE (inline expansion available)
  Source file: s:/src/third-party/alexandria/alexandria-1/numbers.lisp

If you are not sure about the name of a function you may still be able to get some help. Suppose you want to find out about functions related to the normal distribution. Most such functions will have “norm” as part of their name. The expression

(apropos 'norm)

will print the help information for all symbols whose names contain the string “norm”:

ALEXANDRIA::NORMALIZE
ALEXANDRIA::NORMALIZE-AUXILARY
ALEXANDRIA::NORMALIZE-KEYWORD
ALEXANDRIA::NORMALIZE-OPTIONAL
ASDF/PARSE-DEFSYSTEM::NORMALIZE-VERSION (fbound)
ASDF/FORCING:NORMALIZE-FORCED-NOT-SYSTEMS (fbound)
ASDF/FORCING:NORMALIZE-FORCED-SYSTEMS (fbound)
ASDF/SESSION::NORMALIZED-NAMESTRING
ASDF/SESSION:NORMALIZE-NAMESTRING (fbound)
CL-INTERPOL::NORMAL-NAME-CHAR-P (fbound)
CL-PPCRE::NORMALIZE-VAR-LIST (fbound)
DISTRIBUTIONS::+NORMAL-LOG-PDF-CONSTANT+ (bound, DOUBLE-FLOAT)
DISTRIBUTIONS::CDF-NORMAL% (fbound)
DISTRIBUTIONS::COPY-LEFT-TRUNCATED-NORMAL (fbound)
DISTRIBUTIONS::COPY-R-LOG-NORMAL (fbound)
DISTRIBUTIONS::COPY-R-NORMAL (fbound)
DISTRIBUTIONS::DRAW-LEFT-TRUNCATED-STANDARD-NORMAL (fbound)
DISTRIBUTIONS::LEFT-TRUNCATED-NORMAL (fbound)
DISTRIBUTIONS::LEFT-TRUNCATED-NORMAL-ALPHA (fbound)
DISTRIBUTIONS::LEFT-TRUNCATED-NORMAL-LEFT (fbound)
DISTRIBUTIONS::LEFT-TRUNCATED-NORMAL-LEFT-STANDARDIZED (fbound)
DISTRIBUTIONS::LEFT-TRUNCATED-NORMAL-M0 (fbound)
DISTRIBUTIONS::LEFT-TRUNCATED-NORMAL-MU (fbound)
DISTRIBUTIONS::LEFT-TRUNCATED-NORMAL-P (fbound)
DISTRIBUTIONS::LEFT-TRUNCATED-NORMAL-SIGMA (fbound)
DISTRIBUTIONS::MAKE-LEFT-TRUNCATED-NORMAL (fbound)
DISTRIBUTIONS::MAKE-R-LOG-NORMAL (fbound)
DISTRIBUTIONS::MAKE-R-NORMAL (fbound)
DISTRIBUTIONS::QUANTILE-NORMAL% (fbound)
DISTRIBUTIONS::R-LOG-NORMAL-LOG-MEAN (fbound)
...

Let me briefly explain the notation used in the information printed by describe regarding the arguments a function expects 5. This is called the lambda-list. Most functions expect a fixed set of arguments, described in the help message by a line like Args: (x y z) or Lambda-list: (x y z)

Some functions can take one or more optional arguments. The arguments for such a function might be listed as

Args: (x &optional y (z t))

or

Lambda-list: (x &optional y (z t))

This means that x is required and y and z are optional. If the function is named f, it can be called as (f x-val), (f x-val y-val) or (f x-val y-val z-val). The list (z t) means that if z is not supplied its default value is T. No explicit default value is specified for y; its default value is therefore NIL. The arguments must be supplied in the order in which they are listed. Thus if you want to give the argument z you must also give a value for y.

Another form of optional argument is the keyword argument. The iota function for example takes arguments

Args: (N &key (START 0) (STEP 1))

The n argument is required, the START argument is an optional keyword argument. The default START is 0, and the default STEP is 1. If you want to create a sequence eight numbers, with a step of two) use the expression

(iota 8 :step 2)

Thus to give a value for a keyword argument you give the keyword 6 for the argument, a symbol consisting of a colon followed by the argument name, and then the value for the argument. If a function can take several keyword arguments then these may be specified in any order, following the required and optional arguments.

Finally, some functions can take an arbitrary number of arguments. This is denoted by a line like

Args: (x &rest args)

The argument x is required, and zero or more additional arguments can be supplied.

In addition to providing information about functions describe also gives information about data types and certain variables. For example,

LS-USER> (describe 'complex)
COMMON-LISP:COMPLEX
  [symbol]

COMPLEX names a compiled function:
  Lambda-list: (REALPART &OPTIONAL (IMAGPART 0))
  Declared type: (FUNCTION (REAL &OPTIONAL REAL)
                  (VALUES NUMBER &OPTIONAL))
  Derived type: (FUNCTION (T &OPTIONAL T)
                 (VALUES
                  (OR RATIONAL (COMPLEX SINGLE-FLOAT)
                      (COMPLEX DOUBLE-FLOAT) (COMPLEX RATIONAL))
                  &OPTIONAL))
  Documentation:
    Return a complex number with the specified real and imaginary components.
  Known attributes: foldable, flushable, unsafely-flushable, movable
  Source file: SYS:SRC;CODE;NUMBERS.LISP

COMPLEX names the built-in-class #<BUILT-IN-CLASS COMMON-LISP:COMPLEX>:
  Class precedence-list: COMPLEX, NUMBER, T
  Direct superclasses: NUMBER
  Direct subclasses: SB-KERNEL:COMPLEX-SINGLE-FLOAT,
                     SB-KERNEL:COMPLEX-DOUBLE-FLOAT
  Sealed.
  No direct slots.

COMPLEX names a primitive type-specifier:
  Lambda-list: (&OPTIONAL (SB-KERNEL::TYPESPEC '*))

shows the function, type and class documentation for complex, and

LS-USER> (documentation 'pi 'variable)
PI                                                              [variable-doc]
The floating-point number that is approximately equal to the ratio of the
circumference of a circle to its diameter.

shows the variable documentation for pi7.

Listing and Undefining Variables

After you have been working for a while you may want to find out what variables you have defined (using def). The function variables will produce a listing:

LS-USER> (variables)
CO
HC
RURAL
URBAN
PRECIPITATION
PURCHASES
NIL
LS-USER>

If you are working with very large variables you may occasionally want to free up some space by getting rid of some variables you no longer need. You can do this using the undef-var function:

LS-USER> (undef-var 'co)
CO
LS-USER> (variables)
HC
RURAL
URBAN
PRECIPITATION
PURCHASES
NIL
LS-USER>

More on the Listener

Common Lisp provides a simple command history mechanism. The symbols -, ``, *, **, +, ++, and +++ are used for this purpose. The top level reader binds these symbols as follows:


  `-` the current input expression
  `+` the last expression read
 `++` the previous value of `+`
`+++` the previous value of `++`
   `` the result of the last evaluation
  `*` the previous value of ``
 `**` the previous value of `*`

The variables ``, * and ** are probably most useful.

For example, if you read a data-frame but forget to assign the resulting object to a variable:

LS-USER> (read-csv rdata:mtcars) WARNING: Missing column name was filled in #<DATA-FRAME (32 observations of 12 variables)>

you can recover it using one of the history variables:

(defparameter mtcars *) ; MTCARS

The symbol MTCARS now has the data-frame object as its value.

Like most interactive systems, Common Lisp needs a system for dynamically managing memory. The system used depends on the implementation. The most common way (SBCL, CCL) is to grab memory out of a fixed bin until the bin is exhausted. At that point the system pauses to reclaim memory that is no longer being used. This process, called garbage collection, will occasionally cause the system to pause if you are using large amounts of memory.

Loading Files

The data for the examples and exercises in this tutorial, when not loaded from the network, have been stored on files with names ending in .lisp. In the LISP-STAT system directory they can be found in the folder Datasets. Any variables you save (see the next subsection for details) will also be saved in files of this form. The data in these files can be read into LISP-STAT with the load function. To load a file named randu.lisp type the expression

(load #P"LS:DATASETS;RANDU.LISP")

or just

(load #P"LS:DATASETS;randu")

If you give load a name that does not end in .lisp then load will add this suffix.

Saving Your Work

Save a Session

If you want to record a session with LISP-STAT you can do so using the dribble function. The expression

(dribble "myfile")

starts a recording. All expressions typed by you and all results printed by LISP-STAT will be entered into the file named myfile. The expression

(dribble)

stops the recording. Note that (dribble "myfile") starts a new file by the name myfile. If you already have a file by that name its contents will be lost. Thus you can’t use dribble to toggle on and off recording to a single file.

dribble only records text that is typed, not plots. However, you can use the buttons displayed on a plot to save in SVG or PNG format. The original HTML plots are saved in your operating system’s TEMP directory and can be viewed again until the directory is cleared during a system reboot.

Saving Variables

Variables you define in LISP-STAT only exist for the duration of the current session. If you quit from LISP-STAT your data will be lost. To preserve your data you can use the savevar function. This function allows you to save one a variable into a file. Again a new file is created and any existing file by the same name is destroyed. To save the variable precipitation in a file called precipitation type

(savevar 'precipitation "precipitation")

Do not add the .lisp suffix yourself; save will supply it. To save the two variables precipitation and purchases in the file examples.lisp type 8.

(savevar '(purchases precipitation) "examples")

The files precipitation.lisp and examples.lisp now contain a set of expression that, when read in with the load command, will recreate the variables precipitation and purchases. You can look at these files with an editor like the Emacs editor and you can prepare files with your own data by following these examples.

Reading Data Files

The data files we have used so far in this tutorial have contained Common Lisp expressions. LISP-STAT also provides functions for reading raw data files. The most commonly used is read-csv.

(read-csv stream)

where stream is a Common Lisp stream with the data. Streams can be obtained from files, strings or a network and are in comma separated value (CSV) format. The parser supports delimiters other than comma.

The character delimited reader should be adequate for most purposes. If you have to read a file that is not in a character delimited format you can use the raw file handling functions of Common Lisp.

User Initialization File

Each Common Lisp implementation provides a way to execute initialization code upon start-up. You can use this file to load any data sets you would like to have available or to define functions of your own.

LISP-STAT also has an initialization file, ls-init.lisp, in your home directory. Typically you will use the lisp implementation initialization file for global level initialization, and ls-init.lisp for data related customizations. See the section Initialization file in the manual for more information.

Defining Functions & Methods

This section gives a brief introduction to programming LISP-STAT. The most basic programming operation is to define a new function. Closely related is the idea of defining a new method for an object. 9

Defining Functions

You can use the Common Lisp language to define functions of your own. Many of the functions you have been using so far are written in this language. The special form used for defining functions is called defun. The simplest form of the defun syntax is

(defun fun args expression)

where fun is the symbol you want to use as the function name, args is the list of the symbols you want to use as arguments, and expression is the body of the function. Suppose for example that you want to define a function to delete a case from a list. This function should take as its arguments the list and the index of the case you want to delete. The body of the function can be based on either of the two approaches described in Section 4.3 above. Here is one approach:

(defun delete-case (x i) (select x (remove i (iota (- (length x) 1)))))

I have used the function length in this definition to determine the length of the argument x. Note that none of the arguments to defun are quoted: defun is a special form that does not evaluate its arguments.

Unless the functions you define are very simple you will probably want to define them in a file and load the file into LISP-STAT with the load command. You can put the functions in the implementation’s initialization file or include in the initialization file a load command that will load another file. The version of Common Lisp for the Macintosh, CCL, includes a simple editor that can be used from within LISP-STAT.

Matrices and Arrays

LISP-STAT includes support for multidimensional arrays. In addition to the standard Common Lisp array functions, LISP-STAT also includes a system called array-operations.

An array is printed using the standard Common Lisp format. For example, a 2 by 3 matrix with rows (1 2 3) and (4 5 6) is printed as

#2A((1 2 3)(4 5 6))

The prefix #2A indicates that this is a two-dimensional array. This form is not particularly readable, but it has the advantage that it can be pasted into expressions and will be read as an array by the LISP reader.10 For matrices you can use the function print-matrix to get a slightly more readable representation:

LS-USER> (print-matrix '#2a((1 2 3)(4 5 6)) *standard-output*)
    1 2 3
    4 5 6
NIL

The select function can be used to extract elements or sub-arrays from an array. If A is a two dimensional array then the expression

(select a 0 1)

will return element 1 of row 0 of A. The expression

(select a (list 0 1) (list 0 1))

returns the upper left hand corner of A.

References

Bates, D. M. and Watts, D. G., (1988), Nonlinear Regression Analysis and its Applications, New York: Wiley.

Becker, Richard A., and Chambers, John M., (1984), S: An Interactive Environment for Data Analysis and Graphics, Belmont, Ca: Wadsworth.

Becker, Richard A., Chambers, John M., and Wilks, Allan R., (1988), The New S Language: A Programming Environment for Data Analysis and Graphics, Pacific Grove, Ca: Wadsworth.

Becker, Richard A., and William S. Cleveland, (1987), “Brushing scatterplots,” Technometrics, vol. 29, pp. 127-142.

Betz, David, (1985) “An XLISP Tutorial,” BYTE, pp 221.

Betz, David, (1988), “XLISP: An experimental object-oriented programming language,” Reference manual for XLISP Version 2.0.

Chaloner, Kathryn, and Brant, Rollin, (1988) “A Bayesian approach to outlier detection and residual analysis,” Biometrika, vol. 75, pp. 651-660.

Cleveland, W. S. and McGill, M. E., (1988) Dynamic Graphics for Statistics, Belmont, Ca.: Wadsworth.

Cox, D. R. and Snell, E. J., (1981) Applied Statistics: Principles and Examples, London: Chapman and Hall.

Dennis, J. E. and Schnabel, R. B., (1983), Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Englewood Cliffs, N.J.: Prentice-Hall.

Devore, J. and Peck, R., (1986), Statistics, the Exploration and Analysis of Data, St. Paul, Mn: West Publishing Co.

McDonald, J. A., (1982), “Interactive Graphics for Data Analysis,” unpublished Ph. D. thesis, Department of Statistics, Stanford University.

Oehlert, Gary W., (1987), “MacAnova User’s Guide,” Technical Report 493, School of Statistics, University of Minnesota.

Press, Flannery, Teukolsky and Vetterling, (1988), Numerical Recipes in C, Cambridge: Cambridge University Press.

Steele, Guy L., (1984), Common Lisp: The Language, Bedford, MA: Digital Press.

Stuetzle, W., (1987), “Plot windows,” J. Amer. Statist. Assoc., vol. 82, pp. 466 - 475.

Tierney, Luke, (1990) LISP-STAT: Statistical Computing and Dynamic Graphics in Lisp. Forthcoming.

Tierney, L. and J. B. Kadane, (1986), “Accurate approximations for posterior moments and marginal densities,” J. Amer. Statist. Assoc., vol. 81, pp. 82-86.

Tierney, Luke, Robert E. Kass, and Joseph B. Kadane, (1989), “Fully exponential Laplace approximations to expectations and variances of nonpositive functions,” J. Amer. Statist. Assoc., to appear.

Tierney, L., Kass, R. E., and Kadane, J. B., (1989), “Approximate marginal densities for nonlinear functions,” Biometrika, to appear.

Weisberg, Sanford, (1982), “MULTREG Users Manual,” Technical Report 298, School of Statistics, University of Minnesota.

Winston, Patrick H. and Berthold K. P. Horn, (1988), LISP, 3rd Ed., New York: Addison-Wesley.

Appendix A: LISP-STAT Interface to the Operating System

A.1 Running System Commands from LISP-STAT

The run-program function can be used to run UNIX commands from within LISP-STAT. This function takes a shell command string as its argument and returns the shell exit code for the command. For example, you can print the date using the UNIX date command:

LS-USER> (uiop:run-program "date" :output *standard-output*)
Wed Jul 19 11:06:53 CDT 1989
0

The return value is 0, indicating successful completion of the UNIX command.


  1. It is possible to make a finer distinction. The reader takes a string of characters from the listener and converts it into an expression. The evaluator evaluates the expression and the printer converts the result into another string of characters for the listener to print. For simplicity I will use evaluator to describe the combination of these functions. ↩︎

  2. def acts like a special form, rather than a function, since its first argument is not evaluated (otherwise you would have to quote the symbol). Technically def is a macro, not a special form, but I will not worry about the distinction in this tutorial. def is closely related to the standard Lisp special forms setf and setq. The advantage of using def is that it adds your variable name to a list of def‘ed variables that you can retrieve using the function variables. If you use setf or setq there is no easy way to find variables you have defined, as opposed to ones that are predefined. def always affects top level symbol bindings, not local bindings. It cannot be used in function definitions to change local bindings. ↩︎

  3. The generator used is Marsaglia’s portable generator from the Core Math Libraries distributed by the National Bureau of Standards. A state object is a vector containing the state information of the generator. “Random” reseeding occurs off the system clock. ↩︎

  4. Help is available both in the REPL, and online at https://lisp-stat.dev/ ↩︎

  5. The notation used corresponds to the specification of the argument lists in Lisp function definitions. See Section 8{reference-type=“ref” reference=“Fundefs”} for more information on defining functions. ↩︎

  6. Note that the keyword :title has not been quoted. Keyword symbols, symbols starting with a colon, are somewhat special. When a keyword symbol is created its value is set to itself. Thus a keyword symbol effectively evaluates to itself and does not need to be quoted. ↩︎

  7. Actually pi represents a constant, produced with defconst. Its value cannot be changed by simple assignment. ↩︎

  8. I have used a quoted list ’(purchases precipitation) in this expression to pass the list of symbols to the savevar function. A longer alternative would be the expression (list ’purchases ’precipitation). ↩︎

  9. The discussion in this section only scratches the surface of what you can do with functions in the XLISP language. To see more examples you can look at the files that are loaded when XLISP-STAT starts up. For more information on options of function definition, macros, etc. see the XLISP documentation and the books on Lisp mentioned in the references. ↩︎

  10. You should quote an array if you type it in using this form, as the value of an array is not defined. ↩︎

4.2 - Data Frame

A Data Frame Primer

Load data frame

(ql:quickload :data-frame)

Load data

We will use one of the example data sets from R, mtcars, for these examples. First, switch into the Lisp-Stat package:

(in-package :ls-user)

Now load the data:

(data :mtcars-example) ;; WARNING: Missing column name was filled in ;; T

Examine data

Lisp-Stat’s printing system is integrated with the Common Lisp Pretty Printing facility. To control aspects of printing, you can use the built in lisp pretty printing configuration system. By default Lisp-Stat sets *print-pretty* to nil.

Basic information

Type the name of the data frame at the REPL to get a simple one-line summary.

mtcars ;; #<DATA-FRAME MTCARS (32 observations of 12 variables) ;; Motor Trend Car Road Tests>

Printing data

By default, the head function will print the first 6 rows:

(head mtcars) ;; X1 MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ;; 1 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ;; 2 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ;; 3 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ;; 4 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ;; 5 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

and tail the last 6 rows:

(tail mtcars) ;; X1 MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2 ;; 1 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2 ;; 2 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4 ;; 3 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6 ;; 4 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8 ;; 5 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2

print-data can be used to print the whole data frame:

(print-data mtcars) ;; X1 MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ;; 1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ;; 2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ;; 3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ;; 4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ;; 5 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ;; 6 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ;; 7 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ;; 8 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ;; 9 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ;; 10 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 ;; 11 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 ;; 12 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 ;; 13 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 ;; 14 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ;; 15 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 ;; 16 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 ;; 17 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ;; 18 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 ;; 19 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 ;; 20 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 ;; 21 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 ;; 22 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 ;; 23 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 ..

The two dots “..” at the end indicate that output has been truncated. Lisp-Stat sets the default for pretty printer *print-lines* to 25 rows and output more than this is truncated. If you’d like to print all rows, set this value to nil, (setf *print-lines* nil)

Notice the column named X1. This is the name given to the column by the data reading function. Note the warning that was issued during the import. Missing columns are named X1, X2, …, Xn in increasing order for the duration of the Lisp-Stat session.

This column is actually the row name, so we’ll rename it:

(rename! mtcars 'model 'x1)

The keys of a data frame are symbols, so you need to quote them to prevent the reader from trying to evaluate them to a value.

Note that your row may be named something other than X1, depending on whether or not you have loaded any other data frames with variable name replacement. Also note: the ! at the end of the function name. This is a convention indicating a destructive operation; a copy will not be returned, it’s the actual data that will be modified.

Now let’s view the results:

(head mtcars) ;; MODEL MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ;; 1 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ;; 2 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ;; 3 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ;; 4 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ;; 5 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Column names

To see the names of the columns, use the column-names function:

(column-names mtcars) ;; => ("MODEL" "MPG" "CYL" "DISP" "HP" "DRAT" "WT" "QSEC" "VS" "AM" "GEAR" "CARB")

Remember we mentioned that the keys (column names) are symbols? Compare the above to the keys of the data frame:

(keys mtcars) ;; => #(MODEL MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB)

These symbols are printed without double quotes. If a function takes a key, it must be quoted, e.g. 'mpg and not mpg or "mpg"

Dimensions

We saw the dimensions above in basic information. That was a printed for human consumption. To get the values in a form suitable for passing to other functions, use the dims command:

(aops:dims mtcars) ;; => (32 12)

Common Lisp specifies dimensions in row-column order, so mtcars has 32 rows and 12 columns.

Basic Statistics

Minimum & Maximum

To get the minimum or maximum of a column, say mpg, you can use several Common Lisp methods. Let’s see what mpg looks like by typing the name of the column into the REPL:

mtcars:mpg ;; => #(21 21 22.8d0 21.4d0 18.7d0 18.1d0 14.3d0 24.4d0 22.8d0 19.2d0 17.8d0 16.4d0 17.3d0 15.2d0 10.4d0 10.4d0 14.7d0 32.4d0 30.4d0 33.9d0 21.5d0 15.5d0 15.2d0 13.3d0 19.2d0 27.3d0 26 30.4d0 15.8d0 19.7d0 15 21.4d0)

You could, for example, use something like this to find the minimum:

(reduce #'min mtcars:mpg) ;; => 10.4d0

or the Lisp-Stat function seq-max to find the maximum

(seq-max mtcars:mpg) ;; => 33.9d0

or perhaps you’d prefer alexandria:extremum, a general-purpose tool to find the minimum in a different way:

(extremum mtcars:mpg #'<) ;; => 10.4d0

The important thing to note is that mtcars:mpg is a standard Common Lisp vector and you can manipulate it like one.

Mean & standard deviation

(mean mtcars:mpg) ;; => 20.090625000000003d0
(sd mtcars:mpg) ;; => 5.932029552301219d0

Summarise

You can summarise a column with the summarize-column function:

(summarize-column 'mtcars:mpg) MPG (Miles/(US) gallon) n: 32 missing: 0 min=10.40 q25=15.40 q50=19.20 mean=20.09 q75=22.80 max=33.90

or the entire data frame:

LS-USER> (summary mtcars) ( MPG (Miles/(US) gallon) n: 32 missing: 0 min=10.40 q25=15.40 q50=19.20 mean=20.09 q75=22.80 max=33.90 CYL (Number of cylinders) 14 (44%) x 8, 11 (34%) x 4, 7 (22%) x 6, DISP (Displacement (cu.in.)) n: 32 missing: 0 min=71.10 q25=120.65 q50=205.87 mean=230.72 q75=334.00 max=472.00 HP (Gross horsepower) n: 32 missing: 0 min=52 q25=96.00 q50=123 mean=146.69 q75=186.25 max=335 DRAT (Rear axle ratio) n: 32 missing: 0 min=2.76 q25=3.08 q50=3.70 mean=3.60 q75=3.95 max=4.93 WT (Weight (1000 lbs)) n: 32 missing: 0 min=1.51 q25=2.54 q50=3.33 mean=3.22 q75=3.68 max=5.42 QSEC (1/4 mile time) n: 32 missing: 0 min=14.50 q25=16.88 q50=17.71 mean=17.85 q75=18.90 max=22.90 VS (Engine (0=v-shaped, 1=straight)) ones: 14 (44%) AM (Transmission (0=automatic, 1=manual)) ones: 13 (41%) GEAR (Number of forward gears) 15 (47%) x 3, 12 (38%) x 4, 5 (16%) x 5, CARB (Number of carburetors) 10 (31%) x 4, 10 (31%) x 2, 7 (22%) x 1, 3 (9%) x 3, 1 (3%) x 6, 1 (3%) x 8, )

Recall that the column named model is treated specially, notice that it is not included in the summary. You can see why it’s excluded by examining the column’s summary:

LS-USER>(pprint (summarize-column 'mtcars:model))) 1 (3%) x "Mazda RX4", 1 (3%) x "Mazda RX4 Wag", 1 (3%) x "Datsun 710", 1 (3%) x "Hornet 4 Drive", 1 (3%) x "Hornet Sportabout", 1 (3%) x "Valiant", 1 (3%) x "Duster 360", 1 (3%) x "Merc 240D", 1 (3%) x "Merc 230", 1 (3%) x "Merc 280", 1 (3%) x "Merc 280C", 1 (3%) x "Merc 450SE", 1 (3%) x "Merc 450SL", 1 (3%) x "Merc 450SLC", 1 (3%) x "Cadillac Fleetwood", 1 (3%) x "Lincoln Continental", 1 (3%) x "Chrysler Imperial", 1 (3%) x "Fiat 128", 1 (3%) x "Honda Civic", 1 (3%) x "Toyota Corolla", 1 (3%) x "Toyota Corona", 1 (3%) x "Dodge Challenger", 1 (3%) x "AMC Javelin", 1 (3%) x "Camaro Z28", 1 (3%) x "Pontiac Firebird", 1 (3%) x "Fiat X1-9", 1 (3%) x "Porsche 914-2", 1 (3%) x "Lotus Europa", 1 (3%) x "Ford Pantera L", 1 (3%) x "Ferrari Dino", 1 (3%) x "Maserati Bora", 1 (3%) x "Volvo 142E",

Columns with unique values in each row aren’t very interesting.

Saving data

To save a data frame to a CSV file, use the write-csv method. Here we save mtcars into the Lisp-Stat datasets directory, including the column names:

(write-csv mtcars #P"LS:DATA;mtcars.csv" :add-first-row t)

4.3 - Plotting

The basics of plotting

Overview

The plot system provides a way to generate specifications for plotting applications. Examples of plotting packages include gnuplot, plotly and vega/vega-lite.

Plot includes a back end for Vega-Lite; this tutorial will teach you how to encode Vega-Lite plot specifications using Common Lisp. For help on Vega-Lite, see the Vega-Lite tutorials.

For the most part, you can transcribe a Vega-Lite specification directly into Common Lisp and adapt it for your own plots.

Preliminaries

Load Vega-Lite

Load Vega-Lite and network libraries:

(asdf:load-system :plot/vega)

and change to the Lisp-Stat user package:

(in-package :ls-user)

Load example data

The examples in this section use the vega-lite data sets. Load them all now:

(vega:load-vega-examples)

Anatomy of a spec

Plot takes advantage of the fact that Vega-Lite’s JSON specification is very close to that of a plist. If you are familiar with Common Lisp’s ASDF system, then you will be familiar with plot’s way of specifying graphics (plot was modeled on ASDF).

Let’s look at a Vega-Lite scatterplot example:

{ "$schema": "https://vega.github.io/schema/vega-lite/v5.json", "description": "A scatterplot showing horsepower and miles per gallons for various cars.", "data": {"url": "data/cars.json"}, "mark": "point", "encoding": { "x": {"field": "Horsepower", "type": "quantitative"}, "y": {"field": "Miles_per_Gallon", "type": "quantitative"} } }

and compare it with the equivalent Lisp-Stat version:

(plot:plot (vega:defplot hp-mpg `(:title "Vega Cars Horsepower vs. MPG" :description "Horsepower vs miles per gallon for various cars" :data (:values ,vgcars) :mark :point :encoding (:x (:field :horsepower :type :quantitative) :y (:field :miles-per-gallon :type :quantitative)))))

Note that in the Lisp-Stat version we are embedding the data using the :values keyword, as opposed to obtaining it from a server with :url. You can try plotting this now: click on the copy button in the upper right corner of the code box and paste it into the REPL. You should see a window open with the plot displayed:

Your first plot

Data sources

The data property tells Vega where the data for the plot is. Most, but not all, specifications have a single, top level data property, e.g.

"data": {"url": "data/cars.json"}

Lisp-Stat allows you to use a data-frame, or data-frame transformation (filter, selection, etc) as the value for the data property. For example, since a data-frame transformation returns a data-frame, we can insert the results as the data value, as in this plot of residuals:

(:data (:values ,(filter-rows imdb '(and (not (eql imdb-rating :na)) (lt:timestamp< release-date (lt:parse-timestring "2019-01-01")))) :transform #((:joinaggregate #((:op :mean :field :imdb-rating :as :average-rating)))

where we remove :NA and any release-date after 2018.

Vega has transformations as well, but are a bit clumsy compared to those in Lisp-Stat. Sometimes though, you’ll need them because a particular transformation is not something you want to do to your data-frame. You can mix transformations in a single plot, as we saw above in the residuals plot, where the filtering was done in your data-frame and the transformation was done in vega-lite.

Below are several examples of the hp-mpg plot, using various data sources:

Embedded

Most of the examples in this documentation use embedded data, where the data is a part of the plot specification. For completeness sake, we repeat an example here:

(plot:plot (vega:defplot hp-mpg `(:title "Vega Cars Horsepower vs. MPG" :description "Horsepower vs miles per gallon for various cars" :data (:values ,vgcars) :mark :point :encoding (:x (:field :horsepower :type :quantitative) :y (:field :miles-per-gallon :type :quantitative)))))

URL

Note in this example we do not use a data frame as a source, therefore we have to specify field encodings as strings, since variable names will not have been converted to idiomatic lisp. E.g. Miles_per_Gallon vs miles-per-gallon.

(plot:plot (vega:defplot hp-mpg `(:title "Horsepower vs. MPG" :description "Horsepower vs miles per gallon for various cars" :data (:url "https://raw.githubusercontent.com/vega/vega-datasets/next/data/cars.json") :mark :point :encoding (:x (:field "Horsepower" :type :quantitative) :y (:field "Miles_per_Gallon" :type :quantitative)))))

In a production environment, you may have several quri data sources in your image. To load from one of these:

(plot:plot (vega:defplot hp-mpg `(:title "Horsepower vs. MPG" :description "Horsepower vs miles per gallon for various cars" :data (:url ,(quri:uri "https://raw.githubusercontent.com/vega/vega-datasets/next/data/cars.json")) :mark :point :encoding (:x (:field "Horsepower" :type :quantitative) :y (:field "Miles_per_Gallon" :type :quantitative)))))

Here we create the quri object at the same time, since it’s a stand-alone example. It would probably already be created in an actual use case.

Named data

Vega has named data sources that are useful if you have to refer to the same data in several places. We can create one like this:

(plot:plot (vega:defplot hp-mpg `(:title "Horsepower vs. MPG" :description "Horsepower vs miles per gallon for various cars" :datasets (:my-data ,vgcars) :data (:name :my-data) :mark :point :encoding (:x (:field :horsepower :type :quantitative) :y (:field :miles-per-gallon :type :quantitative)))))

Plot specifications

Lisp in a spec

A plot specification is a plist. A nested plist to be exact (or, perhaps more correctly, a tree). This means that we can use Common Lisp tree/list functions to manipulate it.

If you look carefully at the examples, you’ll note they use a backquote (`) instead of a normal list quote ('). This is the mechanism that Common Lisp macros use to rewrite code before compilation, and we can use the same mechanism to rewrite our Vega-Lite specifications before encoding them.

The simplest, and most common, feature is insertion, like we did above. By placing a comma (,) before the name of the data frame, we told the backquote system to insert the value of the data frame instead of the symbol (vgcars) in the example.

There’s a lot more you can do with the backquote mechanism. We won’t say any more here, as it’s mostly a topic for advanced users. It’s important for you to know it’s there though.

Properties

properties are the keys in key/value pairs. This is true whether discussing a plist or JSON specification. Vega-Lite is case sensitive and Common Lisp is not, so there are a few rules you need to be aware of when constructing plot specifications.

Keys vs. values

Plot uses yason to transform a plist plot specification to JSON. When yason encodes a spec there are two functions of importance:

  • *symbol-encoder*
  • *symbol-key-encoder*

The former encodes values, and the latter encodes keys. In PLOT, both of these are bound to a custom function encode-symbol-as-metadata. This function does more than just encode meta data though, it also handles naming conventions.

This won’t mean much in your day-to-day use of the system, but you do need to be aware of the difference between encoding a key and a value. There are some values that the encoder can’t work with, and in those cases you’ll need to use text.

Finally, remember that the symbol encoders are just a convenience to make things more lisp-like. You can build a plot specification, both keys and values, entirely from text if you wish.

Encoding symbols

JavaScript identifiers are incompatible with Common Lisp identifiers, so we need a way to translate between them. plot uses Parenscript symbol conversion for this. This is one of the reasons for specialised symbol encoders. Let’s look at the difference between the standard yason encoder and the one provided by plot (Parenscript):

LS-USER> (ps:symbol-to-js-string :x-offset) "xOffset" LS-USER> (yason:encode-symbol-as-lowercase :x-offset) "x-offset" LS-USER>

That difference is significant to Vega-Lite, where identifiers with a - are not allowed. Vega is also case sensitive, so if a key is xOffset, xoffset will not work. Fortunately Parenscript’s symbol conversion is just what we need. It will automatically capitalise the words following a dash, so x-offset becomes xOffset.

Symbols can also be used for value fields, and these are more forgiving. As long as you are consistent, and keep in mind that a behind the scenes conversion is happening, you can use lisp-like identifiers. Where this mostly comes into play is when you are using Vega transforms, as in the residuals example:

(:data ,(filter-rows imdb '(and (not (eql imdb-rating :na)) (lt:timestamp< release-date (lt:parse-timestring "2019-01-01")))) :transform #((:joinaggregate #((:op :mean :field :imdb-rating :as :average-rating))) (:calculate "datum['imdbRating'] - datum.averageRating" :as :rating-delta))

Notice that we used :imdb-rating as the field name for the joinaggregate, however in the calculate part of the transform we used the converted name imdbRating; that’s because by the time the transform is run, the conversion will have already happened. When we use :as we are assigning a name, when we use datum, we are telling Vega to find a name, and since this is done in a text field, plot won’t convert the names it finds inside text strings.

Finally, remember that the Parenscript transformation is also run on variable/column names. You can see that we referred to imdb-rating in the filter. If you get confused, run (keys <data-frame>) and think about how ps:symbol-to-js-string would return the keys. That’s what Vega will use as the column names.

This is more complicated to explain than to use. See the examples for best practice patterns. You’ll probably only need to be aware of this when doing transforms in Vega.

Variable symbols

When you define a data frame using the defdf macro, Lisp-Stat sets up an environment for that data set. Part of that environment includes configuring a package with a symbol for each variable in the data set. These symbols have properties that describe the variable, such as unit, label, type, etc. plot can make use of this information when creating plots. Here’s a previous example, where we do not use variable symbols:

(plot:plot (vega:defplot hp-mpg-plot `(:title "Vega Cars" :data (:values ,vgcars) :mark :point :encoding (:x (:field :horsepower :type :quantitative) :y (:field :miles-per-gallon :type :quantitative)))))

and one where we do:

(plot:plot (vega:defplot hp-mpg-plot `(:title "Vega Cars" :data ,vgcars :mark :point :encoding (:x (:field vgcars:horsepower) :y (:field vgcars:miles-per-gallon)))))

The difference is subtle, but this can save some typing if you are always adding titles and field types. We don’t use this in the examples because we want to demonstrate the lowest common denominator, but in all plots we create professionally we use variable symbols.

Special characters

There are occasions when neither the Parenscript encoder nor Yason will correctly encode a key or value. In those situations, you’ll need to use text strings. This can happen when Vega wants an encoding that includes a character that is a reader macro, #, often used in color specifications, or in format properties, like this one (:format ".1~%")

Finally, there may be times when you need to use multiple escape characters instead of quoted strings. Occasionally an imported data set will include parenthesis (). The data-frame reader will enclose these in multiple escape characters, so for example a variable named body mass (g) will be loaded as |BODY-MASS-(G)|. In these cases you can either change the name to a valid Common Lisp identifier using rename-column!, or refer to the variable using the multiple escape characters.

nil, null, false, true

Strictly speaking, false in JavaScript is the Boolean negative. In practice, "false", a string, is often accepted. This seems to vary within Vega-Lite. Some parts accept "false", others do not. The plot symbol encoder will correctly output false for the symbol :false, and you should use that anywhere you encounter a Boolean negative.

true is encoded for the lisp symbol T.

nil and null may be entered directly as they are and will be correctly transcribed.

Embedded data

By default, plot embeds data within the Vega-Lite JSON spec, then uses vega-embed to display it within an HTML page. The alternative is to use data from a url. Both are mostly equivalent, however there can be differences in parsing, especially with dates. When data is embedded, values are parsed by the JavaScript parser in your browser. When it’s loaded via a url, it’s run through the Vega-Lite parser. Sometimes Vega-Lite needs a bit of help by way of format for embedded data. For this reason plot always outputs dates & times in ISO-8601 format, which works everywhere.

Large data sets can be problematic if you have a number of plots open and limited memory.

Saving plots

You can save plot specifications like any other Common Lisp object, for example using with-open-file. data-frames also have read/write functions. This section describes some convenience functions for plot I/O.

Devices

A ‘device’ is a loose abstraction for the various locations that data and specifications can be written to. For example in developing this website, data is written to a directory for static files /static/data/, and the plot specification to /static/plots/. We can model this with a plist like so:

(defparameter hugo-url '(:spec-loc #P"s:/src/documentation/static/plots/" :data-loc #P"s:/src/documentation/static/data/" :data-url "/data/")

With this ‘device’, you can save a plot like so:

(vega:plot-to-device hugo-url <plot-name>)

and all the bits will be saved to their proper locations. See the examples at the bottom of the file PLOT:SRC;VEGA;device.lisp for various ways to use devices and the heuristics for determining where/when/what to write. These devices have worked in practice in generating more than 300 plots, but if you encounter a use case that’s not covered, please open an issue.

Vega quirks

Vega and Vega-Lite have more than their fair share of quirks and inconsistencies. For the most part you’ll only notice this in the ‘grammar’ of the graphics specification, however occasionally they may look like bugs.

When using the bin transformation, Vega-Lite assumes that if you don’t provide the variable identifier to store the end of the bin, it will use the name of the start of the bin, suffixed with _end. Many of the Vega-Lite examples make this assumption. For example, this is the snippet from a Vega-Lite example:

"data": {"url": "data/cars.json"}, "transform": [{ "bin": true, "field": "Horsepower", "as": "bin_Horsepwoer" }, { "aggregate": [{"op": "count", "as": "Count"}], "groupby": ["bin_Horsepwoer", "bin_Horsepwoer_end"] }, { "joinaggregate": [{"op": "sum", "field": "Count", "as": "TotalCount"}] }, { "calculate": "datum.Count/datum.TotalCount", "as": "PercentOfTotal" } ]

Noticed the bin is using as: bin_Horsepower and then later, in the groupBy transformation, referring to bin_Horsepower_end. To work around this ‘feature’, we need to specify both the start and end for the bin operation:

:transform #((:bin t :field :horsepower :as #(:bin-horsepower :bin-horsepower-end)) (:aggregate #((:op :count :as :count)) :groupby #(:bin-horsepower :bin-horsepower-end))

This kind of behaviour may occur elsewhere, and it’s not well documented, so just be careful when you see any kind of beginning or end encoding in a Vega-Lite example.

Workflow

There are many possible workflows when plotting. This section describes a few that I’ve found useful when developing plots.

By default, plot will embed data in an HTML file and then call the systems browser to open it. This is a perfectly fine way to develop plots, especially if you’re on a machine with a good amount of RAM.

Vega-Desktop

The Vega-Desktop sadly now unmaintained, still works fine for Vega-Lite up to version 5. With this desktop application, you can drag a plot specification to the application and ‘watch’ it. Once watched, any changes you make are instantly updated in the application window. Here’s a demonstration:

First, set up a ‘device’ to use a directory on the desktop for plotting:

(defparameter vdsk1 '(:spec-loc #P"~/Desktop/plots/" :data-loc #P"~/Desktop/plots/data/") "Put data into a data/ subdirectory")

Now send a scatterplot to this device:

(vega:plot-to-device vdsk1 (vega:defplot hp-mpg `(:data (:values ,vgcars) :mark :point :encoding (:x (:field :horsepower :type :quantitative) :y (:field :miles-per-gallon :type :quantitative)))))

Now drag the file ~/Desktop/plots/hp-mpg.vl.json to the Vega-Desktop application:

and click on the ‘watch’ button:

now go back to the buffer with the spec and add a title:

(vega:plot-to-device vdsk1 (vega:defplot hp-mpg `(:title "Horsepower vs. Miles per Gallon" :data (:values ,vgcars) :mark :point :encoding (:x (:field :horsepower :type "quantitative") :y (:field :miles-per-gallon :type "quantitative")))))

and reevaluate the form. If you’re in emacs, this is the C-x C-e command. Observe how the plot is instantly updated:

I tend to use this method when I’m tweaking a plot for final publication.

Vega edit

You can publish a plot specification to a Github gist and then invoke the Vega editor. This isn’t quite as real-time as Vega Desktop in that changes in the Lisp image aren’t automatically reflected and you’ll have to re-publish. It is a good way to debug plots and download them in various formats, or for sharing.

To use this mechanism, you’ll need to configure two environment variables so the gist wrapper will be able to use your credentials to authenticate to the Github API. Set the following environment variables to hold your github credentials:

  • GITHUB_USERNAME
  • GITHUB_OAUTH_TOKEN

Github no longer works with a password, so don’t bother setting that. If you want a custom scheme for authentication, you can create one by following the examples in examples/1.credentials.lisp

Now, you can edit the hp-mpg plot online with:

(vega:edit hp-mpg)

Debugging

There are a couple of commonly encountered scenarios when plots don’t display correctly:

  • it’s so broken the browser displays nothing
  • the ... button appears, but the plot is broken

Nothing is displayed

In this case, your best option is to print to a device where you can examine the output. I use the Vega-Desktop (vgdsk1) so often it’s part of my Lisp-Stat initialisation, and I also use it for these cases. Once you’ve got the spec written out as JSON, see if Vega-Desktop can render it, paying attention to the warnings. Vega-Desktop also has a debug function:

If Vega-Desktop doesn’t help, open the file in Visual Studio code, which has a schema validator. Generally these kinds of syntax errors are easy to spot once they’re pointed out by Visual Studio.

Something is displayed

If you see the three ellipses, then you can open the plot in the online vega editor. This is very similar to Vega Desktop, but with one important difference: you can only debug plots with embedded data sets or remotely available URLs. Because the online editor is a web application hosted on Github, you can’t access local data sets. This is one reason I typically use the Vega-Desktop / Visual Studio combination.

Getting plot information

There are two ways to get information about the plots in your environment.

show-plots

The show-plots command will display the plots you have defined, along with a description (if one was provided in the spec). Here are the plots currently in my environment:

LS-USER> (vega:show-plots) 0: #<PLOT GROUPED-BAR-CHART: Bar chart NIL> 1: #<PLOT HP-MPG-PLOT: Scatter plot NIL> 2: #<PLOT HP-MPG: Scatter plot Horsepower vs miles per gallon for various cars>

Only the last, from the example above, has a description.

describe

You can also use the describe command to view plot information:

LS-USER> (describe hp-mpg) HP-MPG Scatter plot of VGCARS Horsepower vs miles per gallon for various cars

inspect

By typing the plots name in the emacs REPL, a ‘handle’ of sorts is returned, printed in orange:

Right click on the orange text to get a context menu allowing various operations on the object, one of which is to ‘inspect’ the object.

Included datasets

The vega package includes all the data sets in the vega data sets. They have the same name, in the vega package, e.g. vega:penguins.

5 - System Manuals

Manuals for Lisp-Stat systems

This section describes the core APIs and systems that comprise Lisp-Stat. These APIs include both the high level functionality described elsewhere, as well as lower level APIs that they are built on. This section will be of interest to ‘power users’ and developers who wish to extend Lisp-Stat, or build modules of their own.

5.1 - Array Operations

Manipulating sample data as arrays

Overview

The array-operations system contains a collection of functions and macros for manipulating Common Lisp arrays and performing numerical calculations with them.

Array-operations is a ‘generic’ way of operating on array like data structures. Several aops functions have been implemented for data-frame. For those that haven’t, you can transform arrays to data frames using the df:matrix-df function, and a data-frame to an array using df:as-array. This make it convenient to work with the data sets using either system.

Quick look

Arrays can be created with numbers from a statistical distribution:

(rand '(2 2)) ; => #2A((0.62944734 0.2709539) (0.81158376 0.6700171))

in linear ranges:

(linspace 1 10 7) ; => #(1 5/2 4 11/2 7 17/2 10)

or generated using a function, optionally given index position

(generate #'identity '(2 3) :position) ; => #2A((0 1 2) (3 4 5))

They can also be transformed and manipulated:

(defparameter A #2A((1 2) (3 4))) (defparameter B #2A((2 3) (4 5))) ;; split along any dimension (split A 1) ; => #(#(1 2) #(3 4)) ;; stack along any dimension (stack 1 A B) ; => #2A((1 2 2 3) ; (3 4 4 5)) ;; element-wise function map (each #'+ #(0 1 2) #(2 3 5)) ; => #(2 4 7) ;; element-wise expressions (vectorize (A B) (* A (sqrt B))) ; => #2A((1.4142135 3.4641016) ; (6.0 8.944272)) ;; index operations e.g. matrix-matrix multiply: (each-index (i j) (sum-index k (* (aref A i k) (aref B k j)))) ; => #2A((10 13) ; (22 29))

Array shorthand

The library defines the following short function names that are synonyms for Common Lisp operations:

array-operations Common Lisp
size array-total-size
rank array-rank
dim array-dimension
dims array-dimensions
nrow number of rows in matrix
ncol number of columns in matrix

The array-operations package has the nickname aops, so you can use, for example, (aops:size my-array) without use‘ing the package.

Displaced arrays

According to the Common Lisp specification, a displaced array is:

An array which has no storage of its own, but which is instead indirected to the storage of another array, called its target, at a specified offset, in such a way that any attempt to access the displaced array implicitly references the target array.

Displaced arrays are one of the niftiest features of Common Lisp. When an array is displaced to another array, it shares structure with (part of) that array. The two arrays do not need to have the same dimensions, in fact, the dimensions do not be related at all as long as the displaced array fits inside the original one. The row-major index of the former in the latter is called the offset of the displacement.

displace

Displaced arrays are usually constructed using make-array, but this library also provides displace for that purpose:

(defparameter *a* #2A((1 2 3) (4 5 6))) (aops:displace *a* 2 1) ; => #(2 3)

Here’s an example of using displace to implement a sliding window over some set of values, say perhaps a time-series of stock prices:

(defparameter stocks (aops:linspace 1 100 100)) (loop for i from 0 to (- (length stocks) 20) do (format t "~A~%" (aops:displace stocks 20 i))) ;#(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20) ;#(2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21) ;#(3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22)

flatten

flatten displaces to a row-major array:

(aops:flatten *a*) ; => #(1 2 3 4 5 6)

split

The real fun starts with split, which splits off sub-arrays nested within a given axis:

(aops:split *a* 1) ; => #(#(1 2 3) #(4 5 6)) (defparameter *b* #3A(((0 1) (2 3)) ((4 5) (6 7)))) (aops:split *b* 0) ; => #3A(((0 1) (2 3)) ((4 5) (6 7))) (aops:split *b* 1) ; => #(#2A((0 1) (2 3)) #2A((4 5) (6 7))) (aops:split *b* 2) ; => #2A((#(0 1) #(2 3)) (#(4 5) #(6 7))) (aops:split *b* 3) ; => #3A(((0 1) (2 3)) ((4 5) (6 7)))

Note how splitting at 0 and the rank of the array returns the array itself.

sub

Now consider sub, which returns a specific array, composed of the elements that would start with given subscripts:

(aops:sub *b* 0) ; => #2A((0 1) ; (2 3)) (aops:sub *b* 0 1) ; => #(2 3) (aops:sub *b* 0 1 0) ; => 2

In the case of vectors, sub works like aref:

(aops:sub #(1 2 3 4 5) 1) ; => 2

There is also a (setf sub) function.

partition

partition returns a consecutive chunk of an array separated along its first subscript:

(aops:partition #2A((0 1) (2 3) (4 5) (6 7) (8 9)) 1 3) ; => #2A((2 3) ; (4 5))

and also has a (setf partition) pair.

combine

combine is the opposite of split:

(aops:combine #(#(0 1) #(2 3))) ; => #2A((0 1) ; (2 3))

subvec

subvec returns a displaced subvector:

(aops:subvec #(0 1 2 3 4) 2 4) ; => #(2 3)

There is also a (setf subvec) function, which is like (setf subseq) except for demanding matching lengths.

reshape

Finally, reshape can be used to displace arrays into a different shape:

(aops:reshape #2A((1 2 3) (4 5 6)) '(3 2)) ; => #2A((1 2) ; (3 4) ; (5 6))

You can use t for one of the dimensions, to be filled in automatically:

(aops:reshape *b* '(1 t)) ; => #2A((0 1 2 3 4 5 6 7))

reshape-col and reshape-row reshape your array into a column or row matrix, respectively:

(defparameter *a* #2A((0 1) (2 3) (4 5))) (aops:reshape-row *a*) ;=> #2A((0 1 2 3 4 5)) (aops:reshape-col *a*) ;=> #2A((0) (1) (2) (3) (4) (5))

Specifying dimensions

Functions in the library accept the following in place of dimensions:

  • a list of dimensions (as for make-array),
  • a positive integer, which is used as a single-element list,
  • another array, the dimensions of which are used.

The last one allows you to specify dimensions with other arrays. For example, to reshape an array a1 to look like a2, you can use

(aops:reshape a1 a2)

instead of the longer form

(aops:reshape a1 (aops:dims a2))

Creation & transformation

Use the functions in this section to create commonly used arrays types. When the resulting element type cannot be inferred from an existing array or vector, you can pass the element type as an optional argument. The default is elements of type T.

Element traversal order of these functions is unspecified. The reason for this is that the library may use parallel code in the future, so it is unsafe to rely on a particular element traversal order.

The following functions all make a new array, taking the dimensions as input. There are also versions ending in ! which do not make a new array, but take an array as first argument, which is modified and returned.

Function Description
zeros Filled with zeros
ones Filled with ones
rand Filled with uniformly distributed random numbers between 0 and 1
randn Normally distributed with mean 0 and standard deviation 1
linspace Evenly spaced numbers in given range

For example:

(aops:zeros 3) ; => #(0 0 0) (aops:zeros 3 'double-float) ; => #(0.0d0 0.0d0 0.0d0) (aops:rand '(2 2)) ; => #2A((0.6686077 0.59425664) ; (0.7987722 0.6930506)) (aops:rand '(2 2) 'single-float) ; => #2A((0.39332366 0.5557821) ; (0.48831415 0.10924244)) (let ((a (make-array '(2 2) :element-type 'double-float))) ;; Modify array A, filling with random numbers ;; element type is taken from existing array (aops:rand! a)) ; => #2A((0.6324615478515625d0 0.4636608362197876d0) ; (0.4145939350128174d0 0.5124958753585815d0))
(linspace 0 4 5) ;=> #(0 1 2 3 4) (linspace 1 3 5) ;=> #(0 1/2 1 3/2 2) (linspace 1 3 5 'double-float) ;=> #(1.0d0 1.5d0 2.0d0 2.5d0 3.0d0) (linspace 0 4d0 3) ;=> #(0.0d0 2.0d0 4.0d0)

generate

generate (and generate*) allow you to generate arrays using functions. The function signatures are:

generate* (element-type function dimensions &optional arguments)
generate (function dimensions &optional arguments)

Where arguments are passed to function. Possible arguments are:

  • no arguments, when ARGUMENTS is nil
  • the position (= row major index), when ARGUMENTS is :POSITION
  • a list of subscripts, when ARGUMENTS is :SUBSCRIPTS
  • both when ARGUMENTS is :POSITION-AND-SUBSCRIPTS
(aops:generate (lambda () (random 10)) 3) ; => #(6 9 5) (aops:generate #'identity '(2 3) :position) ; => #2A((0 1 2) ; (3 4 5)) (aops:generate #'identity '(2 2) :subscripts) ; => #2A(((0 0) (0 1)) ; ((1 0) (1 1))) (aops:generate #'cons '(2 2) :position-and-subscripts) ; => #2A(((0 0 0) (1 0 1)) ; ((2 1 0) (3 1 1)))

permute

permute can permute subscripts (you can also invert, complement, and complete permutations, look at the docstring and the unit tests). Transposing is a special case of permute:

(defparameter *a* #2A((1 2 3) (4 5 6))) (aops:permute '(0 1) *a*) ; => #2A((1 2 3) ; (4 5 6)) (aops:permute '(1 0) *a*) ; => #2A((1 4) ; (2 5) ; (3 6))

each

each applies a function to its one dimensional array arguments elementwise. It essentially is an element-wise function map on each of the vectors:

(aops:each #'+ #(0 1 2) #(2 3 5) #(1 1 1) ; => #(3 5 8)

vectorize

vectorize is a macro which performs elementwise operations

(defparameter a #(1 2 3 4)) (aops:vectorize (a) (* 2 a)) ; => #(2 4 6 8) (defparameter b #(2 3 4 5)) (aops:vectorize (a b) (* a (sin b))) ; => #(0.9092974 0.28224 -2.2704074 -3.8356972)

There is also a version vectorize* which takes a type argument for the resulting array, and a version vectorize! which sets elements in a given array.

margin

The semantics of margin are more difficult to explain, so perhaps an example will be more useful. Suppose that you want to calculate column sums in a matrix. You could permute (transpose) the matrix, split its sub-arrays at rank one (so you get a vector for each row), and apply the function that calculates the sum. margin automates that for you:

(aops:margin (lambda (column) (reduce #'+ column)) #2A((0 1) (2 3) (5 7)) 0) ; => #(7 11)

But the function is more general than this: the arguments inner and outer allow arbitrary permutations before splitting.

recycle

Finally, recycle allows you to reuse the elements of the first argument, object, to create new arrays by extending the dimensions. The :outer keyword repeats the original object and :inner keyword argument repeats the elements of object. When both :inner and :outer are nil, object is returned as is. Non-array objects are intepreted as rank 0 arrays, following the usual semantics.

(aops:recycle #(2 3) :inner 2 :outer 4) ; => #3A(((2 2) (3 3)) ((2 2) (3 3)) ((2 2) (3 3)) ((2 2) (3 3)))

Three dimensional arrays can be tough to get your head around. In the example above, :outer asks for 4 2-element vectors, composed of repeating the elements of object twice, i.e. repeat ‘2’ twice and repeat ‘3’ twice. Compare this with :inner as 3:

(aops:recycle #(2 3) :inner 3 :outer 4) ; #3A(((2 2 2) (3 3 3)) ((2 2 2) (3 3 3)) ((2 2 2) (3 3 3)) ((2 2 2) (3 3 3)))

The most common use case for recycle is to ‘stretch’ a vector so that it can be an operand for an array of compatible dimensions. In Python, this would be known as ‘broadcasting’. See the Numpy broadcasting basics for other use cases.

For example, suppose we wish to multiply array a, a size 4x3 with vector b of size 3, as in the figure below:

We can do that by recycling array b like this:

(recycle #(1 2 3) :outer 4) ;#2A((1 2 3) ; (1 2 3) ; (1 2 3) ; (1 2 3))

In a similar manner, the figure below (also from the Numpy page) shows how we might stretch a vector horizontally to create an array compatible with the one created above.

To create that array from a vector, use the :inner keyword:

(recycle #(0 10 20 30) :inner 3) ;#2A((0 0 0) ; (10 10 10) ; (20 20 20) ; (30 30 30))

turn

turn rotates an array by a specified number of clockwise 90° rotations. The axis of rotation is specified by RANK-1 (defaulting to 0) and RANK-2 (defaulting to 1). In the first example, we’ll rotate by 90°:

(defparameter array-1 #2A((1 0 0) (2 0 0) (3 0 0))) (aops:turn array-1 1) ;; #2A((3 2 1) ;; (0 0 0) ;; (0 0 0))

and if we rotate it twice (180°):

(aops:turn array-1 2) ;; #2A((0 0 3) ;; (0 0 2) ;; (0 0 1))

finally, rotate it three times (270°):

(aops:turn array-1 3) ;; #2A((0 0 0) ;; (0 0 0) ;; (1 2 3))

map-array

map-array maps a function over the elements of an array.

(aops:map-array #2A((1.7 2.1 4.3 5.4) (0.3 0.4 0.5 0.6)) #'log) ; #2A(( 0.53062826 0.7419373 1.4586151 1.686399) ; (-1.2039728 -0.9162907 -0.6931472 -0.5108256))

outer

outer is generalized outer product of arrays using a provided function.

Lambda list: (function &rest arrays)

The resulting array has the concatenated dimensions of arrays

The examples below return the outer product of vectors or arrays. This is the outer product you get in most linear algebra packages.

(defparameter a #(2 3 5)) (defparameter b #(7 11)) (defparameter c #2A((7 11) (13 17))) (outer #'* a b) ;#2A((14 22) ; (21 33) ; (35 55)) (outer #'* c a) ;#3A(((14 21 35) (22 33 55)) ; ((26 39 65) (34 51 85)))

Indexing operations

nested-loop

nested-loop is a simple macro which iterates over a set of indices with a given range

(defparameter A #2A((1 2) (3 4))) (aops:nested-loop (i j) (array-dimensions A) (setf (aref A i j) (* 2 (aref A i j)))) A ; => #2A((2 4) (6 8)) (aops:nested-loop (i j) '(2 3) (format t "(~a ~a) " i j)) ; => (0 0) (0 1) (0 2) (1 0) (1 1) (1 2)

sum-index

sum-index is a macro which uses a code walker to determine the dimension sizes, summing over the given index or indices

(defparameter A #2A((1 2) (3 4))) ;; Trace (aops:sum-index i (aref A i i)) ; => 5 ;; Sum array (aops:sum-index (i j) (aref A i j)) ; => 10 ;; Sum array (aops:sum-index i (row-major-aref A i)) ; => 10

The main use for sum-index is in combination with each-index.

each-index

each-index is a macro which creates an array and iterates over the elements. Like sum-index it is given one or more index symbols, and uses a code walker to find array dimensions.

(defparameter A #2A((1 2) (3 4))) (defparameter B #2A((5 6) (7 8))) ;; Transpose (aops:each-index (i j) (aref A j i)) ; => #2A((1 3) ; (2 4)) ;; Sum columns (aops:each-index i (aops:sum-index j (aref A j i))) ; => #(4 6) ;; Matrix-matrix multiply (aops:each-index (i j) (aops:sum-index k (* (aref A i k) (aref B k j)))) ; => #2A((19 22) ; (43 50))

reduce-index

reduce-index is a more general version of sum-index; it applies a reduction operation over one or more indices.

(defparameter A #2A((1 2) (3 4))) ;; Sum all values in an array (aops:reduce-index #'+ i (row-major-aref A i)) ; => 10 ;; Maximum value in each row (aops:each-index i (aops:reduce-index #'max j (aref A i j))) ; => #(2 4)

Reducing

Some reductions over array elements can be done using the Common Lisp reduce function, together with aops:flatten, which returns a displaced vector:

(defparameter a #2A((1 2) (3 4))) (reduce #'max (aops:flatten a)) ; => 4

argmax & argmin

argmax and argmin find the row-major-aref index where an array value is maximum or minimum. They both return two values: the first value is the index; the second is the array value at that index.

(defparameter a #(1 2 5 4 2)) (aops:argmax a) ; => 2 5 (aops:argmin a) ; => 0 1

vectorize-reduce

More complicated reductions can be done with vectorize-reduce, for example the maximum absolute difference between arrays:

(defparameter a #2A((1 2) (3 4))) (defparameter b #2A((2 2) (1 3))) (aops:vectorize-reduce #'max (a b) (abs (- a b))) ; => 2

best

best compares two arrays according to a function and returns the ‘best’ value found. The function, FN must accept two inputs and return true/false. This function is applied to elements of ARRAY. The row-major-aref index is returned.

Example: The index of the maximum is

   * (best #'> #(1 2 3 4))
    3   ; row-major index
    4   ; value

most

most finds the element of ARRAY that returns the value closest to positive infinity when FN is applied to the array value. Returns the row-major-aref index, and the winning value.

Example: The maximum of an array is:

     (most #'identity #(1 2 3))
     -> 2    (row-major index)
        3    (value)

and the minimum of an array is:

      (most #'- #(1 2 3))
        0
        -1

See also reduce-index above.

Scalar values

Library functions treat non-array objects as if they were equivalent to 0-dimensional arrays: for example, (aops:split array (rank array)) returns an array that effectively equivalent (eq) to array. Another example is recycle:

(aops:recycle 4 :inner '(2 2)) ; => #2A((4 4) ; (4 4))

Stacking

You can stack compatible arrays by column or row. Metaphorically you can think of these operations as stacking blocks. For example stacking two row vectors yields a 2x2 array:

(stack-rows #(1 2) #(3 4)) ;; #2A((1 2) ;; (3 4))

Like other functions, there are two versions: generalised stacking, with rows and columns of type T and specialised versions where the element-type is specified. The versions allowing you to specialise the element type end in *.

The stack functions use object dimensions (as returned by dims to determine how to use the object.

  • when the object has 0 dimensions, fill a column with the element
  • when the object has 1 dimension, use it as a column
  • when the object has 2 dimensions, use it as a matrix

copy-row-major-block is a utility function in the stacking package that does what it suggests; it copies elements from one array to another. This function should be used to implement copying of contiguous row-major blocks of elements.

rows

stack-rows-copy is the method used to implement the copying of objects in stack-row*, by copying the elements of source to destination, starting with the row index start-row in the latter. Elements are coerced to element-type.

stack-rows and stack-rows* stack objects row-wise into an array of the given element-type, coercing if necessary. Always return a simple array of rank 2. stack-rows always returns an array with elements of type T, stack-rows* coerces elements to the specified type.

columns

stack-cols-copy is a method used to implement the copying of objects in stack-col*, by copying the elements of source to destination, starting with the column index start-col in the latter. Elements are coerced to element-type.

stack-cols and stack-cols* stack objects column-wise into an array of the given element-type, coercing if necessary. Always return a simple array of rank 2. stack-cols always returns an array with elements of type T, stack-cols* coerces elements to the specified type.

arbitrary

stack and stack* stack array arguments along axis. element-type determines the element-type of the result.

(defparameter *a1* #(0 1 2)) (defparameter *a2* #(3 5 7)) (aops:stack 0 *a1* *a2*) ; => #(0 1 2 3 5 7) (aops:stack 1 (aops:reshape-col *a1*) (aops:reshape-col *a2*)) ; => #2A((0 3) ; (1 5) ; (2 7))

5.2 - Data Frame

Manipulating data using a data frame

Overview

A common lisp data frame is a collection of observations of sample variables that shares many of the properties of arrays and lists. By design it can be manipulated using the same mechanisms used to manipulate lisp arrays. This allow you to, for example, transform a data frame into an array and use array-operations to manipulate it, and then turn it into a data frame again to use in modeling or plotting.

Data frame is implemented as a two-dimensional common lisp data structure: a vector of vectors for data, and a hash table mapping variable names to column vectors. All columns are of equal length. This structure provides the flexibility required for column oriented manipulation, as well as speed for large data sets.

Load/install

Data-frame is part of the Lisp-Stat package. It can be used independently if desired. Since the examples in this manual use Lisp-Stat functionality, we’ll use it from there rather than load independently.

(ql:quickload :lisp-stat)

Within the Lisp-Stat system, the LS-USER package is the package for you to do statistics work. Type the following to change to that package:

(in-package :ls-user)

Naming conventions

Lisp-Stat has a few naming conventions you should be aware of. If you see a punctuation mark or the letter ‘p’ as the last letter of a function name, it indicates something about the function:

  • ‘!’ indicates that the function is destructive. It will modify the data that you pass to it. Otherwise, it will return a copy that you will need to save in a variable.
  • ‘p’, ‘-p’ or ‘?’ means the function is a predicate, that returns a Boolean truth value.

Data frame environment

Although you can work with data frames bound to symbols (as would happen if you used (defparameter ...), it is more convenient to define them as part of an environment. When you do this, the system defines a package of the same name as the data frame, and provides a symbol for each variable. Let’s see how things work without an environment:

First, we define a data frame as a parameter:

(defparameter mtcars (read-csv rdata:mtcars) "Motor Trend Car Road Tests") ;; WARNING: Missing column name was filled in ;; MTCARS2

Now if we want a column, we can say:

(column mtcars 'mpg)

Now let’s define an environment using defdf:

(defdf mtcars (read-csv rdata:mtcars) "Motor Trend Car Road Tests") ;; WARNING: Missing column name was filled in ;; #<DATA-FRAME (32 observations of 12 variables) ;; Motor Trend Car Road Tests>

Now we can access the same variable with:

mtcars:mpg

defdf does a lot more than this, and you should probably use defdf to set up an environment instead of defparameter. We mention it here because there’s an important bit about maintaining the environment to be aware of:

defdf

The defdf macro is conceptually equivalent to the Common Lisp defparameter, but with some additional functionality that makes working with data frames easier. You use it the same way you’d use defparameter, for example:

(defdf foo <any-function returning a data frame> )

We’ll use both ways of defining data frames in this manual. The access methods that are defined by defdf are described in the access data section.

Data types

It is important to note that there are two ’types’ in Lisp-Stat: the implementation type and the ‘statistical’ type. Sometimes these are the same, such as in the case of reals; in other situations they are not. A good example of this can be seen in the mtcars data set. The hp (horsepower), gear and carb are all of type integer from an implementation perspective. However only horsepower is a continuous variable. You can have an additional 0.5 horsepower, but you cannot add an additional 0.5 gears or carburetors.

Data types are one kind of property that can be set on a variable.

As part of the recoding and data cleansing process, you will want to add properties to your variables. In Common Lisp, these are plists that reside on the variable symbols, e.g. mtcars:mpg. In R they are known as attributes. By default, there are three properties for each variable: type, unit and label (documentation). When you load from external formats, like CSV, these properties are all nil; when you load from a lisp file, they will have been saved along with the data (if you set them).

There are seven data types in Lisp-Stat:

  • string
  • integer
  • double-float
  • single-float
  • categorical (factor in R)
  • temporal
  • bit (Boolean)

Numeric

Numeric types, double-float, single-float and integer are all essentially similar. The vector versions have type definitions (from the numeric-utilities package) of:

  • simple-double-float-vector
  • simple-single-float-vector
  • simple-fixnum-vector

As an example, let’s look at mtcars:mpg, where we have a variable of type float, but a few integer values mixed in.

The values may be equivalent, but the types are not. The CSV loader has no way of knowing, so loads the column as a mixture of integers and floats. Let’s start by reloading mtcars from the CSV file:

(undef 'mtcars) (defdf mtcars (read-csv rdata:mtcars))

and look at the mpg variable:

LS-USER> mtcars:mpg #(21 21 22.8d0 21.4d0 18.7d0 18.1d0 14.3d0 24.4d0 22.8d0 19.2d0 17.8d0 16.4d0 17.3d0 15.2d0 10.4d0 10.4d0 14.7d0 32.4d0 30.4d0 33.9d0 21.5d0 15.5d0 15.2d0 13.3d0 19.2d0 27.3d0 26 30.4d0 15.8d0 19.7d0 15 21.4d0) LS-USER> (type-of *) (SIMPLE-VECTOR 32)

Notice that the first two entries in the vector are integers, and the remainder floats. To fix this manually, you will need to coerce each element of the column to type double-float (you could use single-float in this case; as a matter of habit we usually use double-float) and then change the type of the vector to a specialised float vector.

You can use the heuristicate-types function to guess the statistical types for you. For reals and strings, heuristicate-types works fine, however because integers and bits can be used to encode categorical or numeric values, you will have to indicate the type using set-properties. We see this below with gear and carb, although implemented as integer, they are actually type categorical. The next sections describes how to set them.

Using describe, we can view the types of all the variables that heuristicate-types set:

LS-USER> (heuristicate-types mtcars) LS-USER> (describe mtcars) MTCARS A data-frame with 32 observations of 12 variables Variable | Type | Unit | Label -------- | ---- | ---- | ----------- X8 | STRING | NIL | NIL MPG | DOUBLE-FLOAT | NIL | NIL CYL | INTEGER | NIL | NIL DISP | DOUBLE-FLOAT | NIL | NIL HP | INTEGER | NIL | NIL DRAT | DOUBLE-FLOAT | NIL | NIL WT | DOUBLE-FLOAT | NIL | NIL QSEC | DOUBLE-FLOAT | NIL | NIL VS | BIT | NIL | NIL AM | BIT | NIL | NIL GEAR | INTEGER | NIL | NIL CARB | INTEGER | NIL | NIL

Notice the system correctly typed vs and am as Boolean (bit) (correct in a mathematical sense)

Strings

Unlike in R, strings are not considered categorical variables by default. Ordering of strings varies according to locale, so it’s not a good idea to rely on the strings. Nevertheless, they do work well if you are working in a single locale.

Categorical

Categorical variables have a fixed and known set of possible values. In mtcars, gear, carb vs and am are categorical variables, but heuristicate-types can’t distinguish categorical types, so we’ll set them:

(set-properties mtcars :type '(:vs :categorical :am :categorical :gear :categorical :carb :categorical))

Temporal

Dates and times can be surprisingly complicated. To make working with them simpler, Lisp-Stat uses vectors of localtime objects to represent dates & times. You can set a temporal type with set-properties as well using the keyword :temporal.

Units & labels

To add units or labels to the data frame, use the set-properties function. This function takes a plist of variable/value pairs, so to set the units and labels:

(set-properties mtcars :unit '(:mpg m/g :cyl :NA :disp in³ :hp hp :drat :NA :wt lb :qsec s :vs :NA :am :NA :gear :NA :carb :NA)) (set-properties mtcars :label '(:mpg "Miles/(US) gallon" :cyl "Number of cylinders" :disp "Displacement (cu.in.)" :hp "Gross horsepower" :drat "Rear axle ratio" :wt "Weight (1000 lbs)" :qsec "1/4 mile time" :vs "Engine (0=v-shaped, 1=straight)" :am "Transmission (0=automatic, 1=manual)" :gear "Number of forward gears" :carb "Number of carburetors"))

Now look at the description again:

LS-USER> (describe mtcars) MTCARS A data-frame with 32 observations of 12 variables Variable | Type | Unit | Label -------- | ---- | ---- | ----------- X8 | STRING | NIL | NIL MPG | DOUBLE-FLOAT | M/G | Miles/(US) gallon CYL | INTEGER | NA | Number of cylinders DISP | DOUBLE-FLOAT | IN3 | Displacement (cu.in.) HP | INTEGER | HP | Gross horsepower DRAT | DOUBLE-FLOAT | NA | Rear axle ratio WT | DOUBLE-FLOAT | LB | Weight (1000 lbs) QSEC | DOUBLE-FLOAT | S | 1/4 mile time VS | BIT | NA | Engine (0=v-shaped, 1=straight) AM | BIT | NA | Transmission (0=automatic, 1=manual) GEAR | INTEGER | NA | Number of forward gears CARB | INTEGER | NA | Number of carburetors

You can set your own properties with this command too. To make your custom properties appear in the describe command and be saved automatically, override the describe and write-df methods, or use :after methods.

Create data-frames

A data frame can be created from a Common Lisp array, alist, plist, individual data vectors, another data frame or a vector-of vectors. In this section we’ll describe creating a data frame from each of these.

Data frame columns represent sample set variables, and its rows are observations (or cases).

(defmethod print-object ((df data-frame) stream) "Print the first six rows of DATA-FRAME" (let ((*print-lines* 6)) (df:print-data df stream nil))) (set-pprint-dispatch 'df:data-frame #'(lambda (s df) (df:print-data df s nil)))

You can ignore the warning that you’ll receive after executing the code above.

Let’s create a simple data frame. First we’ll setup some variables (columns) to represent our sample domain:

(defparameter v #(1 2 3 4)) ; vector (defparameter b #*0110) ; bits (defparameter s #(a b c d)) ; symbols (defparameter plist `(:vector ,v :symbols ,s)) ;only v & s

Let’s print plist. Just type the name in at the REPL prompt.

plist (:VECTOR #(1 2 3 4) :SYMBOLS #(A B C D))

From p/a-lists

Now suppose we want to create a data frame from a plist

(apply #'df plist) ;; VECTOR SYMBOLS ;; 1 A ;; 2 B ;; 3 C ;; 4 D

We could also have used the plist-df function:

(plist-df plist) ;; VECTOR SYMBOLS ;; 1 A ;; 2 B ;; 3 C ;; 4 D

and to demonstrate the same thing using an alist, we’ll use the alexandria:plist-alist function to convert the plist into an alist:

(alist-df (plist-alist plist)) ;; VECTOR SYMBOLS ;; 1 A ;; 2 B ;; 3 C ;; 4 D

From vectors

You can use make-df to create a data frame from keys and a list of vectors. Each vector becomes a column in the data-frame.

(make-df '(:a :b) ; the keys '(#(1 2 3) #(10 20 30))) ; the columns ;; A B ;; 1 10 ;; 2 20 ;; 3 30

This is useful if you’ve started working with variables defined with defparameter or defvar and want to combine them into a data frame.

From arrays

matrix-df converts a matrix (array) to a data-frame with the given keys.

(matrix-df #(:a :b) #2A((1 2) (3 4))) ;#<DATA-FRAME (2 observations of 2 variables)>

This is useful if you need to do a lot of numeric number-crunching on a data set as an array, perhaps with BLAS or array-operations then want to add categorical variables and continue processing as a data-frame.

Example datasets

Vincent Arel-Bundock maintains a library of over 1700 R datasets that is a consolidation of example data from various R packages. You can load one of these by specifying the url to the raw data to the read-csv function. For example to load the iris data set, use:

(defdf iris (read-csv "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/iris.csv") "Edgar Anderson's Iris Data")

Default datasets

To make the examples and tutorials easier, Lisp-Stat includes the URLs for the R built in data sets. You can see these by viewing the rdata:*r-default-datasets* variable:

LS-USER? rdata:*r-default-datasets* (RDATA:AIRPASSENGERS RDATA:ABILITY.COV RDATA:AIRMILES RDATA:AIRQUALITY RDATA:ANSCOMBE RDATA:ATTENU RDATA:ATTITUDE RDATA:AUSTRES RDATA:BJSALES RDATA:BOD RDATA:CARS RDATA:CHICKWEIGHT RDATA:CHICKWTS RDATA:CO2-1 RDATA:CO2-2 RDATA:CRIMTAB RDATA:DISCOVERIES RDATA:DNASE RDATA:ESOPH RDATA:EURO RDATA:EUSTOCKMARKETS RDATA:FAITHFUL RDATA:FORMALDEHYDE RDATA:FREENY RDATA:HAIREYECOLOR RDATA:HARMAN23.COR RDATA:HARMAN74.COR RDATA:INDOMETH RDATA:INFERT RDATA:INSECTSPRAYS RDATA:IRIS RDATA:IRIS3 RDATA:ISLANDS RDATA:JOHNSONJOHNSON RDATA:LAKEHURON RDATA:LH RDATA:LIFECYCLESAVINGS RDATA:LOBLOLLY RDATA:LONGLEY RDATA:LYNX RDATA:MORLEY RDATA:MTCARS RDATA:NHTEMP RDATA:NILE RDATA:NOTTEM RDATA:NPK RDATA:OCCUPATIONALSTATUS RDATA:ORANGE RDATA:ORCHARDSPRAYS RDATA:PLANTGROWTH RDATA:PRECIP RDATA:PRESIDENTS RDATA:PRESSURE RDATA:PUROMYCIN RDATA:QUAKES RDATA:RANDU RDATA:RIVERS RDATA:ROCK RDATA:SEATBELTS RDATA::STUDENT-SLEEP RDATA:STACKLOSS RDATA:SUNSPOT.MONTH RDATA:SUNSPOT.YEAR RDATA:SUNSPOTS RDATA:SWISS RDATA:THEOPH RDATA:TITANIC RDATA:TOOTHGROWTH RDATA:TREERING RDATA:TREES RDATA:UCBADMISSIONS RDATA:UKDRIVERDEATHS RDATA:UKGAS RDATA:USACCDEATHS RDATA:USARRESTS RDATA:USJUDGERATINGS RDATA:USPERSONALEXPENDITURE RDATA:USPOP RDATA:VADEATHS RDATA:VOLCANO RDATA:WARPBREAKS RDATA:WOMEN RDATA:WORLDPHONES RDATA:WWWUSAGE)

To load one of these, you can use the name of the data set. For example to load mtcars:

(defdf mtcars (read-csv rdata:mtcars))

If you want to load all of the default R data sets, use the rdata:load-r-default-datasets command. All the data sets included in base R will now be loaded into your environment. This is useful if you are following a R tutorial, but using Lisp-Stat for the analysis software.

You may also want to save the default R data sets in order to augment the data with labels, units, types, etc. To save all of the default R data sets to the LS:DATA;R directory, use the (rdata:save-r-default-datasets) command if the default data sets have already been loaded, or save-r-data if they have not. This saves the data in lisp format.

Install R datasets

To work with all of the R data sets, we recommend you use git to download the repository to your hard drive. For example I downloaded the example data to the s: drive like this:

cd s: git clone https://github.com/vincentarelbundock/Rdatasets.git

and setup a logical host in my ls-init.lisp file like so:

;;; Define logical hosts for external data sets (setf (logical-pathname-translations "RDATA") `(("**;*.*.*" ,(merge-pathnames "csv/**/*.*" "s:/Rdatasets/"))))

Now you can access any of the datasets using the logical pathname. Here’s an example of creating a data frame using the ggplot mpg data set:

(defdf mpg (read-csv #P"RDATA:ggplot2;mpg.csv"))

Searching the examples

With so many data sets, it’s helpful to load the index into a data frame so you can search for specific examples. You can do this by loading the rdata:index into a data frame:

(defdf rindex (read-csv rdata:index))

I find it easiest to use the SQL-DF system to query this data. For example if you wanted to find the data sets with the largest number of observations:

(ql:quickload :sqldf) (print-data (sqldf:sqldf "select item, title, rows, cols from rindex order by rows desc limit 10")) ;; ITEM TITLE ROWS COLS ;; 0 military US Military Demographics 1414593 6 ;; 1 Birthdays US Births in 1969 - 1988 372864 7 ;; 2 wvs_justifbribe Attitudes about the Justifiability of Bribe-Taking in the ... 348532 6 ;; 3 flights Flights data 336776 19 ;; 4 wvs_immig Attitudes about Immigration in the World Values Survey 310388 6 ;; 5 Fertility Fertility and Women's Labor Supply 254654 8 ;; 6 avandia Cardiovascular problems for two types of Diabetes medicines 227571 2 ;; 7 AthleteGrad Athletic Participation, Race, and Graduation 214555 3 ;; 8 mortgages Data from "How do Mortgage Subsidies Affect Home Ownership? ..." 214144 6 ;; 9 mammogram Experiment with Mammogram Randomized

Export data frames

These next few functions are the reverse of the ones above used to create them. These are useful when you want to use foreign libraries or common lisp functions to process the data.

For this section of the manual, we are going to work with a subset of the mtcars data set from above. We’ll use the select package to take the first 5 rows so that the data transformations are easier to see.

(defparameter mtcars-small (select mtcars (range 0 5) t))

The next three functions convert a data-frame to and from standard common lisp data structures. This is useful if you’ve got data in Common Lisp format and want to work with it in a data frame, or if you’ve got a data frame and want to apply Common Lisp operators on it that don’t exist in df.

as-alist

Just like it says on the tin, as-alist takes a data frame and returns an alist version of it (formatted here for clearer output – a pretty printer that outputs an alist in this format would be a welcome addition to Lisp-Stat)

(as-alist mtcars-small) ;; ((MTCARS:X1 . #("Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" "Hornet Sportabout")) ;; (MTCARS:MPG . #(21 21 22.8d0 21.4d0 18.7d0)) ;; (MTCARS:CYL . #(6 6 4 6 8)) ;; (MTCARS:DISP . #(160 160 108 258 360)) ;; (MTCARS:HP . #(110 110 93 110 175)) ;; (MTCARS:DRAT . #(3.9d0 3.9d0 3.85d0 3.08d0 3.15d0)) ;; (MTCARS:WT . #(2.62d0 2.875d0 2.32d0 3.215d0 3.44d0)) ;; (MTCARS:QSEC . #(16.46d0 17.02d0 18.61d0 19.44d0 17.02d0)) ;; (MTCARS:VS . #*00110) ;; (MTCARS:AM . #*11100) ;; (MTCARS:GEAR . #(4 4 4 3 3)) ;; (MTCARS:CARB . #(4 4 1 1 2)))

as-plist

Similarly, as-plist will return a plist:

(as-plist mtcars-small) ;; (MTCARS:X1 #("Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" "Hornet Sportabout") ;; MTCARS:MPG #(21 21 22.8d0 21.4d0 18.7d0) ;; MTCARS:CYL #(6 6 4 6 8) ;; MTCARS:DISP #(160 160 108 258 360) ;; MTCARS:HP #(110 110 93 110 175) ;; MTCARS:DRAT #(3.9d0 3.9d0 3.85d0 3.08d0 3.15d0) ;; MTCARS:WT #(2.62d0 2.875d0 2.32d0 3.215d0 3.44d0) ;; MTCARS:QSEC #(16.46d0 17.02d0 18.61d0 19.44d0 17.02d0) ;; MTCARS:VS #*00110 ;; MTCARS:AM #*11100 ;; MTCARS:GEAR #(4 4 4 3 3) ;; MTCARS:CARB #(4 4 1 1 2))

as-array

as-array returns the data frame as a row-major two dimensional lisp array. You’ll want to save the variable names using the keys function to make it easy to convert back (see matrix-df). One of the reasons you might want to use this function is to manipulate the data-frame using array-operations. This is particularly useful when you have data frames of all numeric values.

(defparameter mtcars-keys (keys mtcars)) ; we'll use later (defparameter mtcars-small-array (as-array mtcars-small)) mtcars-small-array ;; 0 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ;; 1 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ;; 2 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ;; 3 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ;; 4 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

Our abbreviated mtcars data frame is now a two dimensional Common Lisp array. It may not look like one because Lisp-Stat will ‘print pretty’ arrays. You can inspect it with the describe command to make sure:

LS-USER> (describe mtcars-small-array) ... Type: (SIMPLE-ARRAY T (5 12)) Class: #<BUILT-IN-CLASS SIMPLE-ARRAY> Element type: T Rank: 2 Physical size: 60

vectors

The columns function returns the variables of the data frame as a vector of vectors:

(columns mtcars-small) ; #(#("Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" "Hornet Sportabout") ; #(21 21 22.8d0 21.4d0 18.7d0) ; #(6 6 4 6 8) ; #(160 160 108 258 360) ; #(110 110 93 110 175) ; #(3.9d0 3.9d0 3.85d0 3.08d0 3.15d0) ; #(2.62d0 2.875d0 2.32d0 3.215d0 3.44d0) ; #(16.46d0 17.02d0 18.61d0 19.44d0 17.02d0) ; #*00110 ; #*11100 ; #(4 4 4 3 3) ; #(4 4 1 1 2))

This is a column-major lisp array.

You can also pass a selection to the columns function to return specific columns:

(columns mtcars-small 'mpg) ; #(21 21 22.8d0 21.4d0 18.7d0)

The functions in array-operations are helpful in further dealing with data frames as vectors and arrays. For example you could convert a data frame to a transposed array by using aops:combine with the columns function:

(combine (columns mtcars-small)) ;; 0 Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout ;; 1 21.00 21.000 22.80 21.400 18.70 ;; 2 6.00 6.000 4.00 6.000 8.00 ;; 3 160.00 160.000 108.00 258.000 360.00 ;; 4 110.00 110.000 93.00 110.000 175.00 ;; 5 3.90 3.900 3.85 3.080 3.15 ;; 6 2.62 2.875 2.32 3.215 3.44 ;; 7 16.46 17.020 18.61 19.440 17.02 ;; 8 0.00 0.000 1.00 1.000 0.00 ;; 9 1.00 1.000 1.00 0.000 0.00 ;; 10 4.00 4.000 4.00 3.000 3.00 ;; 11 4.00 4.000 1.00 1.000 2.00

Load data

There are two functions for loading data. The first data makes loading from logical pathnames convenient. The other, read-csv works with the file system or URLs. Although the name read-csv implies only CSV (comma separated values), it can actually read with other delimiters, such as the tab character. See the DFIO API reference for more information.

The data command

For built in Lisp-Stat data sets, you can load with just the data set name. For example to load mtcars:

(data :mtcars)

If you’ve installed the R data sets, and want to load the antigua data set from the daag package, you could do it like this:

(data :antigua :system :rdata :directory :daag :type :csv)

If the file type is not lisp (say it’s TSV or CSV), you need to specify the type parameter.

From strings

Here is a short demonstration of reading from strings:

(defparameter *d* (read-csv (format nil "Gender,Age,Height~@ \"Male\",30,180.~@ \"Male\",31,182.7~@ \"Female\",32,1.65e2")))

dfio tries to hard to decipher the various number formats sometimes encountered in CSV files:

(select (dfio:read-csv (format nil "\"All kinds of wacky number formats\"~%.7~%19.~%.7f2")) t 'all-kinds-of-wacky-number-formats) ; => #(0.7d0 19.0d0 70.0)

From delimited files

We saw above that dfio can read from strings, so one easy way to read from a file is to use the uiop system function read-file-string. We can read one of the example data files included with Lisp-Stat like this:

(read-csv (uiop:read-file-string #P"LS:DATA;absorbtion.csv")) ;; IRON ALUMINUM ABSORPTION ;; 0 61 13 4 ;; 1 175 21 18 ;; 2 111 24 14 ;; 3 124 23 18 ;; 4 130 64 26 ;; 5 173 38 26 ..

That example just illustrates reading from a file to a string. In practice you’re better off just reading the file in directly and avoid reading into a string first:

(read-csv #P"LS:DATA;absorbtion.csv") ;; IRON ALUMINUM ABSORPTION ;; 0 61 13 4 ;; 1 175 21 18 ;; 2 111 24 14 ;; 3 124 23 18 ;; 4 130 64 26 ;; 5 173 38 26 ..

From parquet files

You can use the duckdb system to load data from parquet files:

(ql:quickload :duckdb) ; see duckdb repo for installation instructions
(ddb:query "INSTALL httpfs;" nil) ; loading via http
(ddb:initialize-default-connection)
(defdf yellow-taxis
    (let ((q (ddb:query "SELECT * FROM read_parquet('https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet') LIMIT 10" nil)))
      (make-df (mapcar #'dfio:string-to-symbol (alist-keys q))
	       (alist-values q))))

Now we can find the average fare:

(mean yellow-taxis:fare-amount)
11.120000000000001d0

From URLs

dfio can also read from Common Lisp streams. Stream operations can be network or file based. Here is an example of how to read the classic Iris data set over the network:

(read-csv "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/iris.csv") ;; X27 SEPAL-LENGTH SEPAL-WIDTH PETAL-LENGTH PETAL-WIDTH SPECIES ;; 0 1 5.1 3.5 1.4 0.2 setosa ;; 1 2 4.9 3.0 1.4 0.2 setosa ;; 2 3 4.7 3.2 1.3 0.2 setosa ;; 3 4 4.6 3.1 1.5 0.2 setosa ;; 4 5 5.0 3.6 1.4 0.2 setosa ;; 5 6 5.4 3.9 1.7 0.4 setosa ..

From a database

You can load data from a SQLite table using the read-table command. Here’s an example of reading the iris data frame from a SQLite table:

(asdf:load-system :sqldf) (defdf iris (sqldf:read-table (sqlite:connect #P"S:\\src\\lisp-stat\\data\\iris.db3") "iris"))

Note that sqlite:connect does not take a logical pathname; use a system path appropriate for your computer. One reason you might want to do this is for speed in loading CSV. The CSV loader for SQLite is 10-15 times faster than the fastest Common Lisp CSV parser, and it is often quicker to load to SQLite first, then load into Lisp.

Save data

Data frames can be saved into any delimited text format supported by fare-csv, or several flavors of JSON, such as Vega-Lite.

As CSV

To save the mtcars data frame to disk, you could use:

(write-csv mtcars #P"LS:DATA;mtcars.csv" :add-first-row t) ; add column headers

to save it as CSV, or to save it to tab-separated values:

(write-csv mtcars #P"LS:DATA;mtcars.tsv" :separator #\tab :add-first-row t) ; add column headers

As Lisp

For the most part, you will want to save your data frames as lisp. Doing so is both faster in loading, but more importantly it preserves any variable attributes that may have been given.

To save a data frame, use the save command:

(save 'mtcars #P"LS:DATA;mtcars-example")

Note that in this case you are passing the symbol to the function, not the value (thus the quote (’) before the name of the data frame). Also note that the system will add the ’lisp’ suffix for you.

To a database

The write-table function can be used to save a data frame to a SQLite database. Each take a connection to a database, which may be file or memory based, a table name and a data frame. Multiple data frames, with different table names, may be written to a single SQLite file this way.

Access data

This section describes various way to access data variables.

Define a data-frame

Let’s use defdf to define the iris data frame. We’ll use both of these data frames in the examples below.

(defdf iris (read-csv rdata:iris)) ;WARNING: Missing column name was filled in

We now have a global variable named iris that represents the data frame. Let’s look at the first part of this data:

(head iris) ;; X29 SEPAL-LENGTH SEPAL-WIDTH PETAL-LENGTH PETAL-WIDTH SPECIES ;; 0 1 5.1 3.5 1.4 0.2 setosa ;; 1 2 4.9 3.0 1.4 0.2 setosa ;; 2 3 4.7 3.2 1.3 0.2 setosa ;; 3 4 4.6 3.1 1.5 0.2 setosa ;; 4 5 5.0 3.6 1.4 0.2 setosa ;; 5 6 5.4 3.9 1.7 0.4 setosa

Notice a couple of things. First, there is a column X29. In fact if you look back at previous data frame output in this tutorial you will notice various columns named X followed by some number. This is because the column was not given a name in the data set, so a name was generated for it. X starts at 1 and increased by 1 each time an unnamed variable is encountered during your Lisp-Stat session. The next time you start Lisp-Stat, numbering will begin from 1 again. We will see how to clean this up this data frame in the next sections.

The second thing to note is the row numbers on the far left side. When Lisp-Stat prints a data frame it automatically adds row numbers. Row and column numbering in Lisp-Stat start at 0. In R they start with 1. Row numbers make it convenient to select data sections from a data frame, but they are not part of the data and cannot be selected or manipulated themselves. They only appear when a data frame is printed.

Access a variable

The defdf macro also defines symbol macros that allow you to refer to a variable by name, for example to refer to the mpg column of mtcars, you can refer to it by the the name data-frame:variable convention.

mtcars:mpg ; #(21 21 22.8D0 21.4D0 18.7D0 18.1D0 14.3D0 24.4D0 22.8D0 19.2D0 17.8D0 16.4D0 17.3D0 15.2D0 10.4D0 10.4D0 14.7D0 32.4D0 30.4D0 33.9D0 21.5D0 15.5D0 15.2D0 13.3D0 19.2D0 27.3D0 26 30.4D0 15.8D0 19.7D0 15 21.4D0)

There is a point of distinction to be made here: the values of mpg and the column mpg. For example to obtain the same vector using the selection/sub-setting package select we must refer to the column:

(select mtcars t 'mpg) ; #(21 21 22.8D0 21.4D0 18.7D0 18.1D0 14.3D0 24.4D0 22.8D0 19.2D0 17.8D0 16.4D0 17.3D0 15.2D0 10.4D0 10.4D0 14.7D0 32.4D0 30.4D0 33.9D0 21.5D0 15.5D0 15.2D0 13.3D0 19.2D0 27.3D0 26 30.4D0 15.8D0 19.7D0 15 21.4D0)

Note that with select we passed the symbol 'mpg (you can tell it’s a symbol because of the quote in front of it).

So, the rule here is: if you want the value refer to it directly, e.g. mtcars:mpg. If you are referring to the column, use the symbol. Data frame operations sometimes require the symbol, where as Common Lisp and other packages that take vectors use the direct access form.

Data-frame operations

These functions operate on data-frames as a whole.

copy

copy returns a newly allocated data-frame with the same values as the original:

(copy mtcars-small) ;; X1 MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ;; 1 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ;; 2 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ;; 3 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ;; 4 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

By default only the keys are copied and the original data remains the same, i.e. a shallow copy. For a deep copy, use the copy-array function as the key:

(copy mtcars-small :key #'copy-array) ;; X1 MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ;; 1 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ;; 2 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ;; 3 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ;; 4 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

Useful when applying destructive operations to the data-frame.

keys

Returns a vector of the variables in the data frame. The keys are symbols. Symbol properties describe the variable, for example units.

(keys mtcars) ; #(X45 MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB)

Recall the earlier discussion of X1 for the column name.

map-df

map-df transforms one data-frame into another, row-by-row. Its function signature is:

(map-df data-frame keys function result-keys) ...

It applies function to each row, and returns a data frame with the result-keys as the column (variable) names. keys is a list. You can also specify the type of the new variables in the result-keys list.

The goal for this example is to transform df1:

(defparameter df1 (make-df '(:a :b) '(#(2 3 5) #(7 11 13))))

into a data-frame that consists of the product of :a and :b, and a bit mask of the columns that indicate where the value is <= 30. First we’ll need a helper for the bit mask:

(defun predicate-bit (a b) "Return 1 if a*b <= 30, 0 otherwise" (if (<= 30 (* a b)) 1 0))

Now we can transform df1 into our new data-frame, df2, with:

(defparameter df2 (map-df df1 '(:a :b) (lambda (a b) (vector (* a b) (predicate-bit a b))) '((:p fixnum) (:m bit))))

Since it was a parameter assignment, we have to view it manually:

(print-df df2) ;; P M ;; 0 14 0 ;; 1 33 1 ;; 2 65 1

Note how we specified both the new key names and their type. Here’s an example that transforms the units of mtcars from imperial to metric:

(map-df mtcars '(x1 mpg disp hp wt) (lambda (model mpg disp hp wt) (vector model ;no transformation for model (X1), return as-is (/ 235.214583 mpg) (/ disp 61.024) (* hp 1.01387) (/ (* wt 1000) 2.2046))) '(:model (:100km/l float) (:disp float) (:hp float) (:kg float))) ;; MODEL 100KM/L DISP HP KG ;; 0 Mazda RX4 11.2007 2.6219 111.5257 1188.4242 ;; 1 Mazda RX4 Wag 11.2007 2.6219 111.5257 1304.0914 ;; 2 Datsun 710 10.3164 1.7698 94.2899 1052.3451 ;; 3 Hornet 4 Drive 10.9913 4.2278 111.5257 1458.3144 ;; 4 Hornet Sportabout 12.5783 5.8993 177.4272 1560.3737 ;; 5 Valiant 12.9953 3.6871 106.4564 1569.4456 ..

Note that you may have to adjust the X column name to suit your current environment.

You might be wondering how we were able to refer to the columns without the ’ (quote); in fact we did, at the beginning of the list. The lisp reader then reads the contents of the list as symbols.

print

The print-data command will print a data frame in a nicely formatted way, respecting the pretty printing row/column length variables:

(print-data mtcars) ;; MODEL MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ;; Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ;; Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ;; Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ... ;; Output elided for brevity

rows

rows returns the rows of a data frame as a vector of vectors:

(rows mtcars-small) ;#(#("Mazda RX4" 21 6 160 110 3.9d0 2.62d0 16.46d0 0 1 4 4) ; #("Mazda RX4 Wag" 21 6 160 110 3.9d0 2.875d0 17.02d0 0 1 4 4) ; #("Datsun 710" 22.8d0 4 108 93 3.85d0 2.32d0 18.61d0 1 1 4 1) ; #("Hornet 4 Drive" 21.4d0 6 258 110 3.08d0 3.215d0 19.44d0 1 0 3 1) ; #("Hornet Sportabout" 18.7d0 8 360 175 3.15d0 3.44d0 17.02d0 0 0 3 2))

remove duplicates

The df-remove-duplicates function will remove duplicate rows. Let’s create a data-frame with duplicates:

(defparameter dup (make-df '(a b c) '(#(a1 a1 a3) #(a1 a1 b3) #(a1 a1 c3)))) ;DUP ;; A B C ;; 0 A1 A1 A1 ;; 1 A1 A1 A1 ;; 2 A3 B3 C3

Now remove duplicate rows 0 and 1:

(df-remove-duplicates dup) ;; A B C ;; A1 A1 A1 ;; A3 B3 C3

remove data-frame

If you are working with large data sets, you may wish to remove a data frame from your environment to save memory. The undef command does this:

LS-USER> (undef 'tooth-growth)
(TOOTH-GROWTH)

You can check that it was removed with the show-data-frames function, or by viewing the list df::*data-frames*.

list data-frames

To list the data frames in your environment, use the show-data-frames function. Here is an example of what is currently loaded into the authors environment. The data frames listed may be different for you, depending on what you have loaded.

To see this output, you’ll have to change to the standard print-object method, using this code:

(defmethod print-object ((df data-frame) stream) "Print DATA-FRAME dimensions and type After defining this method it is permanently associated with data-frame objects" (print-unreadable-object (df stream :type t) (let ((description (and (slot-boundp df 'name) (documentation (find-symbol (name df)) 'variable)))) (format stream "(~d observations of ~d variables)" (aops:nrow df) (aops:ncol df)) (when description (format stream "~&~A" (short-string description))))))

Now, to see all the data frames in your environment:

LS-USER> (show-data-frames) #<DATA-FRAME AQ (153 observations of 7 variables)> #<DATA-FRAME MTCARS (32 observations of 12 variables) Motor Trend Car Road Tests> #<DATA-FRAME USARRESTS (50 observations of 5 variables) Violent Crime Rates by US State> #<DATA-FRAME PLANTGROWTH (30 observations of 3 variables) Results from an Experiment on Plant Growth> #<DATA-FRAME TOOTHGROWTH (60 observations of 4 variables) The Effect of Vitamin C on Tooth Growth in Guinea Pigs>

with the :head t option, show-data-frames will print the first five rows of the data frame, similar to the head command:

LS-USER> (show-data-frames :head t) AQ ;; X5 OZONE SOLAR-R WIND TEMP MONTH DAY ;; 1 41.0000 190 7.4 67 5 1 ;; 2 36.0000 118 8.0 72 5 2 ;; 3 12.0000 149 12.6 74 5 3 ;; 4 18.0000 313 11.5 62 5 4 ;; 5 42.1293 NA 14.3 56 5 5 ;; 6 28.0000 NA 14.9 66 5 6 .. MTCARS ;; MODEL MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ;; Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ;; Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ;; Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ;; Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ;; Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 .. ;; Output elided for brevity

You, of course, may see different output depending on what data frames you currently have loaded.

Let’s change the print-object back to our convenience method.

(defmethod print-object ((df data-frame) stream) "Print the first six rows of DATA-FRAME" (let ((*print-lines* 6)) (df:print-data df stream nil)))

Column operations

You have seen some of these functions before, and for completeness we repeat them here.

To obtain a variable (column) from a data frame, use the column function. Using the mtcars-small data frame, defined in export data frames above:

(column mtcars-small 'mpg) ;; #(21 21 22.8d0 21.4d0 18.7d0)

To get all the columns as a vector, use the columns function:

(columns mtcars-small) ; #(#("Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" "Hornet Sportabout") ; #(21 21 22.8d0 21.4d0 18.7d0) ; #(6 6 4 6 8) ; #(160 160 108 258 360) ; #(110 110 93 110 175) ; #(3.9d0 3.9d0 3.85d0 3.08d0 3.15d0) ; #(2.62d0 2.875d0 2.32d0 3.215d0 3.44d0) ; #(16.46d0 17.02d0 18.61d0 19.44d0 17.02d0) ; #*00110 ; #*11100 ; #(4 4 4 3 3) ; #(4 4 1 1 2))

You can also return a subset of the columns by passing in a selection:

(columns mtcars-small '(mpg wt)) ;; #(#(21 21 22.8d0 21.4d0 18.7d0) #(2.62d0 2.875d0 2.32d0 3.215d0 3.44d0))

add columns

There are two ‘flavors’ of add functions, destructive and non-destructive. The latter return a new data frame as the result, and the destructive versions modify the data frame passed as a parameter. The destructive versions are denoted with a ‘!’ at the end of the function name.

The columns to be added can be in several formats:

  • plist
  • alist
  • (plist)
  • (alist)
  • (data-frame)

To add a single column to a data frame, use the add-column! function. We’ll use a data frame similar to the one used in our reading data-frames from a string example to illustrate column operations.

Create the data frame:

(defparameter *d* (read-csv (format nil "Gender,Age,Height \"Male\",30,180 \"Male\",31,182 \"Female\",32,165 \"Male\",22,167 \"Female\",45,170")))

and print it:

(head *d*) ;; GENDER AGE HEIGHT ;; 0 Male 30 180 ;; 1 Male 31 182 ;; 2 Female 32 165 ;; 3 Male 22 167 ;; 4 Female 45 170

and add a ‘weight’ column to it:

(add-column! *d* 'weight #(75.2 88.5 49.4 78.1 79.4)) ;; GENDER AGE HEIGHT WEIGHT ;; 0 Male 30 180 75.2 ;; 1 Male 31 182 88.5 ;; 2 Female 32 165 49.4 ;; 3 Male 22 167 78.1 ;; 4 Female 45 170 79.4

now that we have weight, let’s add a BMI column to it to demonstrate using a function to compute the new column values:

(add-column! *d* 'bmi (map-rows *d* '(height weight) #'(lambda (h w) (/ w (square (/ h 100)))))) ;; SEX AGE HEIGHT WEIGHT BMI ;; 0 Female 10 180 75.2 23.209875 ;; 1 Female 15 182 88.5 26.717787 ;; 2 Male 20 165 49.4 18.145086 ;; 3 Female 25 167 78.1 28.003874 ;; 4 Male 30 170 79.4 27.474049

Now let’s add multiple columns destructively using add-columns!

(add-columns! *d* 'a #(1 2 3 4 5) 'b #(foo bar baz qux quux)) ;; GENDER AGE HEIGHT WEIGHT BMI A B ;; Male 30 180 75.2 23.2099 1 FOO ;; Male 31 182 88.5 26.7178 2 BAR ;; Female 32 165 49.4 18.1451 3 BAZ ;; Male 22 167 78.1 28.0039 4 QUX ;; Female 45 170 79.4 27.4740 5 QUUX

remove columns

Let’s remove the columns a and b that we just added above with the remove-columns function. Since it returns a new data frame, we’ll need to assign the return value to *d*:

(setf *d* (remove-columns *d* '(a b bmi))) ;; GENDER AGE HEIGHT WEIGHT BMI ;; Male 30 180 75.2 23.2099 ;; Male 31 182 88.5 26.7178 ;; Female 32 165 49.4 18.1451 ;; Male 22 167 78.1 28.0039 ;; Female 45 170 79.4 27.4740

To remove columns destructively, meaning modifying the original data, use the remove-column! or remove-columns! functions.

rename columns

Sometimes data sources can have variable names that we want to change. To do this, use the rename-column! function. This example will rename the ‘gender’ variable to ‘sex’:

(rename-column! *d* 'sex 'gender) ;; SEX AGE HEIGHT WEIGHT ;; 0 Male 30 180 75.2 ;; 1 Male 31 182 88.5 ;; 2 Female 32 165 49.4 ;; 3 Male 22 167 78.1 ;; 4 Female 45 170 79.4

If you used defdf to create your data frame, and this is the recommended way to define data frames, the variable references within the data package will have been updated. This is true for all destructive data frame operations. Let’s use this now to rename the mtcars X1 variable to model. First a quick look at the first 2 rows as they are now:

(head mtcars 2) ;; X1 MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ;; 1 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4

Replace X1 with model:

(rename-column! mtcars 'model 'x1)

Note: check to see what value your version of mtcars has. In this case, with a fresh start of Lisp-Stat, it has X1. It could have X2, X3, etc.

Now check that it worked:

(head mtcars 2) ;; MODEL MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 ;; 1 Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4

We can now refer to mtcars:model

mtcars:model #("Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" "Hornet Sportabout" "Valiant" "Duster 360" "Merc 240D" "Merc 230" "Merc 280" "Merc 280C" "Merc 450SE" "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood" "Lincoln Continental" "Chrysler Imperial" "Fiat 128" "Honda Civic" "Toyota Corolla" "Toyota Corona" "Dodge Challenger" "AMC Javelin" "Camaro Z28" "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2" "Lotus Europa" "Ford Pantera L" "Ferrari Dino" "Maserati Bora" "Volvo 142E")

replace columns

Columns are “setf-able” places and the simplest way to replace a column is set the field to a new value. We’ll complement the sex field of *d*:

(df::setf (df:column *d* 'sex) #("Female" "Female" "Male" "Female" "Male")) ;#("Female" "Female" "Male" "Female" "Male")

Note that df::setf is not exported. Use this with caution.

You can also replace a column using two functions specifically for this purpose. Here we’ll replace the ‘age’ column with new values:

(replace-column *d* 'age #(10 15 20 25 30)) ;; SEX AGE HEIGHT WEIGHT ;; 0 Female 10 180 75.2 ;; 1 Female 15 182 88.5 ;; 2 Male 20 165 49.4 ;; 3 Female 25 167 78.1 ;; 4 Male 30 170 79.4

That was a non-destructive replacement, and since we didn’t reassign the value of *d*, it is unchanged:

LS-USER> (print-data *d*) ;; SEX AGE HEIGHT WEIGHT ;; 0 Female 30 180 75.2 ;; 1 Female 31 182 88.5 ;; 2 Male 32 165 49.4 ;; 3 Female 22 167 78.1 ;; 4 Male 45 170 79.4

We can also use the destructive version to make a permanent change instead of setf-ing *d*:

(replace-column! *d* 'age #(10 15 20 25 30)) ;; SEX AGE HEIGHT WEIGHT ;; 0 Female 10 180 75.2 ;; 1 Female 15 182 88.5 ;; 2 Male 20 165 49.4 ;; 3 Female 25 167 78.1 ;; 4 Male 30 170 79.4

transform columns

There are two functions for column transformations, replace-column and map-columns.

replace-column

replace-column can be used to transform a column by applying a function to each value. This example will add 20 to each row of the age column:

(replace-column *d* 'age #'(lambda (x) (+ 20 x))) ;; SEX AGE HEIGHT WEIGHT ;; 0 Female 30 180 75.2 ;; 1 Female 35 182 88.5 ;; 2 Male 40 165 49.4 ;; 3 Female 45 167 78.1 ;; 4 Male 50 170 79.4

replace-column! can also apply functions to a column, destructively modifying the column.

map-columns

The map-columns functions can be thought of as applying a function on all the values of each variable/column as a vector, rather than the individual rows as replace-column does. To see this, we’ll use functions that operate on vectors, in this case nu:e+, which is the vector addition function for Lisp-Stat. Let’s see this working first:

(nu:e+ #(1 1 1) #(2 3 4)) ; => #(3 4 5)

observe how the vectors were added element-wise. We’ll demonstrate map-columns by adding one to each of the numeric columns in the example data frame:

(map-columns (select *d* t '(weight age height)) #'(lambda (x) (nu:e+ 1 x))) ;; WEIGHT AGE HEIGHT ;; 0 76.2 11 181 ;; 1 89.5 16 183 ;; 2 50.4 21 166 ;; 3 79.1 26 168 ;; 4 80.4 31 171

recall that we used the non-destructive version of replace-column above, so *d* has the original values. Also note the use of select to get the numeric variables from the data frame; e+ can’t add categorical values like gender/sex.

Row operations

As the name suggests, row operations operate on each row, or observation, of a data set.

add rows

Adding rows is done with the array-operations stacking functions. Since these functions operate on both arrays and data frames, we can use them to stack data frames, arrays, or a mixture of both, providing they have a rank of 2. Here’s an example of adding a row to the mtcars data frame:

(defparameter boss-mustang #("Boss Mustang" 12.7d0 8 302 405 4.11d0 2.77d0 12.5d0 0 1 4 4))

and now stack it onto the mtcars data set (load it with (data :mtcars) if you haven’t already done so):

(matrix-df (keys mtcars) (stack-rows mtcars boss-mustang))

This is the functional equivalent of R’s rbind function. You can also add columns with the stack-cols function.

An often asked question is: why don’t you have a dedicated stack-rows function? Well, if you want one it might look like this:

(defun stack-rows (df &rest objects) "Stack rows that works on matrices and/or data frames." (matrix-df (keys df) (apply #'aops:stack-rows (cons df objects))))

But now the data frame must be the first parameter passed to the function. Or perhaps you want to rename the columns? Or you have matrices as your starting point? For all those reasons, it makes more sense to pass in the column keys than a data frame:

(defun stack-rows (col-names &rest objects) "Stack rows that works on matrices and/or data frames." (matrix-df (keys col-names) (stack-rows objects)))

However this means we have two stack-rows functions, and you don’t really gain anything except an extra function call. So use the above definition if you like; we use the first example and call matrix-df and stack-rows to stack data frames.

count-rows

This function is used to determine how many rows meet a certain condition. For example if you want to know how many cars have a MPG (miles-per-galleon) rating greater than 20, you could use:

(count-rows mtcars 'mpg #'(lambda (x) (< 20 x))) ; => 14

do-rows

do-rows applies a function on selected variables. The function must take the same number of arguments as variables supplied. It is analogous to dotimes, but iterating over data frame rows. No values are returned; it is purely for side-effects. Let’s create a new data data-frame to illustrate row operations:

LS-USER> (defparameter *d2* (make-df '(a b) '(#(1 2 3) #(10 20 30)))) *D2* LS-USER> *d2* ;; A B ;; 0 1 10 ;; 1 2 20 ;; 2 3 30

This example uses format to illustrate iterating using do-rows for side effect:

(do-rows *d2* '(a b) #'(lambda (a b) (format t "~A " (+ a b)))) 11 22 33 ; No value

map-rows

Where map-columns can be thought of as working through the data frame column-by-column, map-rows goes through row-by-row. Here we add the values in each row of two columns:

(map-rows *d2* '(a b) #'+) #(11 22 33)

Since the length of this vector will always be equal to the data-frame column length, we can add the results to the data frame as a new column. Let’s see this in a real-world pattern, subtracting the mean from a column:

(add-column! *d2* 'c (map-rows *d2* 'b #'(lambda (x) (- x (mean (select *d2* t 'b)))))) ;; A B C ;; 0 1 10 -10.0 ;; 1 2 20 0.0 ;; 2 3 30 10.0

You could also have used replace-column! in a similar manner to replace a column with normalize values.

mask-rows

mask-rows is similar to count-rows, except it returns a bit-vector for rows matching the predicate. This is useful when you want to pass the bit vector to another function, like select to retrieve only the rows matching the predicate.

(mask-rows mtcars 'mpg #'(lambda (x) (< 20 x))) ; => #*11110001100000000111100001110001

filter-rows

The filter-rows function will return a data-frame whose rows match the predicate. The function signature is:

(defun filter-rows (data body) ...

As an example, let’s filter mtcars to find all the cars whose fuel consumption is greater than 20 mpg:

(filter-rows mtcars '(< 20 mpg)) ;=> #<DATA-FRAME (14 observations of 12 variables)>

To view them we’ll need to call the print-data function directly instead of using the print-object function we installed earlier. Otherwise, we’ll only see the first 6.

(print-data *) ;; MODEL MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ;; 1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ;; 2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ;; 3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ;; 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ;; 5 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ;; 6 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ;; 7 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 ;; 8 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 ;; 9 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 ;; 10 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 ;; 11 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 ;; 12 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 ;; 13 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

Filter predicates can be more complex than this, here’s an example filtering the Vega movies data set (which we call imdb):

(filter-rows imdb '(and (not (eql imdb-rating :na)) (local-time:timestamp< release-date (local-time:parse-timestring "2019-01-01"))))

You can refer to any of the column/variable names in the data-frame directly when constructing the filter predicate. The predicate is turned into a lambda function, so let, etc is also possible.

Summarising data

Often the first thing you’ll want to do with a data frame is get a quick summary. You can do that with these functions, and we’ve seen most of them used in this manual. For more information about these functions, see the data-frame api reference.

nrow data-frame
return the number of rows in data-frame
ncol data-frame
return the number of columns in data-frame
dims data-frame
return the dimensions of data-frame as a list in (rows columns) format
keys data-frame
return a vector of symbols representing column names
column-names data-frame
returns a list of strings of the column names in data-frames
head data-frame &optional n
displays the first n rows of data-frame. n defaults to 6.
tail data-frame &optional n
displays the last n rows of data-frame. n defaults to 6.

describe

describe data-frame
returns the meta-data for the variables in data-frame

describe is a common lisp function that describes an object. In Lisp-Stat describe prints a description of the data frame and the three ‘standard’ properties of the variables: type, unit and description. It is similar to the str command in R. To see an example use the augmented mtcars data set included in Lisp-Stat. In this data set, we have added properties describing the variables. This is a good illustration of why you should always save data frames in lisp format; properties such as these are lost in CSV format.

(data :mtcars)
LS-USER> (describe mtcars) MTCARS Motor Trend Car Road Tests A data-frame with 32 observations of 12 variables Variable | Type | Unit | Label -------- | ---- | ---- | ----------- MODEL | STRING | NIL | NIL MPG | DOUBLE-FLOAT | M/G | Miles/(US) gallon CYL | INTEGER | NA | Number of cylinders DISP | DOUBLE-FLOAT | IN3 | Displacement (cu.in.) HP | INTEGER | HP | Gross horsepower DRAT | DOUBLE-FLOAT | NA | Rear axle ratio WT | DOUBLE-FLOAT | LB | Weight (1000 lbs) QSEC | DOUBLE-FLOAT | S | 1/4 mile time VS | BIT | NA | Engine (0=v-shaped, 1=straight) AM | BIT | NA | Transmission (0=automatic, 1=manual) GEAR | INTEGER | NA | Number of forward gears CARB | INTEGER | NA | Number of carburetors

summary

summary data-frame
returns a summary of the variables in data-frame

Summary functions are one of those things that tend to be use-case or application specific. Witness the number of R summary packages; there are at least half a dozen, including hmisc, stat.desc, psych describe, skim and summary tools. In short, there is no one-size-fits-all way to provide summaries, so Lisp-Stat provides the data structures upon which users can customise the summary output. The output you see below is a simple :print-function for each of the summary structure types (numeric, factor, bit and generic).

LS-USER> (summary mtcars) ( MPG (Miles/(US) gallon) n: 32 missing: 0 min=10.40 q25=15.40 q50=19.20 mean=20.09 q75=22.80 max=33.90 CYL (Number of cylinders) 14 (44%) x 8, 11 (34%) x 4, 7 (22%) x 6, DISP (Displacement (cu.in.)) n: 32 missing: 0 min=71.10 q25=120.65 q50=205.87 mean=230.72 q75=334.00 max=472.00 HP (Gross horsepower) n: 32 missing: 0 min=52 q25=96.00 q50=123 mean=146.69 q75=186.25 max=335 DRAT (Rear axle ratio) n: 32 missing: 0 min=2.76 q25=3.08 q50=3.70 mean=3.60 q75=3.95 max=4.93 WT (Weight (1000 lbs)) n: 32 missing: 0 min=1.51 q25=2.54 q50=3.33 mean=3.22 q75=3.68 max=5.42 QSEC (1/4 mile time) n: 32 missing: 0 min=14.50 q25=16.88 q50=17.71 mean=17.85 q75=18.90 max=22.90 VS (Engine (0=v-shaped, 1=straight)) ones: 14 (44%) AM (Transmission (0=automatic, 1=manual)) ones: 13 (41%) GEAR (Number of forward gears) 15 (47%) x 3, 12 (38%) x 4, 5 (16%) x 5, CARB (Number of carburetors) 10 (31%) x 4, 10 (31%) x 2, 7 (22%) x 1, 3 (9%) x 3, 1 (3%) x 6, 1 (3%) x 8, )

Note that the model column, essentially row-name was deleted from the output. The summary function, designed for human readable output, removes variables with all unique values, and those with monotonically increasing numbers (usually row numbers).

To build your own summary function, use the get-summaries function to get a list of summary structures for the variables in the data frame, and then print them as you wish.

columns

You can also describe or summarize individual columns:

LS-USER> (describe 'mtcars:mpg) MTCARS:MPG [symbol] MPG names a symbol macro: Expansion: (AREF (COLUMNS MTCARS) 1) Symbol-plist: :TYPE -> DOUBLE-FLOAT :UNIT -> M/G :LABEL -> "Miles/(US) gallon"
LS-USER> (summarize-column 'mtcars:mpg) MPG (Miles/(US) gallon) n: 32 missing: 0 min=10.40 q25=15.40 q50=19.20 mean=20.09 q75=22.80 max=33.90

Missing values

Data sets often contain missing values and we need to both understand where and how many are missing, and how to transform or remove them for downstream operations. In Lisp-Stat, missing values are represented by the keyword symbol :na. You can control this encoding during delimited text import by passing an a-list containing the mapping. By default this is a keyword parameter map-alist:

(map-alist '(("" . :na) ("NA" . :na)))

The default maps blank cells ("") and ones containing “NA” (not available) to the keyword :na, which stands for missing. Some systems encode missing values as numeric, e.g. 99; in this case you can pass in a map-alist that includes this mapping:

(map-alist '(("" . :na) ("NA" . :na) (99 . :na)))

We will use the R air-quality dataset to illustrate working with missing values. Let’s load it now:

(defdf aq (read-csv rdata:airquality))

Examine

To see missing values we use the predicate missingp. This works on sequences, arrays and data-frames. It returns a logical sequence, array or data-frame indicating which values are missing. T indicates a missing value, NIL means the value is present. Here’s an example of using missingp on a vector:

(missingp #(1 2 3 4 5 6 :na 8 9 10)) ;#(NIL NIL NIL NIL NIL NIL T NIL NIL NIL)

and on a data-frame:

(print-data (missingp aq)) ;; X3 OZONE SOLAR-R WIND TEMP MONTH DAY ;; 0 NIL NIL NIL NIL NIL NIL NIL ;; 1 NIL NIL NIL NIL NIL NIL NIL ;; 2 NIL NIL NIL NIL NIL NIL NIL ;; 3 NIL NIL NIL NIL NIL NIL NIL ;; 4 NIL T T NIL NIL NIL NIL ;; 5 NIL NIL T NIL NIL NIL NIL ;; 6 NIL NIL NIL NIL NIL NIL NIL ;; 7 NIL NIL NIL NIL NIL NIL NIL ;; 8 NIL NIL NIL NIL NIL NIL NIL ;; 9 NIL T NIL NIL NIL NIL NIL ;; 10 NIL NIL T NIL NIL NIL NIL ;; 11 NIL NIL NIL NIL NIL NIL NIL ;; 12 NIL NIL NIL NIL NIL NIL NIL ;; 13 NIL NIL NIL NIL NIL NIL NIL ;; 14 NIL NIL NIL NIL NIL NIL NIL ;; 15 NIL NIL NIL NIL NIL NIL NIL ;; 16 NIL NIL NIL NIL NIL NIL NIL ;; 17 NIL NIL NIL NIL NIL NIL NIL ;; 18 NIL NIL NIL NIL NIL NIL NIL ;; 19 NIL NIL NIL NIL NIL NIL NIL ;; 20 NIL NIL NIL NIL NIL NIL NIL ;; 21 NIL NIL NIL NIL NIL NIL NIL ;; 22 NIL NIL NIL NIL NIL NIL NIL ;; 23 NIL NIL NIL NIL NIL NIL NIL ..

We can see that the ozone variable contains some missing values. To see which rows of ozone are missing, we can use the which function:

(which aq:ozone :predicate #'missingp) ;#(4 9 24 25 26 31 32 33 34 35 36 38 41 42 44 45 51 52 53 54 55 56 57 58 59 60 64 71 74 82 83 101 102 106 114 118 149)

and to get a count, use the length function on this vector:

(length *) ; => 37

It’s often convenient to use the summary function to get an overview of missing values. We can do this because the missingp function is a transformation of a data-frame that yields another data-frame of boolean values:

LS-USER> (summary (missingp aq)) X4: 153 (100%) x NIL, OZONE: 116 (76%) x NIL, 37 (24%) x T, SOLAR-R: 146 (95%) x NIL, 7 (5%) x T, WIND: 153 (100%) x NIL, TEMP: 153 (100%) x NIL, MONTH: 153 (100%) x NIL, DAY: 153 (100%) x NIL,

we can see that ozone is missing 37 values, 24% of the total, and solar-r is missing 7 values.

Exclude

To exclude missing values from a single column, use the Common Lisp remove function:

(remove :na aq:ozone) ;#(41 36 12 18 28 23 19 8 7 16 11 14 18 14 34 6 30 11 1 11 4 32 ...

To ensure that our data-frame includes only complete observations, we exclude any row with a missing value. To do this use the drop-missing function:

(head (drop-missing aq)) ;; X3 OZONE SOLAR-R WIND TEMP MONTH DAY ;; 0 1 41 190 7.4 67 5 1 ;; 1 2 36 118 8.0 72 5 2 ;; 2 3 12 149 12.6 74 5 3 ;; 3 4 18 313 11.5 62 5 4 ;; 4 7 23 299 8.6 65 5 7 ;; 5 8 19 99 13.8 59 5 8

Replace

To replace missing values we can use the transformation functions. For example we can recode the missing values in ozone by the mean. Let’s look at the first six rows of the air quality data-frame:

(head aq) ;; X3 OZONE SOLAR-R WIND TEMP MONTH DAY ;; 0 1 41 190 7.4 67 5 1 ;; 1 2 36 118 8.0 72 5 2 ;; 2 3 12 149 12.6 74 5 3 ;; 3 4 18 313 11.5 62 5 4 ;; 4 5 NA NA 14.3 56 5 5 ;; 5 6 28 NA 14.9 66 5 6

Now replace ozone with the mean using the common lisp function nsubstitute:

(nsubstitute (mean (remove :na aq:ozone)) :na aq:ozone)

and look at head again:

(head aq) ;; X3 OZONE SOLAR-R WIND TEMP MONTH DAY ;; 0 1 41.0000 190 7.4 67 5 1 ;; 1 2 36.0000 118 8.0 72 5 2 ;; 2 3 12.0000 149 12.6 74 5 3 ;; 3 4 18.0000 313 11.5 62 5 4 ;; 4 5 42.1293 NA 14.3 56 5 5 ;; 5 6 28.0000 NA 14.9 66 5 6

You could have used the non-destructive substitute if you wanted to create a new data-frame and leave the original aq untouched.

Normally we’d round mean to be consistent from a type perspective, but did not here so you can see the values that were replaced.

Sampling

You can take a random sample of the rows of a data-frame with the select:sample function:

LS-USER> mtcars #<DATA-FRAME (32 observations of 12 variables) Motor Trend Car Road Tests> LS-USER> (sample mtcars 3 :skip-unselected t) #<DATA-FRAME (3 observations of 12 variables)> LS-USER> (print-data *) ;; MODEL MPG CYL DISP HP DRAT WT QSEC VS AM GEAR CARB ;; 0 Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2 ;; 1 Duster 360 14.3 8 360.0 245 3.21 3.57 15.84 0 0 3 4 ;; 2 Merc 230 22.8 4 140.8 95 3.92 3.15 22.90 1 0 4 2

You can also take random samples from CL sequences and arrays, with or without replacement and in various proportions. For further information see sampling in the select system manual.

Uses Vitter’s Algorithm D to efficiently select the rows. Sometimes you may want to use the algorithm at a lower level. If you don’t want the sample itself, say you only want the indices, you can directly use map-random-below, which simply calls a provided function on each index.

This is an enhancement and port to standard common lisp of ruricolist’s random-sample. It also removes the dependency on Trivia, which has a restrictive license (LLGPL).

Dates & Times

Lisp-Stat uses localtime to represent dates. This works well, but the system is a bit strict on input formats, and real-world data can be quite messy at times. For these cases chronicity and cl-date-time-parser can be helpful. Chronicity returns local-time timestamp objects, and is particularly easy to work with.

For example, if you have a variable with dates encoded like: ‘Jan 7 1995’, you can recode the column like we did for the vega movies data set:

(replace-column! imdb 'release-date #'(lambda (x) (local-time:universal-to-timestamp (date-time-parser:parse-date-time x))))

5.3 - Distributions

Working with statistical distributions

Overview

The Distributions system provides a collection of probability distributions and related functions such as:

  • Sampling from distributions
  • Moments (e.g mean, variance, skewness, and kurtosis), entropy, and other properties
  • Probability density/mass functions (pdf) and their logarithm (logpdf)
  • Moment-generating functions and characteristic functions
  • Maximum likelihood estimation
  • Distribution composition and derived distributions

Getting Started

Load the distributions system with (asdf:load-system :distributions) and the plot system with (asdf:load-system :plot/vega). Now generate a sequence of 1000 samples drawn from the standard normal distribution:

(defparameter *rn-samples* (nu:generate-sequence '(vector double-float) 1000 #'distributions:draw-standard-normal))

and plot a histogram of the counts:

(plot:plot (vega:defplot normal `(:mark :bar :data (:values ,(plist-df `(:x ,*rn-samples*))) :encoding (:x (:bin (:step 0.5) :field x) :y (:aggregate :count)))))

It looks like there’s an outlier at 5, but basically you can see it’s centered around 0.

To create a parameterised distribution, pass the parameters when you create the distribution object. In the following example we create a distribution with a mean of 2 and variance of 1 and plot it:

(defparameter rn2 (distributions:r-normal 2 1)) (let* ((seq (nu:generate-sequence '(vector double-float) 10000 (lambda () (distributions:draw rn2))))) (plot:plot (vega:defplot normal-2-1 `(:mark :bar :data (:values ,(plist-df `(:x ,seq))) :encoding (:x (:bin (:step 0.5) :field x) :y (:aggregate :count))))))

Now that we have the distribution as an object, we can obtain pdf, cdf, mean and other parameters for it:

LS-USER> (mean rn2)
2.0d0
LS-USER> (pdf rn2 1.75)
0.38666811680284924d0
LS-USER> (cdf rn2 1.75)
0.4012936743170763d0

Gamma

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distribution. There are two different parameterisations in common use:

  • With a shape parameter k and a scale parameter θ.
  • With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.

In each of these forms, both parameters are positive real numbers.

The parameterisation with k and θ appears to be more common in econometrics and certain other applied fields, where for example the gamma distribution is frequently used to model waiting times.

The parameterisation with α and β is more common in Bayesian statistics, where the gamma distribution is used as a conjugate prior distribution for various types of inverse scale (rate) parameters, such as the λ of an exponential distribution or a Poisson distribution.

When the shape parameter has an integer value, the distribution is the Erlang distribution. Since this can be produced by ensuring that the shape parameter has an integer value > 0, the Erlang distribution is not separately implemented.

PDF

The probability density function parameterized by shape-scale is:

$f(x;k,\theta )={\frac {x^{k-1}e^{-x/\theta }}{\theta ^{k}\Gamma (k)}}\quad {\text{ for }}x>0{\text{ and }}k,\theta >0$,

and by shape-rate:

$f(x;\alpha ,\beta )={\frac {x^{\alpha -1}e^{-\beta x}\beta ^{\alpha }}{\Gamma (\alpha )}}\quad {\text{ for }}x>0\quad \alpha ,\beta >0$

CDF

The cumulative distribution function characterized by shape and scale (k and θ) is:

$F(x;k,\theta )=\int _{0}^{x}f(u;k,\theta ),du={\frac {\gamma \left(k,{\frac {x}{\theta }}\right)}{\Gamma (k)}}$

where $\gamma \left(k,{\frac {x}{\theta }}\right)$ is the lower-incomplete-gamma function.

Characterized by α and β (shape and rate):

$F(x;\alpha ,\beta )=\int _{0}^{x}f(u;\alpha ,\beta ),du={\frac {\gamma (\alpha ,\beta x)}{\Gamma (\alpha )}}$

where $\gamma (\alpha ,\beta x)$ is the lower incomplete gamma function.

Usage

Python and Boost use shape & scale for parameterization. Lisp-Stat and R use shape and rate for the default parameterisation. Both forms of parameterization are common. However, since Lisp-Stat’s implementation is based on Boost (because of the restrictive license of R), we perform the conversion $\theta=\frac{1}{\beta}$ internally.

Implementation notes

In the following table k is the shape parameter of the distribution, θ is its scale parameter, x is the random variate, p is the probability and q is (- 1 p). The implementation functions are in the special-functions system.

Function Implementation
PDF (/ (gamma-p-derivative k (/ x θ)) θ)
CDF (incomplete-gamma k (/ x θ))
CDF complement (upper-incomplete-gamma k (/ x θ))
quantile (* θ (inverse-incomplete-gamma k p))
quantile complement (* θ (upper-inverse-incomplete-gamma k p))
mean
variance 2
mode (* (1- k) θ), k>1
skewness (/ 2 (sqrt k))
kurtosis (+ 3 (/ 6 k))
kurtosis excess (/ 6 k)

Example

On average, a train arrives at a station once every 15 minutes (θ=15/60). What is the probability there are 10 trains (occurances of the event) within three hours?

In this example we have:

alpha = 10
theta = 15/60
x = 3

To compute the exact answer:

(distributions:cdf-gamma 3d0 10d0 :scale 15/60) ;=> 0.7576078383294877d0

As an alternative, we can run a simulation, where we draw from the parameterised distribution and then calculate the percentage of values that fall below our threshold, x = 3:

(let* ((rv (distributions:r-gamma 10 60/15)) (seq (aops:generate (distributions:generator rv) 10000))) (statistics-1:mean (e2<= seq 3))) ;e2<= is the vectorised <= operator ;=> 0.753199999999998d0

Finally, if we want to plot the probability:

(let* ((x (aops:linspace 0.01d0 10 1000)) (prob (map 'simple-double-float-vector #'(lambda (x) (distributions:cdf-gamma x 10d0 :scale 15/60)) x)) (interval (map 'vector #'(lambda (x) (if (<= x 3) "0 to 3" "other")) x))) (plot:plot (vega:defplot gamma-example `(:mark :area :data (:values ,(plist-df `(:x ,x :prob ,prob :interval ,interval))) :encoding (:x (:field :x :type :quantitative :title "Interval (x)") :y (:field :prob :type :quantitative :title "Cum Probability") :color (:field :interval))))))

References

Boost implementation of Gamma
Gamma distribution (Wikipedia)

5.4 - Numeric Utilities

Various utilities for numerical computation

Arithmetic

The arithmetic package provides a collection of mathematical and arithmetic functions for Common Lisp. These utilities are designed to simplify common numerical computations while maintaining efficiency through inline declarations where appropriate.

Basic Operations

same-sign-p

(same-sign-p &rest numbers)

Tests whether all arguments have the same sign (i.e., all are positive, negative, or zero).

(same-sign-p 1 2 3) ; ⇒ T (same-sign-p 1 -2 3) ; ⇒ NIL (same-sign-p -1 -2 -3) ; ⇒ T

square

(square x)

Returns the square of a number. This function is inlined for performance.

(square 2) ; ⇒ 4 (square -3) ; ⇒ 9 (square 1.5) ; ⇒ 2.25

cube

(cube x)

Returns the cube of a number. This function is inlined for performance.

(cube 2) ; ⇒ 8 (cube -3) ; ⇒ -27 (cube 1.5) ; ⇒ 3.375

absolute-square

(absolute-square x)

Returns a number multiplied by its complex conjugate. For real numbers, this is equivalent to squaring. For complex numbers, it returns the squared magnitude.

(absolute-square 2.0) ; ⇒ 4.0 (absolute-square #C(3 4)) ; ⇒ 25 (since |3+4i|² = 3² + 4² = 25)

abs-diff

(abs-diff x y)

Returns the absolute difference between two numbers. This function is inlined for performance.

(abs-diff 3 5) ; ⇒ 2 (abs-diff -3 -5) ; ⇒ 2 (abs-diff 5 3) ; ⇒ 2

Logarithmic Functions

log10

(log10 x)

Computes the decimal (base 10) logarithm. It always returns a double-float to avoid the Common Lisp behavior where log might return a single-float for integer arguments.

(log10 100) ; ⇒ 2.0d0 (log10 1000) ; ⇒ 3.0d0

log2

(log2 x)

Computes the binary (base 2) logarithm. Like log10, it ensures double-float precision.

(log2 256) ; ⇒ 8.0d0 (log2 64) ; ⇒ 6.0d0

Utility Functions

1c

(1c x)

Returns 1 minus the argument. The mnemonic is “1 complement” (since 1- is already a CL function).

(1c 4/5) ; ⇒ 1/5 (1c 0.3) ; ⇒ 0.7 (1c 2) ; ⇒ -1

divides?

(divides? number divisor)

Tests if divisor divides number without remainder. If so, returns the quotient; otherwise returns NIL.

(divides? 8 2) ; ⇒ 4 (divides? 8 3) ; ⇒ NIL (divides? 15 5) ; ⇒ 3

as-integer

(as-integer x)

Converts a number to an integer if it represents an integer value, otherwise signals an error. Works with integers, rationals, floats, and complex numbers (if imaginary part is zero).

(as-integer 2.0) ; ⇒ 2 (as-integer 5/1) ; ⇒ 5 (as-integer #C(3 0)) ; ⇒ 3 (as-integer 2.5) ; signals error: "2.5 has non-zero fractional part."

Sequence Generation

numseq

(numseq from &optional to by)

Generates a sequence of numbers. With one argument, generates from 0 to from-1. With two arguments, generates from first to second (exclusive). With three arguments, uses by as the step size.

(numseq 5) ; ⇒ #(0 1 2 3 4) (numseq 2 7) ; ⇒ #(2 3 4 5 6) (numseq 0 10 2) ; ⇒ #(0 2 4 6 8)

The sign of by is automatically adjusted to match the direction from from to to. The type parameter can be:

  • 'list to return a list instead of an array
  • A numeric type specifier for the array element type
  • nil for automatic type detection

ivec

(ivec from &optional to by)

Generates a vector of fixnums (integers).

(ivec 4) ; ⇒ #(0 1 2 3) (ivec 1 4) ; ⇒ #(1 2 3) (ivec 1 4 2) ; ⇒ #(1 3) (ivec -3) ; ⇒ #(0 -1 -2)

When called with one argument, generates integers from 0 up to (but not including) the argument. With two arguments, generates from start to end. The by parameter controls the increment.

Aggregate Operations

sum

(sum sequence &key key)

Returns the sum of elements in a sequence or array. An optional key function can be applied to each element before summing.

(sum #(2 3 4)) ; ⇒ 9 (sum '(1 2 3 4)) ; ⇒ 10 (sum #()) ; ⇒ 0 (sum #(1 2 3) :key #'square) ; ⇒ 14 (1² + 2² + 3²)

product

(product sequence)

Returns the product of elements in a sequence or array.

(product #(2 3 4)) ; ⇒ 24 (product '(1 2 3)) ; ⇒ 6 (product #()) ; ⇒ 1

cumulative-sum

(cumulative-sum sequence)

Returns a sequence where each element is the sum of all preceding elements plus itself. Also returns the total sum as a second value.

(cumulative-sum #(2 3 4)) ; ⇒ #(2 5 9), 9 (cumulative-sum '(1 2 3 4)) ; ⇒ (1 3 6 10), 10 (cumulative-sum #()) ; ⇒ #(), 0

cumulative-product

(cumulative-product sequence)

Returns a sequence where each element is the product of all preceding elements times itself. Also returns the total product as a second value.

(cumulative-product #(2 3 4)) ; ⇒ #(2 6 24), 24 (cumulative-product '(1 2 3)) ; ⇒ (1 2 6), 6 (cumulative-product #()) ; ⇒ #(), 1

Probability Operations

normalize-probabilities

(normalize-probabilities sequence &key element-type)

Verifies that each element is non-negative and returns a vector scaled so elements sum to 1.

(normalize-probabilities #(1 2 7)) ; ⇒ #(1/10 2/10 7/10) (normalize-probabilities #(1 2 7) :element-type 'double-float) ; ⇒ #(0.1d0 0.2d0 0.7d0)

Parameters:

  • :element-type - type of result elements (default t)
  • :result - if provided, results are placed here; if NIL, modifies input vector in-place
(let ((v #(1 2 7))) (normalize-probabilities v :result nil) v) ; ⇒ #(1/10 2/10 7/10) ; vector modified in place

Rounding with Offset

These functions find values of the form A = I × DIVISOR + OFFSET that satisfy various rounding criteria.

floor*

(floor* number divisor &optional offset)

Finds the highest A = I × DIVISOR + OFFSET that is ≤ number. Returns (values A remainder).

(floor* 27 5) ; ⇒ 25, 2 (25 = 5×5 + 0, remainder = 2) (floor* 27 5 1) ; ⇒ 26, 1 (26 = 5×5 + 1, remainder = 1)

ceiling*

(ceiling* number divisor &optional offset)

Finds the lowest A = I × DIVISOR + OFFSET that is ≥ number. Returns (values A remainder).

(ceiling* 27 5) ; ⇒ 30, -3 (30 = 6×5 + 0, remainder = -3) (ceiling* 27 5 1) ; ⇒ 31, -4 (31 = 6×5 + 1, remainder = -4)

round*

(round* number divisor &optional offset)

Finds A = I × DIVISOR + OFFSET that minimizes |A - number|. Returns (values A remainder).

(round* 27 5) ; ⇒ 25, 2 (25 is closer to 27 than 30) (round* 27 5 -1) ; ⇒ 29, -2 (29 = 6×5 + (-1))

truncate*

(truncate* number divisor &optional offset)

Finds A = I × DIVISOR + OFFSET that maximizes |A| ≤ |number| with the same sign. Returns (values A remainder).

(truncate* -27 5) ; ⇒ -25, -2 (truncate* -27 5 1) ; ⇒ -24, -3

Sequence Min/Max

seq-max

(seq-max sequence)

Returns the maximum value in a sequence (list or vector).

(seq-max #(0 1 2 3 4 5)) ; ⇒ 5 (seq-max '(0 1 2 3 4 5)) ; ⇒ 5

seq-min

(seq-min sequence)

Returns the minimum value in a sequence (list or vector).

(seq-min #(0 1 2 3 4 5)) ; ⇒ 0 (seq-min '(-2 5 3 1)) ; ⇒ -2

Chebyshev

The Chebyshev package provides efficient polynomial approximation for functions on finite and semi-infinite intervals. It includes computation of Chebyshev polynomial roots, regression coefficients, and evaluation. This module is particularly useful for approximating expensive functions with smooth, polynomial-like behavior.

chebyshev-root

(chebyshev-root m i)

Returns the i-th root of the m-th Chebyshev polynomial as a double-float. The roots are the zeros of the Chebyshev polynomial Tₘ(x).

(chebyshev-root 4 0) ; ⇒ -0.9238795325112867d0 (chebyshev-root 4 1) ; ⇒ -0.38268343236508984d0 (chebyshev-root 4 2) ; ⇒ 0.38268343236508984d0 (chebyshev-root 4 3) ; ⇒ 0.9238795325112867d0

The i parameter must satisfy 0 ≤ i < m. These roots are commonly used as interpolation points for polynomial approximation.

chebyshev-roots

(chebyshev-roots m)

Returns all roots of the m-th Chebyshev polynomial as a vector of double-floats.

(chebyshev-roots 3) ; ⇒ #(-0.8660254037844387d0 0.0d0 0.8660254037844387d0) (chebyshev-roots 5) ; ⇒ #(-0.9510565162951535d0 -0.5877852522924731d0 0.0d0 ; 0.5877852522924731d0 0.9510565162951535d0)

chebyshev-regression

(chebyshev-regression f n-polynomials &optional n-points)

Computes Chebyshev polynomial regression coefficients for function f using the specified number of polynomials. The optional n-points parameter (defaults to n-polynomials) specifies the number of interpolation points.

;; Approximate sin(x) on [-1, 1] with 5 Chebyshev polynomials (chebyshev-regression #'sin 5) ; ⇒ #(8.808207830203602d-17 0.8801099265688267d0 -7.851872826654999d-17 ; -0.03912505871870944d0 2.477076195177506d-17) ;; Using more interpolation points than polynomials (chebyshev-regression (lambda (x) (exp x)) 4 8) ; ⇒ #(1.2660658777520082d0 1.1303182079849703d0 0.2714953395340767d0 ; 0.04433684984866382d0)

The function f is evaluated at the Chebyshev nodes, and coefficients are computed using the discrete cosine transform. Note that n-points must be ≥ n-polynomials.

evaluate-chebyshev

(evaluate-chebyshev coefficients x)

Evaluates a Chebyshev polynomial series at point x, given the coefficient vector from chebyshev-regression.

;; Evaluate approximation of exp(x) at x = 0.5 (let ((coeffs (chebyshev-regression #'exp 5))) (evaluate-chebyshev coeffs 0.5)) ; ⇒ 1.6487208279284558d0 (compare to (exp 0.5) = 1.6487212707001282d0) ;; Evaluate at multiple points (let ((coeffs (chebyshev-regression #'sin 4))) (mapcar (lambda (x) (evaluate-chebyshev coeffs x)) '(-0.5 0.0 0.5))) ; ⇒ (-0.47942553860420295d0 0.0d0 0.47942553860420295d0)

Uses the efficient Clenshaw algorithm for polynomial evaluation.

chebyshev-approximate

(chebyshev-approximate f interval n-polynomials &key n-points)

Returns a closure that approximates function f on the given interval using Chebyshev polynomial interpolation. The interval can be finite or semi-infinite.

;; Example from tests: approximate x/(4+x) on [2, ∞) (let ((f-approx (chebyshev-approximate (lambda (x) (/ x (+ 4 x))) (interval 2 :plusinf) 15))) (funcall f-approx 10.0)) ; ⇒ 0.7142857142857143d0 (compare to exact: 10/14 = 0.714...) ;; Exponential decay on [0, ∞) with more sampling points (let ((exp-approx (chebyshev-approximate (lambda (x) (exp (- x))) (interval 0 :plusinf) 15 :n-points 30))) (funcall exp-approx 2.0)) ; ⇒ 0.1353352832366127d0 (compare to (exp -2) = 0.1353...) ;; Finite interval example: 1/(1+x²) on [-3, 2] (let ((rational-approx (chebyshev-approximate (lambda (x) (/ (1+ (expt x 2)))) (interval -3d0 2d0) 20))) (funcall rational-approx 0.0)) ; ⇒ 1.0d0 (exact at x=0)

For semi-infinite intervals, the function automatically applies an appropriate transformation to map the interval to [-1, 1]. The :n-points keyword (defaults to n-polynomials) controls the number of interpolation points.

Usage Examples

Example: Accuracy Testing (from test suite)

;; Helper function from tests to measure approximation error (defun approximation-error (f f-approx interval &optional (n-grid 1000)) "Approximation error, using maximum on grid." (loop for index below n-grid maximizing (abs (- (funcall f (+ (interval-left interval) (* index (/ (interval-width interval) (1- n-grid))))) (funcall f-approx (+ (interval-left interval) (* index (/ (interval-width interval) (1- n-grid))))))))) ;; Test semi-infinite interval approximation (let ((f (lambda (x) (/ x (+ 4 x)))) (f-approx (chebyshev-approximate (lambda (x) (/ x (+ 4 x))) (interval 2 :plusinf) 15))) ;; Error should be <= 1e-5 on interval [2, 102] (approximation-error f f-approx (interval 2 102))) ; ⇒ small value <= 1e-5

Example: Exponential Decay with Higher Sampling

;; From test: exponential decay on [0, ∞) with 30 sampling points (let ((decay-approx (chebyshev-approximate (lambda (x) (exp (- x))) (interval 0 :plusinf) 15 :n-points 30))) ;; Test at a few points (list (funcall decay-approx 0.0) ; ⇒ ≈ 1.0 (funcall decay-approx 1.0) ; ⇒ ≈ 0.368 (funcall decay-approx 3.0))) ; ⇒ ≈ 0.050 ; Error should be <= 1e-4 on [0, 10]

Example: Finite Interval Rational Function

;; From test: 1/(1+x²) on finite interval [-3, 2] (let ((rational-fn (lambda (x) (/ (1+ (expt x 2))))) (rational-approx (chebyshev-approximate (lambda (x) (/ (1+ (expt x 2)))) (interval -3d0 2d0) 20))) ;; Test accuracy at center and edges (list (funcall rational-approx -1.5) ; ⇒ ≈ 0.308 (funcall rational-approx 0.0) ; ⇒ 1.0 (funcall rational-approx 1.0))) ; ⇒ 0.5 ; Error should be <= 1e-3 on [-1.5, 1.0]

Notes on Usage

  1. Interval Selection: Chebyshev approximation works best when the function is smooth over the interval. Avoid intervals containing singularities or discontinuities.

  2. Number of Polynomials: More polynomials generally give better approximation, but beware of numerical instability with very high orders (typically > 50).

  3. Semi-infinite Intervals: For intervals like [a, ∞), the function applies a transformation x ↦ (x-a)/(1+x-a) to map to a finite interval.

  4. Performance: The returned closure is very fast to evaluate, making this ideal for replacing expensive function calls in performance-critical code.

Extended Real

The Extended Real package extends the real number line with positive and negative infinity (:plusinf, :minusinf). It provides comparison operators that work seamlessly with both real numbers and infinities, type definitions for extended reals, and template macros for pattern matching.

Type Definitions

extended-real

(extended-real &optional (base 'real))

Type definition for extended real numbers, which includes real numbers plus positive and negative infinity.

;; Type checking (typep 1 'extended-real) ; ⇒ T (typep 1.5 'extended-real) ; ⇒ T (typep :plusinf 'extended-real) ; ⇒ T (typep :minusinf 'extended-real) ; ⇒ T (typep "string" 'extended-real) ; ⇒ NIL (typep #C(1 2) 'extended-real) ; ⇒ NIL

infinite?

(infinite? object)

Tests if an object represents positive or negative infinity.

;; Infinity testing (infinite? :plusinf) ; ⇒ T (infinite? :minusinf) ; ⇒ T (infinite? 1) ; ⇒ NIL (infinite? 1.0) ; ⇒ NIL (infinite? "string") ; ⇒ NIL

Comparison Operators

All comparison operators accept one or more arguments and work with both real numbers and infinities.

xreal:=

(xreal:= number &rest more-numbers)

Tests equality across extended real numbers.

;; Equality cases that return T (xreal:= 1 1) ; ⇒ T (xreal:= :plusinf :plusinf) ; ⇒ T (xreal:= :minusinf :minusinf) ; ⇒ T (xreal:= 2 2 2) ; ⇒ T (xreal:= :plusinf :plusinf :plusinf) ; ⇒ T (xreal:= :minusinf :minusinf :minusinf) ; ⇒ T ;; Equality cases that return NIL (xreal:= 1 2) ; ⇒ NIL (xreal:= 1 :plusinf) ; ⇒ NIL (xreal:= :plusinf 1) ; ⇒ NIL (xreal:= 1 :minusinf) ; ⇒ NIL (xreal:= :minusinf 1) ; ⇒ NIL (xreal:= 1 2 2) ; ⇒ NIL (xreal:= 2 2 1) ; ⇒ NIL (xreal:= :plusinf :plusinf 9) ; ⇒ NIL (xreal:= :plusinf :minusinf) ; ⇒ NIL

xreal:<

(xreal:< number &rest more-numbers)

Tests strict less-than ordering across extended real numbers.

;; Less-than cases that return T (xreal:< 1 2) ; ⇒ T (xreal:< 1 :plusinf) ; ⇒ T (xreal:< :minusinf :plusinf) ; ⇒ T (xreal:< :minusinf 1) ; ⇒ T (xreal:< 1 2 3) ; ⇒ T (xreal:< 1 2 :plusinf) ; ⇒ T (xreal:< :minusinf 1 4 :plusinf) ; ⇒ T ;; Less-than cases that return NIL (xreal:< 1 1) ; ⇒ NIL (xreal:< 2 1) ; ⇒ NIL (xreal:< :plusinf :plusinf) ; ⇒ NIL (xreal:< :plusinf 1) ; ⇒ NIL (xreal:< :minusinf :minusinf) ; ⇒ NIL (xreal:< :plusinf :minusinf) ; ⇒ NIL (xreal:< 1 :minusinf) ; ⇒ NIL (xreal:< 1 2 2) ; ⇒ NIL (xreal:< 1 3 2) ; ⇒ NIL (xreal:< 1 :plusinf 2) ; ⇒ NIL (xreal:< 1 :plusinf :plusinf) ; ⇒ NIL

xreal:>

(xreal:> number &rest more-numbers)

Tests strict greater-than ordering across extended real numbers.

;; Greater-than cases that return T (xreal:> 2 1) ; ⇒ T (xreal:> :plusinf 1) ; ⇒ T (xreal:> :plusinf :minusinf) ; ⇒ T (xreal:> 1 :minusinf) ; ⇒ T (xreal:> 3 2 1) ; ⇒ T (xreal:> :plusinf 2 1) ; ⇒ T (xreal:> :plusinf 4 1 :minusinf) ; ⇒ T ;; Greater-than cases that return NIL (xreal:> 1 1) ; ⇒ NIL (xreal:> 1 2) ; ⇒ NIL (xreal:> :plusinf :plusinf) ; ⇒ NIL (xreal:> 1 :plusinf) ; ⇒ NIL (xreal:> :minusinf :minusinf) ; ⇒ NIL (xreal:> :minusinf :plusinf) ; ⇒ NIL (xreal:> :minusinf 1) ; ⇒ NIL (xreal:> 2 2 1) ; ⇒ NIL (xreal:> 2 3 1) ; ⇒ NIL (xreal:> 2 :plusinf 1) ; ⇒ NIL (xreal:> :plusinf :plusinf 1) ; ⇒ NIL

xreal:<=

(xreal:<= number &rest more-numbers)

Tests less-than-or-equal ordering across extended real numbers.

;; Less-than-or-equal cases that return T (xreal:<= 1 1) ; ⇒ T (xreal:<= 1 2) ; ⇒ T (xreal:<= 1 :plusinf) ; ⇒ T (xreal:<= :plusinf :plusinf) ; ⇒ T (xreal:<= :minusinf :plusinf) ; ⇒ T (xreal:<= :minusinf :minusinf) ; ⇒ T (xreal:<= :minusinf 1) ; ⇒ T (xreal:<= 1 2 2) ; ⇒ T (xreal:<= 1 2 3) ; ⇒ T (xreal:<= 1 2 :plusinf) ; ⇒ T (xreal:<= 1 :plusinf :plusinf) ; ⇒ T (xreal:<= :minusinf 1 4 :plusinf) ; ⇒ T ;; Less-than-or-equal cases that return NIL (xreal:<= 2 1) ; ⇒ NIL (xreal:<= :plusinf 1) ; ⇒ NIL (xreal:<= :plusinf :minusinf) ; ⇒ NIL (xreal:<= 1 :minusinf) ; ⇒ NIL (xreal:<= 1 3 2) ; ⇒ NIL (xreal:<= 1 :plusinf 2) ; ⇒ NIL

xreal:>=

(xreal:>= number &rest more-numbers)

Tests greater-than-or-equal ordering across extended real numbers.

;; Greater-than-or-equal cases that return T (xreal:>= 1 1) ; ⇒ T (xreal:>= 2 1) ; ⇒ T (xreal:>= :plusinf 1) ; ⇒ T (xreal:>= :plusinf :plusinf) ; ⇒ T (xreal:>= :plusinf :minusinf) ; ⇒ T (xreal:>= :minusinf :minusinf) ; ⇒ T (xreal:>= 1 :minusinf) ; ⇒ T (xreal:>= 2 2 1) ; ⇒ T (xreal:>= 3 2 1) ; ⇒ T (xreal:>= :plusinf 2 1) ; ⇒ T (xreal:>= :plusinf :plusinf 1) ; ⇒ T (xreal:>= :plusinf 4 1 :minusinf) ; ⇒ T ;; Greater-than-or-equal cases that return NIL (xreal:>= 1 2) ; ⇒ NIL (xreal:>= 1 :plusinf) ; ⇒ NIL (xreal:>= :minusinf :plusinf) ; ⇒ NIL (xreal:>= :minusinf 1) ; ⇒ NIL (xreal:>= 2 3 1) ; ⇒ NIL (xreal:>= 2 :plusinf 1) ; ⇒ NIL

Template Macros

with-template

(with-template (prefix &rest variables) &body body)

Defines a local macro for pattern matching on extended real values. The macro can match against :plusinf, :minusinf, real, or t.

;; Pattern matching with template macro (with-template (? x y) (if (? real real) (+ x y) ; both are real numbers (if (? :plusinf :plusinf) :plusinf ; both are +∞ (if (? :minusinf :minusinf) :minusinf ; both are -∞ (error "Mixed infinity types"))))) ;; Example usage with different patterns (let ((x 5) (y 10)) (with-template (? x y) (? real real))) ; ⇒ T (let ((x :plusinf) (y :plusinf)) (with-template (? x y) (? :plusinf :plusinf))) ; ⇒ T (let ((x 5) (y :minusinf)) (with-template (? x y) (? real :minusinf))) ; ⇒ T

lambda-template

(lambda-template (prefix &rest variables) &body body)

Convenience macro that combines lambda with with-template.

;; Creating a function with template matching (let ((compare-fn (lambda-template (? a b) (if (? real real) (< a b) (if (? :minusinf t) t (if (? t :plusinf) t nil)))))) (list (funcall compare-fn 1 2) ; ⇒ T (real numbers) (funcall compare-fn :minusinf 5) ; ⇒ T (-∞ < real) (funcall compare-fn 5 :plusinf) ; ⇒ T (real < +∞) (funcall compare-fn :plusinf 5))) ; ⇒ NIL (+∞ not < real)

Corner Cases

All comparison functions handle corner cases consistently:

;; Single argument always returns T (xreal:= 1) ; ⇒ T (xreal:< :plusinf) ; ⇒ T (xreal:> 1) ; ⇒ T (xreal:>= :minusinf) ; ⇒ T (xreal:<= 1) ; ⇒ T ;; No arguments signals an error (xreal:=) ; ⇒ ERROR (xreal:<) ; ⇒ ERROR (xreal:>) ; ⇒ ERROR (xreal:>=) ; ⇒ ERROR (xreal:<=) ; ⇒ ERROR

Notes on Usage

  1. Type Safety: All functions check that arguments are extended reals (real numbers or infinities)

  2. Operator Shadowing: The package shadows CL operators =, <, >, <=, >=. Use package prefixes or be careful with use-package

  3. Mathematical Consistency: The ordering follows mathematical conventions:

    • :minusinf < any real number < :plusinf
    • :minusinf < :plusinf
    • Infinities are equal to themselves but not to each other
  4. Template Patterns: When using with-template or lambda-template:

    • :plusinf matches only positive infinity
    • :minusinf matches only negative infinity
    • real matches any real number
    • t matches any extended real value

Interval

The interval package provides interval arithmetic on the extended real line, supporting finite, semi-infinite, and infinite intervals with open/closed endpoints. Features include interval creation, length/midpoint calculations, membership testing, hull operations, interval extension, splitting, shrinking, and grid generation.

Creation

interval

(interval left right &optional open-left? open-right?)

Creates an interval with specified endpoints and open/closed status. Supports finite intervals as well as semi-infinite and infinite intervals using :minusinf and :plusinf.

;; Basic interval creation (interval 1 2) ; ⇒ [1,2] ;; Invalid intervals signal errors (interval 2 1) ; ⇒ ERROR

finite-interval

(finite-interval left right &optional open-left? open-right?)

Creates a finite interval, ensuring both endpoints are real numbers.

plusinf-interval

(plusinf-interval left &optional open-left?)

Creates a semi-infinite interval extending to positive infinity.

minusinf-interval

(minusinf-interval right &optional open-right?)

Creates a semi-infinite interval extending from negative infinity.

real-line

(real-line)

Returns the entire real line as an interval.

plusminus-interval

(plusminus-interval center radius)

Creates a symmetric interval around a center point with the given radius.

;; Create interval around center point (plusminus-interval 1 0.5) ; ⇒ [0.5,1.5] (equivalent to (interval 0.5 1.5))

Properties

left

(left interval)

Returns the left endpoint of an interval.

(right interval)

Returns the right endpoint of an interval.

open-left?

(open-left? interval)

Tests if the left endpoint is open.

open-right?

(open-right? interval)

Tests if the right endpoint is open.

interval-length

(interval-length interval)

Returns the length of a finite interval.

;; Length of unit interval (let ((a (interval 1 2))) (interval-length a)) ; ⇒ 1

interval-midpoint

(interval-midpoint interval &optional fraction)

Returns a point within the interval. With no fraction, returns the midpoint. With fraction, returns a point at that relative position.

;; Midpoint with fraction (let ((a (interval 1 2))) (interval-midpoint a 0.25)) ; ⇒ 1.25

Testing

in-interval?

(in-interval? interval number)

Tests if a number is contained in an interval, respecting open/closed endpoints.

;; Membership testing (let ((a (interval 1 2))) (list (in-interval? a 1.5) ; ⇒ T (in-interval? a 1) ; ⇒ T (in-interval? a 2) ; ⇒ T (in-interval? a 0.9) ; ⇒ NIL (in-interval? a 2.1))) ; ⇒ NIL

Operations

extend-interval

(extend-interval interval &optional (left 0) (right 0))

Returns a new interval extended by the specified amounts on each side.

extendf-interval

(extendf-interval interval &optional (left 0) (right 0))

Destructively extends an interval by modifying it in place.

;; Destructive extension with counter and array (let+ ((counter -1) (a (make-array 2 :initial-contents (list nil (interval 1 2))))) (extendf-interval (aref a (incf counter)) 3) (extendf-interval (aref a (incf counter)) 3) (values a counter)) ; ⇒ #([3,3] [1,3]), 1

interval-hull

(interval-hull &rest objects)

Returns the smallest interval containing all given objects (intervals, numbers, arrays).

;; Hull operations (let ((a (interval 1 2))) (list (interval-hull nil) ; ⇒ NIL (interval-hull a) ; ⇒ [1,2] (interval-hull '(1 1.5 2)) ; ⇒ [1,2] (interval-hull #(1 1.5 2)) ; ⇒ [1,2] (interval-hull #2A((1) (1.5) (2))))) ; ⇒ [1,2] ;; Complex hull with mixed types (interval-hull (list (interval 0 2) -1 #(3) '(2.5))) ; ⇒ [-1,3] ;; Invalid input signals error (interval-hull #C(1 2)) ; ⇒ ERROR

shift-interval

(shift-interval interval shift)

Returns a new interval shifted by the given amount.

Division

split-interval

(split-interval interval splits)

Splits an interval at specified points, returning a vector of subintervals.

;; Complex splitting with spacers and relative positions (let ((a (interval 10 20))) (split-interval a (list (spacer 1) (relative 0.1) (spacer 2)))) ; ⇒ #([10,13] [13,14] [14,20]) ;; Simpler split with spacer and absolute value (let ((a (interval 10 20))) (split-interval a (list (spacer) 4))) ; ⇒ #([10,16] [16,20]) ;; Invalid splits signal errors (let ((a (interval 10 20))) (split-interval a (list 9))) ; ⇒ ERROR (outside interval) (let ((a (interval 10 20))) (split-interval a (list 6 7 (spacer)))) ; ⇒ ERROR (invalid sequence)

shrink-interval

(shrink-interval interval left &optional right)

Returns a sub-interval by shrinking from both ends. Parameters are relative positions (0 to 1).

;; Shrinking with relative positions (let ((a (interval 1 2))) (shrink-interval a 0.25 0.2)) ; ⇒ [1.25,1.8]

Grid Generation

grid-in

(grid-in interval n &key endpoints)

Generates a vector of n evenly spaced points within an interval.

;; Grid generation examples (grid-in (interval 0.0 1.0) 3) ; ⇒ #(0.0 0.5 1.0) (grid-in (interval 0 4) 3) ; ⇒ #(0 2 4)

subintervals-in

(subintervals-in interval n)

Divides an interval into n equal subintervals, returning a vector.

;; Creating subintervals (subintervals-in (interval 0 3) 3) ; ⇒ #([0,1) [1,2) [2,3])

Relative

relative

(relative &optional fraction offset)

Creates a relative specification for use with other interval functions. Represents a position as fraction × width + offset.

;; Used in split-interval examples above (relative 0.1) ; represents 10% of interval width from left

spacer

(spacer &optional n)

Creates a relative specification that divides an interval into equal parts.

;; Used in split-interval examples above (spacer 1) ; divides interval into 1+1=2 parts, split at middle (spacer 2) ; divides interval into 2+1=3 parts (spacer) ; default spacing

Notes on Usage

  1. Extended Real Support: Intervals can use :plusinf and :minusinf for semi-infinite and infinite intervals

  2. Open/Closed Endpoints: The third and fourth arguments to interval control whether endpoints are open (excluded) or closed (included)

  3. Type Safety: Functions check that intervals are properly ordered (left ≤ right) and that operations make sense

  4. Relative Specifications: The relative and spacer functions provide flexible ways to specify positions within intervals without hardcoding absolute values

  5. Destructive Operations: Functions ending in f (like extendf-interval) modify their arguments in place for efficiency

  6. Error Handling: Invalid operations (like creating intervals with left > right, or using complex numbers in hulls) signal appropriate errors

Usage Examples

Example: Statistical Confidence Intervals

;; Create confidence intervals for a mean estimate (defun confidence-interval (mean std-error confidence-level) "Create a confidence interval for a mean with given standard error. Confidence level should be between 0 and 1 (e.g., 0.95 for 95% CI)." (let* ((alpha (- 1 confidence-level)) (z-score (cond ((= confidence-level 0.90) 1.645) ; 90% CI ((= confidence-level 0.95) 1.96) ; 95% CI ((= confidence-level 0.99) 2.576) ; 99% CI (t (error "Unsupported confidence level")))) (margin (* z-score std-error))) (interval (- mean margin) (+ mean margin)))) ;; Example: Sample mean = 100, standard error = 5 (confidence-interval 100 5 0.95) ; ⇒ [90.2,109.8] (95% CI) (confidence-interval 100 5 0.99) ; ⇒ [87.12,112.88] (99% CI) ;; Check if a value falls within the confidence interval (let ((ci-95 (confidence-interval 100 5 0.95))) (list (in-interval? ci-95 95) ; ⇒ T (within CI) (in-interval? ci-95 105) ; ⇒ T (within CI) (in-interval? ci-95 110))) ; ⇒ NIL (outside CI) ;; Overlapping confidence intervals (defun intervals-overlap? (int1 int2) "Check if two intervals overlap." (not (or (< (right int1) (left int2)) (> (left int1) (right int2))))) (let ((group1-ci (confidence-interval 100 3 0.95)) ; [94.12,105.88] (group2-ci (confidence-interval 108 4 0.95))) ; [100.16,115.84] (intervals-overlap? group1-ci group2-ci)) ; ⇒ T (overlap suggests no significant difference) ;; Compute confidence interval width for sample size planning (defun ci-width (std-error confidence-level) "Calculate the width of a confidence interval." (let ((ci (confidence-interval 0 std-error confidence-level))) (interval-length ci))) (ci-width 5 0.95) ; ⇒ 19.6 (width of 95% CI with SE=5) (ci-width 2 0.95) ; ⇒ 7.84 (smaller SE gives narrower CI)

Log-Exp

The Log-Exp package provides numerically stable implementations of logarithmic and exponential functions that require special handling near zero. These implementations avoid floating-point overflow, underflow, and loss of precision in critical numerical computations involving values close to mathematical singularities.

Basic Log Functions

log1+

(log1+ x)

Computes log(1+x) stably even when x is near 0. Uses specialized algorithm to avoid loss of precision.

(log1+ 0.0) ; ⇒ 0.0d0 (log1+ 1e-15) ; ⇒ 1.0d-15 (exact for small x) (log1+ 1.0) ; ⇒ 0.6931471805599453d0 (log(2)) (log1+ -0.5) ; ⇒ -0.6931471805599453d0 (log(0.5))

For small x, returns x directly to maintain precision. For larger x, uses the stable formula: x·log(1+x)/(x).

log1-

(log1- x)

Computes log(1−x) stably even when x is near 0.

(log1- 0.0) ; ⇒ 0.0d0 (log1- 1e-15) ; ⇒ -1.0d-15 (exact for small x) (log1- 0.5) ; ⇒ -0.6931471805599453d0 (log(0.5)) (log1- -1.0) ; ⇒ 0.6931471805599453d0 (log(2))

log1+/x

(log1+/x x)

Computes log(1+x)/x stably even when x is near 0.

(log1+/x 0.0) ; ⇒ 1.0d0 (limit as x→0) (log1+/x 1e-10) ; ⇒ 0.99999999995d0 (very close to 1) (log1+/x 1.0) ; ⇒ 0.6931471805599453d0 (log(2)/1) (log1+/x 2.0) ; ⇒ 0.5493061443340549d0 (log(3)/2)

Exponential Functions

exp-1

(exp-1 x)

Computes exp(x)−1 stably even when x is near 0.

(exp-1 0.0) ; ⇒ 0.0d0 (exp-1 1e-15) ; ⇒ 1.0d-15 (exact for small x) (exp-1 1.0) ; ⇒ 1.718281828459045d0 (e−1) (exp-1 -1.0) ; ⇒ -0.6321205588285577d0 (1/e−1)

exp-1/x

(exp-1/x x)

Computes (exp(x)−1)/x stably even when x is near 0.

(exp-1/x 0.0) ; ⇒ 1.0d0 (limit as x→0) (exp-1/x 1e-10) ; ⇒ 1.0000000000500000d0 (very close to 1) (exp-1/x 1.0) ; ⇒ 1.718281828459045d0 (e−1) (exp-1/x -1.0) ; ⇒ 0.6321205588285577d0 ((1/e−1)/(−1))

expt-1

(expt-1 a z)

Computes aᶻ − 1 stably when a is close to 1 or z is close to 0.

(expt-1 1.0 0.0) ; ⇒ 0.0d0 (1⁰ − 1) (expt-1 1.001 0.1) ; ⇒ 0.00010005001667083846d0 (small perturbation) (expt-1 2.0 3.0) ; ⇒ 7.0d0 (2³ − 1 = 8 − 1) (expt-1 0.5 2.0) ; ⇒ -0.75d0 (0.25 − 1)

Logarithmic Exponential Combinations

log1+exp

(log1+exp a)

Accurately computes log(1+exp(x)) even when a is near zero or large.

(log1+exp 0.0) ; ⇒ 0.6931471805599453d0 (log(2)) (log1+exp -40.0) ; ⇒ 4.248354255291589d-18 (exp(−40), very small) (log1+exp 1.0) ; ⇒ 1.3132616875182228d0 (log(1+e)) (log1+exp 20.0) ; ⇒ 20.000000485165195d0 (≈ 20 for large x) (log1+exp 50.0) ; ⇒ 50.0d0 (exactly 50 for very large x)

Uses different algorithms based on the magnitude of the input to maintain numerical stability.

log1-exp

(log1-exp a)

Computes log(1−exp(x)) stably. This is the third Einstein function E₃.

(log1-exp 0.0) ; ⇒ −∞ (log(0)) (log1-exp -1.0) ; ⇒ -1.3132616875182228d0 (log(1−1/e)) (log1-exp -0.1) ; ⇒ -2.3514297952645346d0 (log(1−exp(−0.1))) (log1-exp 0.5) ; ⇒ -0.3665129205816643d0 (log(1−exp(0.5)))

log2-exp

(log2-exp x)

Computes log(2−exp(x)) stably even when x is near zero.

(log2-exp 0.0) ; ⇒ 0.6931471805599453d0 (log(1) = 0, so log(2−1)) (log2-exp -1.0) ; ⇒ 1.0681748093670006d0 (log(2−1/e)) (log2-exp 0.5) ; ⇒ 0.5596157879354228d0 (log(2−exp(0.5)))

logexp-1

(logexp-1 a)

Computes log(exp(a)−1) stably even when a is small.

(logexp-1 0.0) ; ⇒ −∞ (log(0)) (logexp-1 1.0) ; ⇒ 0.5413248546129181d0 (log(e−1)) (logexp-1 -40.0) ; ⇒ -40.0d0 (≈ −40 for very negative values) (logexp-1 20.0) ; ⇒ 19.999999514834805d0 (≈ 20 for large positive)

Utility Functions

hypot

(hypot x y)

Computes the hypotenuse √(x²+y²) without danger of floating-point overflow or underflow.

(hypot 3.0 4.0) ; ⇒ 5.0d0 (classic 3-4-5 triangle) (hypot 1.0 1.0) ; ⇒ 1.4142135623730951d0 (√2) (hypot 0.0 5.0) ; ⇒ 5.0d0 (hypot -3.0 4.0) ; ⇒ 5.0d0 (uses absolute values) (hypot 1e200 1e200) ; ⇒ 1.4142135623730951d200 (avoids overflow)

Always returns a positive result by taking absolute values of inputs.

log1pmx

(log1pmx x)

Computes log(1+x) − x accurately. Most accurate for −0.227 < x < 0.315.

(log1pmx 0.0) ; ⇒ 0.0d0 (log(1) − 0) (log1pmx 0.1) ; ⇒ -0.004758473160399683d0 (log1pmx 0.5) ; ⇒ -0.09063636515559594d0 (log1pmx -0.1) ; ⇒ 0.005170919881432486d0 (log1pmx 1.0) ; ⇒ -0.30685281944005443d0 (log(2) − 1)

Uses polynomial approximations in different ranges for optimal accuracy.

Usage Examples

Example: Numerical Stability Comparison

;; Standard approach vs stable approach for small values (let ((x 1e-15)) (list ;; Potentially inaccurate (- (log (+ 1 x)) x) ; May lose precision ;; Stable version (log1pmx x))) ; ⇒ (0.0d0 0.0d0) - both accurate here

Example: Exponential Probability Calculations

;; Computing log probabilities stably (defun log-sigmoid (x) "Stable computation of log(1/(1+exp(−x)))" (- (log1+exp (- x)))) (log-sigmoid 0.0) ; ⇒ -0.6931471805599453d0 (log-sigmoid 10.0) ; ⇒ -0.000045398899216859934d0 (very small) (log-sigmoid -10.0) ; ⇒ -10.000045399929762d0 ;; Computing softmax denominators stably (defun log-sum-exp (values) "Stable log-sum-exp computation" (let ((max-val (reduce #'max values))) (+ max-val (log1+exp (log (reduce #'+ values :key (lambda (x) (exp (- x max-val)))))))))

Example: Statistical Distributions

;; Log-normal distribution helpers (defun log-normal-log-pdf (x mu sigma) "Log probability density for log-normal distribution" (let ((log-x (log x)) (sigma-sq (* sigma sigma))) (- (- (* 0.5 (/ (* (- log-x mu) (- log-x mu)) sigma-sq))) (log (* x sigma (sqrt (* 2 pi))))))) ;; Using stable exponential functions (defun stable-gaussian-tail (x) "Stable computation of log(1−Φ(x)) for large positive x" (log1-exp (- (* 0.5 x x))))

Example: Financial Mathematics

;; Continuous compounding calculations (defun continuously-compounded-return (initial final time) "Calculate annualized continuously compounded return" (/ (log (/ final initial)) time)) ;; Small return approximations (defun log-return-approx (return-rate) "Approximate log(1+r) for small returns r" (log1+ return-rate)) (log-return-approx 0.05) ; ⇒ 0.04879016416943205d0 (vs naive 0.05) (log-return-approx 0.001) ; ⇒ 0.0009995003330835334d0 (very accurate)

Notes on Usage

  1. Numerical Stability: These functions are designed to maintain accuracy when standard implementations would lose precision due to cancellation or overflow.

  2. Domain Considerations: Functions like log1-exp and logexp-1 may return −∞ for certain inputs where the mathematical result is undefined.

  3. Performance: While more stable, these functions may be slightly slower than naive implementations. Use when precision near singularities is critical.

  4. Range Optimization: Functions like log1+exp and logexp-1 use different algorithms based on input ranges to optimize both accuracy and performance.

  5. Complex Numbers: Most functions are designed primarily for real inputs, though some will handle complex numbers by falling back to standard implementations.

  6. Integration with Statistical Computing: These functions are particularly useful in machine learning, statistics, and financial mathematics where log-probabilities and exponential transformations are common.

Num=

The Num= package provides approximate equality comparison for numeric values and structures containing numbers. Features tolerance-based floating-point comparison using relative error metrics, with support for numbers, arrays, lists, and custom structures. Includes num= generic function, num-delta for relative differences, configurable default tolerance, and macros for defining comparison methods on user-defined types.

Configuration

*num=-tolerance*

*num=-tolerance*

Dynamic variable that sets the default tolerance for num= comparisons. Default value is 1d-5.

*num=-tolerance* ; ⇒ 1.0d-5 ;; Temporarily change tolerance (let ((*num=-tolerance* 1e-3)) (num= 1 1.001)) ; ⇒ T (within 0.1% tolerance) (num= 1 1.001) ; ⇒ NIL (outside default 0.001% tolerance)

Core Functions

num-delta

(num-delta a b)

Computes the relative difference |a−b|/max(1,|a|,|b|). This metric is used internally by num= for comparing numbers.

(num-delta 1 1) ; ⇒ 0.0d0 (num-delta 1 1.001) ; ⇒ 0.0009990009990009974d0 (num-delta 100 101) ; ⇒ 0.009900990099009901d0 (num-delta 0 0.001) ; ⇒ 0.001d0 (uses 1 as minimum divisor)

num=

(num= a b &optional tolerance)

Generic function that compares two objects for approximate equality. Specializations exist for numbers, arrays, lists, and custom structures.

Numbers

;; Number comparisons with custom tolerance (let ((*num=-tolerance* 1e-3)) (num= 1 1)) ; ⇒ T (num= 1 1.0)) ; ⇒ T (num= 1 1.001)) ; ⇒ T (within tolerance) (num= 1 2)) ; ⇒ NIL (num= 1 1.01))) ; ⇒ NIL (outside tolerance) ;; Using explicit tolerance parameter (num= 1 1.01 0.01) ; ⇒ T (within 1% tolerance) (num= 1 1.01 0.001) ; ⇒ NIL (outside 0.1% tolerance)

Lists

(let ((*num=-tolerance* 1e-3)) (num= nil nil)) ; ⇒ T (num= '(1) '(1.001))) ; ⇒ T (num= '(1 2) '(1.001 1.999))) ; ⇒ T (num= '(0 1) '(0 1.02))) ; ⇒ NIL (1.02 outside tolerance) (num= nil '(1)))) ; ⇒ NIL (different structures)

Arrays

(let* ((*num=-tolerance* 1e-3) (a #(0 1 2)) (b #2A((0 1) (2 3)))) ;; Vector comparisons (num= a a)) ; ⇒ T (same object) (num= a #(0 1.001 2))) ; ⇒ T (num= a #(0 1.001 2.001))) ; ⇒ T (num= a #(0 1.01 2))) ; ⇒ NIL (1.01 outside tolerance) (num= a #(0 1))) ; ⇒ NIL (different dimensions) ;; 2D array comparisons (num= b b)) ; ⇒ T (num= b #2A((0 1) (2.001 3)))) ; ⇒ T (num= b #2A((0 1.01) (2 3)))) ; ⇒ NIL (num= b #2A((0 1))))) ; ⇒ NIL (different dimensions) ;; Arrays of different ranks (num= #(0 1 2) #2A((0 1 2))) ; ⇒ NIL (different ranks)

num=-function

(num=-function tolerance)

Returns a curried version of num= with the given tolerance fixed.

(let ((approx= (num=-function 0.01))) (funcall approx= 1 1.005)) ; ⇒ T (within 1%) (funcall approx= 1 1.02))) ; ⇒ NIL (outside 1%) ;; Use with higher-order functions (remove-if-not (num=-function 0.1) '(1.05 2.0 0.95 1.08) :key (lambda (x) 1)) ; ⇒ (1.05 0.95 1.08) (all within 10% of 1)

Structure Comparison

define-num=-with-accessors

(define-num=-with-accessors class accessors)

Macro that defines a num= method for a class, comparing values obtained through the specified accessor functions.

;; Define a class with accessors (defclass point () ((x :accessor point-x :initarg :x) (y :accessor point-y :initarg :y))) ;; Define num= method using accessors (define-num=-with-accessors point (point-x point-y)) ;; Use the defined method (let ((p1 (make-instance 'point :x 1.0 :y 2.0)) (p2 (make-instance 'point :x 1.001 :y 1.999))) (num= p1 p2 0.01)) ; ⇒ T

define-structure-num=

(define-structure-num= structure &rest slots)

Convenience macro for structures that automatically generates accessor names from the structure name and slot names.

;; Define structure from test (defstruct num=-test-struct "Structure for testing DEFINE-STRUCTURE-num=." a b) ;; Generate num= method (define-structure-num= num=-test-struct a b) ;; Examples from test (let ((*num=-tolerance* 1e-3) (a (make-num=-test-struct :a 0 :b 1)) (b (make-num=-test-struct :a "string" :b nil))) (num= a a)) ; ⇒ T (num= a (make-num=-test-struct :a 0 :b 1))) ; ⇒ T (num= a (make-num=-test-struct :a 0 :b 1.001))) ; ⇒ T (num= a (make-num=-test-struct :a 0 :b 1.01))) ; ⇒ NIL (num= b b)) ; ⇒ T (num= a b))) ; ⇒ NIL

The macro expands to use accessors named structure-slot, so for num=-test-struct with slots a and b, it uses num=-test-struct-a and num=-test-struct-b.

Usage Examples

Example: Floating-Point Computation Verification

;; Verify numerical algorithm results (defun verify-computation (computed expected &optional (tolerance 1e-10)) "Verify that computed result matches expected value within tolerance." (if (num= computed expected tolerance) (format t "✓ Result ~A matches expected ~A~%" computed expected) (format t "✗ Result ~A differs from expected ~A by ~A~%" computed expected (num-delta computed expected)))) ;; Example: Verify matrix computation (let* ((result #2A((1.0 0.0) (0.0 1.0))) (expected #2A((1.0 0.0) (0.0 0.99999999)))) (verify-computation result expected 1e-8)) ; prints: ✓ Result #2A((1.0 0.0) (0.0 1.0)) matches expected #2A((1.0 0.0) (0.0 0.99999999))

Example: Testing Numerical Algorithms

;; Compare different implementations (defun compare-algorithms (algorithm1 algorithm2 test-inputs &optional (tol 1e-6)) "Compare outputs of two algorithms on test inputs." (loop for input in test-inputs for result1 = (funcall algorithm1 input) for result2 = (funcall algorithm2 input) for equal-p = (num= result1 result2 tol) unless equal-p collect (list :input input :alg1 result1 :alg2 result2 :delta (num-delta result1 result2)))) ;; Example: Compare two sqrt implementations (compare-algorithms #'sqrt (lambda (x) (expt x 0.5)) '(1.0 2.0 3.0 4.0)) ; ⇒ NIL (all results match within tolerance)

Example: Custom Structure Comparison

;; Define a complex number structure (defstruct (complex-num (:constructor make-complex-num (real imag))) real imag) ;; Enable approximate comparison (define-structure-num= complex-num real imag) ;; Use in computations (let* ((z1 (make-complex-num 1.0 2.0)) (z2 (make-complex-num 1.0001 1.9999)) (*num=-tolerance* 1e-3)) (num= z1 z2)) ; ⇒ T ;; Array of structures (let ((arr1 (vector (make-complex-num 1 0) (make-complex-num 0 1))) (arr2 (vector (make-complex-num 1.0001 0) (make-complex-num 0 0.9999)))) (num= arr1 arr2 1e-3)) ; ⇒ T

Notes on Usage

  1. Relative Error Metric: The comparison uses relative error |a−b|/max(1,|a|,|b|), which handles both small and large numbers appropriately.

  2. Default Fallback: Objects without specialized methods are compared using equalp, ensuring the function works with any Lisp objects.

  3. Recursive Comparison: Methods for arrays and lists recursively apply num= to elements, propagating the tolerance parameter.

  4. Structure Macros: The define-structure-num= macro assumes standard naming convention for accessors (structure-name-slot-name).

  5. Performance: For large arrays or deeply nested structures, consider the overhead of element-wise comparison.

  6. Mixed Types: num= can compare different numeric types (integer vs float) as it uses numeric operations that handle type coercion.

Polynomial

The Polynomial package provides efficient evaluation of polynomial functions using Horner’s method. It supports optimized implementations for different numeric types including fixnum, single-float, double-float, and arbitrary precision numbers.

evaluate-polynomial

(evaluate-polynomial coefficients x)

Evaluates a polynomial at point x using Horner’s method. Coefficients are ordered from highest degree down to the constant term.

;; Basic polynomial evaluation with fixnum coefficients (evaluate-polynomial #(2 -6 2 -1) 3) ; ⇒ 5 ;; Evaluates: 2x³ - 6x² + 2x - 1 at x=3 ;; = 2(27) - 6(9) + 2(3) - 1 = 54 - 54 + 6 - 1 = 5 (evaluate-polynomial #(2 0 3 1) 2) ; ⇒ 23 ;; Evaluates: 2x³ + 0x² + 3x + 1 at x=2 ;; = 2(8) + 0 + 3(2) + 1 = 16 + 6 + 1 = 23 (evaluate-polynomial #(1 3 5 7 9) 2) ; ⇒ 83 ;; Evaluates: x⁴ + 3x³ + 5x² + 7x + 9 at x=2 ;; = 16 + 3(8) + 5(4) + 7(2) + 9 = 16 + 24 + 20 + 14 + 9 = 83 ;; Single coefficient (constant polynomial) (evaluate-polynomial #(5) 2) ; ⇒ 5

Type-Specific Optimizations

The function provides optimized implementations for different numeric types:

;; Single-float coefficients (evaluate-polynomial #(2.0 -6.0 2.0 -1.0) 3.0) ; ⇒ 5.0 (evaluate-polynomial #(2.0 0.0 3.0 1.0) 2.0) ; ⇒ 23.0 (evaluate-polynomial #(1.0 3.0 5.0 7.0 9.0) 2.0) ; ⇒ 83.0 ;; Double-float coefficients (evaluate-polynomial #(2.0d0 -6.0d0 2.0d0 -1.0d0) 3.0d0) ; ⇒ 5.0d0 (evaluate-polynomial #(2.0d0 0.0d0 3.0d0 1.0d0) 2.0d0) ; ⇒ 23.0d0 (evaluate-polynomial #(1.0d0 3.0d0 5.0d0 7.0d0 9.0d0) 2.0d0) ; ⇒ 83.0d0 ;; Arbitrary precision (bignum) - evaluating at large values (evaluate-polynomial #(2 0 1) (1+ most-positive-fixnum)) ; ⇒ 42535295865117307932921825928971026433 ;; Evaluates: 2x² + 1 at x = 2^62 (on 64-bit systems)

evaluate-rational

(evaluate-rational numerator denominator z)

Evaluates a rational function (ratio of two polynomials) using Horner’s method with special handling to prevent overflow for large z values.

Important: Unlike evaluate-polynomial, coefficients are ordered from lowest degree (constant term) to highest degree.

;; Simple rational function: (1 + 2z) / (1 + z) (evaluate-rational #(1 2) #(1 1) 3.0) ; ⇒ 1.75d0 ;; = (1 + 2×3) / (1 + 3) = 7/4 = 1.75 ;; More complex example: (1 + 2z + z²) / (1 + z + z²) (evaluate-rational #(1 2 1) #(1 1 1) 2.0) ; ⇒ 1.2857142857142858d0 ;; = (1 + 2×2 + 2²) / (1 + 2 + 2²) = 9/7 ≈ 1.286 ;; Handling large values - uses z⁻¹ internally to prevent overflow (evaluate-rational #(1 0 1) #(1 1 0) 1e10) ; Works without overflow

Usage Examples

Polynomial Fitting and Evaluation

;; Fit a quadratic polynomial to data points and evaluate (defun fit-and-evaluate-quadratic (x1 y1 x2 y2 x3 y3 x-eval) "Fit ax² + bx + c through three points and evaluate at x-eval" ;; For demonstration - actual fitting would use linear algebra ;; Here we use a pre-computed polynomial (let ((coefficients #(1.0d0 -2.0d0 3.0d0))) ; x² - 2x + 3 (evaluate-polynomial coefficients x-eval))) (fit-and-evaluate-quadratic 0 3 1 2 2 3 1.5) ; ⇒ 2.25d0

Chebyshev Polynomial Evaluation

;; Chebyshev polynomials of the first kind (defun chebyshev-t3 (x) "T₃(x) = 4x³ - 3x" (evaluate-polynomial #(4 0 -3 0) x)) (defun chebyshev-t4 (x) "T₄(x) = 8x⁴ - 8x² + 1" (evaluate-polynomial #(8 0 -8 0 1) x)) (chebyshev-t3 0.5d0) ; ⇒ -1.0d0 (chebyshev-t4 0.5d0) ; ⇒ -0.5d0

Numerical Stability Comparison

;; Compare naive evaluation vs Horner's method (defun naive-polynomial-eval (coeffs x) "Naive polynomial evaluation - less numerically stable" (loop for i from 0 for coeff across coeffs sum (* coeff (expt x (- (length coeffs) i 1))))) ;; For well-conditioned polynomials, results are similar (let ((coeffs #(1.0d0 -2.0d0 1.0d0))) ; (x-1)² (list (evaluate-polynomial coeffs 1.0001d0) ; ⇒ 1.0000000100000003d-8 (naive-polynomial-eval coeffs 1.0001d0))) ; ⇒ 1.0000000099999842d-8 ;; Horner's method is more efficient (fewer operations) ;; and generally more numerically stable

Rational Function Applications

;; Padé approximation of exp(x) around x=0 ;; exp(x) ≈ (1 + x/2 + x²/12) / (1 - x/2 + x²/12) (defun pade-exp (x) (evaluate-rational #(1.0d0 0.5d0 0.08333333333333333d0) ; 1, 1/2, 1/12 #(1.0d0 -0.5d0 0.08333333333333333d0) ; 1, -1/2, 1/12 x)) (pade-exp 0.1d0) ; ⇒ 1.1051709180756474d0 (compare to (exp 0.1) = 1.1051709180756477d0) (pade-exp 0.5d0) ; ⇒ 1.6487212707001282d0 (compare to (exp 0.5) = 1.6487212707001282d0) ;; Transfer function evaluation in control theory ;; H(s) = (s + 1) / (s² + 2s + 1) (defun transfer-function (s) (evaluate-rational #(1 1) ; 1 + s #(1 2 1) ; 1 + 2s + s² s)) (transfer-function 1.0d0) ; ⇒ 0.5d0

Performance and Usage Notes

  1. Type-Specific Paths: The implementation provides optimized paths for double-float, single-float, fixnum, and generic arbitrary precision arithmetic.

  2. Horner’s Method Efficiency: Evaluating an n-degree polynomial requires only n multiplications and n additions, compared to naive evaluation which requires O(n²) operations.

  3. Numerical Stability: Horner’s method generally provides better numerical stability than naive evaluation, especially for polynomials with terms of vastly different magnitudes.

  4. Coefficient Ordering:

    • evaluate-polynomial: Highest degree → constant term
    • evaluate-rational: Constant term → highest degree
  5. Type Consistency: For optimal performance, ensure x and all coefficients are of the same numeric type.

  6. Overflow Prevention: evaluate-rational automatically switches to reciprocal evaluation for |z| > 1 to prevent overflow.

  7. Optimization: The functions are declared with (optimize (speed 3) (safety 1)) for maximum performance in production use.

The Print-Matrix package provides formatted printing of 2D matrices with configurable precision, alignment, and truncation. Features include column alignment, custom element formatting, masking specific elements, respecting *print-length* for large matrices, and special handling for complex numbers. Supports customizable padding, indentation, and precision control through *print-matrix-precision* for human-readable matrix display.

*print-matrix-precision*

Dynamic variable that controls the number of digits after the decimal point when printing numeric matrices. Default value is 5.

*print-matrix-precision* ; ⇒ 5 ;; Temporarily change precision (let ((*print-matrix-precision* 2)) (print-matrix #2A((1.234567 2.345678) (3.456789 4.567890)) t)) ; Prints: ; 1.23 2.35 ; 3.46 4.57 ;; Default precision (print-matrix #2A((1.234567 2.345678) (3.456789 4.567890)) t) ; Prints: ; 1.23457 2.34568 ; 3.45679 4.56789

(print-length-truncate dimension)

Returns the effective dimension to use based on *print-length* and whether truncation occurred. Returns two values: the effective dimension and a boolean indicating if truncation happened.

(print-length-truncate 10) ; ⇒ 10, NIL (no truncation) (let ((*print-length* 5)) (print-length-truncate 10)) ; ⇒ 5, T (truncated) (let ((*print-length* nil)) (print-length-truncate 10)) ; ⇒ 10, NIL (no limit)

(print-matrix matrix stream &key formatter masked-fn aligned? padding indent)

Prints a 2D matrix with configurable formatting options.

Parameters:

  • matrix - a 2D array to print
  • stream - output stream (T for standard output)
  • :formatter - function to format individual elements (default: print-matrix-formatter)
  • :masked-fn - predicate function; elements where this returns true are replaced with “…”
  • :aligned? - whether to align columns (default: T)
  • :padding - string between columns (default: " “)
  • :indent - string at start of each line (default: " “)

Basic Usage Examples

;; Integer matrix (print-matrix #2A((1 2 3) (4 5 6) (7 8 9)) t) ; Prints: ; 1 2 3 ; 4 5 6 ; 7 8 9 ;; Floating-point matrix with default precision (print-matrix #2A((1.0 2.5 3.33333) (4.1 5.0 6.66667)) t) ; Prints: ; 1.00000 2.50000 3.33333 ; 4.10000 5.00000 6.66667 ;; Mixed numeric types (print-matrix #2A((1 2.5 3) (4.0 5 6.7)) t) ; Prints: ; 1 2.50000 3 ; 4.00000 5 6.70000

Complex Number Formatting

;; Complex matrix (print-matrix #2A((#C(1 2) #C(3 -4)) (#C(0 1) #C(-2 0))) t) ; Prints: ; 1.00000+2.00000i 3.00000+-4.00000i ; 0.00000+1.00000i -2.00000+0.00000i ;; Mixed real and complex (print-matrix #2A((1.0 #C(2 3)) (#C(4 0) 5.0)) t) ; Prints: ; 1.00000 2.00000+3.00000i ; 4.00000+0.00000i 5.00000

Truncation with *print-length*

;; Large matrix with print-length restriction (let ((*print-length* 3)) (print-matrix #2A((1 2 3 4 5) (6 7 8 9 10) (11 12 13 14 15) (16 17 18 19 20) (21 22 23 24 25)) t)) ; Prints: ; 1 2 3 ... ; 6 7 8 ... ; 11 12 13 ... ; ... ;; Different print-length values (let ((*print-length* 2)) (print-matrix #2A((1.1 2.2 3.3) (4.4 5.5 6.6) (7.7 8.8 9.9)) t)) ; Prints: ; 1.10000 2.20000 ... ; 4.40000 5.50000 ... ; ...

Advanced Features

Custom Formatting

;; Custom formatter for percentages (defun percentage-formatter (x) (format nil "~,1f%" (* x 100))) (print-matrix #2A((0.15 0.30 0.55) (0.20 0.45 0.35)) t :formatter #'percentage-formatter) ; Prints: ; 15.0% 30.0% 55.0% ; 20.0% 45.0% 35.0% ;; Scientific notation formatter (defun scientific-formatter (x) (format nil "~,2e" x)) (print-matrix #2A((1e-5 2.5e6) (3.14159 0.001)) t :formatter #'scientific-formatter) ; Prints: ; 1.00e-5 2.50e+6 ; 3.14e+0 1.00e-3

Element Masking

;; Mask diagonal elements (print-matrix #2A((1 2 3) (4 5 6) (7 8 9)) t :masked-fn (lambda (row col) (= row col))) ; Prints: ; ... 2 3 ; 4 ... 6 ; 7 8 ... ;; Mask values below threshold (print-matrix #2A((0.1 0.5 0.9) (0.3 0.7 0.2) (0.8 0.4 0.6)) t :masked-fn (lambda (row col) (< (aref #2A((0.1 0.5 0.9) (0.3 0.7 0.2) (0.8 0.4 0.6)) row col) 0.5))) ; Prints: ; ... 0.50000 0.90000 ; ... 0.70000 ... ; 0.80000 ... 0.60000

Alignment and Padding Options

;; No alignment (print-matrix #2A((1 22 333) (4444 5 66)) t :aligned? nil) ; Prints: ; 1 22 333 ; 4444 5 66 ;; With alignment (default) (print-matrix #2A((1 22 333) (4444 5 66)) t) ; Prints: ; 1 22 333 ; 4444 5 66 ;; Custom padding (print-matrix #2A((1 2 3) (4 5 6)) t :padding " | ") ; Prints: ; 1 | 2 | 3 ; 4 | 5 | 6 ;; Custom indentation (print-matrix #2A((1 2) (3 4)) t :indent ">>>") ; Prints: ; >>>1 2 ; >>>3 4

Practical Applications

Correlation Matrix Display

;; Display correlation matrix with custom precision (let ((*print-matrix-precision* 3) (corr-matrix #2A((1.000 0.856 -0.234) (0.856 1.000 0.142) (-0.234 0.142 1.000)))) (print-matrix corr-matrix t)) ; Prints: ; 1.000 0.856 -0.234 ; 0.856 1.000 0.142 ; -0.234 0.142 1.000 ;; With significance masking (mask values < 0.3) (print-matrix corr-matrix t :masked-fn (lambda (row col) (and (/= row col) (< (abs (aref corr-matrix row col)) 0.3)))) ; Prints: ; 1.000 0.856 ... ; 0.856 1.000 ... ; ... ... 1.000

Sparse Matrix Visualization

;; Visualize sparse matrix by masking zeros (let ((sparse #2A((1.0 0.0 0.0 2.0) (0.0 3.0 0.0 0.0) (0.0 0.0 0.0 4.0) (5.0 0.0 6.0 0.0)))) (print-matrix sparse t :masked-fn (lambda (row col) (zerop (aref sparse row col))))) ; Prints: ; 1.00000 ... ... 2.00000 ; ... 3.00000 ... ... ; ... ... ... 4.00000 ; 5.00000 ... 6.00000 ...

Usage Notes

  1. Stream Output: The stream parameter follows Common Lisp conventions - use t for standard output, nil for a string, or any stream object.

  2. Precision Control: *print-matrix-precision* affects all real and complex numbers. For integers, no decimal places are shown.

  3. Performance: For very large matrices, consider using *print-length* to limit output, or write custom formatters that summarize data.

  4. Alignment: Column alignment adds overhead for large matrices as it requires pre-scanning all elements. Disable with :aligned? nil for better performance.

  5. Complex Number Format: Complex numbers are always printed as a+bi format, with the imaginary unit shown as ‘i’.

  6. Thread Safety: The *print-matrix-precision* variable is dynamically bound, so it’s thread-safe when using let bindings.

Quadrature

The Quadrature package provides adaptive numerical integration using Romberg quadrature with Richardson extrapolation. Supports finite and semi-infinite intervals with automatic coordinate transformations. Features trapezoidal and midpoint rule refinements, configurable convergence criteria (epsilon tolerance), and handles open/closed interval endpoints. Efficiently computes definite integrals with controlled accuracy through iterative refinement and extrapolation techniques.

romberg-quadrature

(romberg-quadrature function interval &key open epsilon max-iterations)

Computes the definite integral of a function over an interval using Romberg’s method with Richardson extrapolation.

Parameters:

  • function - the integrand function of one argument
  • interval - an interval object (finite or semi-infinite)
  • :open - if true, uses midpoint rule for open intervals (default: nil uses trapezoidal rule)
  • :epsilon - relative error tolerance for convergence (default: machine epsilon)
  • :max-iterations - maximum number of refinement iterations (default: 20)

Returns two values:

  • The estimated integral value
  • The estimated relative error

Basic Integration Examples

;; Integrate x² from 0 to 1 (exact: 1/3) (romberg-quadrature (lambda (x) (* x x)) (interval 0d0 1d0)) ; ⇒ 0.3333333333333333d0, 5.551115123125783d-17 ;; Integrate exp(x) from 0 to 1 (exact: e-1 ≈ 1.71828...) (romberg-quadrature #'exp (interval 0d0 1d0)) ; ⇒ 1.7182818284590453d0, 5.551115123125783d-17 ;; Integrate 1/x from 1 to e (exact: 1) (romberg-quadrature (lambda (x) (/ x)) (interval 1d0 (exp 1d0))) ; ⇒ 1.0000000000000002d0, 2.220446049250313d-16 ;; Integrate sin(x) from 0 to π (exact: 2) (romberg-quadrature #'sin (interval 0d0 pi)) ; ⇒ 2.0000000000000004d0, 1.7763568394002506d-15

Open Interval Integration

When the integrand has singularities at the endpoints, use the :open t option to employ the midpoint rule:

;; Integrate 1/√x from 0 to 1 using open interval (exact: 2) (romberg-quadrature (lambda (x) (/ (sqrt x))) (interval 0d0 1d0) :open t) ; ⇒ 1.9999999999999998d0, 1.1102230246251566d-16 ;; Integrate log(x) from 0 to 1 using open interval (exact: -1) (romberg-quadrature #'log (interval 0d0 1d0) :open t) ; ⇒ -0.9999999999999998d0, 2.220446049250313d-16

Semi-Infinite Intervals

The function automatically applies appropriate transformations for semi-infinite intervals:

;; Integrate exp(-x) from 0 to ∞ (exact: 1) (romberg-quadrature (lambda (x) (exp (- x))) (interval 0d0 :plusinf)) ; ⇒ 1.0000000000000002d0, 2.220446049250313d-16 ;; Integrate x*exp(-x²) from 0 to ∞ (exact: 1/2) (romberg-quadrature (lambda (x) (* x (exp (- (* x x))))) (interval 0d0 :plusinf)) ; ⇒ 0.5000000000000001d0, 2.220446049250313d-16 ;; Integrate 1/(1+x²) from -∞ to ∞ (exact: π) (romberg-quadrature (lambda (x) (/ (1+ (* x x)))) (interval :minusinf :plusinf)) ; ⇒ 3.141592653589793d0, 2.220446049250313d-16

Custom Tolerance

Specify a custom error tolerance for faster computation or higher accuracy:

;; Lower precision for faster computation (romberg-quadrature #'exp (interval 0d0 1d0) :epsilon 1d-6) ; ⇒ 1.7182818284590429d0, 9.325873406851296d-7 ;; Higher precision (will use more iterations) (romberg-quadrature #'exp (interval 0d0 1d0) :epsilon 1d-12) ; ⇒ 1.7182818284590453d0, 5.551115123125783d-17 ;; Complex integrands with custom tolerance (romberg-quadrature (lambda (x) (sin (/ x))) (interval 0.1d0 1d0) :epsilon 1d-8) ; ⇒ 0.8639703768373046d0, 6.453172330487389d-9

Practical Applications

Example: Probability Distributions

;; Compute cumulative distribution function values (defun normal-cdf (x &key (mean 0d0) (stddev 1d0)) "Cumulative distribution function of normal distribution" (let ((z (/ (- x mean) stddev))) (+ 0.5d0 (* 0.5d0 (first (romberg-quadrature (lambda (t) (* (/ (sqrt (* 2 pi))) (exp (- (* 0.5 t t))))) (interval 0d0 z))))))) ;; Standard normal CDF values (normal-cdf 0d0) ; ⇒ 0.5d0 (normal-cdf 1d0) ; ⇒ 0.8413447460685429d0 (normal-cdf -1d0) ; ⇒ 0.15865525393145707d0 (normal-cdf 2d0) ; ⇒ 0.9772498680518208d0 ;; Compute probability between two values (defun normal-probability (a b &key (mean 0d0) (stddev 1d0)) "Probability that normal random variable is between a and b" (- (normal-cdf b :mean mean :stddev stddev) (normal-cdf a :mean mean :stddev stddev))) (normal-probability -1d0 1d0) ; ⇒ 0.6826894921370859d0 (≈ 68.3%) (normal-probability -2d0 2d0) ; ⇒ 0.9544997361036416d0 (≈ 95.4%)

Example: Arc Length Calculation

;; Compute arc length of a curve y = f(x) (defun arc-length (f df a b) "Arc length of curve y=f(x) from x=a to x=b, where df is f'(x)" (romberg-quadrature (lambda (x) (sqrt (1+ (expt (funcall df x) 2)))) (interval a b))) ;; Arc length of parabola y = x² from 0 to 1 (arc-length (lambda (x) (* x x)) ; f(x) = x² (lambda (x) (* 2 x)) ; f'(x) = 2x 0d0 1d0) ; ⇒ 1.4789428575445975d0, 5.551115123125783d-17 ;; Arc length of sine curve from 0 to π (arc-length #'sin ; f(x) = sin(x) #'cos ; f'(x) = cos(x) 0d0 pi) ; ⇒ 3.8201977382081133d0, 1.887379141862766d-15

Example: Expected Values

;; Compute expected value of a function under a probability distribution (defun expected-value (g pdf a b) "E[g(X)] where X has probability density function pdf on [a,b]" (romberg-quadrature (lambda (x) (* (funcall g x) (funcall pdf x))) (interval a b))) ;; Expected value of X² under uniform distribution on [0,1] (expected-value (lambda (x) (* x x)) ; g(x) = x² (lambda (x) 1d0) ; uniform pdf = 1 0d0 1d0) ; ⇒ 0.3333333333333333d0 (exact: 1/3) ;; Expected value of X under exponential distribution (expected-value (lambda (x) x) ; g(x) = x (lambda (x) (exp (- x))) ; exponential pdf 0d0 :plusinf) ; ⇒ 1.0000000000000002d0 (exact: 1)

Example: Fourier Coefficients

;; Compute Fourier coefficients of a periodic function (defun fourier-coefficient (f n period &key (cosine t)) "Compute nth Fourier coefficient (cosine or sine) of function f" (let ((omega (* 2 pi (/ n period)))) (* (/ 2d0 period) (first (romberg-quadrature (lambda (x) (* (funcall f x) (if cosine (cos (* omega x)) (sin (* omega x))))) (interval 0d0 period)))))) ;; Fourier coefficients of square wave (defun square-wave (x) (if (< (mod x 2d0) 1d0) 1d0 -1d0)) (fourier-coefficient #'square-wave 1 2d0 :cosine nil) ; b₁ ; ⇒ 1.2732395447351628d0 (exact: 4/π) (fourier-coefficient #'square-wave 3 2d0 :cosine nil) ; b₃ ; ⇒ 0.4244131815783876d0 (exact: 4/(3π)) (fourier-coefficient #'square-wave 2 2d0 :cosine nil) ; b₂ = 0 ; ⇒ 4.440892098500626d-16 (≈ 0)

Advanced Usage

Example: Improper Integrals with Singularities

;; Handle integrals with removable singularities (defun integrate-with-singularity (f interval singularity-points &key (epsilon 1d-10)) "Integrate function with known singularities by splitting interval" (let* ((splits (sort (copy-list singularity-points) #'<)) (subintervals (split-interval interval splits)) (total 0d0) (total-error 0d0)) (loop for subinterval across subintervals do (multiple-value-bind (value error) (romberg-quadrature f subinterval :open t :epsilon epsilon) (incf total value) (incf total-error error))) (values total total-error))) ;; Example: ∫|sin(x)/x| dx from -π to π (singularity at x=0) (integrate-with-singularity (lambda (x) (if (zerop x) 1d0 (abs (/ (sin x) x)))) (interval (- pi) pi) '(0d0)) ; ⇒ 5.876481158479012d0, 3.552713678800501d-15

Example: Parameter-Dependent Integrals

;; Compute integrals that depend on a parameter (defun gamma-incomplete (s x) "Lower incomplete gamma function γ(s,x)" (first (romberg-quadrature (lambda (t) (* (expt t (1- s)) (exp (- t)))) (interval 0d0 x) :open (< s 1)))) ; Use open interval if s < 1 (gamma-incomplete 2d0 1d0) ; ⇒ 0.2642411176571154d0 (gamma-incomplete 0.5d0 1d0) ; ⇒ 1.4936482656248541d0 ;; Beta function B(a,b) (defun beta-function (a b) "Beta function B(a,b) = ∫₀¹ t^(a-1)(1-t)^(b-1) dt" (first (romberg-quadrature (lambda (t) (* (expt t (1- a)) (expt (- 1 t) (1- b)))) (interval 0d0 1d0) :open t))) (beta-function 2d0 3d0) ; ⇒ 0.08333333333333334d0 (exact: 1/12) (beta-function 0.5d0 0.5d0) ; ⇒ 3.1415926535897927d0 (exact: π)

Performance Notes

  1. Convergence Rate: Romberg quadrature has very fast convergence for smooth functions, typically achieving machine precision in 10-15 iterations.

  2. Coordinate Transformations: For semi-infinite intervals, the function applies transformations that may affect convergence for certain integrands.

  3. Singularities: Use :open t for integrands with endpoint singularities. For interior singularities, split the interval.

  4. Oscillatory Integrands: For highly oscillatory functions, consider using specialized methods or increasing max-iterations.

  5. Error Estimation: The returned error is an estimate based on Richardson extrapolation convergence. The actual error may differ.

  6. Numerical Stability: The implementation uses double-float arithmetic throughout for consistency and stability.

Notes on Usage

  1. Function Smoothness: Romberg quadrature works best for smooth (infinitely differentiable) functions. For functions with discontinuities or kinks, consider splitting the interval at problematic points.

  2. Interval Types: The method supports:

    • Finite intervals: (interval a b)
    • Semi-infinite: (interval a :plusinf) or (interval :minusinf b)
    • Infinite: (interval :minusinf :plusinf)
  3. Open vs Closed:

    • Closed (default): Uses trapezoidal rule, evaluates at endpoints
    • Open: Uses midpoint rule, avoids endpoint evaluation
  4. Convergence Criteria: The algorithm stops when the relative change between successive Richardson extrapolation steps is less than epsilon.

  5. Maximum Iterations: If convergence isn’t achieved within max-iterations, the function returns the best estimate with a larger error bound.

  6. Thread Safety: The function is thread-safe as it doesn’t use global state beyond the input parameters.

Rootfinding

The Rootfinding package provides numerical root-finding algorithms for univariate functions with configurable convergence criteria. Currently implements bisection method with automatic bracketing validation. Features adjustable tolerance (interval width) and epsilon (function value) parameters, supports double-float precision, and returns detailed convergence information including final bracket bounds and whether the root satisfies the epsilon criterion.

*rootfinding-epsilon*

*rootfinding-epsilon*

Dynamic variable that sets the default maximum absolute value of the function at the root. Default value is (expt double-float-epsilon 0.25).

*rootfinding-epsilon* ; ⇒ 1.1920928955078125d-4 ;; Temporarily change epsilon for higher accuracy (let ((*rootfinding-epsilon* 1e-10)) (root-bisection #'sin (interval 3d0 4d0))) ; ⇒ 3.141592653589793d0, -1.2246467991473532d-16, T, 3.141592653589793d0, 3.1415926535897936d0

*rootfinding-delta-relative*

*rootfinding-delta-relative*

Dynamic variable that sets the default relative interval width for rootfinding. Default value is (expt double-float-epsilon 0.25).

*rootfinding-delta-relative* ; ⇒ 1.1920928955078125d-4 ;; Use different relative tolerance (let ((*rootfinding-delta-relative* 1e-6)) (root-bisection #'identity (interval -1 2))) ; ⇒ 0.0, 0.0d0, T, -9.5367431640625d-7, 9.5367431640625d-7

root-bisection

(root-bisection f bracket &key delta epsilon)

Finds the root of function f within the given bracket using the bisection method.

Parameters:

  • f - a univariate function
  • bracket - an interval object containing the root
  • :delta - absolute tolerance for bracket width (defaults to relative tolerance × initial bracket width)
  • :epsilon - tolerance for function value at root (defaults to *rootfinding-epsilon*)

Returns five values:

  1. The root location
  2. The function value at the root
  3. Boolean indicating if |f(root)| ≤ epsilon
  4. Left endpoint of final bracket
  5. Right endpoint of final bracket
;; Test examples from rootfinding test file (let ((*rootfinding-delta-relative* 1e-6) (*num=-tolerance* 1d-2)) ;; Find root of identity function (root at 0) (root-bisection #'identity (interval -1 2))) ; ⇒ 0.0, 0.0d0, T, -9.5367431640625d-7, 9.5367431640625d-7 (let ((*rootfinding-delta-relative* 1e-6) (*num=-tolerance* 1d-2)) ;; Find root of (x-5)³ = 0 at x = 5 (root-bisection (lambda (x) (expt (- x 5) 3)) (interval -1 10))) ; ⇒ 5.000000476837158d0, 5.445199250513759d-14, T, 4.999999523162842d0, 5.000000476837158d0

Helper Functions

opposite-sign?

(opposite-sign? a b)

Tests whether two numbers have opposite signs (one positive, one negative).

(opposite-sign? -1 2) ; ⇒ T (opposite-sign? 1 2) ; ⇒ NIL (opposite-sign? -1 -2) ; ⇒ NIL (opposite-sign? 0 1) ; ⇒ NIL (zero is neither positive nor negative)

narrow-bracket?

(narrow-bracket? a b delta)

Tests whether the interval [a,b] is narrower than delta.

(narrow-bracket? 1.0 1.001 0.01) ; ⇒ T (narrow-bracket? 1.0 2.0 0.5) ; ⇒ NIL (narrow-bracket? -0.5 0.5 1.1) ; ⇒ T

near-root?

(near-root? f epsilon)

Tests whether |f| < epsilon, indicating a value close to a root.

(near-root? 0.0001 0.001) ; ⇒ T (near-root? 0.01 0.001) ; ⇒ NIL (near-root? -0.0001 0.001) ; ⇒ T (uses absolute value)

rootfinding-delta

(rootfinding-delta interval &optional delta-relative)

Computes the absolute tolerance from a relative tolerance and interval width.

(rootfinding-delta (interval 0d0 10d0)) ; ⇒ 0.0011920928955078125d0 (10 × default relative tolerance) (rootfinding-delta (interval -5d0 5d0) 1e-6) ; ⇒ 1.0d-5 (10 × 1e-6)

Usage Examples

Example: Finding Roots of Polynomials

;; Find root of x² - 4 = 0 in [0, 3] (exact root: x = 2) (defun f1 (x) (- (* x x) 4)) (root-bisection #'f1 (interval 0d0 3d0)) ; ⇒ 2.0000000298023224d0, 1.1920928955078125d-7, T, 1.9999999701976776d0, 2.0000000298023224d0 ;; Find root of x³ - 2x - 5 = 0 in [2, 3] (exact root ≈ 2.094551) (defun f2 (x) (- (* x x x) (* 2 x) 5)) (root-bisection #'f2 (interval 2d0 3d0)) ; ⇒ 2.0945514519214863d0, -5.906386491214556d-8, T, 2.094551417231559d0, 2.0945514866109134d0

Example: Transcendental Equations

;; Find where cos(x) = x (fixed point, exact ≈ 0.739085) (defun f3 (x) (- (cos x) x)) (root-bisection #'f3 (interval 0d0 1d0)) ; ⇒ 0.7390851378440857d0, -5.896262174291357d-8, T, 0.7390850857645273d0, 0.7390851899236441d0 ;; Find root of e^x = 3x (has two roots) (defun f4 (x) (- (exp x) (* 3 x))) ;; First root in [0, 1] (root-bisection #'f4 (interval 0d0 1d0)) ; ⇒ 0.6190612792968751d0, 5.872900560954019d-8, T, 0.6190612167119979d0, 0.6190613418817521d0 ;; Second root in [1, 2] (root-bisection #'f4 (interval 1d0 2d0)) ; ⇒ 1.512134553491497d0, -5.823208439077178d-8, T, 1.512134492397309d0, 1.5121346145856858d0

Example: Custom Tolerances

;; High precision root finding for x - sin(x) = 0 (root-bisection (lambda (x) (- x (sin x))) (interval 0.1d0 1d0) :epsilon 1d-12 :delta 1d-12) ; ⇒ 0.5110276571540832d0, -5.551115123125783d-17, T, 0.5110276571540831d0, 0.5110276571540832d0 ;; Lower precision for faster computation (root-bisection (lambda (x) (- x (sin x))) (interval 0.1d0 1d0) :epsilon 1d-3 :delta 1d-3) ; ⇒ 0.5107421875d0, -0.0002839813765924419d0, T, 0.509765625d0, 0.51171875d0

Example: Error Handling

;; Function without roots in bracket signals error (handler-case (root-bisection (lambda (x) (+ 1 (* x x))) ; always positive (interval -1d0 1d0)) (error (e) (format nil "Error: ~A" e))) ; ⇒ "Error: Boundaries don't bracket 0." ;; Bracket must contain sign change (let ((f (lambda (x) (- x 5)))) (handler-case (root-bisection f (interval 6d0 10d0)) ; f > 0 throughout (error () "No sign change in bracket"))) ; ⇒ "No sign change in bracket"

Practical Applications

Example: Finding Interest Rates

;; Find interest rate r such that present value = 1000 ;; for payments of 100/year for 15 years: PV = 100 × [(1-(1+r)^-15)/r] = 1000 (defun pv-annuity (r) (if (zerop r) 1500.0d0 ; limiting case when r → 0 (- (* 100 (/ (- 1 (expt (+ 1 r) -15)) r)) 1000))) (root-bisection #'pv-annuity (interval 0.01d0 0.15d0)) ; ⇒ 0.0579444758594036d0, -1.875277668157615d-5, T, 0.057944431900978086d0, 0.057944519817829134d0 ; Interest rate ≈ 5.79%

Example: Solving Optimization Conditions

;; Find critical points by solving f'(x) = 0 ;; For f(x) = x³ - 3x² - 9x + 5, f'(x) = 3x² - 6x - 9 (defun derivative (x) (- (* 3 x x) (* 6 x) 9)) ;; Find critical point in [-2, 0] (exact: x = -1) (root-bisection #'derivative (interval -2d0 0d0)) ; ⇒ -0.9999999701976776d0, -8.940696716308594d-8, T, -1.0000000298023224d0, -0.9999999701976776d0 ;; Find critical point in [2, 4] (exact: x = 3) (root-bisection #'derivative (interval 2d0 4d0)) ; ⇒ 3.0000000596046448d0, 1.7881393432617188d-7, T, 2.9999999403953552d0, 3.0000000596046448d0

Example: Inverse Function Evaluation

;; Find x such that sinh(x) = 2 (defun sinh-eqn (x) (- (sinh x) 2)) (root-bisection #'sinh-eqn (interval 1d0 2d0)) ; ⇒ 1.4436354846954346d0, -5.92959405947104d-8, T, 1.4436354227364063d0, 1.443635546654463d0 ;; Verify: (sinh 1.4436354846954346d0) ⇒ 1.9999999407040594d0 ;; Find x such that log(x) + x = 2 (defun log-eqn (x) (- (+ (log x) x) 2)) (root-bisection #'log-eqn (interval 0.1d0 2d0)) ; ⇒ 1.5571455955505562d0, 6.041633141658837d-9, T, 1.5571455433964732d0, 1.5571456477046013d0

Performance and Convergence

Example: Convergence Analysis

;; Track iterations by wrapping function (let ((iterations 0)) (flet ((counting-f (x) (incf iterations) (- (* x x) 2))) (multiple-value-bind (root froot within-epsilon a b) (root-bisection #'counting-f (interval 1d0 2d0) :delta 1d-10) (format t "Root: ~A~%Function calls: ~D~%Final bracket width: ~A~%" root iterations (- b a))))) ; Prints: ; Root: 1.4142135623730951d0 ; Function calls: 36 ; Final bracket width: 7.105427357601002d-13 ;; Bisection converges linearly - each iteration halves the bracket

Notes on Usage

  1. Bracket Requirement: The function must have opposite signs at the bracket endpoints. The algorithm will signal an error otherwise.

  2. Convergence Criteria: The algorithm stops when either:

    • The bracket width is less than delta, OR
    • |f(root)| < epsilon
  3. Return Values: The third return value indicates which stopping criterion was met:

    • T means |f(root)| < epsilon (found accurate root)
    • NIL means bracket is narrow but root may not be accurate
  4. Numerical Precision: All computations use double-float arithmetic for consistency.

  5. Multiple Roots: If multiple roots exist in the bracket, bisection will find one of them (not necessarily any particular one).

  6. Performance: Bisection has guaranteed convergence but is slower than methods like Newton-Raphson. It requires approximately log₂(initial_bracket/tolerance) iterations.

  7. Robustness: Bisection is very robust - it will always converge if the initial bracket contains a root and the function is continuous.

Test-Utilities

The Test-Utilities package provides utilities for testing accuracy of mathematical functions against reference values. Features functions to compare implementations, measure relative errors, and generate statistical reports including min/max/mean errors, variance, and RMS. Supports testing against known values, reference implementations, or pre-computed vectors. Returns detailed test-results structure with error statistics and worst-case identification.

test-results

test-results

Structure containing statistical differences between reference values and computed values.

Fields:

  • worst-case - integer row index where the worst error occurred
  • min-error - smallest relative error found (double-float)
  • max-error - largest relative error found (double-float)
  • mean-error - mean of all errors (double-float)
  • test-count - number of test cases (integer)
  • variance0 - population variance of the errors (double-float)
  • variance1 - sample (unbiased) variance of the errors (double-float)
  • rms - Root Mean Square of the errors (double-float)
;; Create and access test results (let ((results (make-test-results :worst-case 5 :min-error 1d-10 :max-error 1d-6 :mean-error 1d-8 :test-count 100 :variance0 1d-12 :variance1 1.01d-12 :rms 1d-7))) (list (worst-case results) ; ⇒ 5 (min-error results) ; ⇒ 1d-10 (max-error results) ; ⇒ 1d-6 (mean-error results) ; ⇒ 1d-8 (test-count results) ; ⇒ 100 (variance0 results) ; ⇒ 1d-12 (variance1 results) ; ⇒ 1.01d-12 (rms results))) ; ⇒ 1d-7

test-fn

(test-fn test-name function data)

Compares a function against reference data containing input and expected output values. Returns a test-results structure with error statistics.

Parameters:

  • test-name - string or symbol naming the test (for error messages)
  • function - function to test (should accept arguments from data)
  • data - 2D array where each row contains [input₁ … inputₙ expected-output]
;; Example: Testing a square root implementation (defparameter *sqrt-test-data* #2A((1.0 1.0) (4.0 2.0) (9.0 3.0) (16.0 4.0) (25.0 5.0))) (test-fn "sqrt" #'sqrt *sqrt-test-data*) ; ⇒ #S(TEST-RESULTS ; :WORST-CASE 0 ; :MIN-ERROR 0.0d0 ; :MAX-ERROR 0.0d0 ; :MEAN-ERROR 0.0d0 ; :TEST-COUNT 5 ; :VARIANCE0 0.0d0 ; :VARIANCE1 0.0d0 ; :RMS 0.0d0) ;; Testing with small errors (defun approx-sqrt (x) (* (sqrt x) (+ 1 (* 1d-6 (random 2.0) (- (random 2.0)))))) (let ((results (test-fn "approx-sqrt" #'approx-sqrt *sqrt-test-data*))) (format t "Max error: ~,2e~%" (max-error results)) (format t "RMS error: ~,2e~%" (rms results))) ; Max error: 1.00e-6 ; RMS error: 5.77e-7

compare-fns

(compare-fns test-name function reference-function data)

Compares two function implementations by evaluating both on the same inputs and measuring relative differences.

Parameters:

  • test-name - string or symbol naming the comparison
  • function - function under test
  • reference-function - reference implementation to compare against
  • data - 2D array where each row contains input arguments
;; Example: Compare two exponential implementations (defparameter *exp-test-inputs* #2A((0.0) (1.0) (-1.0) (10.0) (-10.0))) ;; Compare built-in exp with Taylor series approximation (defun exp-taylor (x) "Simple Taylor series approximation of exp(x)" (let ((sum 1.0d0) (term 1.0d0)) (loop for n from 1 to 20 do (setf term (* term (/ x n))) (incf sum term)) sum)) (compare-fns "exp-taylor" #'exp-taylor #'exp *exp-test-inputs*) ; ⇒ #S(TEST-RESULTS ; :WORST-CASE 3 ; :MIN-ERROR 0.0d0 ; :MAX-ERROR 2.688117141816135d-10 ; :MEAN-ERROR 5.376234283632267d-11 ; :TEST-COUNT 5 ; :VARIANCE0 1.1515628172868944d-20 ; :VARIANCE1 1.4394535216086182d-20 ; :RMS 1.2030913128542057d-10)

compare-vectors

(compare-vectors test-name vector reference-vector)

Compares two pre-computed vectors of values element by element.

Parameters:

  • test-name - string or symbol naming the comparison
  • vector - computed values to test
  • reference-vector - reference values to compare against
;; Example: Compare precomputed function values (let* ((x-values (vector 0.0 0.1 0.2 0.3 0.4 0.5)) (computed (map 'vector (lambda (x) (sin x)) x-values)) (reference (vector 0.0d0 0.09983341664682815d0 0.19866933079506122d0 0.29552020666133955d0 0.38941834230865045d0 0.47942553860420306d0))) (compare-vectors "sin-values" computed reference)) ; ⇒ #S(TEST-RESULTS ; :WORST-CASE 0 ; :MIN-ERROR 0.0d0 ; :MAX-ERROR 2.220446049250313d-16 ; :MEAN-ERROR 3.7007434154172195d-17 ; :TEST-COUNT 6 ; :VARIANCE0 8.630170314869144d-33 ; :VARIANCE1 1.0356204377842972d-32 ; :RMS 9.291498646471065d-17) ;; Example with larger errors (let ((computed (vector 1.0 2.001 2.999 4.002)) (reference (vector 1.0 2.0 3.0 4.0))) (compare-vectors "small-errors" computed reference)) ; Shows relative errors around 0.0005

Practical Examples

Example: Testing Special Function Implementations

;; Test a Bessel function implementation (defparameter *bessel-j0-data* ;; x, J₀(x) reference values #2A((0.0 1.0) (0.5 0.93846980724081290423) (1.0 0.76519768655796655145) (2.0 0.22389077914123566805) (5.0 -0.17759677131433830435) (10.0 -0.24593576445134833520))) (defun my-bessel-j0 (x) "Placeholder for user's Bessel J0 implementation" ;; In reality, this would be the function being tested (cos x)) ; Wrong implementation for demonstration (let ((results (test-fn "bessel-j0" #'my-bessel-j0 *bessel-j0-data*))) (format t "Testing Bessel J0 implementation:~%") (format t " Test count: ~D~%" (test-count results)) (format t " Max error: ~,2e~%" (max-error results)) (format t " RMS error: ~,2e~%" (rms results)) (format t " Worst case at row: ~D~%" (worst-case results))) ; Testing Bessel J0 implementation: ; Test count: 6 ; Max error: 1.24e+0 ; RMS error: 5.91e-1 ; Worst case at row: 1

Example: Comparing Optimization Algorithms

;; Compare different matrix multiplication algorithms (defun naive-matmul (a b) "Simple matrix multiplication" (let* ((m (array-dimension a 0)) (n (array-dimension b 1)) (k (array-dimension a 1)) (result (make-array (list m n) :element-type 'double-float))) (dotimes (i m) (dotimes (j n) (setf (aref result i j) (loop for p below k sum (* (aref a i p) (aref b p j)))))) result)) ;; Generate test cases for 2x2 matrices (defparameter *matmul-test-data* (let ((data (make-array '(10 8) :element-type 'double-float))) (loop for i below 10 do ;; Random 2x2 matrices A and B (flattened) (loop for j below 8 do (setf (aref data i j) (random 10.0d0)))) data)) ;; Assume we have a reference implementation (defun reference-matmul (a b) ;; Same as naive but serves as reference (naive-matmul a b)) ;; Test wrapper that reconstructs matrices (defun matmul-wrapper (a11 a12 a21 a22 b11 b12 b21 b22) (let ((a (make-array '(2 2) :initial-contents `((,a11 ,a12) (,a21 ,a22)))) (b (make-array '(2 2) :initial-contents `((,b11 ,b12) (,b21 ,b22)))) (result (naive-matmul a b))) ;; Return flattened result for comparison (vector (aref result 0 0) (aref result 0 1) (aref result 1 0) (aref result 1 1)))) (compare-fns "matmul" #'matmul-wrapper #'matmul-wrapper *matmul-test-data*) ; Shows near-zero errors for identical implementations

Example: Validating Numerical Integration

;; Test numerical integration against known integrals (defparameter *integration-test-data* ;; Function parameters and expected integral value ;; ∫₀¹ x^n dx = 1/(n+1) #2A((0.0 1.0) ; ∫₀¹ x⁰ dx = 1 (1.0 0.5) ; ∫₀¹ x¹ dx = 1/2 (2.0 0.333333333333333d0) ; ∫₀¹ x² dx = 1/3 (3.0 0.25) ; ∫₀¹ x³ dx = 1/4 (4.0 0.2))) ; ∫₀¹ x⁴ dx = 1/5 (defun integrate-power (n) "Integrate x^n from 0 to 1 using simple trapezoid rule" (let ((steps 1000) (sum 0.0d0)) (loop for i from 1 below steps for x = (/ i (float steps 1d0)) do (incf sum (expt x n))) (/ sum steps))) (let ((results (test-fn "power-integration" #'integrate-power *integration-test-data*))) (format t "Integration test results:~%") (format t " Mean relative error: ~,2e~%" (mean-error results)) (format t " Maximum error: ~,2e~%" (max-error results)) (when (< (max-error results) 1d-3) (format t " ✓ All tests passed with < 0.1% error~%")))

Example: Cross-Platform Consistency

;; Compare results across different numeric types (defun test-numeric-consistency () "Test that algorithms give consistent results across numeric types" (let ((test-values #(0.1 0.5 1.0 2.0 10.0))) ;; Single-float version (defun log-single (x) (log (coerce x 'single-float))) ;; Double-float version (defun log-double (x) (log (coerce x 'double-float))) ;; Compare implementations (let ((single-results (map 'vector #'log-single test-values)) (double-results (map 'vector #'log-double test-values))) (compare-vectors "single-vs-double-log" single-results double-results)))) (test-numeric-consistency) ; Shows relative errors around single-float precision (1e-7)

Notes on Usage

  1. Relative Error Metric: The package uses num-delta for computing relative errors, which handles both small and large values appropriately.

  2. Array Format: Test data arrays should have inputs in the first columns and expected output in the last column for test-fn.

  3. Statistical Measures:

    • variance0 is the population variance (divides by n)
    • variance1 is the sample variance (divides by n-1)
    • rms provides a single measure of typical error magnitude
  4. Error Identification: The worst-case field helps identify which test case needs the most attention.

  5. Function Signatures: When testing multi-argument functions, ensure the data array has the correct number of columns.

  6. Performance: For large test suites, consider breaking tests into smaller batches to get intermediate results.

  7. Integration: This package was designed to support the special-functions library but works well for testing any numerical computations.

Utilities

A collection of utilities to work with floating point values. Optimised for double-float. Provides type conversion functions, vector creation utilities, sequence generation, binary search, and utility macros including currying, multiple bindings, and conditional splicing. Features specialized array types for fixnum, boolean, and floating-point vectors with conversion functions.

Hash Table Utilities

gethash*

(gethash* key hash-table &optional (datum "Key not found.") &rest arguments)

Like gethash, but checks that key is present and raises an error if not found.

(let ((ht (make-hash-table :test 'equal))) (setf (gethash "key" ht) "value") (gethash* "key" ht)) ; ⇒ "value" (let ((ht (make-hash-table :test 'equal))) (gethash* "missing" ht "Key ~A not found" "missing")) ; ⇒ ERROR: Key missing not found

Conditional Splicing

splice-when

(splice-when test &body forms)

Similar to when, but wraps the result in a list for use with splicing operators.

(let ((add-middle t)) `(start ,@(splice-when add-middle 'middle) end)) ; ⇒ (START MIDDLE END) (let ((add-middle nil)) `(start ,@(splice-when add-middle 'middle) end)) ; ⇒ (START END)

splice-awhen

(splice-awhen test &body forms)

Anaphoric version of splice-when that binds the test result to it.

(let ((value 42)) `(result ,@(splice-awhen value `(found ,it)))) ; ⇒ (RESULT FOUND 42) (let ((value nil)) `(result ,@(splice-awhen value `(found ,it)))) ; ⇒ (RESULT)

Functional Utilities

curry*

(curry* function &rest arguments)

Currying macro that accepts * as placeholders for arguments to be supplied later.

(funcall (curry* + 5 *) 3) ; ⇒ 8 (funcall (curry* list 'a * 'c) 'b) ; ⇒ (A B C) (funcall (curry* - * 3) 10) ; ⇒ 7 ;; Multiple placeholders (funcall (curry* + * * 5) 2 3) ; ⇒ 10

Type Checking

check-types

(check-types (&rest arguments) type)

Applies check-type to multiple places of the same type.

(let ((a 1.0d0) (b 2.0d0) (c 3.0d0)) (check-types (a b c) double-float) (+ a b c)) ; ⇒ 6.0d0 (let ((x 1) (y 2.0d0)) (check-types (x y) double-float)) ; ⇒ ERROR: The value of X is 1, which is not of type DOUBLE-FLOAT

Multiple Bindings

define-with-multiple-bindings

(define-with-multiple-bindings macro &key (plural) (docstring))

Defines a version of a macro that accepts multiple bindings as a list.

;; Example usage (typically used to create macros like let+s from let+) (define-with-multiple-bindings let+ :plural let+s) ;; This creates a let+s macro that can be used like: (let+s ((x 1) ((&plist y z) '(:y 2 :z 3))) (+ x y z)) ; ⇒ 6

Numeric Predicates

within?

(within? left value right)

Returns non-nil if value is in the interval [left, right).

(within? 0 0.5 1) ; ⇒ T (within? 0 1 1) ; ⇒ NIL (right boundary exclusive) (within? -1 0 1) ; ⇒ T (within? 5 3 10) ; ⇒ NIL

fixnum?

(fixnum? object)

Checks if object is of type fixnum.

(fixnum? 42) ; ⇒ T (fixnum? 3.14) ; ⇒ NIL (fixnum? most-positive-fixnum) ; ⇒ T (fixnum? (1+ most-positive-fixnum)) ; ⇒ NIL

Type Definitions

simple-fixnum-vector

simple-fixnum-vector

Type definition for simple one-dimensional arrays of fixnums.

(typep #(1 2 3) 'simple-fixnum-vector) ; ⇒ T (implementation-dependent) (make-array 5 :element-type 'fixnum) ; Creates simple-fixnum-vector

simple-boolean-vector

simple-boolean-vector

Type definition for simple one-dimensional arrays of booleans.

(let ((vec (make-array 3 :initial-contents '(t nil t)))) (typep vec 'simple-boolean-vector)) ; ⇒ T (if all elements are boolean)

simple-single-float-vector

simple-single-float-vector

Type definition for simple one-dimensional arrays of single-floats.

(make-array 3 :element-type 'single-float :initial-contents '(1.0 2.0 3.0)) ; Creates simple-single-float-vector

simple-double-float-vector

simple-double-float-vector

Type definition for simple one-dimensional arrays of double-floats.

(make-array 3 :element-type 'double-float :initial-contents '(1.0d0 2.0d0 3.0d0)) ; Creates simple-double-float-vector

Type Conversion Functions

as-simple-fixnum-vector

(as-simple-fixnum-vector sequence &optional copy?)

Converts sequence to a simple-fixnum-vector.

(as-simple-fixnum-vector '(1 2 3)) ; ⇒ #(1 2 3) (as-simple-fixnum-vector #(4 5 6)) ; ⇒ #(4 5 6) ;; With copy flag (let ((original #(1 2 3))) (eq original (as-simple-fixnum-vector original))) ; ⇒ T (let ((original #(1 2 3))) (eq original (as-simple-fixnum-vector original t))) ; ⇒ NIL

as-bit-vector

(as-bit-vector vector)

Converts a vector to a bit vector, mapping non-nil to 1 and nil to 0.

(as-bit-vector #(t nil t nil t)) ; ⇒ #*10101 (as-bit-vector '(1 nil 0 nil "hello")) ; ⇒ #*10101 (as-bit-vector #(nil nil nil)) ; ⇒ #*000

as-double-float

(as-double-float number)

Converts a number to double-float.

(as-double-float 5) ; ⇒ 5.0d0 (as-double-float 1/2) ; ⇒ 0.5d0 (as-double-float 3.14) ; ⇒ 3.14d0 (converted to double)

with-double-floats

(with-double-floats bindings &body body)

Macro that coerces values to double-float and binds them to variables.

(with-double-floats ((a 1) (b 1/2) (c 3.14)) (list a b c)) ; ⇒ (1.0d0 0.5d0 3.14d0) ;; Variable name can be inferred (let ((x 5) (y 2)) (with-double-floats (x y) (/ x y))) ; ⇒ 2.5d0

as-simple-double-float-vector

(as-simple-double-float-vector sequence &optional copy?)

Converts sequence to a simple-double-float-vector.

(as-simple-double-float-vector '(1 2 3)) ; ⇒ #(1.0d0 2.0d0 3.0d0) (as-simple-double-float-vector #(1.5 2.5)) ; ⇒ #(1.5d0 2.5d0) (as-simple-double-float-vector '(1/2 1/3)) ; ⇒ #(0.5d0 0.3333333333333333d0)

Vector Creation

make-vector

(make-vector element-type &rest initial-contents)

Creates a vector with specified element type and initial contents.

(make-vector 'fixnum 1 2 3 4) ; ⇒ #(1 2 3 4) (make-vector 'double-float 1.0 2.0) ; ⇒ #(1.0d0 2.0d0) (make-vector 'character #\a #\b #\c) ; ⇒ #(#\a #\b #\c)

generate-sequence

(generate-sequence result-type size function)

Creates a sequence by repeatedly calling function.

(generate-sequence 'vector 5 (lambda () (random 10))) ; ⇒ #(3 7 1 9 2) ; Random values (generate-sequence '(vector double-float) 3 (lambda () (random 1.0d0))) ; ⇒ #(0.23d0 0.87d0 0.45d0) ; Random double-floats (let ((counter 0)) (generate-sequence 'list 4 (lambda () (incf counter)))) ; ⇒ (1 2 3 4)

Utility Functions

expanding

(expanding &body body)

Expands body at macro-expansion time. Useful for code generation.

;; Typically used in macro definitions for programmatic code generation (defmacro make-accessors (slots) (expanding `(progn ,@(loop for slot in slots collect `(defun ,(intern (format nil "GET-~A" slot)) (obj) (slot-value obj ',slot))))))

bic

(bic a b)

Biconditional function. Returns true if both arguments have the same truth value.

(bic t t) ; ⇒ T (bic nil nil) ; ⇒ T (bic t nil) ; ⇒ NIL (bic nil t) ; ⇒ NIL ;; Useful for logical equivalence testing (bic (> 5 3) (< 2 4)) ; ⇒ T (both true) (bic (> 5 3) (< 4 2)) ; ⇒ NIL (different truth values)

(binary-search sorted-reals value)

Performs binary search on a sorted vector of real numbers.

(let ((sorted-vec #(1.0 3.0 5.0 7.0 9.0))) (binary-search sorted-vec 5.0)) ; ⇒ 2 (index where 5.0 would go) (let ((sorted-vec #(1.0 3.0 5.0 7.0 9.0))) (binary-search sorted-vec 4.0)) ; ⇒ 1 (between indices 1 and 2) (let ((sorted-vec #(1.0 3.0 5.0 7.0 9.0))) (binary-search sorted-vec 0.0)) ; ⇒ NIL (below minimum) (let ((sorted-vec #(1.0 3.0 5.0 7.0 9.0))) (binary-search sorted-vec 10.0)) ; ⇒ T (above maximum)

Generic Conversion

as-alist

(as-alist object)

Generic function to convert objects to association lists. Methods defined for various types.

;; Default behavior depends on object type ;; Hash tables convert key-value pairs to alist (let ((ht (make-hash-table :test 'equal))) (setf (gethash "a" ht) 1 (gethash "b" ht) 2) (as-alist ht)) ; ⇒ (("a" . 1) ("b" . 2)) ; Order may vary

as-plist

(as-plist object)

Generic function to convert objects to property lists. Default method uses as-alist.

;; Default implementation converts through alist (let ((ht (make-hash-table :test 'equal))) (setf (gethash "a" ht) 1 (gethash "b" ht) 2) (as-plist ht)) ; ⇒ ("a" 1 "b" 2) ; Order may vary

Practical Examples

Example: Type-Safe Vector Operations

;; Create and manipulate typed vectors efficiently (let* ((indices (as-simple-fixnum-vector '(0 1 2 3 4))) (values (as-simple-double-float-vector '(0.0 1.0 1.4 1.7 2.0)))) (with-double-floats ((threshold 1.5)) (loop for i across indices for v across values when (>= v threshold) collect (cons i v)))) ; ⇒ ((3 . 1.7d0) (4 . 2.0d0))

Example: Functional Programming with Currying

;; Create specialized functions using curry* (let* ((add-tax (curry* * 1.08 *)) ; 8% tax (format-currency (curry* format nil "$~,2F" *)) (prices '(10.00 25.50 99.99))) (mapcar (lambda (price) (format-currency (funcall add-tax price))) prices)) ; ⇒ ("$10.80" "$27.54" "$107.99")

Example: Conditional List Building

;; Build lists conditionally using splice-when (defun make-command (base-cmd &key verbose debug output-file) `(,base-cmd ,@(splice-when verbose "--verbose") ,@(splice-when debug "--debug") ,@(splice-awhen output-file `("--output" ,it)))) (make-command "process" :verbose t :output-file "result.txt") ; ⇒ ("process" "--verbose" "--output" "result.txt") (make-command "process" :debug t) ; ⇒ ("process" "--debug")

Example: Binary Search for Interpolation

;; Use binary search for table lookup with interpolation (defun interpolate-table (x-values y-values x) (let ((index (binary-search x-values x))) (cond ((null index) (first y-values)) ; Below range ((eq index t) (first (last y-values))) ; Above range (t ; Interpolate between points (let* ((i index) (x1 (aref x-values i)) (x2 (aref x-values (1+ i))) (y1 (aref y-values i)) (y2 (aref y-values (1+ i))) (alpha (/ (- x x1) (- x2 x1)))) (+ y1 (* alpha (- y2 y1)))))))) (let ((x-vals #(0.0 1.0 2.0 3.0 4.0)) (y-vals #(0.0 1.0 4.0 9.0 16.0))) ; y = x² (list (interpolate-table x-vals y-vals 1.5) ; Between 1 and 2 (interpolate-table x-vals y-vals 2.5))) ; Between 2 and 3 ; ⇒ (2.5 6.5)

Example: Sequence Generation Patterns

;; Generate sequences with different patterns (let* ((fibonacci (let ((a 1) (b 1)) (generate-sequence 'vector 10 (lambda () (let ((result a)) (setf a b b (+ a b)) result))))) (powers-of-2 (generate-sequence 'vector 8 (let ((power 0)) (lambda () (prog1 (expt 2 power) (incf power)))))) (random-bools (generate-sequence 'vector 5 (lambda () (< (random 1.0) 0.5))))) (list fibonacci powers-of-2 (as-bit-vector random-bools))) ; ⇒ (#(1 1 2 3 5 8 13 21 34 55) ; #(1 2 4 8 16 32 64 128) ; #*10110) ; Random bit pattern

Notes on Usage

  1. Type Optimization: Use specific vector types like simple-double-float-vector for better performance in numeric computations.

  2. Memory Efficiency: The copy? parameter in conversion functions controls whether data is copied or shared.

  3. Currying with Placeholders: curry* uses * as placeholders, making it more flexible than traditional currying.

  4. Binary Search Semantics: Returns the insertion point for values not found, nil for values below range, t for values above range.

  5. Conditional Splicing: Use splice-when and splice-awhen with backquote for building lists conditionally.

  6. Type Checking: check-types provides a convenient way to validate multiple variables of the same type.

  7. Sequence Generation: generate-sequence is more flexible than make-sequence when you need computed initial values.

  8. Double-Float Preference: The package emphasizes double-float precision for numerical stability in scientific computing.

5.5 - Linear Algebra

Linear Algebra for Common Lisp

Overview

LLA (Lisp Linear Algebra) is a high-level Common Lisp library for numerical linear algebra operations. It provides a Lisp-friendly interface to BLAS and LAPACK libraries, allowing you to work with matrices and vectors using Lisp’s native array types while leveraging the performance of optimized numerical libraries.

The library is designed to work with dense matrices (rank-2 arrays) containing numerical values. While categorical variables can be integer-coded if needed, LLA is primarily intended for continuous numerical data.

Setup

lla requires a BLAS and LAPACK shared library. These may be available via your operating systems package manager, or you can download OpenBLAS, which includes precompiled binaries for MS Windows.

If you’re working on UNIX or Linux and have the BLAS library installed, LLA should ‘just work’. If you’ve installed to a custom location, or on MS Windows, you’ll need to tell LLA where your libraries are.

configuration

LLA can be configured before loading by setting the cl-user::*lla-configuration* variable. This allows you to specify custom library paths and enable various optimizations:

(defvar cl-user::*lla-configuration* '(:libraries ("s:/src/lla/lib/libopenblas.dll")))

The configuration accepts the following options:

  • :libraries - List of paths to BLAS/LAPACK libraries
  • :int64 - Use 64-bit integers (default: nil)
  • :efficiency-warnings - Enable efficiency warnings (default: nil)
    • :array-type - Warn when array types need elementwise checking
    • :array-conversion - Warn when arrays need copying for foreign calls

Use the location specific to your system.

loading LLA

To load lla:

(asdf:load-system :lla) (use-package 'lla) ;access to the symbols

getting started

To make working with matrices easier, we’re going to use the matrix-shorthand library. Load it like so:

(use-package :num-utils.matrix-shorthand)

Here’s a simple example demonstrating matrix multiplication:

(let ((a (mx 'lla-double (1 2) (3 4) (5 6))) (b2 (vec 'lla-double 1 2))) (mm a b2)) ; => #(5.0d0 11.0d0 17.0d0)

The mx macro creates a matrix, vec creates a vector, and mm performs matrix multiplication.

Numeric types

LLA provides type synonyms for commonly used numeric types. These serve as optimization hints, similar to element-type declarations in MAKE-ARRAY:

  • lla-integer - Signed integers (32 or 64-bit depending on configuration)
  • lla-single - Single-precision floating point
  • lla-double - Double-precision floating point
  • lla-complex-single - Single-precision complex numbers
  • lla-complex-double - Double-precision complex numbers

These types help LLA avoid runtime type detection and enable more efficient operations.

Matrix Types

LLA supports specialized matrix types that take advantage of mathematical properties for more efficient storage and computation. These types are provided by the num-utils.matrix package and can be used interchangeably with regular arrays thanks to the aops protocol.

diagonal

Diagonal matrices store only the diagonal elements in a vector, saving memory and computation time. Off-diagonal elements are implicitly zero.

(diagonal-matrix #(1 2 3)) ; => #<DIAGONAL-MATRIX 3x3 ; 1 . . ; . 2 . ; . . 3> ;; Converting to a regular array shows the full structure (aops:as-array (diagonal-matrix #(1 2 3))) ; => #2A((1 0 0) ; (0 2 0) ; (0 0 3))

You can extract and modify the diagonal elements:

(diagonal-vector (diagonal-matrix #(4 5 6))) ; => #(4 5 6) ;; Set new diagonal values (let ((d (diagonal-matrix #(1 2 3)))) (setf (diagonal-vector d) #(7 8 9)) d) ; => #<DIAGONAL-MATRIX 3x3 ; 7 . . ; . 8 . ; . . 9>

triangular

Triangular matrices come in two varieties: lower and upper triangular. Elements outside the triangular region are treated as zero, though they may contain arbitrary values in the internal storage.

Lower Triangular

Lower triangular matrices have all elements above the diagonal as zero:

(lower-triangular-matrix #2A((1 999) (2 3))) ; => #<LOWER-TRIANGULAR-MATRIX 2x2 ; 1 . ; 2 3> ;; Converting to array shows the zero structure (aops:as-array (lower-triangular-matrix #2A((1 999) (2 3)))) ; => #2A((1 0) ; (2 3))

Upper Triangular

Upper triangular matrices have all elements below the diagonal as zero:

(upper-triangular-matrix #2A((1 2) (999 3))) ; => #<UPPER-TRIANGULAR-MATRIX 2x2 ; 1 2 ; . 3> ;; Converting to array shows the zero structure (aops:as-array (upper-triangular-matrix #2A((1 2) (999 3)))) ; => #2A((1 2) ; (0 3))

Transposing switches between upper and lower triangular:

(transpose (lower-triangular-matrix #2A((1 0) (2 3)))) ; => #<UPPER-TRIANGULAR-MATRIX 2x2 ; 1 2 ; . 3>

hermitian

Hermitian matrices are equal to their conjugate transpose. For real-valued matrices, this means symmetric matrices. Only the lower triangle needs to be stored, with the upper triangle automatically filled by conjugation.

;; Real symmetric matrix (hermitian-matrix #2A((1 2) (2 3))) ; => #<HERMITIAN-MATRIX 2x2 ; 1 . ; 2 3> ;; Converting to array shows the symmetric structure (aops:as-array (hermitian-matrix #2A((1 2) (2 3)))) ; => #2A((1 2) ; (2 3))

For complex matrices, the upper triangle is the conjugate of the lower:

(hermitian-matrix #2A((#C(1 0) #C(2 1)) (#C(2 1) #C(3 0)))) ; => #<HERMITIAN-MATRIX 2x2 ; #C(1 0) . ; #C(2 1) #C(3 0)> ;; Converting shows the conjugate symmetry (aops:as-array (hermitian-matrix #2A((#C(1 0) #C(2 1)) (#C(2 1) #C(3 0))))) ; => #2A((#C(1 0) #C(2 -1)) ; (#C(2 1) #C(3 0)))

as-arrays

All matrix types support the aops protocol, making them interchangeable with regular arrays:

;; Dimensions (aops:dims (diagonal-matrix #(1 2 3))) ; => (3 3) ;; Element type (aops:element-type (hermitian-matrix #2A((1.0 2.0) (2.0 3.0)))) ; => DOUBLE-FLOAT ;; Size (aops:size (upper-triangular-matrix #2A((1 2 3) (0 4 5) (0 0 6)))) ; => 9 ;; Rank (aops:rank (lower-triangular-matrix #2A((1 0) (2 3)))) ; => 2

You can also use array displacement and slicing:

;; Flatten to a vector (aops:flatten (diagonal-matrix #(1 2 3))) ; => #(1 0 0 0 2 0 0 0 3) ;; Slice operations with select (select (upper-triangular-matrix #2A((1 2 3) (0 4 5) (0 0 6))) 0 t) ; First row, all columns ; => #(1 2 3)

The specialized matrix types automatically maintain their structure during operations, providing both memory efficiency and computational advantages while remaining compatible with generic array operations.

Shorthand

The num-utils.matrix-shorthand package provides convenient macros and functions for creating matrices and vectors with specific element types. These constructors simplify the creation of typed arrays and specialized matrix structures.

vec

vec creates a vector with elements coerced to the specified element type:

(vec t 1 2 3) ; => #(1 2 3) (vec 'double-float 1 2 3) ; => #(1.0d0 2.0d0 3.0d0) (vec 'single-float 1/2 3/4 5/6) ; => #(0.5 0.75 0.8333333)

The first argument specifies the element type, followed by the vector elements. Each element is coerced to the specified type.

mx

mx is a macro for creating dense matrices (rank 2 arrays) from row specifications:

(mx t (1 2 3) (4 5 6)) ; => #2A((1 2 3) ; (4 5 6)) (mx 'double-float (1 2) (3 4)) ; => #2A((1.0d0 2.0d0) ; (3.0d0 4.0d0))

Each row is specified as a list. Elements are coerced to the specified type.

diagonal-mx

diagonal-mx creates a diagonal matrix from the given diagonal elements:

(diagonal-mx t 1 2 3) ; => #<DIAGONAL-MATRIX 3x3 ; 1 . . ; . 2 . ; . . 3> (diagonal-mx 'double-float 4 5 6) ; => #<DIAGONAL-MATRIX 3x3 ; 4.0d0 . . ; . 5.0d0 . ; . . 6.0d0>

The resulting diagonal matrix stores only the diagonal elements, with off-diagonal elements implicitly zero.

lower-triangular-mx

lower-triangular-mx creates a lower triangular matrix. Elements above the diagonal are ignored, and rows are padded with zeros as needed:

(lower-triangular-mx t (1) (3 4)) ; => #<LOWER-TRIANGULAR-MATRIX 2x2 ; 1 . ; 3 4> ;; Elements above diagonal are ignored (lower-triangular-mx t (1 9) ; 9 is ignored (3 4)) ; => #<LOWER-TRIANGULAR-MATRIX 2x2 ; 1 . ; 3 4>

upper-triangular-mx

upper-triangular-mx creates an upper triangular matrix. Elements below the diagonal are replaced with zeros:

(upper-triangular-mx t (1 2) (3 4)) ; => #<UPPER-TRIANGULAR-MATRIX 2x2 ; 1 2 ; . 4> ;; Elements below diagonal become zero (upper-triangular-mx t (1 2) (9 4)) ; 9 becomes 0 ; => #<UPPER-TRIANGULAR-MATRIX 2x2 ; 1 2 ; . 4> (upper-triangular-mx 'double-float (1 2 3) (4 5 6) (7 8 9)) ; => #<UPPER-TRIANGULAR-MATRIX 3x3 ; 1.0d0 2.0d0 3.0d0 ; . 5.0d0 6.0d0 ; . . 9.0d0>

hermitian-mx

hermitian-mx creates a Hermitian matrix (symmetric for real values). Only the lower triangle needs to be specified, and elements above the diagonal are ignored:

(hermitian-mx t (1) (3 4)) ; => #<HERMITIAN-MATRIX 2x2 ; 1 . ; 3 4> ;; Elements above diagonal are ignored (hermitian-mx t (1 9) ; 9 is ignored (3 4)) ; => #<HERMITIAN-MATRIX 2x2 ; 1 . ; 3 4>

For Hermitian matrices, attempting to create non-square matrices will signal an error:

(hermitian-mx t (1 2 3) (3 4 5)) ; Error: rows too long for matrix

All shorthand constructors coerce elements to the specified type and return specialized matrix objects that are memory-efficient and compatible with the aops protocol for array operations.

Matrix Arithmetic

The num-utils.elementwise package provides a comprehensive set of elementwise operations for arrays and numerical objects. These operations extend Common Lisp’s standard arithmetic to work seamlessly with arrays, matrices, and vectors while preserving numerical type precision through automatic type contagion.

intuition

Elementwise arithmetic operates on corresponding elements of arrays, applying the specified operation to each pair of elements independently. This differs fundamentally from algebraic matrix operations like matrix multiplication:

  • Algebraic operations follow mathematical rules (e.g., matrix multiplication requires compatible dimensions: m×n · n×p = m×p)
  • Elementwise operations require identical dimensions and operate position-by-position

For example:

;; Elementwise multiplication (e*) (e* (mx 'double-float (1 2) (3 4)) (mx 'double-float (5 6) (7 8))) ; => #2A((5.0d0 12.0d0) ; 1×5=5, 2×6=12 ; (21.0d0 32.0d0)) ; 3×7=21, 4×8=32 ;; Algebraic matrix multiplication (mm) (mm (mx 'double-float (1 2) (3 4)) (mx 'double-float (5 6) (7 8))) ; => #2A((19.0d0 22.0d0) ; 1×5+2×7=19, 1×6+2×8=22 ; (43.0d0 50.0d0)) ; 3×5+4×7=43, 3×6+4×8=50

type contagion

The elementwise-float-contagion function automatically determines the appropriate result type when combining different numeric types, following Common Lisp’s numeric contagion rules but extended for arrays:

(e+ (vec 'single-float 1.0 2.0) ; single-float array (vec 'double-float 3.0 4.0)) ; double-float array ; => #(4.0d0 6.0d0) ; result is double-float (e* 2 ; integer scalar (vec 'double-float 1.5 2.5)) ; double-float array ; => #(3.0d0 5.0d0) ; result is double-float

unary operations

Unary operations apply a function to each element of an array independently. They work with both scalars and arrays:

;; Unary operations on vectors (e1- (vec 'double-float 1 2 3 4 5)) ; => #(-1.0d0 -2.0d0 -3.0d0 -4.0d0 -5.0d0) (esqrt (vec 'double-float 4 9 16 25)) ; => #(2.0d0 3.0d0 4.0d0 5.0d0) ;; Unary operations on matrices (eabs (mx 'double-float (-1 2.5) (-3.7 4))) ; => #2A((1.0d0 2.5d0) ; (3.7d0 4.0d0)) (elog (mx 'double-float ((exp 1) (exp 2)) ((exp 3) (exp 4)))) ; => #2A((1.0d0 2.0d0) ; (3.0d0 4.0d0))

binary operations

Binary operations combine two arrays element-by-element or broadcast a scalar across an array. Arrays must have matching dimensions:

;; Binary operations with equal-length vectors (e2+ (vec 'double-float 1 2 3) (vec 'double-float 4 5 6)) ; => #(5.0d0 7.0d0 9.0d0) ;; Binary operations with scalar broadcasting (e2* (vec 'double-float 2 3 4) 2.5d0) ; => #(5.0d0 7.5d0 10.0d0) ;; Binary operations on matrices (e2- (mx 'double-float (10 20) (30 40)) (mx 'double-float (1 2) (3 4))) ; => #2A((9.0d0 18.0d0) ; (27.0d0 36.0d0)) ;; Dimension mismatch signals an error (handler-case (e2+ (vec 'double-float 1 2 3) ; length 3 (vec 'double-float 4 5)) ; length 2 (error (e) (format nil "Error: ~A" e))) ; => "Error: Assertion failed: (EQUAL (ARRAY-DIMENSIONS A) (ARRAY-DIMENSIONS B))"

unary operators reference

Function Description Example
e1- Univariate elementwise - (e1- x)-x
e1/ Univariate elementwise / (e1/ x)1/x
e1log Univariate elementwise LOG (e1log x)log(x)
e1exp Univariate elementwise EXP (e1exp x)exp(x)
eabs Univariate elementwise ABS (eabs x) ≡ `
efloor Univariate elementwise FLOOR (efloor x)⌊x⌋
eceiling Univariate elementwise CEILING (eceiling x)⌈x⌉
eexp Univariate elementwise EXP (eexp x)e^x
esqrt Univariate elementwise SQRT (esqrt x)√x
econjugate Univariate elementwise CONJUGATE (econjugate x)x*
esquare Univariate elementwise SQUARE (esquare x)
esin Univariate elementwise SIN (esin x)sin(x)
ecos Univariate elementwise COS (ecos x)cos(x)
emod Univariate elementwise MOD (emod x)mod(x)

binary operators reference

Function Description Example
e2+ Bivariate elementwise + (e2+ a b)a + b
e2- Bivariate elementwise - (e2- a b)a - b
e2* Bivariate elementwise * (e2* a b)a × b
e2/ Bivariate elementwise / (e2/ a b)a ÷ b
e2log Bivariate elementwise LOG (e2log a b)log_b(a)
e2exp Bivariate elementwise EXPT (e2exp a b)a^b
eexpt Bivariate elementwise EXPT (eexpt a b)a^b
e2mod Bivariate elementwise MOD (e2mod a b)a mod b
e2< Bivariate elementwise < (e2< a b)a < b
e2<= Bivariate elementwise <= (e2<= a b)a ≤ b
e2> Bivariate elementwise > (e2> a b)a > b
e2>= Bivariate elementwise >= (e2>= a b)a ≥ b
e2= Bivariate elementwise = (e2= a b)a = b

variadic operations

The variadic operators (e+, e-, e*, e/) accept multiple arguments, applying the operation from left to right:

;; Multiple arguments with e+ (e+ (vec 'double-float 1 2 3) (vec 'double-float 4 5 6) (vec 'double-float 7 8 9)) ; => #(12.0d0 15.0d0 18.0d0) ;; Multiple arguments with e* (e* (vec 'double-float 2 3 4) (vec 'double-float 5 6 7) (vec 'double-float 8 9 10)) ; => #(80.0d0 162.0d0 280.0d0) ;; Single argument returns identity for e+ and e* (e+ (vec 'double-float 1 2 3)) ; => #(1.0d0 2.0d0 3.0d0) (e* (vec 'double-float 1 2 3)) ; => #(1.0d0 2.0d0 3.0d0) ;; Unary negation with e- (e- (vec 'double-float 1 2 3)) ; => #(-1.0d0 -2.0d0 -3.0d0) ;; Reciprocal with e/ (e/ (vec 'double-float 2 4 8)) ; => #(0.5d0 0.25d0 0.125d0)

special operations

  • elog: Provides both natural logarithm and logarithm with arbitrary base

    (elog (vec 'double-float 10 100)) ; Natural log ; => #(2.302... 4.605...) (elog (vec 'double-float 10 100) 10) ; Log base 10 ; => #(1.0d0 2.0d0)
  • ereduce: Applies a reduction function across all elements in row-major order

    (ereduce #'+ (mx 'double-float (1 2 3) (4 5 6))) ; => 21.0d0
  • emin/emax: Find the minimum or maximum element

    (emin (mx 'double-float (5 2 8) (1 9 3))) ; => 1.0d0 (emax (mx 'double-float (5 2 8) (1 9 3))) ; => 9.0d0

These elementwise operations provide a powerful and consistent interface for numerical computations, automatically handling type promotion and supporting both scalar and array arguments

Factorizations

Matrix factorizations are fundamental tools in numerical linear algebra that decompose a matrix into a product of simpler, structured matrices. These decompositions reveal important properties of the original matrix and enable efficient algorithms for solving systems of equations, computing eigenvalues, finding least-squares solutions, and performing other numerical operations.

LLA provides several key factorizations:

  • LU decomposition - Factors a matrix into lower and upper triangular matrices with row pivoting, used for solving linear systems and computing determinants
  • QR decomposition - Factors a matrix into an orthogonal matrix and an upper triangular matrix, essential for least-squares problems and eigenvalue algorithms
  • Cholesky decomposition - Factors a positive definite symmetric matrix into the product of a lower triangular matrix and its transpose, providing the most efficient method for solving positive definite systems
  • Spectral decomposition - Decomposes a symmetric matrix into its eigenvalues and eigenvectors, revealing the matrix’s fundamental structure
  • Singular Value Decomposition (SVD) - The most general factorization, decomposing any matrix into orthogonal matrices and a diagonal matrix of singular values, used for rank determination, pseudo-inverses, and data compression

Each factorization exploits specific mathematical properties to provide computational advantages. The factored forms often require less storage than the original matrix and enable specialized algorithms that are more numerically stable and computationally efficient than working with the original matrix directly.

lu

lu computes the LU factorization of a matrix with pivoting. The factorization represents $PA = LU$, where $P$ is a permutation matrix, $L$ is lower triangular with unit diagonal, and $U$ is upper triangular.

(let ((a (mx 'lla-double (1 2) (3 4)))) (lu a)) ; => #<LU ; ; L=#<LOWER-TRIANGULAR-MATRIX element-type DOUBLE-FLOAT ; 1.00000 . ; 0.33333 1.00000> ; ; U=#<UPPER-TRIANGULAR-MATRIX element-type DOUBLE-FLOAT ; 3.00000 4.00000 ; . 0.66667> ; ; pivot indices=#(2 2)>

lu-u

lu-u returns the upper triangular $U$ matrix from an LU factorization:

(let* ((a (mx 'lla-double (1 2) (3 4))) (lu-fact (lu a))) (lu-u lu-fact)) ; => #<UPPER-TRIANGULAR-MATRIX 2x2 ; 3.0d0 4.0d0 ; . 0.6666...>

lu-l

lu-l returns the lower triangular $L$ matrix from an LU factorization, with ones on the diagonal:

(let* ((a (mx 'lla-double (1 2) (3 4))) (lu-fact (lu a))) (lu-l lu-fact)) ; => #<LOWER-TRIANGULAR-MATRIX 2x2 ; 1.0d0 . ; 0.3333... 1.0d0>

ipiv

ipiv returns pivot indices in a format understood by SELECT, counting from 0. These indices show how rows were permuted during factorization:

(let* ((a (mx 'lla-double (1 2) (3 4))) (lu-fact (lu a))) (ipiv lu-fact)) ; => #(1 0)

The identity $(select; a; ipiv; t) \equiv (mm; (lu\text{-}l; lu\text{-}fact); (lu\text{-}u; lu\text{-}fact))$ holds.

ipiv-inverse

ipiv-inverse returns the inverted permutation indices. This allows you to reconstruct the original matrix from the factorization:

(let* ((a (mx 'lla-double (1 2) (3 4))) (lu-fact (lu a))) (ipiv-inverse lu-fact)) ; => #(1 0)

The identity $a \equiv (select; (mm; (lu\text{-}l; lu\text{-}fact); (lu\text{-}u; lu\text{-}fact)); ipiv\text{-}inverse; t)$ holds.

qr

qr computes the QR factorization of a matrix, where $Q$ is orthogonal and $R$ is upper triangular such that $A = QR$:

(let ((a (mx 'lla-double (1 2) (3 4)))) (qr a)) ; => #<QR 2x2>

qr-r

qr-r returns the upper triangular $R$ matrix from a QR factorization:

(let* ((a (mx 'lla-double (1 2) (3 4))) (qr-fact (qr a))) (qr-r qr-fact)) ; => #<UPPER-TRIANGULAR-MATRIX 2x2>

matrix-square-root

matrix-square-root is a general structure for representing $XX^T$ decompositions of matrices. The convention is to store $X$, the left square root:

(let ((x (mx 'lla-double (1 0) (2 1)))) (make-matrix-square-root x)) ; => #S(MATRIX-SQUARE-ROOT :LEFT #2A((1.0d0 0.0d0) (2.0d0 1.0d0)))

xx

xx is a convenience function to create a matrix-square-root from a left square root:

(xx (mx 'lla-double (1 0) (2 1))) ; => #S(MATRIX-SQUARE-ROOT :LEFT #2A((1.0d0 0.0d0) (2.0d0 1.0d0)))

left-square-root

left-square-root returns $X$ such that $XX^T = A$. For general matrix-square-root objects, it returns the stored left factor:

(let* ((a (hermitian-mx 'lla-double (2) (-1 2) (0 -1 2))) (chol (cholesky a))) (left-square-root chol)) ; => #<LOWER-TRIANGULAR-MATRIX 3x3 ; 1.414... . . ; -0.707... 1.224... . ; 0.0 -0.816... 1.154...>

right-square-root

right-square-root returns $Y$ such that $Y^T Y = A$. This is computed as the transpose of the left square root:

(let* ((a (hermitian-mx 'lla-double (2) (-1 2) (0 -1 2))) (chol (cholesky a))) (right-square-root chol)) ; => #<UPPER-TRIANGULAR-MATRIX 3x3 ; 1.414... -0.707... 0.0 ; . 1.224... -0.816... ; . . 1.154...>

cholesky

cholesky computes the Cholesky factorization of a positive definite Hermitian matrix. It returns a lower triangular matrix $L$ such that $A = LL^T$:

(let ((a (hermitian-mx 'lla-double (2) (-1 2) (0 -1 2)))) (cholesky a)) ; => #S(CHOLESKY :LEFT #<LOWER-TRIANGULAR-MATRIX 3x3 ; 1.414... . . ; -0.707... 1.224... . ; 0.0 -0.816... 1.154...>)

hermitian-factorization

hermitian-factorization computes a factorization for indefinite Hermitian matrices with pivoting. This is used internally for solving systems with Hermitian matrices that may not be positive definite.

spectral-factorization

spectral-factorization computes the eigenvalue decomposition of a Hermitian matrix, returning a structure containing eigenvectors ($Z$) and eigenvalues ($W$) such that $A = ZWZ^T$:

(let* ((a (mm t (mx 'lla-double (1 2) (3 4)))) (sf (spectral-factorization a))) sf) ; => #S(SPECTRAL-FACTORIZATION ; :Z #2A((-0.8174... 0.5760...) ; (0.5760... 0.8174...)) ; :W #<DIAGONAL-MATRIX 2x2 ; 0.1339... . ; . 29.866...>)

spectral-factorization-w

spectral-factorization-w returns the diagonal matrix $W$ of eigenvalues from a spectral factorization:

(let* ((a (mm t (mx 'lla-double (1 2) (3 4)))) (sf (spectral-factorization a))) (spectral-factorization-w sf)) ; => #<DIAGONAL-MATRIX 2x2 ; 0.1339... . ; . 29.866...>

spectral-factorization-z

spectral-factorization-z returns the matrix $Z$ of eigenvectors from a spectral factorization:

(let* ((a (mm t (mx 'lla-double (1 2) (3 4)))) (sf (spectral-factorization a))) (spectral-factorization-z sf)) ; => #2A((-0.8174... 0.5760...) ; (0.5760... 0.8174...))

svd

svd computes the singular value decomposition of a matrix $A = U\Sigma V^T$, where $U$ and $V$ are orthogonal matrices and $\Sigma$ is diagonal with non-negative entries:

(let ((a (mx 'lla-double (0 1) (2 3) (4 5)))) (svd a)) ; => #S(SVD ; :U #2A((-0.1081... 0.9064...) ; (-0.4873... 0.3095...) ; (-0.8664... -0.2872...)) ; :D #<DIAGONAL-MATRIX 2x2 ; 7.386... . ; . 0.6632...> ; :VT #2A((-0.6011... -0.7991...) ; (-0.7991... 0.6011...)))

svd-u

svd-u returns the left singular vectors ($U$ matrix) from an SVD:

(let* ((a (mx 'lla-double (0 1) (2 3) (4 5))) (svd-result (svd a))) (svd-u svd-result)) ; => #2A((-0.1081... 0.9064...) ; (-0.4873... 0.3095...) ; (-0.8664... -0.2872...))

svd-d

svd-d returns the diagonal matrix $\Sigma$ of singular values from an SVD, in descending order:

(let* ((a (mx 'lla-double (0 1) (2 3) (4 5))) (svd-result (svd a))) (svd-d svd-result)) ; => #<DIAGONAL-MATRIX 2x2 ; 7.386... . ; . 0.6632...>

svd-vt

svd-vt returns the right singular vectors ($V^T$ matrix) from an SVD:

(let* ((a (mx 'lla-double (0 1) (2 3) (4 5))) (svd-result (svd a))) (svd-vt svd-result)) ; => #2A((-0.6011... -0.7991...) ; (-0.7991... 0.6011...))

Linear Algebra

This section covers the core linear algebra operations provided by LLA. These functions leverage BLAS and LAPACK for efficient numerical computations while providing a convenient Lisp interface.

multiplication

mm performs matrix multiplication of two arrays. It handles matrices, vectors, and special matrix types, automatically selecting the appropriate operation based on the arguments’ dimensions.

When given two matrices, it computes their product:

(let ((a (mx 'lla-double (1 2) (3 4) (5 6))) (i2 (mx 'lla-double (1 0) (0 1)))) (mm a i2)) ; => #2A((1.0d0 2.0d0) ; (3.0d0 4.0d0) ; (5.0d0 6.0d0))

Matrix-vector multiplication produces a vector:

(let ((a (mx 'lla-double (1 2) (3 4) (5 6))) (b2 (vec 'lla-double 1 2))) (mm a b2)) ; => #(5.0d0 11.0d0 17.0d0)

Vector-matrix multiplication (row vector times matrix):

(let ((b3 (vec 'lla-double 1 2 3)) (a (mx 'lla-double (1 2) (3 4) (5 6)))) (mm b3 a)) ; => #(22.0d0 28.0d0)

The dot product of two vectors returns a scalar:

(let ((a (vec 'lla-double 2 3 5)) (b (vec 'lla-complex-double 1 #C(2 1) 3))) (mm a b)) ; => #C(31.0d0 -3.0d0)

Special handling for transpose operations using the symbol t:

;; A * A^T (let ((a (mx 'lla-double (1 2) (3 4)))) (mm a t)) ; => #<HERMITIAN-MATRIX 2x2 ; 5.0d0 . ; 11.0d0 25.0d0> ;; A^T * A (let ((a (mx 'lla-double (1 2) (3 4)))) (mm t a)) ; => #<HERMITIAN-MATRIX 2x2 ; 10.0d0 . ; 14.0d0 20.0d0>

multiple matrix multiply

mmm multiplies multiple matrices from left to right. This is more efficient than repeated calls to mm:

(mmm a b c ... z) ; equivalent to (mm (mm (mm a b) c) ... z)

outer

outer computes the outer product of two vectors, returning $column(a) \times row(b)^H$:

(let ((a (vec 'lla-double 2 3)) (b (vec 'lla-complex-double 1 #C(2 1) 9))) (outer a b)) ; => #2A((#C(2.0d0 0.0d0) #C(4.0d0 -2.0d0) #C(18.0d0 0.0d0)) ; (#C(3.0d0 0.0d0) #C(6.0d0 -3.0d0) #C(27.0d0 0.0d0)))

When the second argument is t, it computes the outer product with itself:

(outer (vec 'lla-double 2 3) t) ; => #<HERMITIAN-MATRIX 2x2 ; 4.0d0 . ; 6.0d0 9.0d0>

solve

solve is LLA’s general-purpose linear system solver that finds the solution vector $\mathbf{x}$ to the matrix equation:

$$A\mathbf{x} = \mathbf{b}$$

where $A$ is an $n \times n$ coefficient matrix, $\mathbf{b}$ is the right-hand side vector (or matrix for multiple systems), and $\mathbf{x}$ is the unknown solution vector. The function intelligently dispatches to the most appropriate algorithm based on the matrix type and structure, ensuring both numerical stability and computational efficiency.

*# Algorithm Selection Strategy

LLA automatically selects the optimal solving strategy:

  • General matrices: LU decomposition with partial pivoting ($PA = LU$)
  • Symmetric positive definite: Cholesky decomposition ($A = LL^T$)
  • Triangular matrices: Forward/backward substitution ($O(n^2)$ complexity)
  • Diagonal matrices: Element-wise division ($O(n)$ complexity)
  • Pre-factored matrices: Direct substitution using stored decomposition

*# Basic Linear System Solving

For general square matrices, solve uses LU decomposition with partial pivoting to handle the system $A\mathbf{x} = \mathbf{b}$:

(let* ((a (mx 'lla-double (1 2) (3 4))) (x (vec 'lla-double 5 6)) (b (mm a x))) ; Create b = Ax for verification (solve a b)) ; => #(5.0d0 6.0d0)

The mathematical process involves:

  1. Factor $A = PLU$ where $P$ is a permutation matrix
  2. Solve $L\mathbf{y} = P\mathbf{b}$ by forward substitution
  3. Solve $U\mathbf{x} = \mathbf{y}$ by backward substitution

*# Efficient Solving with Pre-computed Factorizations

When solving multiple systems with the same coefficient matrix, pre-computing the factorization is much more efficient. The LU factorization $PA = LU$ needs to be computed only once:

(let* ((a (mx 'lla-double (1 2) (3 4))) (lu-fact (lu a)) ; Factor once: PA = LU (b (vec 'lla-double 17 39))) (solve lu-fact b)) ; Reuse factorization ; => #(5.0d0 6.0d0)

This approach reduces the complexity from $O(n^3)$ for each solve to $O(n^2)$, making it ideal for applications requiring multiple solves with the same matrix.

*# Triangular System Solvers

For triangular matrices, solve uses specialized algorithms that exploit the matrix structure. Upper triangular systems $U\mathbf{x} = \mathbf{b}$ are solved by backward substitution:

$$x_i = \frac{b_i - \sum_{j=i+1}^{n} u_{ij}x_j}{u_{ii}}$$

(let ((u (upper-triangular-mx 'lla-double (1 2) (0 3))) (b (vec 'lla-double 5 7))) (solve u b)) ; => #(1.0d0 2.333...d0)

Similarly, lower triangular systems use forward substitution with complexity $O(n^2)$.

*# Cholesky Solvers for Positive Definite Systems

For symmetric positive definite matrices, the Cholesky decomposition $A = LL^T$ provides the most efficient and numerically stable solution method:

(let* ((a (hermitian-mx 'lla-double (2) (-1 2) (0 -1 2))) ; Tridiagonal positive definite matrix (b (vec 'lla-double 5 7 13)) (chol (cholesky a))) (solve chol b)) ; => #(3.5d0 7.5d0 10.25d0)

The solution process involves:

  1. Decompose $A = LL^T$ (Cholesky factorization)
  2. Solve $L\mathbf{y} = \mathbf{b}$ by forward substitution
  3. Solve $L^T\mathbf{x} = \mathbf{y}$ by backward substitution

This method is approximately twice as fast as LU decomposition and uses half the storage, making it the preferred approach for positive definite systems arising in optimization, statistics, and numerical PDEs.

*# Multiple Right-Hand Sides

The solve function handles matrix right-hand sides $AX = B$ where $B$ contains multiple column vectors:

(let ((a (mx 'lla-double (2 1) (1 2))) (b (mx 'lla-double (3 7) (2 8)))) (solve a b)) ; => #2A((1.333... 2.0d0) ; (0.333... 4.0d0))

Each column of the result matrix contains the solution to $A\mathbf{x}_i = \mathbf{b}_i$, solved simultaneously using optimized LAPACK routines.

invert

invert computes the matrix inverse $A^{-1}$ such that $A A^{-1} = A^{-1} A = I$, where $I$ is the identity matrix. While mathematically fundamental, explicit matrix inversion should generally be avoided in favor of solve for numerical linear algebra applications.

*# Why Avoid Explicit Inversion?

Computing $A^{-1}$ and then multiplying $A^{-1}\mathbf{b}$ to solve $A\mathbf{x} = \mathbf{b}$ is both computationally expensive and numerically unstable compared to direct solving methods:

Computational Cost:

  • Matrix inversion: $O(\frac{2}{3}n^3)$ operations
  • Matrix-vector multiplication: $O(n^2)$ operations
  • Total for inversion approach: $O(\frac{2}{3}n^3 + n^2)$
  • Direct solve: $O(\frac{2}{3}n^3)$ operations (same factorization cost, no extra multiplication)

Numerical Stability: Direct solving typically achieves machine precision, while the inversion approach can amplify rounding errors, especially for ill-conditioned matrices.

*# Comparison: Explicit Inversion vs Direct Solving

Let’s compare both approaches using the same system $A\mathbf{x} = \mathbf{b}$:

(let* ((a (mx 'lla-double (1 2) (3 4))) (b (vec 'lla-double 5 11))) ;; Method 1: Explicit inversion (NOT recommended) (let ((a-inv (invert a))) (format t "Method 1 - Explicit inversion:~%") (format t " Inverse matrix: ~A~%" a-inv) (format t " Solution via inverse: ~A~%~%" (mm a-inv b))) ;; Method 2: Direct solving (RECOMMENDED) (format t "Method 2 - Direct solving:~%") (format t " Solution: ~A~%" (solve a b))) ; Output: ; Method 1 - Explicit inversion: ; Inverse matrix: #2A((-2.0d0 1.0d0) (1.5d0 -0.5d0)) ; Solution via inverse: #(1.0d0 3.0d0) ; ; Method 2 - Direct solving: ; Solution: #(1.0d0 3.0d0)

Both methods produce identical results, but solve is more efficient and numerically stable. For a more programmatic comparison:

;; Compare the results programmatically (let* ((a (mx 'lla-double (1 2) (3 4))) (b (vec 'lla-double 5 11)) (x-inverse (mm (invert a) b)) (x-solve (solve a b))) (equalp x-inverse x-solve)) ; => T ; Both methods give the same result

*# Basic Matrix Inversion

For educational purposes or when the inverse itself is needed (rare in practice):

(let ((m (mx 'lla-double (1 2) (3 4)))) (invert m)) ; => #2A((-2.0d0 1.0d0) ; (1.5d0 -0.5d0)) ;; Verification: A * A^(-1) = I (let* ((a (mx 'lla-double (1 2) (3 4))) (a-inv (invert a))) (mm a a-inv)) ; => #2A((1.0d0 0.0d0) ; (0.0d0 1.0d0))

*# Structured Matrix Inversions

Inverting triangular matrices preserves their structure and is computationally efficient:

(invert (upper-triangular-mx 'lla-double (1 2) (0 4))) ; => #<UPPER-TRIANGULAR-MATRIX 2x2 ; 1.0d0 -0.5d0 ; . 0.25d0>

For upper triangular matrices $U$, the inverse $(U^{-1})_{ij}$ can be computed by back-substitution, maintaining the triangular structure.

*# Pseudoinverse for Singular Matrices

When a matrix is singular (determinant = 0) or nearly singular, the standard inverse doesn’t exist. The Moore-Penrose pseudoinverse $A^+$ generalizes the concept of inversion to non-invertible matrices.

Mathematical Definition: For any matrix $A \in \mathbb{R}^{m \times n}$, the pseudoinverse $A^+ \in \mathbb{R}^{n \times m}$ satisfies:

  1. $A A^+ A = A$ (generalized inverse property)
  2. $A^+ A A^+ = A^+$ (reflexive property)
  3. $(A A^+)^T = A A^+$ (symmetry of $AA^+$)
  4. $(A^+ A)^T = A^+ A$ (symmetry of $A^+A$)

SVD Construction: If $A = U\Sigma V^T$ is the singular value decomposition, then: $$A^+ = V\Sigma^+ U^T$$ where $\Sigma^+$ is formed by taking the reciprocal of non-zero singular values and transposing.

*# Diagonal Matrix Pseudoinverse

For diagonal matrices, the pseudoinverse is particularly intuitive. Elements below a specified tolerance are treated as zero:

;; Singular diagonal matrix (contains zero) (let ((d (diagonal-mx 'lla-double 2 0 3))) (invert d :tolerance 0.1)) ; => #<DIAGONAL-MATRIX 3x3 ; 0.5d0 . . ; . 0.0d0 . ; . . 0.333...d0>

Mathematical interpretation:

  • $d_{11} = 2 > 0.1 \Rightarrow (d^+)_{11} = 1/2 = 0.5$
  • $d_{22} = 0 < 0.1 \Rightarrow (d^+)_{22} = 0$ (treated as zero)
  • $d_{33} = 3 > 0.1 \Rightarrow (d^+)_{33} = 1/3 \approx 0.333$

This tolerance-based approach prevents division by near-zero values, which would create numerically unstable results.

*# Nearly Singular Systems Example

The pseudoinverse is particularly useful for rank-deficient systems arising in least-squares problems:

;; Create a rank-deficient matrix (columns are linearly dependent) (let* ((a (mx 'lla-double (1 2 3) ; Third column = first + second (4 5 9) (7 8 15))) (b (vec 'lla-double 6 18 30))) ;; For rank-deficient systems, use least-squares ;; which handles the pseudoinverse internally (least-squares b a)) ; => #(1.0d0 2.0d0 0.0d0) ; coefficients (non-unique solution) ; 0.0d0 ; sum of squares (perfect fit) ; 0 ; degrees of freedom ; #<QR 3x3> ; QR decomposition

For rank-deficient systems, the least-squares solution using the pseudoinverse provides the minimum-norm solution among all possible solutions that minimize $||A\mathbf{x} - \mathbf{b}||_2$. This is essential for handling over-parameterized systems in machine learning and statistics.

Key properties of the pseudoinverse:

  • For full-rank square matrices: $A^+ = A^{-1}$
  • For full column rank matrices: $A^+ = (A^T A)^{-1} A^T$ (left inverse)
  • For full row rank matrices: $A^+ = A^T (A A^T)^{-1}$ (right inverse)
  • Always exists and is unique for any matrix

least-squares

least-squares solves overdetermined systems in the least squares sense, finding the best-fit solution that minimizes the sum of squared residuals. This is fundamental for regression analysis, curve fitting, and many practical applications where you have more observations than unknown parameters.

*# Mathematical Foundation

Given a system $X\mathbf{b} = \mathbf{y}$ where:

  • $X$ is an $m \times n$ design matrix with $m > n$ (more rows than columns)
  • $\mathbf{y}$ is an $m \times 1$ response vector
  • $\mathbf{b}$ is the $n \times 1$ vector of unknown coefficients

The least squares solution minimizes: $$||\mathbf{y} - X\mathbf{b}||2^2 = \sum{i=1}^{m} (y_i - \mathbf{x}_i^T\mathbf{b})^2$$

The solution is given by the normal equations: $$\mathbf{b} = (X^T X)^{-1} X^T \mathbf{y}$$

However, LLA uses the more numerically stable QR decomposition approach, factoring $X = QR$ where $Q$ is orthogonal and $R$ is upper triangular.

*# Return Values

The function returns four values:

  1. Coefficient vector $\mathbf{b}$ - the best-fit parameters
  2. Sum of squared residuals - $||\mathbf{y} - X\mathbf{b}||_2^2$
  3. Degrees of freedom - $m - n$ (observations minus parameters)
  4. QR decomposition - can be reused for further computations

*# Simple Example

(let ((x (mx 'lla-double (1 23) ; observation 1: intercept, x-value (1 22) ; observation 2 (1 25) ; observation 3 (1 29) ; observation 4 (1 24))) ; observation 5 (y (vec 'lla-double 67 63 65 94 84))) (least-squares y x)) ; => #(-24.9795... 4.0479...) ; coefficients [intercept, slope] ; 270.7329... ; sum of squared residuals ; 3 ; degrees of freedom ; #<QR {1006A69913}> ; QR decomposition

This fits the linear model $y = -24.98 + 4.05x$.

*# Real-Life Example: Predicting House Prices

Let’s model house prices based on size (square feet) and number of bedrooms:

;; House data: [size_sqft, bedrooms] -> price_thousands ;; Design matrix includes intercept term (let* ((x (mx 'lla-double ;; intercept size(1000s) bedrooms (1 1.2 2) ; 1200 sqft, 2 bed -> €280k (1 1.5 3) ; 1500 sqft, 3 bed -> €320k (1 1.8 3) ; 1800 sqft, 3 bed -> €360k (1 2.0 4) ; 2000 sqft, 4 bed -> €400k (1 2.2 4) ; 2200 sqft, 4 bed -> €430k (1 1.0 1) ; 1000 sqft, 1 bed -> €220k (1 2.5 5) ; 2500 sqft, 5 bed -> €480k (1 1.6 3))) ; 1600 sqft, 3 bed -> €340k (y (vec 'lla-double 280 320 360 400 430 220 480 340)) (coefficients (least-squares y x))) ;; Display coefficients (format t "Model: price = ~,2f + ~,2f*size + ~,2f*bedrooms~%" (aref coefficients 0) (aref coefficients 1) (aref coefficients 2)) ;; Make a prediction (let* ((new-house (vec 'lla-double 1 1.7 3)) ; 1700 sqft, 3 bedrooms (prediction (mm new-house coefficients))) (format t "Predicted price for 1700 sqft, 3 bedroom: €~,0f~%" (* prediction 1000)))) ; Output: ; Model: price = 91.75 + 113.14*size + 21.39*bedrooms ; Predicted price for 1700 sqft, 3 bedroom: €348248

The model shows that each additional 1000 square feet adds approximately €113,140 to the price, and each bedroom adds about €21,390.

*# Handling Rank-Deficient Matrices

When predictors are linearly dependent, the design matrix becomes rank-deficient. The QR decomposition can detect and handle this:

;; Example with perfect multicollinearity (let* ((x (mx 'lla-double (1 2 3) ; x3 = x1 + x2 (perfect collinearity) (1 3 4) (1 4 5) (1 5 6))) (y (vec 'lla-double 10 12 14 16))) (multiple-value-bind (coeffs ssr df qr) (least-squares y x) (declare (ignore ssr qr)) ; Suppress unused variable warnings (format t "Coefficients: ~A~%" coeffs) (format t "Degrees of freedom: ~A~%" df))) ; Output: ; Coefficients: #(6.0d0 2.0d0 -0.0d0) ; Degrees of freedom: 1

Note that one coefficient is effectively zero due to the linear dependency, and the degrees of freedom reflect the rank deficiency.

*# Computing Standard Errors

Standard errors measure the precision of estimated regression coefficients. They quantify the uncertainty in our parameter estimates due to sampling variability. Smaller standard errors indicate more precise estimates, while larger ones suggest greater uncertainty.

What are Standard Errors?

When we fit a linear model $y = \beta_0 + \beta_1 x + \epsilon$, the coefficients $\hat{\beta}_0$ and $\hat{\beta}_1$ are estimates based on our sample data. If we collected a different sample, we’d get slightly different estimates. The standard error measures this variability - it’s the standard deviation of the sampling distribution of the coefficient.

Why Standard Errors Matter:

  • Confidence intervals: We use them to construct confidence intervals (e.g., $\hat{\beta} \pm 1.96 \times SE$ for 95% CI)
  • Hypothesis testing: The t-statistic $t = \hat{\beta}/SE$ tests if a coefficient is significantly different from zero
  • Model assessment: Large standard errors relative to coefficients suggest unstable estimates

Example: Analyzing a Linear Trend

Let’s fit a line to some data and compute standard errors to understand the precision of our estimates:

(let* ((x (mx 'lla-double (1 1) ; Year 1 (1 2) ; Year 2 (1 3) ; Year 3 (1 4) ; Year 4 (1 5))) ; Year 5 (y (vec 'lla-double 2.1 3.9 6.1 8.0 9.9)) ; Sales in millions ;; Fit the model (results (multiple-value-list (least-squares y x))) (coefficients (first results)) (ssr (second results)) (df (third results))) ;; Calculate standard errors using classical formulas (let* ((mse (/ ssr df)) ; Mean squared error (residual variance) (n (length y)) ; Number of observations (x-values '(1 2 3 4 5)) ; Extract x values (x-bar (/ (reduce #'+ x-values) n)) ; Mean of x (sxx (reduce #'+ ; Sum of squared deviations (mapcar (lambda (xi) (expt (- xi x-bar) 2)) x-values))) ;; Standard error formulas for simple linear regression (se-slope (sqrt (/ mse sxx))) (se-intercept (sqrt (* mse (+ (/ 1 n) (/ (expt x-bar 2) sxx)))))) (format t "Linear Regression Analysis~%") (format t "=========================~%~%") ;; Report coefficients with standard errors (format t "Parameter Estimates:~%") (format t " Intercept: ~,3f (SE = ~,3f)~%" (aref coefficients 0) se-intercept) (format t " Slope: ~,3f (SE = ~,3f)~%~%" (aref coefficients 1) se-slope) ;; Compute t-statistics (let ((t-intercept (/ (aref coefficients 0) se-intercept)) (t-slope (/ (aref coefficients 1) se-slope))) (format t "Significance Tests (H0: parameter = 0):~%") (format t " Intercept: t = ~,2f~%" t-intercept) (format t " Slope: t = ~,2f (highly significant)~%~%" t-slope)) ;; Construct confidence intervals (format t "95% Confidence Intervals:~%") (format t " Intercept: [~,3f, ~,3f]~%" (- (aref coefficients 0) (* 1.96 se-intercept)) (+ (aref coefficients 0) (* 1.96 se-intercept))) (format t " Slope: [~,3f, ~,3f]~%~%" (- (aref coefficients 1) (* 1.96 se-slope)) (+ (aref coefficients 1) (* 1.96 se-slope))) ;; Interpretation (format t "Interpretation:~%") (format t " Sales increase by ~,3f ± ~,3f million per year~%" (aref coefficients 1) (* 1.96 se-slope)) (format t " We are 95% confident the true annual increase~%") (format t " is between ~,3f and ~,3f million~%" (- (aref coefficients 1) (* 1.96 se-slope)) (+ (aref coefficients 1) (* 1.96 se-slope))))) ; Output: ; Linear Regression Analysis ; ========================= ; ; Parameter Estimates: ; Intercept: 0.090 (SE = 0.107) ; Slope: 1.970 (SE = 0.032) ; ; Significance Tests (H0: parameter = 0): ; Intercept: t = 0.84 ; Slope: t = 61.14 (highly significant) ; ; 95% Confidence Intervals: ; Intercept: [-0.119, 0.299] ; Slope: [1.907, 2.033] ; ; Interpretation: ; Sales increase by 1.970 ± 0.063 million per year ; We are 95% confident the true annual increase ; is between 1.907 and 2.033 million

Key Insights from Standard Errors:

  1. Slope precision: The small standard error (0.032) relative to the slope (1.970) indicates a very precise estimate of the yearly trend
  2. Intercept uncertainty: The larger relative standard error for the intercept shows more uncertainty about the starting value
  3. Statistical significance: The t-statistic of 61.14 for the slope (much larger than 2) confirms a highly significant upward trend
  4. Practical interpretation: We can confidently state that sales are increasing by approximately 2 million per year

This analysis demonstrates how standard errors transform point estimates into interval estimates, enabling us to quantify and communicate the uncertainty inherent

*# Performance Considerations

The QR decomposition approach used by least-squares:

  • Is more numerically stable than the normal equations
  • Handles rank-deficient matrices better
  • Has complexity $O(mn^2)$ for an $m \times n$ matrix
  • Can be reused for multiple right-hand sides

For very large systems, consider:

  • Centering and scaling predictors to improve numerical stability
  • Using incremental/online algorithms for streaming data
  • Regularization methods (ridge, lasso) for high-dimensional problems

The least squares method forms the foundation for linear regression, polynomial fitting, and many machine learning algorithms, making it one of the most important tools in numerical computing.

invert-xx

invert-xx calculates $(X^T X)^{-1}$ from a QR decomposition, which is essential for computing variance estimates and standard errors in regression analysis. This function is particularly useful because it leverages the numerical stability of the QR factorization to compute the inverse of the normal matrix $(X^T X)$ without explicitly forming it.

Mathematical Background:

For a matrix $X$ with QR decomposition $X = QR$, we have: $$X^T X = (QR)^T (QR) = R^T Q^T Q R = R^T R$$

Therefore: $$(X^T X)^{-1} = (R^T R)^{-1} = R^{-1} (R^T)^{-1}$$

The invert-xx function computes this efficiently using the upper triangular structure of $R$.

Usage Example:

(let* ((x (mx 'lla-double (1 1) (1 2) (1 3))) (y (vec 'lla-double 2 4 5)) (qr (fourth (multiple-value-list (least-squares y x))))) (invert-xx qr)) ; => #S(MATRIX-SQUARE-ROOT :LEFT #<UPPER-TRIANGULAR-MATRIX element-type DOUBLE-FLOAT ; -0.57735 1.41421 ; . -0.70711>)

Understanding the Output:

The result is a MATRIX-SQUARE-ROOT structure, which represents the Cholesky-like factorization of $(X^T X)^{-1}$. The structure contains:

  • :LEFT field: An upper triangular matrix $U$ such that $U^T U = (X^T X)^{-1}$
  • Element values: The numerical entries showing the factorized form
  • Structure definition: This comes from the LLA source code’s internal representation for matrix square roots

This factored form is computationally efficient for subsequent operations like computing standard errors, where you need the diagonal elements of $(X^T X)^{-1}$ multiplied by the residual variance.

eigenvalues

eigenvalues returns the eigenvalues of a Hermitian matrix. Eigenvalues are scalar values $\lambda$ that satisfy the equation $A\mathbf{v} = \lambda\mathbf{v}$, where $\mathbf{v}$ is the corresponding eigenvector.

For Hermitian matrices (symmetric for real matrices), eigenvalues have special properties:

  • All eigenvalues are real numbers
  • Eigenvectors corresponding to different eigenvalues are orthogonal
  • The matrix can be diagonalized as $A = Q\Lambda Q^T$ where $Q$ contains orthonormal eigenvectors and $\Lambda$ is diagonal with eigenvalues

These properties make Hermitian matrices particularly important in applications like principal component analysis, quantum mechanics, and optimization.

(eigenvalues (hermitian-mx 'lla-double (1) (2 4))) ; => #(0.0d0 5.0d0)

The optional abstol parameter controls the absolute error tolerance for eigenvalue computation.

logdet

logdet computes the logarithm of the determinant efficiently, returning the log absolute value and sign separately:

(logdet (mx 'lla-double (1 0) (-1 (exp 1d0)))) ; => 1.0d0 ; log|det| ; 1 ; sign

det

det computes the determinant of a matrix:

(det (mx 'lla-double (1 2) (3 4))) ; => -2.0d0

For numerical stability with large determinants, use logdet instead.

tr

tr computes the trace (sum of diagonal elements) of a square matrix:

(tr (mx 'lla-double (1 2) (3 4))) ; => 5.0d0 (tr (diagonal-mx 'lla-double 2 15)) ; => 17.0d0

These operations form the foundation for most numerical linear algebra computations. They are optimized to work with LLA’s specialized matrix types and automatically dispatch to the most efficient BLAS/LAPACK routines based on the matrix properties.

BLAS

The Basic Linear Algebra Subprograms (BLAS) are a collection of low-level routines that provide standard building blocks for performing basic vector and matrix operations.

These functions operate directly on arrays and modify them in-place (functions ending with !). They require careful attention to array dimensions, strides, and memory layout. Use these functions when you need maximum performance and control over the computation, or when implementing custom linear algebra algorithms.

gemm!

gemm! performs the general matrix-matrix multiplication: C = α·A’·B’ + β·C, where A’ and B’ can be either the original matrices or their transposes. This is one of the most important BLAS operations, forming the computational kernel for many linear algebra algorithms.

The function modifies C in-place and returns it. The operation performed is:

  • If transpose-a? is NIL and transpose-b? is NIL: C = α·A·B + β·C
  • If transpose-a? is T and transpose-b? is NIL: C = α·A^T·B + β·C
  • If transpose-a? is NIL and transpose-b? is T: C = α·A·B^T + β·C
  • If transpose-a? is T and transpose-b? is T: C = α·A^T·B^T + β·C

Where:

  • A’ is an M×K matrix (A or A^T)
  • B’ is a K×N matrix (B or B^T)
  • C is an M×N matrix
;; Basic matrix multiplication: C = 2·A·B + 3·C (let ((a (mx 'lla-double (1 2 3) (4 5 6))) (b (mx 'lla-double (7 8) (9 10) (11 12))) (c (mx 'lla-double (0.5 1.5) (2.5 3.5)))) (gemm! 2d0 a b 3d0 c)) ; => #2A((118.5d0 140.5d0) ; (298.5d0 356.5d0)) ;; With transposed A: C = A^T·B (let ((a (mx 'lla-double (1 4) (2 5) (3 6))) (b (mx 'lla-double (7 8) (9 10))) (c (mx 'lla-double (0 0) (0 0) (0 0)))) (gemm! 1d0 a b 0d0 c :transpose-a? t)) ; => #2A((43.0d0 48.0d0) ; (56.0d0 63.0d0) ; (69.0d0 78.0d0))

The optional parameters allow working with submatrices:

  • m, n, k - dimensions of the operation (default: inferred from C and A/B)
  • lda, ldb, ldc - leading dimensions for matrices stored in larger arrays

scal!

scal! computes the product of a vector by a scalar: x ← α·x, modifying the vector in-place.

(let ((x (vec 'lla-double 1 2 3 4 5))) (scal! 2.5d0 x)) ; => #(2.5d0 5.0d0 7.5d0 10.0d0 12.5d0) ;; Scale every other element using incx (let ((x (vec 'lla-double 1 2 3 4 5 6))) (scal! 0.5d0 x :n 3 :incx 2)) ; => #(0.5d0 2.0d0 1.5d0 4.0d0 2.5d0 6.0d0)

Parameters:

  • alpha - the scalar multiplier
  • x - the vector to scale (modified in-place)
  • n - number of elements to process (default: all elements considering incx)
  • incx - stride between elements (default: 1)

axpy!

axpy! computes a vector-scalar product and adds the result to a vector: y ← α·x + y, modifying y in-place.

(let ((x (vec 'lla-double 1 2 3)) (y (vec 'lla-double 4 5 6))) (axpy! 2d0 x y)) ; => #(6.0d0 9.0d0 12.0d0) ; y = 2·x + y ;; Using strides (let ((x (vec 'lla-double 1 2 3 4 5 6)) (y (vec 'lla-double 10 20 30 40 50 60))) (axpy! 3d0 x y :n 3 :incx 2 :incy 2)) ; => #(13.0d0 20.0d0 39.0d0 40.0d0 65.0d0 60.0d0)

Parameters:

  • alpha - the scalar multiplier
  • x - the source vector
  • y - the destination vector (modified in-place)
  • n - number of elements to process
  • incx, incy - strides for x and y (default: 1)

copy!

copy! copies elements from vector x to vector y, modifying y in-place.

(let ((x (vec 'lla-double 1 2 3 4 5)) (y (vec 'lla-double 0 0 0 0 0))) (copy! x y)) ; => #(1.0d0 2.0d0 3.0d0 4.0d0 5.0d0) ;; Copy with strides (let ((x (vec 'lla-double 1 2 3 4 5 6)) (y (vec 'lla-double 0 0 0 0 0 0))) (copy! x y :n 3 :incx 2 :incy 1)) ; => #(1.0d0 3.0d0 5.0d0 0.0d0 0.0d0 0.0d0)

Parameters:

  • x - the source vector
  • y - the destination vector (modified in-place)
  • n - number of elements to copy
  • incx, incy - strides for x and y (default: 1)

dot

dot computes the dot product of two vectors, returning a scalar value.

(let ((x (vec 'lla-double 1 2 3)) (y (vec 'lla-double 4 5 6))) (dot x y)) ; => 32.0d0 ; 1·4 + 2·5 + 3·6 ;; Dot product with strides (let ((x (vec 'lla-double 1 2 3 4 5)) (y (vec 'lla-double 10 20 30 40 50))) (dot x y :n 3 :incx 2 :incy 1)) ; => 140.0d0 ; 1·10 + 3·20 + 5·30

Parameters:

  • x, y - the vectors
  • n - number of elements to process
  • incx, incy - strides for x and y (default: 1)

Note: Complex dot products are not available in all BLAS implementations.

asum

asum returns the sum of absolute values of vector elements (L1 norm).

(let ((x (vec 'lla-double 1 -2 3 -4 5))) (asum x)) ; => 15.0d0 ; |1| + |-2| + |3| + |-4| + |5| ;; L1 norm with stride (let ((x (vec 'lla-double 1 -2 3 -4 5 -6))) (asum x :n 3 :incx 2)) ; => 9.0d0 ; |1| + |3| + |5| ;; Complex vectors: sum of |real| + |imag| (let ((x (vec 'lla-complex-double #C(3 4) #C(-5 12) #C(0 -8)))) (asum x)) ; => 37.0d0 ; (3+4) + (5+12) + (0+8)

Parameters:

  • x - the vector
  • n - number of elements to process
  • incx - stride between elements (default: 1)

nrm2

nrm2 returns the Euclidean norm (L2 norm) of a vector: √(Σ|xi|²).

(let ((x (vec 'lla-double 3 4))) (nrm2 x)) ; => 5.0d0 ; √(3² + 4²) (let ((x (vec 'lla-double 1 2 3 4 5))) (nrm2 x)) ; => 7.416...d0 ; √(1² + 2² + 3² + 4² + 5²) ;; L2 norm with stride (let ((x (vec 'lla-double 3 0 4 0 0 0))) (nrm2 x :n 2 :incx 2)) ; => 5.0d0 ; √(3² + 4²) ;; Complex vectors (let ((x (vec 'lla-complex-double #C(3 4) #C(0 0) #C(0 0)))) (nrm2 x)) ; => 5.0d0 ; √(|3+4i|²) = √25

Parameters:

  • x - the vector
  • n - number of elements to process
  • incx - stride between elements (default: 1)

These BLAS functions provide the fundamental building blocks for linear algebra operations. They are highly optimized and form the computational core of higher-level operations like matrix multiplication (mm), solving linear systems (solve), and computing factorizations. The in-place operations (marked with !) modify their arguments directly, providing memory-efficient computation for large-scale numerical applications.

5.6 - Select

Selecting Cartesian subsets of data

Overview

Select provides:

  1. An API for taking slices (elements selected by the Cartesian product of vectors of subscripts for each axis) of array-like objects. The most important function is select. Unless you want to define additional methods for select, this is pretty much all you need from this library. See the API reference for additional details.
  2. An extensible DSL for selecting a subset of valid subscripts. This is useful if, for example, you want to resolve column names in a data frame in your implementation of select.
  3. A set of utility functions for traversing selections in array-like objects.

It combines the functionality of dplyr’s slice, select and sample methods.

Basic Usage

The most frequently used form is:

(select object selection1 selection2 ...)

where each selection specifies a set of subscripts along the corresponding axis. The selection specifications are found below.

To select a column, pass in t for the rows selection1, and the columns names (for a data frame) or column number (for an array) for selection2. For example, to select the second column of this array (remember Common Lisp has zero based arrays, so the second column is at index 1.

(select #2A((C0 C1 C2) (v10 v11 v12) (v20 v21 v22) (v30 v31 v32)) t 1) ; #(C1 V11 V21 V31)

and to select a column from the mtcars data frame:

(ql:quickload :data-frame) (data :mtcars) (select mtcars t 'mpg)

if you’re selecting from a data frame, you can also use the column or columns command:

(column mtcars 'mpg)

to select an entire row, pass t for the column selector, and the row(s) you want for selection1. This example selects the first row (second row in purely array terms, which are 0 based):

(select #2A((C0 C1 C2) (v10 v11 v12) (v20 v21 v22) (v30 v31 v32)) 1 t) ;#(V10 V11 V12)

Selection Specifiers

Selecting Single Values

A non-negative integer selects the corresponding index, while a negative integer selects an index counting backwards from the last index. For example:

(select #(0 1 2 3) 1) ; => 1 (select #(0 1 2 3) -2) ; => 2

These are called singleton slices. Each singleton slice drops the dimension: vectors become atoms, matrices become vectors, etc.

Selecting Ranges

(range start end) selects subscripts i where start <= i < end. When end is nil, the last index is included (cf. subseq). Each boundary is resolved according to the other rules, if applicable, so you can use negative integers:

(select #(0 1 2 3) (range 1 3)) ; => #(1 2) (select #(0 1 2 3) (range 1 -1)) ; => #(1 2)

Selecting All Subscripts

t selects all subscripts:

(select #2A((0 1 2) (3 4 5)) t 1) ; => #(1 4)

Selecting w/ Sequences

Sequences can be used to make specific selections from the object. For example:

(select #(0 1 2 3 4 5 6 7 8 9) (vector (range 1 3) 6 (range -2 -1))) ; => #(1 2 3 6 8 9) (select #(0 1 2) '(2 2 1 0 0)) ; => #(2 2 1 0 0)

Masks

Bit Vectors

Bit vectors can be used to select elements of arrays and sequences as well:

(select #(0 1 2 3 4) #*00110) ; => #(2 3)

Which

which returns an index of the positions in SEQUENCE which satisfy PREDICATE.

(defparameter data #(12 127 28 42 39 113 42 18 44 118 44 37 113 124 37 48 127 36 29 31 125 139 131 115 105 132 104 123 35 113 122 42 117 119 58 109 23 105 63 27 44 105 99 41 128 121 116 125 32 61 37 127 29 113 121 58 114 126 53 114 96 25 109 7 31 141 46 13 27 43 117 116 27 7 68 40 31 115 124 42 128 146 52 71 118 117 38 27 106 33 117 116 111 40 119 47 105 57 122 109 124 115 43 120 43 27 27 18 28 48 125 107 114 34 133 45 120 30 127 31 116)) (which data :predicate #'evenp) ; #(0 2 3 6 7 8 9 10 13 15 17 25 26 30 31 34 40 44 46 48 55 56 57 59 60 66 71 74 ; 75 78 79 80 81 82 84 86 88 91 93 98 100 103 107 108 109 112 113 116 117 120)

Sampling

You may sample sequences, arrays and data frames with the sample generic function, and extend it for your own objects. The function signature is:

(defgeneric sample (data n &key with-replacement skip-unselected)

By default in common lisp, key values that are not provide are nil, so you need to turn them on if you want them.

:skip-unselected t means to not return the values of the object that were not part of the sample. This is turned off by default because a common use case is splitting a data set into training and test groups, and the second value is ignored by default in Common Lisp. The let-plus package, imported by default in select, makes it easy to destructure into test and training. This example is from the tests included with select:

(let+ ((*random-state* state) ((&values train test) (sample arr35 2)) ...

Note the setting of *random-state*. You should use this pattern of setting *random-state* to a saved seed anytime you need reproducible results (like in a testing scenerio).

The size of the sample is determined by the value of n, which must be between 0 and the number of rows (for an array) or length if a sequence. If (< n 1), then n indicates a proportion of the sample, e.g. 2/3 (values of n less than one may be rational or float. For example, let’s take a training sample of 2/3 of the rows in the mtcars dataset:

LS-USER> (sample mtcars 2/3) #<DATA-FRAME (21 observations of 12 variables)> #<DATA-FRAME (11 observations of 12 variables)> LS-USER> (dims mtcars) (32 12)

You can see that mtcars has 32 rows, and has been divided into 2/3 and 1/3 proportional samples for training / test.

You can also take samples of sequences (lists and vectors), for example using the DATA variable defined above:

LS-USER> (length data) 121 LS-USER> (sample data 10 :skip-unselected t) #(43 117 42 29 41 105 116 27 133 58) LS-USER> (sample data 1/10 :skip-unselected t) #(119 116 7 53 27 114 31 23 121 109 42 125)

list objects can also be sampled:

(sample '(a b c d e f g) 0.5) (A E G B) (F D C)

Note that n is rounded up when the number of elements is odd and a proportional number is requested.

Extensions

The previous section describes the core functionality. The semantics can be extended. The extensions in this section are provided by the library and prove useful in practice. Their implementation provide good examples of extending the library.

including is convenient if you want the selection to include the end of the range:

(select #(0 1 2 3) (including 1 2)) ; => #(1 2), cf. (select ... (range 1 3))

nodrop is useful if you do not want to drop dimensions:

(select #(0 1 2 3) (nodrop 2)) ; => #(2), cf. (select ... (range 2 3))

All of these are trivial to implement. If there is something you are missing, you can easily extend select. Pull request are welcome.

(ref) is a version of (select) that always returns a single element, so it can only be used with singleton slices.

Select Semantics

Arguments of select, except the first one, are meant to be resolved using canonical-representation, in the select-dev package. If you want to extend select, you should define methods for canonical-representation. See the source code for the best examples. Below is a simple example that extends the semantics with ordinal numbers.

(defmacro define-ordinal-selection (number) (check-type number (integer 0)) `(defmethod select-dev:canonical-representation ((axis integer) (select (eql ',(intern (format nil \"~:@@(~:r~)\" number))))) (assert (< ,number axis)) (select-dev:canonical-singleton ,number))) (define-ordinal-selection 1) (define-ordinal-selection 2) (define-ordinal-selection 3) (select #(0 1 2 3 4 5) (range 'first 'third)) ; => #(1 2)

Note the following:

  • The value returned by canonical-representation needs to be constructed using canonical-singleton, canonical-range, or canonical-sequence. You should not use the internal representation directly as it is subject to change.
  • You can assume that axis is an integer; this is the default. An object may define a more complex mapping (such as, for example, named rows & columns), but unless a method specialized to that is found, canonical-representation will just query its dimension (with axis-dimension) and try to find a method that works on integers.
  • You need to make sure that the subscript is valid, hence the assertion.

5.7 - SQLDF

Selecting subsets of data using SQL

Overview

sqldf is a library for querying data in a data-frame using SQL, optimised for memory consumption. Any query that can be done in SQL can also be done in the API, but since SQL is widely known, many developers find it more convenient to use.

To use SQL to query a data frame, the developer uses the sqldf function, using the data frame name (converted to SQL identifier format) in place of the table name. sqldf will automatically create an in-memory SQLite database, copy the contents of the data frame to it, perform the query, return the results as a new data frame and delete the database. We have tested this with data frames of 350K rows and there is no noticeable difference in performance compared to API based queries.

See the cl-sqlite documentation for additional functionality provided by the SQLite library. You can create databases, employ multiple persistent connections, use prepared statements, etc. with the underlying library. sqldf is a thin layer for moving data to/from data-frames.

Basic Usage

sqldf requires the sqlite shared library from the SQLite project. It may also be available via your operating systems package manager.

To load sqldf:

(asdf:load-system :sqldf) (use-package 'sqldf) ;access to the symbols

Examples

These examples use the R data sets that are loaded using the example ls-init file. If your init file doesn’t do this, go now and load the example datasets in the REPL. Mostly these examples are intended to demonstrate commonly used queries for users who are new to SQL. If you already know SQL, you can skip this section.

Ordering & Limiting

This example shows how to limit the number of rows output by the query. It also illustrates changing the column name to meet SQL identifier requirements. In particular, the R CSV file has sepal.length for a column name, which is converted to sepal-length for the data frame, and we query it with sepal_length for SQL because ‘-’ is not a valid character in SQL identifers.

First, let’s see how big the iris data set is:

LS-USER> iris #<DATA-FRAME (150 observations of 6 variables)>

and look at the first few rows:

(head iris) ;; X7 SEPAL-LENGTH SEPAL-WIDTH PETAL-LENGTH PETAL-WIDTH SPECIES ;; 0 1 5.1 3.5 1.4 0.2 setosa ;; 1 2 4.9 3.0 1.4 0.2 setosa ;; 2 3 4.7 3.2 1.3 0.2 setosa ;; 3 4 4.6 3.1 1.5 0.2 setosa ;; 4 5 5.0 3.6 1.4 0.2 setosa ;; 5 6 5.4 3.9 1.7 0.4 setosa

X7 is the row name/number from the data set. Since it was not assigned a column name in the data set, lisp-stat gives it a random name upon import (X1, X2, X3, …).

Now use sqldf for a query:

(pprint (sqldf "select * from iris order by sepal_length desc limit 3")) ;; X7 SEPAL-LENGTH SEPAL-WIDTH PETAL-LENGTH PETAL-WIDTH SPECIES ;; 0 132 7.9 3.8 6.4 2.0 virginica ;; 1 118 7.7 3.8 6.7 2.2 virginica ;; 2 119 7.7 2.6 6.9 2.3 virginica

Averaging & Grouping

Grouping is often useful during the exploratory phase of data analysis. Here’s how to do it with sqldf:

(pprint (sqldf "select species, avg(sepal_length) from iris group by species")) ;; SPECIES AVG(SEPAL-LENGTH) ;; 0 setosa 5.0060 ;; 1 versicolor 5.9360 ;; 2 virginica 6.5880

Nested Select

For each species, show the two rows with the largest sepal lengths:

(pprint (sqldf "select * from iris i where x7 in (select x7 from iris where species = i.species order by sepal_length desc limit 2) order by i.species, i.sepal_length desc")) ;; X7 SEPAL-LENGTH SEPAL-WIDTH PETAL-LENGTH PETAL-WIDTH SPECIES ;; 0 15 5.8 4.0 1.2 0.2 setosa ;; 1 16 5.7 4.4 1.5 0.4 setosa ;; 2 51 7.0 3.2 4.7 1.4 versicolor ;; 3 53 6.9 3.1 4.9 1.5 versicolor ;; 4 132 7.9 3.8 6.4 2.0 virginica ;; 5 118 7.7 3.8 6.7 2.2 virginica

Recall the note above about X7 being the row id. This may be different depending on how many other data frames with an unnamed column have been imported in your Lisp-Stat session.

SQLite access

sqldf needs to read and write data frames to the data base, and these functions are exported for general use.

Write a data frame

create-df-table and write-table can be used to write a data frame to a database. Each take a connection to a database, which may be file or memory based, a table name and a data frame. Multiple data frames, with different table names, may be written to a single SQLite file this way. For example, to write iris to disk:

LS-USER> (defparameter *conn* (sqlite:connect #P"c:/Users/lisp-stat/data/iris.db3")) ;filel to save to *CONN* LS-USER> (sqldf::create-df-table *conn* 'iris iris) ; create the table * schema NIL LS-USER> (sqldf:write-table *conn* 'iris iris) ; write the data NIL

Read a data frame

read-table will read a database table into a data frame and update the column names to be lisp like by converting “.” and “_” to “-”. Note that the CSV reading tools of SQLite (for example, DB-Browser for SQLite are much faster than the lisp libraries, sometimes 15x faster. This means that often the quickest way to load a data-frame from CSV data is to first read it into a SQLite database, and then load the database table into a data frame. In practice, SQLite also turns out to be a convenient file format for storing data frames.

Roadmap

SQLDF is currently written using an apparently abandoned library, cl-sqlite. Pull requests from 2012 have been made with no response from the author, and the SQLite C API has improved considerably in the 12 years since the cl-sqlite FFI was last updated.

We choose CL-SQLite because, at the time of writing, it was the only SQLite library with a commercially acceptable license. Since then CLSQL has migrated to a BSD license and is a better option for new development. Not only does it support CommonSQL, the de-facto SQL query syntax for Common Lisp, it also supports several additional databases.

Version 2 of SQLDF will use CLSQL, possibly including some of the CSV and other extensions available in SQLite. Benchmarks show that SQLite’s CSV import is about 15x faster than cl-csv, and a FFI wrapper of SQLite’s CSV importer would be a good addition to Lisp-Stat.

Joins

Joins on tables are not implemented in SQLDF, though there is no technical reason they could not be. This will be done as part of the CLSQL conversion and involves more advanced SQL parsing. SXQL is worth investigating as a SQL parser.

5.8 - Statistics

Statistical functions

Overview

statistics is a library that consolidates three well-known statistical libraries:

  • The statistics library from numerical-utilities
  • Larry Hunter’s cl-statistics
  • Gary Warren King’s cl-mathstats

There are a few challenges in using these as independent systems on projects though:

  • There is a good amount of overlap. Everyone implements, for example mean (as does alexandria, cephes, and almost every other library out there).
  • In the case of mean, variance, etc., the functions deal only with samples, not distributions

This library brings these three systems under a single ‘umbrella’, and adds a few missing ones. To do this we use Tim Bradshaw’s conduit-packages. For the few functions that require dispatch on type (sample data vs. a distribution), we use typecase because of its simplicity and not needing another system. There’s a slight performance hit here in the case of run-time determination of types, but until it’s a problem prefer it. Some alternatives considered for dispatch was https://github.com/pcostanza/filtered-functions.

nu-statistics

These functions cover sample moments in detail, and are accurate. They include up to forth moments, and are well suited to the work of an econometrist (and were written by one).

lh-statistics

These were written by Larry Hunter, based on the methods described in Bernard Rosner’s book, Fundamentals of Biostatistics 5th Edition, along with some from the CLASP system. They cover a wide range of statistical applications. Note that lh-statistics uses lists and not vectors, so you’ll need to convert. To see what’s available see the statistics github repo.

gwk-statistics

These are from Gary Warren King, and also partially based on CLASP. It is well written, and the functions have excellent documentation. The major reason we don’t include it by default is because it uses an older ecosystem of libraries that duplicate more widely used system (for example, numerical utilities, alexandria). If you want to use these, you’ll need to uncomment the appropriate code in the ASDF and pkgdcl.lisp files.

ls-statistics

These are considered the most complete, and they account for various types and dispatch properly.

Accuracy

LH and GWK statistics compute quantiles, CDF, PDF, etc. using routines from CLASP, that in turn are based on algorithms from Numerical Recipes. These are known to be accurate to only about four decimal places. This is probably accurate enough for many statistical problems, however should you need greater accuracy look at the distributions system. The computations there are based on special-functions, which has accuracy around 15 digits. Unfortunately documentation of distributions and the ‘wrapping’ of them here are incomplete, so you’ll need to know the pattern, e.g. pdf-gamma, cdf-gamma, etc., which is described in the link above.

Versions

Because this system is likely to change rapidly, we have adopted a system of versioning proposed in defpackage+. This is also the system alexandria uses where a version number is appended to the API. So, statistics-1 is our current package name. statistics-2 will be the next and so on. If you don’t like these names, you can always change it locally using a package local nickname.

Dictionary

scale

scale scale is generic function whose default method centers and/or scales the columns of a numeric matrix. This is neccessary when the units of measurement for your data differ. The scale function is provided for this purpose.

(defun standard-scale (x &key center scale)

Returns

The function returns three values:

  1. (x - x̄) / s where X̄ is the mean and S is the standard deviation
  2. the center value used
  3. the scale value used

Parameters

  • CENTRE value to center on. (mean x) by default
  • SCALE value to scale by. (sd x) by default

If center or scale is nil, do not scale or center respectively.

Example: Scale the values in a vector

(defparameter x #(11 12 13 24 25 16 17 18 19)) (scale x) ; => #(-1.2585064099313854d0 -1.0562464511924128d0 -0.85398649245344d0 1.3708730536752591d0 1.5731330124142318d0 -0.24720661623652215d0 -0.044946657497549475d0 0.15731330124142318d0 0.3595732599803958d0)

Note that the scaled vector contains negative values. This is expected scaling a vector. Let’s try the same thing but without scaling:

(scale x :scale nil) #(-56/9 -47/9 -38/9 61/9 70/9 -11/9 -2/9 7/9 16/9) 155/9 1

Note the scaling factor was set to 1, meaning no scaling was performed, only centering (division by zero returns the original value).

Example: Scale the columns of an array

(defparameter y #2A((1 2 3 4 5 6 7 8 9) (10 20 30 40 50 60 70 80 90))) (margin #'scale y 1) ; 1 splits along columns, 0 splits along rows #(#(-1.4605934866804429d0 -1.0954451150103321d0 -0.7302967433402214d0 -0.3651483716701107d0 0.0d0 0.3651483716701107d0 0.7302967433402214d0 1.0954451150103321d0 1.4605934866804429d0) #(-1.4605934866804429d0 -1.0954451150103321d0 -0.7302967433402214d0 -0.3651483716701107d0 0.0d0 0.3651483716701107d0 0.7302967433402214d0 1.0954451150103321d0 1.4605934866804429d0))

Example: Scale the variables of a data frame

LS-USER> (remove-column! iris 'species) ;species is a categorical variable #<DATA-FRAME (150 observations of 4 variables) Edgar Anderson's Iris Data> LS-USER> (head iris) ;; SEPAL-LENGTH SEPAL-WIDTH PETAL-LENGTH PETAL-WIDTH ;; 0 5.1 3.5 1.4 0.2 ;; 1 4.9 3.0 1.4 0.2 ;; 2 4.7 3.2 1.3 0.2 ;; 3 4.6 3.1 1.5 0.2 ;; 4 5.0 3.6 1.4 0.2 ;; 5 5.4 3.9 1.7 0.4 NIL LS-USER> (map-columns iris #'scale) #<DATA-FRAME (150 observations of 4 variables)> LS-USER> (head *) ;; SEPAL-LENGTH SEPAL-WIDTH PETAL-LENGTH PETAL-WIDTH ;; 0 -0.8977 1.0156 -1.3358 -1.3111 ;; 1 -1.1392 -0.1315 -1.3358 -1.3111 ;; 2 -1.3807 0.3273 -1.3924 -1.3111 ;; 3 -1.5015 0.0979 -1.2791 -1.3111 ;; 4 -1.0184 1.2450 -1.3358 -1.3111 ;; 5 -0.5354 1.9333 -1.1658 -1.0487

6 - Reference

API documentation for Lisp-Stat systems

6.1 -

6.2 -

6.3 -

6.4 -

6.5 -

6.6 -

6.7 -

6.8 -

6.9 -

6.10 -

6.11 -

7.2 - Special Functions

Implemented in Common Lisp

The library assumes working with 64 bit double-floats. It will probably work with single-float as well. Whilst we would prefer to implement the complex domain, the majority of the sources do not. Tabled below are the special function implementations and their source. This library has a focus on high accuracy double-float calculations using the latest algorithms.

function source
erf libm
erfc libm
inverse-erf Boost
inverse-erfc Boost
log-gamma libm
gamma Cephes
incomplete-gamma Boost

Error rates

The following table shows the peak and mean errors using Boost test data. Tests run on MS Windows 10 with SBCL 2.0.10. Boost results taken from the Boost error function, inverse error function and log-gamma pages.

erf

Data Set Boost (MS C++) Special-Functions
erf small values Max = 0.841ε (Mean = 0.0687ε) Max = 6.10e-5ε (Mean = 4.58e-7ε)
erf medium values Max = 1ε (Mean = 0.119ε) Max = 1ε (Mean = 0.003ε)
erf large values Max = 0ε (Mean = 0ε) N/A erf range 0 < x < 6

erfc

Data Set Boost (MS C++) Special-Functions
erfc small values Max = 0ε (Mean = 0) Max = 1ε (Mean = 0.00667ε)
erfc medium values Max = 1.65ε (Mean = 0.373ε) Max = 1.71ε (Mean = 0.182ε)
erfc large values Max = 1.14ε (Mean = 0.248ε) Max = 2.31e-15ε (Mean = 8.86e-18ε)

inverse-erf/c

Data Set Boost (MS C++) Special-Functions
inverse-erf Max = 1.09ε (Mean = 0.502ε) Max = 2ε (Mean = 0.434ε)
inverse-erfc Max = 1ε (Mean = 0.491ε) Max = 2ε (Mean = 0.425ε)

log-gamma

Data Set Boost (MS C++) Special-Functions
factorials Max = 0.914ε (Mean = 0.175ε) Max = 2.10ε (Mean = 0.569ε)
near 0 Max = 0.964ε (Mean = 0.462ε) Max = 1.93ε (Mean = 0.662ε)
near 1 Max = 0.867ε (Mean = 0.468ε) Max = 0.50ε (Mean = 0.0183ε)
near 2 Max = 0.591ε (Mean = 0.159ε) Max = 0.0156ε (Mean = 3.83d-4ε)
near -10 Max = 4.22ε (Mean = 1.33ε) Max = 4.83d+5ε (Mean = 3.06d+4ε)
near -55 Max = 0.821ε (Mean = 0.419ε) Max = 8.16d+4ε (Mean = 4.53d+3ε)

The results for log gamma are good near 1 and 2, bettering those of Boost, however are worse (relatively speaking) at values of x > 8. I don’t have an explanation for this, since the libm values match Boost more closely. For example:

(spfn:log-gamma -9.99999237060546875d0) = -3.3208925610275326d0 (libm:lgamma -9.99999237060546875d0) = -3.3208925610151265d0 Boost test answer -3.320892561015125097640948165422843317137

libm:lgamma provides an additional 4 digits of accuracy over spfn:log-gamma when compared to the Boost test answer, despite using identical computations. log-gamma is still within 12 digits of agreement though, and likely good enough for most uses.

gamma

Data Set Boost (MS C++) Special-Functions
factorials Max = 1.85ε (Mean = 0.491ε) Max = 3.79ε (Mean = 0.949ε)
near 0 Max = 1.96ε (Mean = 0.684ε) Max = 2.26ε (Mean = 0.56ε)
near 1 Max = 2ε (Mean = 0.865ε) Max = 2.26ε (Mean = 0.858ε)
near 2 Max = 2ε (Mean = 0.995ε) Max = 2ε (Mean = 0.559ε)
near -10 Max = 1.73ε (Mean = 0.729ε) Max = 0.125ε (Mean = 0.0043ε)
near -55 Max = 1.8ε (Mean = 0.817ε) Max = 0ε (Mean = 0ε)

incomplete-gamma

See boost incomplete gamma documentation for notes and error rates.

lower

Data Set Boost (MS C++) Special-Functions
small values Max = 1.54ε (Mean = 0.439ε) Max = 3.00ε (Mean = 0.516ε)
medium values Max = 35.1ε (Mean = 6.98ε) Max = 10.00ε (Mean = 0.294ε)
large values Max = 243ε (Mean = 20.2ε) Max = 20ε (Mean = 0.613ε)
integer and half-integer Max = 13ε (Mean = 2.97ε) Max = 3ε (Mean = 0.189ε)

upper

Data Set Boost (MS C++) Special-Functions
small values Max = 2.26ε (Mean = 0.74ε) Max = 2.23ε (Mean = 0.511ε)
medium values Max = 23.7ε (Mean = 4ε) Max = 9.00ε (Mean = 0.266ε)
large values Max = 469ε (Mean = 31.5ε) Max = 20.5ε (Mean = 0.621ε)
integer and half-integer Max = 8.72ε (Mean = 1.48ε) Max = 4.00ε (Mean = 0.174ε)

NaN and Infinity

The lisp specification mentions neither NaN nor infinity, so any proper treatment of these is going to be either implementation specific or using a third party library.

We are using the float-features library. There is also some support for infinity in the extended-reals package of numerical-utilities, but it is not comprehensive. Openlibm and Cephes have definitions, but we don’t want to introduce a large dependency just to get these definitions.

Test data

The test data is based on Boost test data. You can run all the tests using the ASDF test op:

(asdf:test-system :special-functions)

By default the test summary values (the same as in Boost) are printed after each test, along with the key epsilon values.

7.3 - Code Repository

Collection of XLisp and Common Lisp statistical routines

Below is a partial list of the consolidated XLispStat packages from UCLA and CMU repositories. There is a great deal more XLispStat code available that was not submitted to these archives, and a search for an algorithm or technique that includes the term “xlispstat” will often turn up interesting results.

Artificial Intelligence

Genetic Programming

Cerebrum
A Framework for the Genetic Programming of Neural Networks. Peter Dudey. No license specified.
[Docs]
GAL
Functions useful for experimentation in Genetic Algorithms. It is hopefully compatible with Lucid Common Lisp (also known as Sun Common Lisp). The implementation is a “standard” GA, similar to Grefenstette’s work. Baker’s SUS selection algorithm is employed, 2 point crossover is maintained at 60%, and mutation is very low. Selection is based on proportional fitness. This GA uses generations. It is also important to note that this GA maximizes. William M. Spears. “Permission is hereby granted to copy all or any part of this program for free distribution, however this header is required on all copies.”
mGA
A Common Lisp Implementation of a Messy Genetic Algorithm. No license specified.
[Docs, errata]

Machine Learning

Machine Learning
Common Lisp files for various standard inductive learning algorithms that all use the same basic data format and same interface. It also includes automatic testing software for running learning curves that compare multiple systems and utilities for plotting and statistically evaluating the results. Included are:
  • AQ: Early DNF learner.
  • Backprop: The standard multi-layer neural-net learning method.
  • Bayes Indp: Simple naive or “idiot’s” Bayesian classifier.
  • Cobweb: A probabilistic clustering system.
  • Foil: A first-order Horn-clause learner (Prolog and Lisp versions).
  • ID3: Decision tree learner with a number of features.
  • KNN: K nearest neighbor (instance-based) algorithm.
  • Perceptron: Early one-layer neural-net algorithm.
  • PFOIL: Propositional version of FOIL for learning DNF.
  • PFOIL-CNF: Propositional version of FOIL for learning CNF.

Raymond J. Mooney. “This program may be freely copied, used, or modified provided that this copyright notice is included in each copy of this code and parts thereof.”

Neural Networks

QuickProp
Common Lisp implementation of “Quickprop”, a variation on back-propagation. For a description of the Quickprop algorithm, see Faster-Learning Variations on Back-Propagation: An Empirical Study by Scott E. Fahlman in Proceedings of the 1988 Connectionist Models Summer School, Morgan-Kaufmann, 1988. Scott E. Fahlman. Public domain.
[README]

Fun & Games

Towers of Hanoi
Tower of Hanoi plus the Queens program explained in Winston and Horn. No license specified.

Mathematics

Combinatorial
Various combinatorial functions for XLispStat. There are other Common Lisp libraries for this, for example cl-permutation. It’s worth searching for something in Quicklisp too. No license specified.
functions
Bessel, beta, erf, gamma and horner implementations. Gerald Roylance. License restricted to non-commercial use only.
integrate
gauss-hermite.lsp is by Jan de Leeuw.

runge.lsp and integr.lsp are from Gerald Roylance 1982 CLMATH package. integr.lsp has Simpson’s rule and the trapezoid rule. runge.lsp integrates runge-kutta differential equations by various methods.

Roylance code is non-commercial use only. Jan de Leeuw’s code has no license specified.

lsqpack
This directory contains the code from the Lawson and Hanson book, Solving Least Squares Problems, translated with f2cl, tweaked for Xlisp-Stat by Jan de Leeuw. No license specified.
nswc
This is an f2cl translation, very incomplete, of the NSWC mathematics library. The FORTRAN, plus a great manual, is available on github. The report is NSWCDD/TR-92/425, by Alfred H. Morris, Jr. dated January 1993. No license specified, but this code is commonly considered public domain.
Numerical Recipes
Code from Numerical Recipes in FORTRAN, first edition, translated with Waikato’s f2cl and tweaked for XLisp-Stat by Jan de Leeuw. No license specified.
optimization
Code for annealing, simplex and other optimization problems. Various licenses. These days, better implementations are available, for example the linear-programming library.

Statistics

Algorithms

  • AS 190 Probabilities and Upper Quantiles for the Studentized Range.
  • AS 226 Computing Noncentral Beta Probabilities
  • AS 241 The Percentage Points of the Normal Distribution
  • AS 243 Cumulative Distribution Function of the Non-Central T Distribution
  • TOMS 744 A stochastic algorithm for global optimization with constraints

AS algorithms: B. Narasimhan (naras@euler.bd.psu.edu) “You can freely use and distribute this code provided you don’t remove this notice. NO WARRANTIES, EXPLICIT or IMPLIED”

TOMS: F. Michael Rabinowitz. No license specified.

Categorical

glim
Glim extension for log-linear models. Jan de Leeuw. No license specified.
IPF
Fits Goodman’s RC model to the array X. Also included is a set of functions for APL like array operations. The four basic APL operators (see, for example, Garry Helzel, An Encyclopedia of APL, 2e edition, 1989, I-APL, 6611 Linville Drive, Weed., CA) are inner-product, outer-product, reduce, and scan. They can be used to produce new binary and unary functions from existing ones. Unknown author. No license specified.
latent-class
One file with the function latent-class. Unknown author. No license specified.
max
Functions to do quantization and cluster analysis in the empirical case. Jan de Leeuw. No license specified.
write-profiles
A function. The argument is a list of lists of strings. Each element of the list corresponds with a variable, the elements of the list corresponding with a variable are the labels of that variable, which are either strings or characters or numbers or symbols. The program returns a matrix of strings coding all the profiles. Unknown author. License not specified.

Distributions

The distributions repository contains single file implementations of:

density demo
Demonstrations of plots of density and probability functions. Requires XLispStat graphics. Jan de Leeuw. No license specified.
noncentral t-distribution
noncentral-t distribution by Russ Lenth, based on Applied Statistics Algorithm AS 243. No license specified.
probability-functions
A compilation of probability densities, cumulative distribution functions, and their inverses (quantile functions), by Jan de Leeuw. No license specified.
power
This appears to test the powers of various distribution functions. Unknown author. No license specified.
weibull-mle
Maximum likelihood estimation of Weibull parameters. M. Ennis. No license specified.

Classroom Statistics

The systems in the introstat directory are meant to be used in teaching situations. For the most part they use XLispStat’s graphical system to introduce students to statistical concepts. They are generally simple in nature from a the perspective of a statistical practitioner.

ElToY
ElToY is a collection of three programs written in XLISP-STAT. Dist-toy displays a univariate distribution dynamically linked to its parameters. CLT-toy provides an illustration of the central limit theorem for univariate distributions. ElToY provides a mechanism for displaying the prior and posterior distributions for a conjugate family dynamically linked so that changes to the prior affect the posterior and visa versa. Russell Almond almond@stat.washington.edu. GPL v2.

Multivariate

Dendro
Dendro is for producing dendrograms for agglomerative cluster in XLISP-STAT.

Plotting

Boxplot Matrix
Graphical Display of Analysis of Variance with the Boxplot Matrix. Extension of the standard one-way box plot to cross-classified data with multiple observations per cell. Richard M. Heiberger rmh@astro.ocis.temple.edu No license specified.
[Docs]
Dynamic Graphics and Regression Diagnostics
Contains methods for regression diagnostics using dynamic graphics, including all the methods discussed in Cook and Weisberg (1989) Technometrics, 277-312. Includes documentation written in LaTeX. sandy@umnstat.stat.umn.edu No license specified.
[Docs}
FEDF
Flipped Empirical Distribution Function. Parallel-FEDF, FEDF-ScatterPlot, FEDF-StarPlot written in XLISP-STAT. These plots are suggested for exploring multidimensional data suggested in “Journal of Computational and Graphical Statistics”, Vol. 4, No. 4, pp.335-343. 97/07/18. Lee, Kyungmi & Huh, Moon Yul myhuh@yurim.skku.ac.kr No license specified.
PDFPlot
PDF graphics output from XlispStat PDFPlot is a XlispStat class to generate PDF files from LispStat plot objects. Steven D. Majewski sdm7g@virginia.edu. No license specified.
RXridge
RXridge.LSP adds shrinkage regression calculation and graphical ridge “trace” display functionality to the XLisp-Stat, ver2.1 release 3+ implementation of LISP-STAT. Bob Obenchain. No license specified.

Regression

Bayes-Linear
BAYES-LIN is an extension of the XLISP-STAT object-oriented statistical computing environment, which adds to XLISP-STAT some object prototypes appropriate for carrying out local computation via message-passing between clique-tree nodes of Bayes linear belief networks. Darren J. Wilkinson. No license specified. [Docs]
Bayesian Poisson Regression
Bayesian Poisson Regression using the Gibbs Sampler Sensitivity Analysis through Dynamic Graphics. A set of programs that allow you to do Bayesian sensitivity analysis dynamically for a variety of models. B. Narasimhan (naras@stat.fsu.edu) License restricted to non-commercial use only.
[Docs]
Binary regression
Smooth and parametric binary regression code. Unknown author. License not specified.
Cost of Data Analysis
A regression analysis usually consists of several stages such as variable selection, transformation and residual diagnosis. Inference is often made from the selected model without regard to the model selection methods that proceeded it. This can result in overoptimistic and biased inferences. We first characterize data analytic actions as functions acting on regression models. We investigate the extent of the problem and test bootstrap, jackknife and sample splitting methods for ameliorating it. We also demonstrate an interactive XLISP-STAT system for assessing the cost of the data analysis while it is taking place. Julian J. Faraway. BSD license.
[Docs]
Gee
Lisp-Stat code for generalised estimating equation models. Thomas Lumley thomas@biostat.washington.edu. GPL v2.
[Docs]
GLIM
Functions and prototypes for fitting generalized linear models. Contributed by Luke Tierney luke@umnstat.stat.umn.edu. No license specified.
[Docs]
GLMER
A function to estimate coefficients and dispersions in a generalized linear model with random effects. Guanghan Liu gliu@math.ucla.edu. No license specified.
Hasse
Implements Taylor & Hilton’s rules for balanced ANOVA designs and draws the Hasse diagram of nesting relationships. Philip Iversen piversen@iastate.edu. License restricted to non-commercial use only.
monotone
Implementation of an algorithm to project on the intersection of r closed convex sets. Further details and references are in Mathar, Cyclic Projections in Data Analysis, Operations Research Proceedings 1988, Springer, 1989. Jan de Leeuw. No license specified.
OIRS
Order and Influence in Regression Strategy. The methods (tactics) of regression data analysis such as variable selection, transformation and outlier detection are characterised as functions acting on regression models and returning regression models. The ordering of the tactics, that is the strategy, is studied. A method for the generation of acceptable models supported by the choice of regression data analysis methods is described with a view to determining if two capable statisticians may reasonably hold differing views on the same data. Optimal strategies are considered. The idea of influential points is extended from estimation to the model building process itself both quantitatively and qualitatively. The methods described are not intended for the entirely automatic analysis of data, rather to assist the statistician in examining regression data at a strategic level. Julian J. Faraway julian@stat.lsa.umich.edu. BSD license.
oneway
Additions to Tierney’s one way ANOVA. B. Narasimhan naras@euler.bd.psu.edu. No license specified.
Regstrat
A XLispStat tool to investigate order in Regression Strategy particularly for finding and examining the models found by changing the ordering of the actions in a regression analysis. Julian Faraway julian@stat.lsa.umich.edu. License restricted to non-commercial use only.
Simsel
XLISP-STAT software to perform Bayesian Predictive Simultaneous Variable and Transformation Selection for regression. A criterion-based model selection algorithm. Jennifer A. Hoeting jah@stat.colostate.edu. License restricted to non-commercial use only.

Robust

There are three robust systems in the robust directory:

robust regression
This is the Xlisp-Stat version of ROSEPACK, the robust regression package developed by Holland, Welsch, and Klema around 1975. See Holland and Welsch, Commun. Statist. A6, 1977, 813-827. See also the Xlisp-Stat book, pages 173-177, for an alternative approach. Jan de Leeuw. No license specified.

There is also robust statistical code for location and scale.

Simulation

The simulation directory contains bootstrapping methods, variable imputation, jackknife resampling, monte-carlo simulations and a general purpose simulator. There is also the discrete finite state markov chains in the temporal directory.

Smoothers

kernel density estimators
KDEs based on Wand, CFFI based KDEs by B. Narasimhan, and graphical univariate density estimation.
spline
Regularized bi-variate splines with smoothing and tension according to Mitasova and Mitas. Cubic splines according to Green and Silverman. Jan de Leeuw. No license specified.
super-smoother
The super smoothing algorithm, originally implemented in FORTRAN by Jerome Friedman of Stanford University, is a method by which a smooth curve may be fitted to a two-dimensional array of points. Its implementation is presented here in the XLISP-STAT language. Jason Bond. No license specified.
[DOCS]
Variable Bandwidth
XLispStat code to facilitate interactive bandwidth choice for estimator (3.14), page 44 in Bagkavos (2003), “BIAS REDUCTION IN NONPARAMETRIC HAZARD RATE ESTIMATION”. No license specified.

Spatial

livemap
LiveMap is a tool for exploratory spatial data analysis. Dr. Chris Brunsdon. No license specified.
[DOCS]
variograms
Produces variograms using algorithms from C.V. Deutsch and A.G. Journel, “GSLIB: Geostatistical Software Library and User’s Guide, Oxford University Press, New York, 1992. Stanley S. Bentow. No license specified.
[DOCS]

Temporal

Exploratory survival analysis
A set of XLISP-STAT routines for the interactive, dynamic, exploratory analysis of survival data. E. Neely Atkinson (neely@odin.mda.uth.tmc.edu) “This software may be freely redistributed.”
[Docs]
Markov
Simulate some Markov chains in Xlisp-Stat. Complete documentation and examples are included. B. Narasimhan (naras@sci234e.mrs.umn.edu). GPL.
[Docs]
SAPA
Sapaclisp is a collection of Common Lisp functions that can be used to carry out many of the computations described in the SAPA book:

Donald B. Percival and Andrew T. Walden, “Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques”, Cambridge University Press, Cambridge, England, 1993.

The SAPA book uses a number of time series as examples of various spectral analysis techniques.

From the description:

Sapaclisp features functions for converting to/from decibels, the FORTRAN sign function, log of the gamma function, manipulating polynomials, root finding, simple numerical integration, matrix functions, Cholesky and modified Gram-Schmidt (i.e., Q-R) matrix decompositions, sample means and variances, sample medians, computation of quantiles from various distributions, linear least squares, discrete Fourier transform, fast Fourier transform, chirp transform, low-pass filters, high-pass filters, band-pass filters, sample auto-covariance sequence, auto-regressive spectral estimates, least squares, forward/backward least squares, Burg’s algorithm, the Yule-Walker method, periodogram, direct spectral estimates, lag window spectral estimates, WOSA spectral estimates, sample cepstrum, time series bandwidth, cumulative periodogram test statistic for white noise, and Fisher’s g statistic.

License: “Use and copying of this software and preparation of derivative works based upon this software are permitted. Any distribution of this software or derivative works must comply with all applicable United States export control laws.”

Times
XLispStat functions for time series analysis, data editing, data selection, and other statistical operations. W. Hatch (bts!bill@uunet.uu.net). Public Domain.

Tests

The tests directory contains code to do one-sample and two-sample Kolmogorov-Smirnov test (with no estimated parameters) and code to do Mann-Whitney and Wilcoxon rank signed rank tests.

Training & Documentation

ENAR Short Course
This directory contains slides and examples used in a shortcourse on Lisp-Stat presented at the 1992 ENAR meetings in Cincinnati, 22 March 1992.
ASA Course
Material from an ASA course given in 1992.
Tech Report
A 106 page mini manual on XLispStat.

Utilities

The majority of the files in the utilities directory are specific to XLISP-STAT and unlikely to be useful. In most cases better alternatives now exist for Common Lisp. A few that may be worth investigating have been noted below.

Filters

XLisp-S
A series of routines to allow users of Xlisp or LispStat to interactively transfer data to and access functions in New S. Steve McKinney kilroy@biostat.washington.edu. License restricted to non-commercial use only.

I/O

formatted-input
A set of XLISP functions that can be used to read ASCII files into lists of lists, using formatted input. The main function is read-file, which has as arguments a filename and a FORTRAN type format string (with f, i, x, t, and a formats) Jan Deleeuw deleeuw@laplace.sscnet.ucla.edu “THIS SOFTWARE CAN BE FREELY DISTRIBUTED, USED, AND MODIFIED.”

Memoization

automatic memoization
As the name suggests. Marty Hall hall@aplcenmp.apl.jhu.edu. “Permission is granted for any use or modification of this code provided this notice is retained."
[OVERVIEW]

8 - Contribution Guidelines

How to contribute to Lisp-Stat

This section describes the mechanics of how to contribute code to Lisp-Stat. Legal requirements, community guidelines, code of conduct, etc. For details on how to contribute code and documentation, see links on nav sidebar to the left under Contributing.

For ideas about what you might contribute, please see open issues on github and the ideas page. The organisation repository contains the individual sub-projects. Contributions to documentation are especially welcome.

Get source code

First you need the Lisp-Stat source code. The core systems are found on the Lisp-Stat github page. For the individual systems, just check out the one you are interested in. For the entire Lisp-Stat system, at a minimum you will need:

Other dependencies will be pulled in by Quicklisp.

Development occurs on the “master” branch. To get all the repos, you can use the following command in the directory you want to be your top level dev space:

cd ~/quicklisp/local-projects && \ git clone https://github.com/Lisp-Stat/data-frame.git && \ git clone https://github.com/Lisp-Stat/dfio.git && \ git clone https://github.com/Lisp-Stat/special-functions.git && \ git clone https://github.com/Lisp-Stat/numerical-utilities.git && \ git clone https://github.com/Lisp-Stat/array-operations.git && \ git clone https://github.com/Lisp-Stat/documentation.git && \ git clone https://github.com/Lisp-Stat/distributions.git && \ git clone https://github.com/Lisp-Stat/plot.git && \ git clone https://github.com/Lisp-Stat/select.git && \ git clone https://github.com/Lisp-Stat/cephes.cl.git && \ git clone https://github.com/Symbolics/alexandria-plus && \ git clone https://github.com/Lisp-Stat/statistics.git && \ git clone https://github.com/Lisp-Stat/lisp-stat.git && \ git clone https://github.com/Lisp-Stat/lla.git

Modify the source

Before you start, send a message to the Lisp-Stat mailing list or file an issue on Github describing your proposed changes. Doing this helps to verify that your changes will work with what others are doing and have planned for the project. Importantly, there may be some existing code or design work for you to leverage that is not yet published, and we’d hate to see work duplicated unnecessarily.

Be patient, it may take folks a while to understand your requirements. For large systems or design changes, a design document is preferred. For small changes, issues and the mailing list are fine.

Once your suggested changes are agreed, you can modify the source code and add some features using your favorite IDE.

The following sections provide tips for working on the project:

Coding Convention

Please consider the following before submitting a pull request:

  • Code should be formatted according to the Google Common Lisp Style Guide
  • All code should include unit tests based on CLUNIT.
  • Contributions should pass existing unit tests
  • New unit tests should be provided to demonstrate bugs and fixes
  • Indentation in Common Lisp is important for readability. Contributions should adhere to these guidelines. For the most part, a properly configured your editor will do this automatically.

Suggested editor settings for code contributions

No line breaks in (doc)strings, otherwise try to keep it within 80 columns. Remove trailing whitespace. ‘modern’ coding style. Suggested Emacs snippet:

(set-fill-column 9999) (font-lock-add-keywords nil '(("\\<\\(FIXME\\|TODO\\|QUESTION\\|NOTE\\)" 1 font-lock-warning-face t))) (setq show-trailing-whitespace t) (add-hook 'write-file-hooks '(lambda() (save-excursion (delete-trailing-whitespace)) nil)) (visual-line-mode 1) (setq slime-net-coding-system 'utf-8-unix) (setq lisp-lambda-list-keyword-parameter-alignment t) (setq lisp-lambda-list-keyword-alignment t) (setq common-lisp-style-default 'modern)

Code review

Github includes code review tools that can be used as part of a pull request. We recommend using a triangular workflow and feature/bug branches in your own repository to work from. Once you submit a pull request, one of the committers will review it and possibly request modifications.

As a contributor you should organise (squash) your git commits to make them understandable to reviewers:

  • Combine WIP and other small commits together.
  • Address multiple issues, for smaller bug fixes or enhancements, with a single commit.
  • Use separate commits to allow efficient review, separating out formatting changes or simple refactoring from core changes or additions.
  • Rebase this chain of commits on top of the current master
  • Write a good git commit message

Once all the comments in the review have been addressed, a Lisp-Stat committer completes the following steps to commit the patch:

  • If the master branch has moved forward since the review, rebase the branch from the pull request on the latest master and re-run tests.
  • If all tests pass, the committer amends the last commit message in the series to include “this closes #1234”. This can be done with interactive rebase. When on the branch issue: git rebase -i HEAD^
    • Change where it says “pick” on the line with the last commit, replacing it with “r” or “reword”. It replays the commit giving you the opportunity the change the commit message.
    • The committer pushes the commit(s) to the github repo
    • The committer resolves the issue with a message like "Fixed in <Git commit SHA>".

Additional Info

Where to start?

If you are new to statistics or Lisp, documentation updates are always a good place to start. You will become familiar with the workflow, learn how the code functions and generally become better acquainted with how Lisp-Stat operates. Besides, any contribution will require documentation updates, so it’s good to learn this system first.

If you are coming from an existing statistical environment, consider porting a XLispStat package that you find useful to Lisp-Stat. Use the XLS compatibility layer to help. If there is a function missing in XLS, raise an issue and we’ll create it. Some XLispStat code to browse:

Keep in mind that some of these rely on the XLispStat graphics system, which was native to the platform. LISP-STAT uses Vega for visualizations, so there isn’t a direct mapping. Non-graphical code should be a straight forward port.

You could also look at CRAN, which contains thousands of high-quality packages.

For specific ideas that would help, see the ideas page.

Issue Guidelines

Please comment on issues in github, making your concerns known. Please also vote for issues that are a high priority for you.

Please refrain from editing descriptions and comments if possible, as edits spam the mailing list and clutter the audit trails, which is otherwise very useful. Instead, preview descriptions and comments using the preview button (on the right) before posting them. Keep descriptions brief and save more elaborate proposals for comments, since descriptions are included in GitHub automatically sent messages. If you change your mind, note this in a new comment, rather than editing an older comment. The issue should preserve this history of the discussion.

Code of Conduct

The following code of conduct is not meant as a means for punishment, action or censorship for the mailing list or project. Instead, it is meant to set the tone, expectations and comfort level for contributors and those wishing to participate in the community.

  • We ask everyone to be welcoming, friendly, and patient.
  • Flame wars and insults are unacceptable in any fashion, by any party.
  • Anything can be asked, and “RTFM” is not an acceptable answer.
  • Neither is “it’s in the archives, go read them”.
  • Statements made by core developers can be quoted outside of the list.
  • Statements made by others can not be quoted outside the list without explicit permission. - Anonymised paraphrased statements “someone asked about…” are OK - direct quotes with or without names are not appropriate.
  • The community administrators reserve the right to revoke the subscription of members (including mentors) that persistently fail to abide by this Code of Conduct.

8.1 - Contributor License Agreement

Contributor License Agreement

First, if you are contributing on behalf of your employer, ensure you have signed a contributor license agreement. Then follow these steps for contributing to Lisp-Stat:

You may also be interested in the additional information at the end of this document.

Contributor License Agreement

Contributor License Agreements (CLAs) are common and accepted in open source projects. We all wish for Lisp-Stat to be used and distributed as widely as possible, and for its users to be confident about the origins and continuing existence of the code. The CLA help us achieve that goal. Although common, many in the Lisp community are unaware of CLAs or their importance.

Some often asked questions include:

Why do you need a CLA?

We need a CLA because, by law, all rights reside with the originator of a work unless otherwise agreed. The CLA allows the project to accept and distribute your contributions. Without your consent via a CLA, the project has no rights to use the code. Here’s what Google has to say in their CLA policy page:

Standard inbound license

Using one standard inbound license that grants the receiving company broad permission to use contributed code in products is beneficial to the company and downstream users alike.

Technology companies will naturally want to make productive use of any code made available to them. However, if all of the code being received by a company was subject to various inbound licenses with conflicting terms, the process for authorizing the use of the code would be cumbersome because of the need for constant checks for compliance with the various licenses. Whenever contributed code were to be used, the particular license terms for every single file would need to be reviewed to ascertain whether the application would be permitted under the terms of that code’s specific license. This would require considerable human resources and would slow down the engineers trying to utilize the code.

The benefits that a company receives under a standard inbound license pass to downstream users as well. Explicit patent permissions and disclaimers of obligations and warranties clarify the recipients’ rights and duties. The broad grant of rights provides code recipients opportunities to make productive use of the software. Adherence to a single standard license promotes consistency and common understanding for all parties involved.

How do I sign?

In order to be legally binding a certain amount of legal ceremony must take place. This varies by jurisdiction. As an individual ‘clickwrap’ or ‘browser wrap’ agreements are used. For corporations, a ‘wet signature’ is required because it is valid everywhere and avoids ambiguity of assent.

If you are an individual contributor, making a pull request from a personal account, the cla-assistant will automatically prompt you to digitally sign as part of the PR.

What does it do?

The CLA essentially does three things. It ensures that the contributor agrees:

  1. To allow the project to use the source code and redistribute it
  2. The contribution is theirs to give, e.g. does not belong to their employer or someone else
  3. Does not contain any patented ‘stuff’.

Mechanics of the CLA

The Lisp-Stat project uses CLAs to accept regular contributions from individuals and corporations, and to accept larger grants of existing software products, for example if you wished to contribute a large XLISP-STAT library.

Contributions to this project must be accompanied by a Contributor License Agreement. You (or your employer) retain the copyright to your contribution; this simply gives us permission to use and redistribute your contributions as part of the project.

You generally only need to submit a CLA once, so if you have already submitted one (even if it was for a different project), you do not need to do it again.

8.2 - Contributor License Agreement

Contributor License Agreement

First, if you are contributing on behalf of your employer, ensure you have signed a contributor license agreement. Then follow these steps for contributing to Lisp-Stat:

You may also be interested in the additional information at the end of this document.

Contributor License Agreement

Contributor License Agreements (CLAs) are common and accepted in open source projects. We all wish for Lisp-Stat to be used and distributed as widely as possible, and for its users to be confident about the origins and continuing existence of the code. The CLA help us achieve that goal. Although common, many in the Lisp community are unaware of CLAs or their importance.

Some often asked questions include:

Why do you need a CLA?

We need a CLA because, by law, all rights reside with the originator of a work unless otherwise agreed. The CLA allows the project to accept and distribute your contributions. Without your consent via a CLA, the project has no rights to use the code. Here’s what Google has to say in their CLA policy page:

Standard inbound license

Using one standard inbound license that grants the receiving company broad permission to use contributed code in products is beneficial to the company and downstream users alike.

Technology companies will naturally want to make productive use of any code made available to them. However, if all of the code being received by a company was subject to various inbound licenses with conflicting terms, the process for authorizing the use of the code would be cumbersome because of the need for constant checks for compliance with the various licenses. Whenever contributed code were to be used, the particular license terms for every single file would need to be reviewed to ascertain whether the application would be permitted under the terms of that code’s specific license. This would require considerable human resources and would slow down the engineers trying to utilize the code.

The benefits that a company receives under a standard inbound license pass to downstream users as well. Explicit patent permissions and disclaimers of obligations and warranties clarify the recipients’ rights and duties. The broad grant of rights provides code recipients opportunities to make productive use of the software. Adherence to a single standard license promotes consistency and common understanding for all parties involved.

How do I sign?

In order to be legally binding a certain amount of legal ceremony must take place. This varies by jurisdiction. As an individual ‘clickwrap’ or ‘browser wrap’ agreements are used. For corporations, a ‘wet signature’ is required because it is valid everywhere and avoids ambiguity of assent.

If you are an individual contributor, making a pull request from a personal account, the cla-assistant will automatically prompt you to digitally sign as part of the PR.

What does it do?

The CLA essentially does three things. It ensures that the contributor agrees:

  1. To allow the project to use the source code and redistribute it
  2. The contribution is theirs to give, e.g. does not belong to their employer or someone else
  3. Does not contain any patented ‘stuff’.

Mechanics of the CLA

The Lisp-Stat project uses CLAs to accept regular contributions from individuals and corporations, and to accept larger grants of existing software products, for example if you wished to contribute a large XLISP-STAT library.

Contributions to this project must be accompanied by a Contributor License Agreement. You (or your employer) retain the copyright to your contribution; this simply gives us permission to use and redistribute your contributions as part of the project.

You generally only need to submit a CLA once, so if you have already submitted one (even if it was for a different project), you do not need to do it again.

8.3 - Contributing to Documentation

You can help make Lisp-Stat documentation better

Creating and updating documentation is a great way to learn. You will not only become more familiar with Common Lisp, you have a chance to investigate the internals of all parts of a statistical system.

We use Hugo to format and generate the website, the Docsy theme for styling and site structure, and Netlify to manage the deployment of the documentation site (what you are reading now). Hugo is an open-source static site generator that provides us with templates, content organisation in a standard directory structure, and a website generation engine. You write the pages in Markdown (or HTML if you want), and Hugo wraps them up into a website.

All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult GitHub Help for more information on using pull requests.

Repository Organisation

Declt generates documentation for individual systems in Markdown format. These are kept with the project, e.g. select/docs/select.md.

Conventions

Please follow the Microsoft Style Guide for technical documentation.

Quick Start

Here’s a quick guide to updating the docs. It assumes you are familiar with the GitHub workflow and you are happy to use the automated preview of your doc updates:

  1. Fork the Lisp-Stat documentation repo on GitHub.
  2. Make your changes and send a pull request (PR).
  3. If you are not yet ready for a review, add “WIP” to the PR name to indicate it’s a work in progress. (Don’t add the Hugo property “draft = true” to the page front matter, because that prevents the auto-deployment of the content preview described in the next point.)
  4. Wait for the automated PR workflow to do some checks. When it’s ready, you should see a comment like this: deploy/netlify — Deploy preview ready!
  5. Click Details to the right of “Deploy preview ready” to see a preview of your updates.
  6. Continue updating your doc and pushing your changes until you’re happy with the content.
  7. When you’re ready for a review, add a comment to the PR, and remove any “WIP” markers.

Updating a single page

If you’ve just spotted something you’d like to change while using the docs, Docsy has a shortcut for you (do not use this for reference docs):

  1. Click Edit this page in the top right hand corner of the page.
  2. If you don’t already have an up to date fork of the project repo, you are prompted to get one - click Fork this repository and propose changes or Update your Fork to get an up to date version of the project to edit. The appropriate page in your fork is displayed in edit mode.
  3. Follow the rest of the Quick Start process above to make, preview, and propose your changes.

Previewing locally

If you want to run your own local Hugo server to preview your changes as you work:

  1. Follow the instructions in Getting started to install Hugo and any other tools you need. You’ll need at least Hugo version 0.45 (we recommend using the most recent available version), and it must be the extended version, which supports SCSS.

  2. Fork the Lisp-Stat documentation repo into your own repository project, then create a local copy using git clone. Don’t forget to use --recurse-submodules or you won’t pull down some of the code you need to generate a working site.

    git clone --recurse-submodules --depth 1 https://github.com/lisp-stat/documentation.git
    
  3. Run hugo server in the site root directory. By default your site will be available at http://localhost:1313/. Now that you’re serving your site locally, Hugo will watch for changes to the content and automatically refresh your site.

  4. Continue with the usual GitHub workflow to edit files, commit them, push the changes up to your fork, and create a pull request.

Creating an issue

If you’ve found a problem in the docs, but are not sure how to fix it yourself, please create an issue in the Lisp-Stat documentation repo. You can also create an issue about a specific page by clicking the Create Issue button in the top right hand corner of the page.

Useful resources

8.4 - Contribution Ideas

Some ideas on how contribute to Lisp-Stat

Special Functions

The functions underlying the statistical distributions require skills in numerical programming. If you like being ‘close to the metal’, this is a good area for contributions. Suitable for medium-advanced level programmers. In particular we need implementations of:

  • gamma
  • incomplete gamma (upper & lower)
  • inverse incomplete gamma

This work is partially complete and makes a good starting point for someone who wants to make a substantial contribution.

Documentation

Better and more documentation is always welcome, and a great way to learn. Suitable for beginners to Common Lisp or statistics.

Jupyter-Lab Integrations

Jupyter Lab has two nice integrations with Pandas, the Python version of Data-Frame, that would make great contributions: Qgrid, which allows editing a data frame in Jupyter Lab, and Jupyter DataTables. There are many more Pandas/Jupyter integrations, and any of them would be welcome additions to the Lisp-Stat ecosystem.

Plotting

LISP-STAT has a basic plotting system, but there is always room for improvement. An interactive REPL based plotting system should be possible with a medium amount of effort. Remote-js provides a working example of running JavaScript in a browser from a REPL, and could combined with something like Electron and a DSL for Vega-lite specifications. This may be a 4-6 week project for someone with JavaScript and HTML skills. There are other Plotly/Vega options, so if this interests you, open an issue and we can discuss. I have working examples of much of this, but all fragmented examples. Skills: good web/JavaScript, beginner lisp.

Regression

We have some code for ‘quick & dirty’ regressions and need a more robust DSL (Domain Specific Language). As a prototype, the -proto regression objects from XLISP-STAT would be both useful and be a good experiment to see what the final form should take. This is a relatively straightforward port, e.g. defproto -> defclass and defmeth -> defmethod. Skill level: medium in both Lisp and statistics, or willing to learn.

Vector Mathematics

We have code for vectorized versions of all Common Lisp functions, living in the elmt package. It now only works on vectors. Shadowing Common Lisp mathematical operators is possible, and more natural. This task is to make elmt vectorized math functions work on lists as well as vectors, and to implement shadowing of Common Lisp. This task requires at least medium-high level Lisp skills, since you will be working with both packages and shadowing. We also need to run the ANSI Common Lisp conformance tests on the results to ensure nothing gets broken in the process.

Continuous Integration

If you have experience with Github’s CI tools, a CI setup for Lisp-Stat would be a great help. This allows people making pull requests to immediately know if their patches break anything. Beginner level Lisp.