This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Resources

For Common Lisp

1: Common Lisp References
2: Special Functions
3: Code Repository

Tutorials

Books

Websites, Forums, Lists

Algorithms

1 - Common Lisp References

2 - Special Functions

Implemented in Common Lisp

The library assumes working with 64 bit double-floats. It will probably work with single-float as well. Whilst we would prefer to implement the complex domain, the majority of the sources do not. Tabled below are the special function implementations and their source. This library has a focus on high accuracy double-float calculations using the latest algorithms.

function	source
erf	libm
erfc	libm
inverse-erf	Boost
inverse-erfc	Boost
log-gamma	libm
gamma	Cephes
incomplete-gamma	Boost

Error rates

The following table shows the peak and mean errors using Boost test data. Tests run on MS Windows 10 with SBCL 2.0.10. Boost results taken from the Boost error function, inverse error function and log-gamma pages.

erf

Data Set	Boost (MS C++)	Special-Functions
erf small values	Max = 0.841ε (Mean = 0.0687ε)	Max = 6.10e-5ε (Mean = 4.58e-7ε)
erf medium values	Max = 1ε (Mean = 0.119ε)	Max = 1ε (Mean = 0.003ε)
erf large values	Max = 0ε (Mean = 0ε)	N/A erf range 0 < x < 6

erfc

Data Set	Boost (MS C++)	Special-Functions
erfc small values	Max = 0ε (Mean = 0)	Max = 1ε (Mean = 0.00667ε)
erfc medium values	Max = 1.65ε (Mean = 0.373ε)	Max = 1.71ε (Mean = 0.182ε)
erfc large values	Max = 1.14ε (Mean = 0.248ε)	Max = 2.31e-15ε (Mean = 8.86e-18ε)

inverse-erf/c

Data Set	Boost (MS C++)	Special-Functions
inverse-erf	Max = 1.09ε (Mean = 0.502ε)	Max = 2ε (Mean = 0.434ε)
inverse-erfc	Max = 1ε (Mean = 0.491ε)	Max = 2ε (Mean = 0.425ε)

log-gamma

Data Set	Boost (MS C++)	Special-Functions
factorials	Max = 0.914ε (Mean = 0.175ε)	Max = 2.10ε (Mean = 0.569ε)
near 0	Max = 0.964ε (Mean = 0.462ε)	Max = 1.93ε (Mean = 0.662ε)
near 1	Max = 0.867ε (Mean = 0.468ε)	Max = 0.50ε (Mean = 0.0183ε)
near 2	Max = 0.591ε (Mean = 0.159ε)	Max = 0.0156ε (Mean = 3.83d-4ε)
near -10	Max = 4.22ε (Mean = 1.33ε)	Max = 4.83d+5ε (Mean = 3.06d+4ε)
near -55	Max = 0.821ε (Mean = 0.419ε)	Max = 8.16d+4ε (Mean = 4.53d+3ε)

The results for log gamma are good near 1 and 2, bettering those of Boost, however are worse (relatively speaking) at values of x > 8. I don’t have an explanation for this, since the libm values match Boost more closely. For example:

(spfn:log-gamma -9.99999237060546875d0) = -3.3208925610275326d0
(libm:lgamma    -9.99999237060546875d0) = -3.3208925610151265d0
Boost test answer                         -3.320892561015125097640948165422843317137

libm:lgamma provides an additional 4 digits of accuracy over spfn:log-gamma when compared to the Boost test answer, despite using identical computations. log-gamma is still within 12 digits of agreement though, and likely good enough for most uses.

gamma

Data Set	Boost (MS C++)	Special-Functions
factorials	Max = 1.85ε (Mean = 0.491ε)	Max = 3.79ε (Mean = 0.949ε)
near 0	Max = 1.96ε (Mean = 0.684ε)	Max = 2.26ε (Mean = 0.56ε)
near 1	Max = 2ε (Mean = 0.865ε)	Max = 2.26ε (Mean = 0.858ε)
near 2	Max = 2ε (Mean = 0.995ε)	Max = 2ε (Mean = 0.559ε)
near -10	Max = 1.73ε (Mean = 0.729ε)	Max = 0.125ε (Mean = 0.0043ε)
near -55	Max = 1.8ε (Mean = 0.817ε)	Max = 0ε (Mean = 0ε)

incomplete-gamma

See boost incomplete gamma documentation for notes and error rates.

lower

Data Set	Boost (MS C++)	Special-Functions
small values	Max = 1.54ε (Mean = 0.439ε)	Max = 3.00ε (Mean = 0.516ε)
medium values	Max = 35.1ε (Mean = 6.98ε)	Max = 10.00ε (Mean = 0.294ε)
large values	Max = 243ε (Mean = 20.2ε)	Max = 20ε (Mean = 0.613ε)
integer and half-integer	Max = 13ε (Mean = 2.97ε)	Max = 3ε (Mean = 0.189ε)

upper

Data Set	Boost (MS C++)	Special-Functions
small values	Max = 2.26ε (Mean = 0.74ε)	Max = 2.23ε (Mean = 0.511ε)
medium values	Max = 23.7ε (Mean = 4ε)	Max = 9.00ε (Mean = 0.266ε)
large values	Max = 469ε (Mean = 31.5ε)	Max = 20.5ε (Mean = 0.621ε)
integer and half-integer	Max = 8.72ε (Mean = 1.48ε)	Max = 4.00ε (Mean = 0.174ε)

NaN and Infinity

The lisp specification mentions neither NaN nor infinity, so any proper treatment of these is going to be either implementation specific or using a third party library.

We are using the float-features library. There is also some support for infinity in the extended-reals package of numerical-utilities, but it is not comprehensive. Openlibm and Cephes have definitions, but we don’t want to introduce a large dependency just to get these definitions.

Test data

The test data is based on Boost test data. You can run all the tests using the ASDF test op:

(asdf:test-system :special-functions)

By default the test summary values (the same as in Boost) are printed after each test, along with the key epsilon values.

3 - Code Repository

Collection of XLisp and Common Lisp statistical routines

Below is a partial list of the consolidated XLispStat packages from UCLA and CMU repositories. There is a great deal more XLispStat code available that was not submitted to these archives, and a search for an algorithm or technique that includes the term “xlispstat” will often turn up interesting results.

Artificial Intelligence

Genetic Programming

Cerebrum: A Framework for the Genetic Programming of Neural Networks. Peter Dudey. No license specified.
[Docs]
GAL: Functions useful for experimentation in Genetic Algorithms. It is hopefully compatible with Lucid Common Lisp (also known as Sun Common Lisp). The implementation is a “standard” GA, similar to Grefenstette’s work. Baker’s SUS selection algorithm is employed, 2 point crossover is maintained at 60%, and mutation is very low. Selection is based on proportional fitness. This GA uses generations. It is also important to note that this GA maximizes. William M. Spears. “Permission is hereby granted to copy all or any part of this program for free distribution, however this header is required on all copies.”
mGA: A Common Lisp Implementation of a Messy Genetic Algorithm. No license specified.
[Docs, errata]

Machine Learning

Machine Learning: Common Lisp files for various standard inductive learning algorithms that all use the same basic data format and same interface. It also includes automatic testing software for running learning curves that compare multiple systems and utilities for plotting and statistically evaluating the results. Included are:

AQ: Early DNF learner.
Backprop: The standard multi-layer neural-net learning method.
Bayes Indp: Simple naive or “idiot’s” Bayesian classifier.
Cobweb: A probabilistic clustering system.
Foil: A first-order Horn-clause learner (Prolog and Lisp versions).
ID3: Decision tree learner with a number of features.
KNN: K nearest neighbor (instance-based) algorithm.
Perceptron: Early one-layer neural-net algorithm.
PFOIL: Propositional version of FOIL for learning DNF.
PFOIL-CNF: Propositional version of FOIL for learning CNF.

Raymond J. Mooney. “This program may be freely copied, used, or modified provided that this copyright notice is included in each copy of this code and parts thereof.”

Neural Networks

QuickProp: Common Lisp implementation of “Quickprop”, a variation on back-propagation. For a description of the Quickprop algorithm, see Faster-Learning Variations on Back-Propagation: An Empirical Study by Scott E. Fahlman in Proceedings of the 1988 Connectionist Models Summer School, Morgan-Kaufmann, 1988. Scott E. Fahlman. Public domain.
[README]

Fun & Games

Towers of Hanoi: Tower of Hanoi plus the Queens program explained in Winston and Horn. No license specified.

Mathematics

Combinatorial

Various combinatorial functions for XLispStat. There are other Common Lisp libraries for this, for example cl-permutation. It’s worth searching for something in Quicklisp too. No license specified.

functions

Bessel, beta, erf, gamma and horner implementations. Gerald Roylance. License restricted to non-commercial use only.

integrate

gauss-hermite.lsp is by Jan de Leeuw.

runge.lsp and integr.lsp are from Gerald Roylance 1982 CLMATH package. integr.lsp has Simpson’s rule and the trapezoid rule. runge.lsp integrates runge-kutta differential equations by various methods.

Roylance code is non-commercial use only. Jan de Leeuw’s code has no license specified.

lsqpack

This directory contains the code from the Lawson and Hanson book, Solving Least Squares Problems, translated with f2cl, tweaked for Xlisp-Stat by Jan de Leeuw. No license specified.

nswc

This is an f2cl translation, very incomplete, of the NSWC mathematics library. The FORTRAN, plus a great manual, is available on github. The report is NSWCDD/TR-92/425, by Alfred H. Morris, Jr. dated January 1993. No license specified, but this code is commonly considered public domain.

Numerical Recipes

Code from Numerical Recipes in FORTRAN, first edition, translated with Waikato’s f2cl and tweaked for XLisp-Stat by Jan de Leeuw. No license specified.

optimization

Code for annealing, simplex and other optimization problems. Various licenses. These days, better implementations are available, for example the linear-programming library.

Statistics

Algorithms

AS 190 Probabilities and Upper Quantiles for the Studentized Range.
AS 226 Computing Noncentral Beta Probabilities
AS 241 The Percentage Points of the Normal Distribution
AS 243 Cumulative Distribution Function of the Non-Central T Distribution
TOMS 744 A stochastic algorithm for global optimization with constraints

AS algorithms: B. Narasimhan (naras@euler.bd.psu.edu) “You can freely use and distribute this code provided you don’t remove this notice. NO WARRANTIES, EXPLICIT or IMPLIED”

TOMS: F. Michael Rabinowitz. No license specified.

Categorical

glim: Glim extension for log-linear models. Jan de Leeuw. No license specified.
IPF: Fits Goodman’s RC model to the array X. Also included is a set of functions for APL like array operations. The four basic APL operators (see, for example, Garry Helzel, An Encyclopedia of APL, 2e edition, 1989, I-APL, 6611 Linville Drive, Weed., CA) are inner-product, outer-product, reduce, and scan. They can be used to produce new binary and unary functions from existing ones. Unknown author. No license specified.
latent-class: One file with the function latent-class. Unknown author. No license specified.
max: Functions to do quantization and cluster analysis in the empirical case. Jan de Leeuw. No license specified.
write-profiles: A function. The argument is a list of lists of strings. Each element of the list corresponds with a variable, the elements of the list corresponding with a variable are the labels of that variable, which are either strings or characters or numbers or symbols. The program returns a matrix of strings coding all the profiles. Unknown author. License not specified.

Distributions

The distributions repository contains single file implementations of:

density demo: Demonstrations of plots of density and probability functions. Requires XLispStat graphics. Jan de Leeuw. No license specified.
noncentral t-distribution: noncentral-t distribution by Russ Lenth, based on Applied Statistics Algorithm AS 243. No license specified.
probability-functions: A compilation of probability densities, cumulative distribution functions, and their inverses (quantile functions), by Jan de Leeuw. No license specified.
power: This appears to test the powers of various distribution functions. Unknown author. No license specified.
weibull-mle: Maximum likelihood estimation of Weibull parameters. M. Ennis. No license specified.

Classroom Statistics

The systems in the introstat directory are meant to be used in teaching situations. For the most part they use XLispStat’s graphical system to introduce students to statistical concepts. They are generally simple in nature from a the perspective of a statistical practitioner.

ElToY: ElToY is a collection of three programs written in XLISP-STAT. Dist-toy displays a univariate distribution dynamically linked to its parameters. CLT-toy provides an illustration of the central limit theorem for univariate distributions. ElToY provides a mechanism for displaying the prior and posterior distributions for a conjugate family dynamically linked so that changes to the prior affect the posterior and visa versa. Russell Almond almond@stat.washington.edu. GPL v2.

Multivariate

Dendro: Dendro is for producing dendrograms for agglomerative cluster in XLISP-STAT.

Plotting

Boxplot Matrix: Graphical Display of Analysis of Variance with the Boxplot Matrix. Extension of the standard one-way box plot to cross-classified data with multiple observations per cell. Richard M. Heiberger rmh@astro.ocis.temple.edu No license specified.
[Docs]
Dynamic Graphics and Regression Diagnostics: Contains methods for regression diagnostics using dynamic graphics, including all the methods discussed in Cook and Weisberg (1989) Technometrics, 277-312. Includes documentation written in LaTeX. sandy@umnstat.stat.umn.edu No license specified.
[Docs}
FEDF: Flipped Empirical Distribution Function. Parallel-FEDF, FEDF-ScatterPlot, FEDF-StarPlot written in XLISP-STAT. These plots are suggested for exploring multidimensional data suggested in “Journal of Computational and Graphical Statistics”, Vol. 4, No. 4, pp.335-343. 97/07/18. Lee, Kyungmi & Huh, Moon Yul myhuh@yurim.skku.ac.kr No license specified.
PDFPlot: PDF graphics output from XlispStat PDFPlot is a XlispStat class to generate PDF files from LispStat plot objects. Steven D. Majewski sdm7g@virginia.edu. No license specified.
RXridge: RXridge.LSP adds shrinkage regression calculation and graphical ridge “trace” display functionality to the XLisp-Stat, ver2.1 release 3+ implementation of LISP-STAT. Bob Obenchain. No license specified.

Regression

Bayes-Linear: BAYES-LIN is an extension of the XLISP-STAT object-oriented statistical computing environment, which adds to XLISP-STAT some object prototypes appropriate for carrying out local computation via message-passing between clique-tree nodes of Bayes linear belief networks. Darren J. Wilkinson. No license specified. [Docs]
Bayesian Poisson Regression: Bayesian Poisson Regression using the Gibbs Sampler Sensitivity Analysis through Dynamic Graphics. A set of programs that allow you to do Bayesian sensitivity analysis dynamically for a variety of models. B. Narasimhan (naras@stat.fsu.edu) License restricted to non-commercial use only.
[Docs]
Binary regression: Smooth and parametric binary regression code. Unknown author. License not specified.
Cost of Data Analysis: A regression analysis usually consists of several stages such as variable selection, transformation and residual diagnosis. Inference is often made from the selected model without regard to the model selection methods that proceeded it. This can result in overoptimistic and biased inferences. We first characterize data analytic actions as functions acting on regression models. We investigate the extent of the problem and test bootstrap, jackknife and sample splitting methods for ameliorating it. We also demonstrate an interactive XLISP-STAT system for assessing the cost of the data analysis while it is taking place. Julian J. Faraway. BSD license.
[Docs]
Gee: Lisp-Stat code for generalised estimating equation models. Thomas Lumley thomas@biostat.washington.edu. GPL v2.
[Docs]
GLIM: Functions and prototypes for fitting generalized linear models. Contributed by Luke Tierney luke@umnstat.stat.umn.edu. No license specified.
[Docs]
GLMER: A function to estimate coefficients and dispersions in a generalized linear model with random effects. Guanghan Liu gliu@math.ucla.edu. No license specified.
Hasse: Implements Taylor & Hilton’s rules for balanced ANOVA designs and draws the Hasse diagram of nesting relationships. Philip Iversen piversen@iastate.edu. License restricted to non-commercial use only.
monotone: Implementation of an algorithm to project on the intersection of r closed convex sets. Further details and references are in Mathar, Cyclic Projections in Data Analysis, Operations Research Proceedings 1988, Springer, 1989. Jan de Leeuw. No license specified.
OIRS: Order and Influence in Regression Strategy. The methods (tactics) of regression data analysis such as variable selection, transformation and outlier detection are characterised as functions acting on regression models and returning regression models. The ordering of the tactics, that is the strategy, is studied. A method for the generation of acceptable models supported by the choice of regression data analysis methods is described with a view to determining if two capable statisticians may reasonably hold differing views on the same data. Optimal strategies are considered. The idea of influential points is extended from estimation to the model building process itself both quantitatively and qualitatively. The methods described are not intended for the entirely automatic analysis of data, rather to assist the statistician in examining regression data at a strategic level. Julian J. Faraway julian@stat.lsa.umich.edu. BSD license.
oneway: Additions to Tierney’s one way ANOVA. B. Narasimhan naras@euler.bd.psu.edu. No license specified.
Regstrat: A XLispStat tool to investigate order in Regression Strategy particularly for finding and examining the models found by changing the ordering of the actions in a regression analysis. Julian Faraway julian@stat.lsa.umich.edu. License restricted to non-commercial use only.
Simsel: XLISP-STAT software to perform Bayesian Predictive Simultaneous Variable and Transformation Selection for regression. A criterion-based model selection algorithm. Jennifer A. Hoeting jah@stat.colostate.edu. License restricted to non-commercial use only.

Robust

There are three robust systems in the robust directory:

robust regression: This is the Xlisp-Stat version of ROSEPACK, the robust regression package developed by Holland, Welsch, and Klema around 1975. See Holland and Welsch, Commun. Statist. A6, 1977, 813-827. See also the Xlisp-Stat book, pages 173-177, for an alternative approach. Jan de Leeuw. No license specified.

There is also robust statistical code for location and scale.

Simulation

The simulation directory contains bootstrapping methods, variable imputation, jackknife resampling, monte-carlo simulations and a general purpose simulator. There is also the discrete finite state markov chains in the temporal directory.

Smoothers

kernel density estimators: KDEs based on Wand, CFFI based KDEs by B. Narasimhan, and graphical univariate density estimation.
spline: Regularized bi-variate splines with smoothing and tension according to Mitasova and Mitas. Cubic splines according to Green and Silverman. Jan de Leeuw. No license specified.
super-smoother: The super smoothing algorithm, originally implemented in FORTRAN by Jerome Friedman of Stanford University, is a method by which a smooth curve may be fitted to a two-dimensional array of points. Its implementation is presented here in the XLISP-STAT language. Jason Bond. No license specified.
[DOCS]
Variable Bandwidth: XLispStat code to facilitate interactive bandwidth choice for estimator (3.14), page 44 in Bagkavos (2003), “BIAS REDUCTION IN NONPARAMETRIC HAZARD RATE ESTIMATION”. No license specified.

Spatial

livemap: LiveMap is a tool for exploratory spatial data analysis. Dr. Chris Brunsdon. No license specified.
[DOCS]
variograms: Produces variograms using algorithms from C.V. Deutsch and A.G. Journel, “GSLIB: Geostatistical Software Library and User’s Guide, Oxford University Press, New York, 1992. Stanley S. Bentow. No license specified.
[DOCS]

Temporal

Exploratory survival analysis: A set of XLISP-STAT routines for the interactive, dynamic, exploratory analysis of survival data. E. Neely Atkinson (neely@odin.mda.uth.tmc.edu) “This software may be freely redistributed.”
[Docs]
Markov: Simulate some Markov chains in Xlisp-Stat. Complete documentation and examples are included. B. Narasimhan (naras@sci234e.mrs.umn.edu). GPL.
[Docs]
SAPA: Sapaclisp is a collection of Common Lisp functions that can be used to carry out many of the computations described in the SAPA book:
Donald B. Percival and Andrew T. Walden, “Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques”, Cambridge University Press, Cambridge, England, 1993.

The SAPA book uses a number of time series as examples of various spectral analysis techniques.

Note

This archive contains SAPA converted to XLispStat. A Common Lisp version can be obtained from the CMU archive.

From the description:

Sapaclisp features functions for converting to/from decibels, the FORTRAN sign function, log of the gamma function, manipulating polynomials, root finding, simple numerical integration, matrix functions, Cholesky and modified Gram-Schmidt (i.e., Q-R) matrix decompositions, sample means and variances, sample medians, computation of quantiles from various distributions, linear least squares, discrete Fourier transform, fast Fourier transform, chirp transform, low-pass filters, high-pass filters, band-pass filters, sample auto-covariance sequence, auto-regressive spectral estimates, least squares, forward/backward least squares, Burg’s algorithm, the Yule-Walker method, periodogram, direct spectral estimates, lag window spectral estimates, WOSA spectral estimates, sample cepstrum, time series bandwidth, cumulative periodogram test statistic for white noise, and Fisher’s g statistic.

License: “Use and copying of this software and preparation of derivative works based upon this software are permitted. Any distribution of this software or derivative works must comply with all applicable United States export control laws.”

Times: XLispStat functions for time series analysis, data editing, data selection, and other statistical operations. W. Hatch (bts!bill@uunet.uu.net). Public Domain.

Tests

The tests directory contains code to do one-sample and two-sample Kolmogorov-Smirnov test (with no estimated parameters) and code to do Mann-Whitney and Wilcoxon rank signed rank tests.

Training & Documentation

ENAR Short Course: This directory contains slides and examples used in a shortcourse on Lisp-Stat presented at the 1992 ENAR meetings in Cincinnati, 22 March 1992.
ASA Course: Material from an ASA course given in 1992.
Tech Report: A 106 page mini manual on XLispStat.

Utilities

The majority of the files in the utilities directory are specific to XLISP-STAT and unlikely to be useful. In most cases better alternatives now exist for Common Lisp. A few that may be worth investigating have been noted below.

Filters

XLisp-S: A series of routines to allow users of Xlisp or LispStat to interactively transfer data to and access functions in New S. Steve McKinney kilroy@biostat.washington.edu. License restricted to non-commercial use only.

I/O

formatted-input: A set of XLISP functions that can be used to read ASCII files into lists of lists, using formatted input. The main function is read-file, which has as arguments a filename and a FORTRAN type format string (with f, i, x, t, and a formats) Jan Deleeuw deleeuw@laplace.sscnet.ucla.edu “THIS SOFTWARE CAN BE FREELY DISTRIBUTED, USED, AND MODIFIED.”

Memoization

automatic memoization: As the name suggests. Marty Hall hall@aplcenmp.apl.jhu.edu. “Permission is granted for any use or modification of this code provided this notice is retained."
[OVERVIEW]