2 - Special Functions

Implemented in Common Lisp

The library assumes working with 64 bit double-floats. It will probably work with single-float as well. Whilst we would prefer to implement the complex domain, the majority of the sources do not. Tabled below are the special function implementations and their source. This library has a focus on high accuracy double-float calculations using the latest algorithms.

function source
erf libm
erfc libm
inverse-erf Boost
inverse-erfc Boost
log-gamma libm
gamma Cephes
incomplete-gamma Boost

Error rates

The following table shows the peak and mean errors using Boost test data. Tests run on MS Windows 10 with SBCL 2.0.10. Boost results taken from the Boost error function, inverse error function and log-gamma pages.

erf

Data Set Boost (MS C++) Special-Functions
erf small values Max = 0.841ε (Mean = 0.0687ε) Max = 6.10e-5ε (Mean = 4.58e-7ε)
erf medium values Max = 1ε (Mean = 0.119ε) Max = 1ε (Mean = 0.003ε)
erf large values Max = 0ε (Mean = 0ε) N/A erf range 0 < x < 6

erfc

Data Set Boost (MS C++) Special-Functions
erfc small values Max = 0ε (Mean = 0) Max = 1ε (Mean = 0.00667ε)
erfc medium values Max = 1.65ε (Mean = 0.373ε) Max = 1.71ε (Mean = 0.182ε)
erfc large values Max = 1.14ε (Mean = 0.248ε) Max = 2.31e-15ε (Mean = 8.86e-18ε)

inverse-erf/c

Data Set Boost (MS C++) Special-Functions
inverse-erf Max = 1.09ε (Mean = 0.502ε) Max = 2ε (Mean = 0.434ε)
inverse-erfc Max = 1ε (Mean = 0.491ε) Max = 2ε (Mean = 0.425ε)

log-gamma

Data Set Boost (MS C++) Special-Functions
factorials Max = 0.914ε (Mean = 0.175ε) Max = 2.10ε (Mean = 0.569ε)
near 0 Max = 0.964ε (Mean = 0.462ε) Max = 1.93ε (Mean = 0.662ε)
near 1 Max = 0.867ε (Mean = 0.468ε) Max = 0.50ε (Mean = 0.0183ε)
near 2 Max = 0.591ε (Mean = 0.159ε) Max = 0.0156ε (Mean = 3.83d-4ε)
near -10 Max = 4.22ε (Mean = 1.33ε) Max = 4.83d+5ε (Mean = 3.06d+4ε)
near -55 Max = 0.821ε (Mean = 0.419ε) Max = 8.16d+4ε (Mean = 4.53d+3ε)

The results for log gamma are good near 1 and 2, bettering those of Boost, however are worse (relatively speaking) at values of x > 8. I don’t have an explanation for this, since the libm values match Boost more closely. For example:

(spfn:log-gamma -9.99999237060546875d0) = -3.3208925610275326d0
(libm:lgamma    -9.99999237060546875d0) = -3.3208925610151265d0
Boost test answer                         -3.320892561015125097640948165422843317137

libm:lgamma provides an additional 4 digits of accuracy over spfn:log-gamma when compared to the Boost test answer, despite using identical computations. log-gamma is still within 12 digits of agreement though, and likely good enough for most uses.

gamma

Data Set Boost (MS C++) Special-Functions
factorials Max = 1.85ε (Mean = 0.491ε) Max = 3.79ε (Mean = 0.949ε)
near 0 Max = 1.96ε (Mean = 0.684ε) Max = 2.26ε (Mean = 0.56ε)
near 1 Max = 2ε (Mean = 0.865ε) Max = 2.26ε (Mean = 0.858ε)
near 2 Max = 2ε (Mean = 0.995ε) Max = 2ε (Mean = 0.559ε)
near -10 Max = 1.73ε (Mean = 0.729ε) Max = 0.125ε (Mean = 0.0043ε)
near -55 Max = 1.8ε (Mean = 0.817ε) Max = 0ε (Mean = 0ε)

incomplete-gamma

See boost incomplete gamma documentation for notes and error rates.

lower

Data Set Boost (MS C++) Special-Functions
small values Max = 1.54ε (Mean = 0.439ε) Max = 3.00ε (Mean = 0.516ε)
medium values Max = 35.1ε (Mean = 6.98ε) Max = 10.00ε (Mean = 0.294ε)
large values Max = 243ε (Mean = 20.2ε) Max = 20ε (Mean = 0.613ε)
integer and half-integer Max = 13ε (Mean = 2.97ε) Max = 3ε (Mean = 0.189ε)

upper

Data Set Boost (MS C++) Special-Functions
small values Max = 2.26ε (Mean = 0.74ε) Max = 2.23ε (Mean = 0.511ε)
medium values Max = 23.7ε (Mean = 4ε) Max = 9.00ε (Mean = 0.266ε)
large values Max = 469ε (Mean = 31.5ε) Max = 20.5ε (Mean = 0.621ε)
integer and half-integer Max = 8.72ε (Mean = 1.48ε) Max = 4.00ε (Mean = 0.174ε)

NaN and Infinity

The lisp specification mentions neither NaN nor infinity, so any proper treatment of these is going to be either implementation specific or using a third party library.

We are using the float-features library. There is also some support for infinity in the extended-reals package of numerical-utilities, but it is not comprehensive. Openlibm and Cephes have definitions, but we don’t want to introduce a large dependency just to get these definitions.

Test data

The test data is based on Boost test data. You can run all the tests using the ASDF test op:

(asdf:test-system :special-functions)

By default the test summary values (the same as in Boost) are printed after each test, along with the key epsilon values.

3 - Code Repository

Collection of XLisp and Common Lisp statistical routines

Below is a partial list of the consolidated XLispStat packages from UCLA and CMU repositories. There is a great deal more XLispStat code available that was not submitted to these archives, and a search for an algorithm or technique that includes the term “xlispstat” will often turn up interesting results.

Artificial Intelligence

Genetic Programming

Cerebrum
A Framework for the Genetic Programming of Neural Networks. Peter Dudey. No license specified.
[Docs]
GAL
Functions useful for experimentation in Genetic Algorithms. It is hopefully compatible with Lucid Common Lisp (also known as Sun Common Lisp). The implementation is a “standard” GA, similar to Grefenstette’s work. Baker’s SUS selection algorithm is employed, 2 point crossover is maintained at 60%, and mutation is very low. Selection is based on proportional fitness. This GA uses generations. It is also important to note that this GA maximizes. William M. Spears. “Permission is hereby granted to copy all or any part of this program for free distribution, however this header is required on all copies.”
mGA
A Common Lisp Implementation of a Messy Genetic Algorithm. No license specified.
[Docs, errata]

Machine Learning

Machine Learning
Common Lisp files for various standard inductive learning algorithms that all use the same basic data format and same interface. It also includes automatic testing software for running learning curves that compare multiple systems and utilities for plotting and statistically evaluating the results. Included are:
  • AQ: Early DNF learner.
  • Backprop: The standard multi-layer neural-net learning method.
  • Bayes Indp: Simple naive or “idiot’s” Bayesian classifier.
  • Cobweb: A probabilistic clustering system.
  • Foil: A first-order Horn-clause learner (Prolog and Lisp versions).
  • ID3: Decision tree learner with a number of features.
  • KNN: K nearest neighbor (instance-based) algorithm.
  • Perceptron: Early one-layer neural-net algorithm.
  • PFOIL: Propositional version of FOIL for learning DNF.
  • PFOIL-CNF: Propositional version of FOIL for learning CNF.

Raymond J. Mooney. “This program may be freely copied, used, or modified provided that this copyright notice is included in each copy of this code and parts thereof.”

Neural Networks

QuickProp
Common Lisp implementation of “Quickprop”, a variation on back-propagation. For a description of the Quickprop algorithm, see Faster-Learning Variations on Back-Propagation: An Empirical Study by Scott E. Fahlman in Proceedings of the 1988 Connectionist Models Summer School, Morgan-Kaufmann, 1988. Scott E. Fahlman. Public domain.
[README]

Fun & Games

Towers of Hanoi
Tower of Hanoi plus the Queens program explained in Winston and Horn. No license specified.

Mathematics

Combinatorial
Various combinatorial functions for XLispStat. There are other Common Lisp libraries for this, for example cl-permutation. It’s worth searching for something in Quicklisp too. No license specified.
functions
Bessel, beta, erf, gamma and horner implementations. Gerald Roylance. License restricted to non-commercial use only.
integrate
gauss-hermite.lsp is by Jan de Leeuw.

runge.lsp and integr.lsp are from Gerald Roylance 1982 CLMATH package. integr.lsp has Simpson’s rule and the trapezoid rule. runge.lsp integrates runge-kutta differential equations by various methods.

Roylance code is non-commercial use only. Jan de Leeuw’s code has no license specified.

lsqpack
This directory contains the code from the Lawson and Hanson book, Solving Least Squares Problems, translated with f2cl, tweaked for Xlisp-Stat by Jan de Leeuw. No license specified.
nswc
This is an f2cl translation, very incomplete, of the NSWC mathematics library. The FORTRAN, plus a great manual, is available on github. The report is NSWCDD/TR-92/425, by Alfred H. Morris, Jr. dated January 1993. No license specified, but this code is commonly considered public domain.
Numerical Recipes
Code from Numerical Recipes in FORTRAN, first edition, translated with Waikato’s f2cl and tweaked for XLisp-Stat by Jan de Leeuw. No license specified.
optimization
Code for annealing, simplex and other optimization problems. Various licenses. These days, better implementations are available, for example the linear-programming library.

Statistics

Algorithms

  • AS 190 Probabilities and Upper Quantiles for the Studentized Range.
  • AS 226 Computing Noncentral Beta Probabilities
  • AS 241 The Percentage Points of the Normal Distribution
  • AS 243 Cumulative Distribution Function of the Non-Central T Distribution
  • TOMS 744 A stochastic algorithm for global optimization with constraints

AS algorithms: B. Narasimhan (naras@euler.bd.psu.edu) “You can freely use and distribute this code provided you don’t remove this notice. NO WARRANTIES, EXPLICIT or IMPLIED”

TOMS: F. Michael Rabinowitz. No license specified.

Categorical

glim
Glim extension for log-linear models. Jan de Leeuw. No license specified.
IPF
Fits Goodman’s RC model to the array X. Also included is a set of functions for APL like array operations. The four basic APL operators (see, for example, Garry Helzel, An Encyclopedia of APL, 2e edition, 1989, I-APL, 6611 Linville Drive, Weed., CA) are inner-product, outer-product, reduce, and scan. They can be used to produce new binary and unary functions from existing ones. Unknown author. No license specified.
latent-class
One file with the function latent-class. Unknown author. No license specified.
max
Functions to do quantization and cluster analysis in the empirical case. Jan de Leeuw. No license specified.
write-profiles
A function. The argument is a list of lists of strings. Each element of the list corresponds with a variable, the elements of the list corresponding with a variable are the labels of that variable, which are either strings or characters or numbers or symbols. The program returns a matrix of strings coding all the profiles. Unknown author. License not specified.

Distributions

The distributions repository contains single file implementations of:

density demo
Demonstrations of plots of density and probability functions. Requires XLispStat graphics. Jan de Leeuw. No license specified.
noncentral t-distribution
noncentral-t distribution by Russ Lenth, based on Applied Statistics Algorithm AS 243. No license specified.
probability-functions
A compilation of probability densities, cumulative distribution functions, and their inverses (quantile functions), by Jan de Leeuw. No license specified.
power
This appears to test the powers of various distribution functions. Unknown author. No license specified.
weibull-mle
Maximum likelihood estimation of Weibull parameters. M. Ennis. No license specified.

Classroom Statistics

The systems in the introstat directory are meant to be used in teaching situations. For the most part they use XLispStat’s graphical system to introduce students to statistical concepts. They are generally simple in nature from a the perspective of a statistical practitioner.

ElToY
ElToY is a collection of three programs written in XLISP-STAT. Dist-toy displays a univariate distribution dynamically linked to its parameters. CLT-toy provides an illustration of the central limit theorem for univariate distributions. ElToY provides a mechanism for displaying the prior and posterior distributions for a conjugate family dynamically linked so that changes to the prior affect the posterior and visa versa. Russell Almond almond@stat.washington.edu. GPL v2.

Multivariate

Dendro
Dendro is for producing dendrograms for agglomerative cluster in XLISP-STAT.

Plotting

Boxplot Matrix
Graphical Display of Analysis of Variance with the Boxplot Matrix. Extension of the standard one-way box plot to cross-classified data with multiple observations per cell. Richard M. Heiberger rmh@astro.ocis.temple.edu No license specified.
[Docs]
Dynamic Graphics and Regression Diagnostics
Contains methods for regression diagnostics using dynamic graphics, including all the methods discussed in Cook and Weisberg (1989) Technometrics, 277-312. Includes documentation written in LaTeX. sandy@umnstat.stat.umn.edu No license specified.
[Docs}
FEDF
Flipped Empirical Distribution Function. Parallel-FEDF, FEDF-ScatterPlot, FEDF-StarPlot written in XLISP-STAT. These plots are suggested for exploring multidimensional data suggested in “Journal of Computational and Graphical Statistics”, Vol. 4, No. 4, pp.335-343. 97/07/18. Lee, Kyungmi & Huh, Moon Yul myhuh@yurim.skku.ac.kr No license specified.
PDFPlot
PDF graphics output from XlispStat PDFPlot is a XlispStat class to generate PDF files from LispStat plot objects. Steven D. Majewski sdm7g@virginia.edu. No license specified.
RXridge
RXridge.LSP adds shrinkage regression calculation and graphical ridge “trace” display functionality to the XLisp-Stat, ver2.1 release 3+ implementation of LISP-STAT. Bob Obenchain. No license specified.

Regression

Bayes-Linear
BAYES-LIN is an extension of the XLISP-STAT object-oriented statistical computing environment, which adds to XLISP-STAT some object prototypes appropriate for carrying out local computation via message-passing between clique-tree nodes of Bayes linear belief networks. Darren J. Wilkinson. No license specified. [Docs]
Bayesian Poisson Regression
Bayesian Poisson Regression using the Gibbs Sampler Sensitivity Analysis through Dynamic Graphics. A set of programs that allow you to do Bayesian sensitivity analysis dynamically for a variety of models. B. Narasimhan (naras@stat.fsu.edu) License restricted to non-commercial use only.
[Docs]
Binary regression
Smooth and parametric binary regression code. Unknown author. License not specified.
Cost of Data Analysis
A regression analysis usually consists of several stages such as variable selection, transformation and residual diagnosis. Inference is often made from the selected model without regard to the model selection methods that proceeded it. This can result in overoptimistic and biased inferences. We first characterize data analytic actions as functions acting on regression models. We investigate the extent of the problem and test bootstrap, jackknife and sample splitting methods for ameliorating it. We also demonstrate an interactive XLISP-STAT system for assessing the cost of the data analysis while it is taking place. Julian J. Faraway. BSD license.
[Docs]
Gee
Lisp-Stat code for generalised estimating equation models. Thomas Lumley thomas@biostat.washington.edu. GPL v2.
[Docs]
GLIM
Functions and prototypes for fitting generalized linear models. Contributed by Luke Tierney luke@umnstat.stat.umn.edu. No license specified.
[Docs]
GLMER
A function to estimate coefficients and dispersions in a generalized linear model with random effects. Guanghan Liu gliu@math.ucla.edu. No license specified.
Hasse
Implements Taylor & Hilton’s rules for balanced ANOVA designs and draws the Hasse diagram of nesting relationships. Philip Iversen piversen@iastate.edu. License restricted to non-commercial use only.
monotone
Implementation of an algorithm to project on the intersection of r closed convex sets. Further details and references are in Mathar, Cyclic Projections in Data Analysis, Operations Research Proceedings 1988, Springer, 1989. Jan de Leeuw. No license specified.
OIRS
Order and Influence in Regression Strategy. The methods (tactics) of regression data analysis such as variable selection, transformation and outlier detection are characterised as functions acting on regression models and returning regression models. The ordering of the tactics, that is the strategy, is studied. A method for the generation of acceptable models supported by the choice of regression data analysis methods is described with a view to determining if two capable statisticians may reasonably hold differing views on the same data. Optimal strategies are considered. The idea of influential points is extended from estimation to the model building process itself both quantitatively and qualitatively. The methods described are not intended for the entirely automatic analysis of data, rather to assist the statistician in examining regression data at a strategic level. Julian J. Faraway julian@stat.lsa.umich.edu. BSD license.
oneway
Additions to Tierney’s one way ANOVA. B. Narasimhan naras@euler.bd.psu.edu. No license specified.
Regstrat
A XLispStat tool to investigate order in Regression Strategy particularly for finding and examining the models found by changing the ordering of the actions in a regression analysis. Julian Faraway julian@stat.lsa.umich.edu. License restricted to non-commercial use only.
Simsel
XLISP-STAT software to perform Bayesian Predictive Simultaneous Variable and Transformation Selection for regression. A criterion-based model selection algorithm. Jennifer A. Hoeting jah@stat.colostate.edu. License restricted to non-commercial use only.

Robust

There are three robust systems in the robust directory:

robust regression
This is the Xlisp-Stat version of ROSEPACK, the robust regression package developed by Holland, Welsch, and Klema around 1975. See Holland and Welsch, Commun. Statist. A6, 1977, 813-827. See also the Xlisp-Stat book, pages 173-177, for an alternative approach. Jan de Leeuw. No license specified.

There is also robust statistical code for location and scale.

Simulation

The simulation directory contains bootstrapping methods, variable imputation, jackknife resampling, monte-carlo simulations and a general purpose simulator. There is also the discrete finite state markov chains in the temporal directory.

Smoothers

kernel density estimators
KDEs based on Wand, CFFI based KDEs by B. Narasimhan, and graphical univariate density estimation.
spline
Regularized bi-variate splines with smoothing and tension according to Mitasova and Mitas. Cubic splines according to Green and Silverman. Jan de Leeuw. No license specified.
super-smoother
The super smoothing algorithm, originally implemented in FORTRAN by Jerome Friedman of Stanford University, is a method by which a smooth curve may be fitted to a two-dimensional array of points. Its implementation is presented here in the XLISP-STAT language. Jason Bond. No license specified.
[DOCS]
Variable Bandwidth
XLispStat code to facilitate interactive bandwidth choice for estimator (3.14), page 44 in Bagkavos (2003), “BIAS REDUCTION IN NONPARAMETRIC HAZARD RATE ESTIMATION”. No license specified.

Spatial

livemap
LiveMap is a tool for exploratory spatial data analysis. Dr. Chris Brunsdon. No license specified.
[DOCS]
variograms
Produces variograms using algorithms from C.V. Deutsch and A.G. Journel, “GSLIB: Geostatistical Software Library and User’s Guide, Oxford University Press, New York, 1992. Stanley S. Bentow. No license specified.
[DOCS]

Temporal

Exploratory survival analysis
A set of XLISP-STAT routines for the interactive, dynamic, exploratory analysis of survival data. E. Neely Atkinson (neely@odin.mda.uth.tmc.edu) “This software may be freely redistributed.”
[Docs]
Markov
Simulate some Markov chains in Xlisp-Stat. Complete documentation and examples are included. B. Narasimhan (naras@sci234e.mrs.umn.edu). GPL.
[Docs]
SAPA
Sapaclisp is a collection of Common Lisp functions that can be used to carry out many of the computations described in the SAPA book:

Donald B. Percival and Andrew T. Walden, “Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques”, Cambridge University Press, Cambridge, England, 1993.

The SAPA book uses a number of time series as examples of various spectral analysis techniques.

From the description:

Sapaclisp features functions for converting to/from decibels, the FORTRAN sign function, log of the gamma function, manipulating polynomials, root finding, simple numerical integration, matrix functions, Cholesky and modified Gram-Schmidt (i.e., Q-R) matrix decompositions, sample means and variances, sample medians, computation of quantiles from various distributions, linear least squares, discrete Fourier transform, fast Fourier transform, chirp transform, low-pass filters, high-pass filters, band-pass filters, sample auto-covariance sequence, auto-regressive spectral estimates, least squares, forward/backward least squares, Burg’s algorithm, the Yule-Walker method, periodogram, direct spectral estimates, lag window spectral estimates, WOSA spectral estimates, sample cepstrum, time series bandwidth, cumulative periodogram test statistic for white noise, and Fisher’s g statistic.

License: “Use and copying of this software and preparation of derivative works based upon this software are permitted. Any distribution of this software or derivative works must comply with all applicable United States export control laws.”

Times
XLispStat functions for time series analysis, data editing, data selection, and other statistical operations. W. Hatch (bts!bill@uunet.uu.net). Public Domain.

Tests

The tests directory contains code to do one-sample and two-sample Kolmogorov-Smirnov test (with no estimated parameters) and code to do Mann-Whitney and Wilcoxon rank signed rank tests.

Training & Documentation

ENAR Short Course
This directory contains slides and examples used in a shortcourse on Lisp-Stat presented at the 1992 ENAR meetings in Cincinnati, 22 March 1992.
ASA Course
Material from an ASA course given in 1992.
Tech Report
A 106 page mini manual on XLispStat.

Utilities

The majority of the files in the utilities directory are specific to XLISP-STAT and unlikely to be useful. In most cases better alternatives now exist for Common Lisp. A few that may be worth investigating have been noted below.

Filters

XLisp-S
A series of routines to allow users of Xlisp or LispStat to interactively transfer data to and access functions in New S. Steve McKinney kilroy@biostat.washington.edu. License restricted to non-commercial use only.

I/O

formatted-input
A set of XLISP functions that can be used to read ASCII files into lists of lists, using formatted input. The main function is read-file, which has as arguments a filename and a FORTRAN type format string (with f, i, x, t, and a formats) Jan Deleeuw deleeuw@laplace.sscnet.ucla.edu “THIS SOFTWARE CAN BE FREELY DISTRIBUTED, USED, AND MODIFIED.”

Memoization

automatic memoization
As the name suggests. Marty Hall hall@aplcenmp.apl.jhu.edu. “Permission is granted for any use or modification of this code provided this notice is retained."
[OVERVIEW]