2 - Special Functions
Implemented in Common Lisp
The library assumes working with 64 bit double-floats. It will
probably work with single-float as well. Whilst we would prefer to
implement the complex domain, the majority of the sources do
not. Tabled below are the special function implementations and their
source. This library has a focus on high accuracy double-float
calculations using the latest algorithms.
function |
source |
erf |
libm |
erfc |
libm |
inverse-erf |
Boost |
inverse-erfc |
Boost |
log-gamma |
libm |
gamma |
Cephes |
incomplete-gamma |
Boost |
Error rates
The following table shows the peak and mean errors using Boost test
data. Tests run on MS Windows 10 with SBCL 2.0.10. Boost results taken
from the Boost error function,
inverse
error
function
and
log-gamma
pages.
erf
Data Set |
Boost (MS C++) |
Special-Functions |
erf small values |
Max = 0.841ε (Mean = 0.0687ε) |
Max = 6.10e-5ε (Mean = 4.58e-7ε) |
erf medium values |
Max = 1ε (Mean = 0.119ε) |
Max = 1ε (Mean = 0.003ε) |
erf large values |
Max = 0ε (Mean = 0ε) |
N/A erf range 0 < x < 6 |
erfc
Data Set |
Boost (MS C++) |
Special-Functions |
erfc small values |
Max = 0ε (Mean = 0) |
Max = 1ε (Mean = 0.00667ε) |
erfc medium values |
Max = 1.65ε (Mean = 0.373ε) |
Max = 1.71ε (Mean = 0.182ε) |
erfc large values |
Max = 1.14ε (Mean = 0.248ε) |
Max = 2.31e-15ε (Mean = 8.86e-18ε) |
inverse-erf/c
Data Set |
Boost (MS C++) |
Special-Functions |
inverse-erf |
Max = 1.09ε (Mean = 0.502ε) |
Max = 2ε (Mean = 0.434ε) |
inverse-erfc |
Max = 1ε (Mean = 0.491ε) |
Max = 2ε (Mean = 0.425ε) |
log-gamma
Data Set |
Boost (MS C++) |
Special-Functions |
factorials |
Max = 0.914ε (Mean = 0.175ε) |
Max = 2.10ε (Mean = 0.569ε) |
near 0 |
Max = 0.964ε (Mean = 0.462ε) |
Max = 1.93ε (Mean = 0.662ε) |
near 1 |
Max = 0.867ε (Mean = 0.468ε) |
Max = 0.50ε (Mean = 0.0183ε) |
near 2 |
Max = 0.591ε (Mean = 0.159ε) |
Max = 0.0156ε (Mean = 3.83d-4ε) |
near -10 |
Max = 4.22ε (Mean = 1.33ε) |
Max = 4.83d+5ε (Mean = 3.06d+4ε) |
near -55 |
Max = 0.821ε (Mean = 0.419ε) |
Max = 8.16d+4ε (Mean = 4.53d+3ε) |
The results for log gamma are good near 1 and 2, bettering those of
Boost, however are worse (relatively speaking) at values of x > 8
. I
don’t have an explanation for this, since the libm values match Boost
more closely. For example:
(spfn:log-gamma -9.99999237060546875d0) = -3.3208925610275326d0
(libm:lgamma -9.99999237060546875d0) = -3.3208925610151265d0
Boost test answer -3.320892561015125097640948165422843317137
libm:lgamma provides an additional 4 digits of accuracy over
spfn:log-gamma when compared to the Boost test answer, despite using
identical computations. log-gamma is still within 12 digits of agreement
though, and likely good enough for most uses.
gamma
Data Set |
Boost (MS C++) |
Special-Functions |
factorials |
Max = 1.85ε (Mean = 0.491ε) |
Max = 3.79ε (Mean = 0.949ε) |
near 0 |
Max = 1.96ε (Mean = 0.684ε) |
Max = 2.26ε (Mean = 0.56ε) |
near 1 |
Max = 2ε (Mean = 0.865ε) |
Max = 2.26ε (Mean = 0.858ε) |
near 2 |
Max = 2ε (Mean = 0.995ε) |
Max = 2ε (Mean = 0.559ε) |
near -10 |
Max = 1.73ε (Mean = 0.729ε) |
Max = 0.125ε (Mean = 0.0043ε) |
near -55 |
Max = 1.8ε (Mean = 0.817ε) |
Max = 0ε (Mean = 0ε) |
incomplete-gamma
See boost incomplete gamma
documentation
for notes and error rates.
lower
Data Set |
Boost (MS C++) |
Special-Functions |
small values |
Max = 1.54ε (Mean = 0.439ε) |
Max = 3.00ε (Mean = 0.516ε) |
medium values |
Max = 35.1ε (Mean = 6.98ε) |
Max = 10.00ε (Mean = 0.294ε) |
large values |
Max = 243ε (Mean = 20.2ε) |
Max = 20ε (Mean = 0.613ε) |
integer and half-integer |
Max = 13ε (Mean = 2.97ε) |
Max = 3ε (Mean = 0.189ε) |
upper
Data Set |
Boost (MS C++) |
Special-Functions |
small values |
Max = 2.26ε (Mean = 0.74ε) |
Max = 2.23ε (Mean = 0.511ε) |
medium values |
Max = 23.7ε (Mean = 4ε) |
Max = 9.00ε (Mean = 0.266ε) |
large values |
Max = 469ε (Mean = 31.5ε) |
Max = 20.5ε (Mean = 0.621ε) |
integer and half-integer |
Max = 8.72ε (Mean = 1.48ε) |
Max = 4.00ε (Mean = 0.174ε) |
NaN and Infinity
The lisp specification mentions neither NaN nor infinity, so any proper
treatment of these is going to be either implementation specific or
using a third party library.
We are using the
float-features
library. There is also some support for infinity in the extended-reals
package of
numerical-utilities,
but it is not comprehensive. Openlibm and Cephes have definitions, but
we don’t want to introduce a large dependency just to get these
definitions.
Test data
The test data is based on Boost test data. You can run all the tests
using the ASDF test op:
(asdf:test-system :special-functions)
By default the test summary values (the same as in Boost) are printed
after each test, along with the key epsilon values.
3 - Code Repository
Collection of XLisp and Common Lisp statistical routines
Below is a partial list of the consolidated XLispStat packages from
UCLA and CMU repositories. There is a great deal more XLispStat code
available that was not submitted to these archives, and a search for
an algorithm or technique that includes the term “xlispstat” will
often turn up interesting results.
Artificial Intelligence
Genetic Programming
- Cerebrum
- A Framework for the Genetic Programming of Neural Networks. Peter
Dudey. No license specified.
[Docs]
- GAL
- Functions useful for experimentation in Genetic Algorithms. It is
hopefully compatible with Lucid Common Lisp (also known as Sun
Common Lisp). The implementation is a “standard” GA, similar to
Grefenstette’s work. Baker’s SUS selection algorithm is employed, 2
point crossover is maintained at 60%, and mutation is very low.
Selection is based on proportional fitness. This GA uses
generations. It is also important to note that this GA maximizes.
William M. Spears. “Permission is hereby granted to copy all or any
part of this program for free distribution, however this header is
required on all copies.”
- mGA
- A Common Lisp Implementation of a Messy Genetic Algorithm. No
license specified.
[Docs,
errata]
Machine Learning
- Machine Learning
- Common Lisp files for various standard inductive learning algorithms
that all use the same basic data format and same interface. It also
includes automatic testing software for running learning curves that
compare multiple systems and utilities for plotting and
statistically evaluating the results. Included are:
- AQ: Early DNF learner.
- Backprop: The standard multi-layer neural-net learning method.
- Bayes Indp: Simple naive or “idiot’s” Bayesian classifier.
- Cobweb: A probabilistic clustering system.
- Foil: A first-order Horn-clause learner (Prolog and Lisp versions).
- ID3: Decision tree learner with a number of features.
- KNN: K nearest neighbor (instance-based) algorithm.
- Perceptron: Early one-layer neural-net algorithm.
- PFOIL: Propositional version of FOIL for learning DNF.
- PFOIL-CNF: Propositional version of FOIL for learning CNF.
Raymond J. Mooney. “This program may be freely copied, used, or
modified provided that this copyright notice is included in each copy
of this code and parts thereof.”
Neural Networks
- QuickProp
- Common Lisp implementation of “Quickprop”, a variation on
back-propagation. For a description of the Quickprop algorithm, see
Faster-Learning Variations on Back-Propagation: An Empirical
Study
by Scott E. Fahlman in Proceedings of the 1988 Connectionist Models
Summer School, Morgan-Kaufmann, 1988. Scott E. Fahlman. Public
domain.
[README]
Fun & Games
- Towers of Hanoi
- Tower of Hanoi plus the Queens program explained in Winston and
Horn. No license specified.
Mathematics
- Combinatorial
- Various combinatorial functions for XLispStat. There are other
Common Lisp libraries for this, for example
cl-permutation. It’s
worth searching for something in Quicklisp too. No license
specified.
- functions
- Bessel, beta, erf, gamma and horner implementations. Gerald
Roylance. License restricted to non-commercial use only.
- integrate
- gauss-hermite.lsp is by Jan de Leeuw.
runge.lsp and integr.lsp are from Gerald Roylance 1982 CLMATH
package. integr.lsp has Simpson’s rule and the trapezoid
rule. runge.lsp integrates runge-kutta differential equations by
various methods.
Roylance code is non-commercial use only. Jan de Leeuw’s code has
no license specified.
- lsqpack
- This directory contains the code from the Lawson and Hanson book,
Solving Least Squares
Problems,
translated with f2cl, tweaked for Xlisp-Stat by Jan de Leeuw. No
license specified.
- nswc
- This is an f2cl translation, very incomplete, of the NSWC
mathematics library. The FORTRAN, plus a great manual, is available
on github. The report is
NSWCDD/TR-92/425, by Alfred H. Morris, Jr. dated January 1993. No
license specified, but this code is commonly considered public
domain.
- Numerical Recipes
- Code from Numerical Recipes in FORTRAN, first edition, translated
with Waikato’s f2cl and tweaked for XLisp-Stat by Jan de Leeuw. No
license specified.
- optimization
- Code for annealing, simplex and other optimization problems. Various
licenses. These days, better implementations are available, for
example the
linear-programming
library.
Statistics
Algorithms
- AS 190 Probabilities and Upper Quantiles for the Studentized Range.
- AS 226 Computing Noncentral Beta Probabilities
- AS 241 The Percentage Points of the Normal Distribution
- AS 243 Cumulative Distribution Function of the Non-Central T Distribution
- TOMS 744 A stochastic algorithm for global optimization with constraints
AS algorithms: B. Narasimhan (naras@euler.bd.psu.edu) “You can freely use and
distribute this code provided you don’t remove this notice. NO
WARRANTIES, EXPLICIT or IMPLIED”
TOMS: F. Michael Rabinowitz. No license specified.
Categorical
- glim
- Glim extension for log-linear models. Jan de Leeuw. No license specified.
- IPF
- Fits Goodman’s RC model to the array X. Also included is a set of
functions for APL like array operations. The four basic APL
operators (see, for example, Garry Helzel, An Encyclopedia of APL,
2e edition, 1989, I-APL, 6611 Linville Drive, Weed., CA) are
inner-product, outer-product, reduce, and scan. They can be used to
produce new binary and unary functions from existing ones.
Unknown author. No license specified.
- latent-class
- One file with the function latent-class. Unknown author. No license specified.
- max
- Functions to do quantization and cluster analysis in the empirical
case. Jan de Leeuw. No license specified.
- write-profiles
- A function. The argument is a list of lists of strings. Each element
of the list corresponds with a variable, the elements of the list
corresponding with a variable are the labels of that variable, which
are either strings or characters or numbers or symbols. The program
returns a matrix of strings coding all the profiles. Unknown
author. License not specified.
Distributions
The distributions
repository
contains single file implementations of:
- density demo
- Demonstrations of plots of density and probability
functions. Requires XLispStat graphics. Jan de Leeuw. No license specified.
- noncentral t-distribution
- noncentral-t distribution by Russ Lenth, based on Applied Statistics Algorithm AS 243. No license specified.
- probability-functions
- A compilation of probability densities, cumulative distribution
functions, and their inverses (quantile functions), by Jan de
Leeuw. No license specified.
- power
- This appears to test the powers of various distribution
functions. Unknown author. No license specified.
- weibull-mle
- Maximum likelihood estimation of Weibull parameters. M. Ennis. No license specified.
Classroom Statistics
The systems in the
introstat
directory are meant to be used in teaching situations. For the most
part they use XLispStat’s graphical system to introduce students to
statistical concepts. They are generally simple in nature from a the
perspective of a statistical practitioner.
- ElToY
- ElToY is a collection of three programs written in
XLISP-STAT. Dist-toy displays a univariate distribution dynamically
linked to its parameters. CLT-toy provides an illustration of the
central limit theorem for univariate distributions. ElToY provides a
mechanism for displaying the prior and posterior distributions for a
conjugate family dynamically linked so that changes to the prior
affect the posterior and visa versa. Russell Almond
almond@stat.washington.edu. GPL v2.
Multivariate
- Dendro
- Dendro is for producing dendrograms for agglomerative cluster in
XLISP-STAT.
Plotting
- Boxplot Matrix
- Graphical Display of Analysis of Variance with the Boxplot Matrix.
Extension of the standard one-way box plot to cross-classified data
with multiple observations per cell. Richard M. Heiberger
rmh@astro.ocis.temple.edu No license specified.
[Docs]
- Dynamic Graphics and Regression Diagnostics
- Contains methods for regression diagnostics using dynamic graphics,
including all the methods discussed in Cook and Weisberg (1989)
Technometrics, 277-312. Includes documentation written in
LaTeX. sandy@umnstat.stat.umn.edu No license specified.
[Docs}
- FEDF
- Flipped Empirical Distribution Function. Parallel-FEDF,
FEDF-ScatterPlot, FEDF-StarPlot written in XLISP-STAT. These plots are
suggested for exploring multidimensional data suggested in “Journal of
Computational and Graphical Statistics”, Vol. 4, No. 4, pp.335-343.
97/07/18. Lee, Kyungmi & Huh, Moon Yul myhuh@yurim.skku.ac.kr No
license specified.
- PDFPlot
- PDF graphics output from XlispStat PDFPlot is a XlispStat class to
generate PDF files from LispStat plot objects. Steven D. Majewski
sdm7g@virginia.edu. No license specified.
- RXridge
- RXridge.LSP adds shrinkage regression calculation and graphical
ridge “trace” display functionality to the XLisp-Stat, ver2.1
release 3+ implementation of LISP-STAT. Bob Obenchain. No license
specified.
Regression
- Bayes-Linear
- BAYES-LIN is an extension of the XLISP-STAT object-oriented
statistical computing environment, which adds to XLISP-STAT some
object prototypes appropriate for carrying out local computation via
message-passing between clique-tree nodes of Bayes linear belief
networks. Darren J. Wilkinson. No license specified.
[Docs]
- Bayesian Poisson Regression
- Bayesian Poisson Regression using the Gibbs Sampler Sensitivity
Analysis through Dynamic Graphics. A set of programs that allow you
to do Bayesian sensitivity analysis dynamically for a variety of
models. B. Narasimhan (naras@stat.fsu.edu) License restricted to
non-commercial use only.
[Docs]
- Binary regression
- Smooth and parametric binary regression code. Unknown author. License not specified.
- Cost of Data Analysis
- A regression analysis usually consists of several stages such as
variable selection, transformation and residual diagnosis.
Inference is often made from the selected model without regard to
the model selection methods that proceeded it. This can result in
overoptimistic and biased inferences. We first characterize data
analytic actions as functions acting on regression models. We
investigate the extent of the problem and test bootstrap, jackknife
and sample splitting methods for ameliorating it. We also
demonstrate an interactive XLISP-STAT system for assessing the cost
of the data analysis while it is taking place. Julian
J. Faraway. BSD license.
[Docs]
- Gee
- Lisp-Stat code for generalised estimating equation models. Thomas
Lumley thomas@biostat.washington.edu. GPL v2.
[Docs]
- GLIM
- Functions and prototypes for fitting generalized linear
models. Contributed by Luke Tierney luke@umnstat.stat.umn.edu. No
license specified.
[Docs]
- GLMER
- A function to estimate coefficients and dispersions in a generalized
linear model with random effects. Guanghan Liu
gliu@math.ucla.edu. No license specified.
- Hasse
- Implements Taylor & Hilton’s rules for balanced ANOVA designs and
draws the Hasse diagram of nesting
relationships. Philip Iversen piversen@iastate.edu. License restricted to
non-commercial use only.
- monotone
- Implementation of an algorithm to project on the intersection of r
closed convex sets. Further details and references are in Mathar,
Cyclic Projections in Data Analysis, Operations Research
Proceedings 1988, Springer, 1989. Jan de Leeuw. No license
specified.
- OIRS
- Order and Influence in Regression Strategy. The methods (tactics)
of regression data analysis such as variable selection,
transformation and outlier detection are characterised as functions
acting on regression models and returning regression models. The
ordering of the tactics, that is the strategy, is studied. A method
for the generation of acceptable models supported by the choice of
regression data analysis methods is described with a view to
determining if two capable statisticians may reasonably hold
differing views on the same data. Optimal strategies are
considered. The idea of influential points is extended from
estimation to the model building process itself both quantitatively
and qualitatively. The methods described are not intended for the
entirely automatic analysis of data, rather to assist the
statistician in examining regression data at a strategic level.
Julian J. Faraway julian@stat.lsa.umich.edu. BSD license.
- oneway
- Additions to Tierney’s one way ANOVA. B. Narasimhan
naras@euler.bd.psu.edu. No license specified.
- Regstrat
- A XLispStat tool to investigate order in Regression Strategy
particularly for finding and examining the models found by changing
the ordering of the actions in a regression analysis. Julian Faraway
julian@stat.lsa.umich.edu. License restricted to non-commercial use
only.
- Simsel
- XLISP-STAT software to perform Bayesian Predictive Simultaneous
Variable and Transformation Selection for regression. A
criterion-based model selection algorithm. Jennifer A. Hoeting
jah@stat.colostate.edu. License restricted to non-commercial use
only.
Robust
There are three robust systems in the
robust
directory:
- robust regression
- This is the Xlisp-Stat version of
ROSEPACK,
the robust regression package developed by Holland, Welsch, and
Klema around 1975. See Holland and Welsch, Commun. Statist. A6,
1977, 813-827. See also the Xlisp-Stat book, pages 173-177, for an
alternative approach. Jan de Leeuw. No license specified.
There is also robust statistical code for
location
and
scale.
Simulation
The simulation
directory contains bootstrapping
methods,
variable
imputation,
jackknife
resampling,
monte-carlo
simulations
and a general purpose
simulator. There
is also the discrete finite state markov
chains in
the temporal
directory.
Smoothers
- kernel density estimators
- KDEs based on Wand, CFFI based KDEs by B. Narasimhan, and graphical univariate density estimation.
- spline
- Regularized bi-variate splines with smoothing and tension according
to Mitasova and Mitas. Cubic splines according to Green and
Silverman. Jan de Leeuw. No license specified.
- super-smoother
- The super smoothing algorithm, originally implemented in FORTRAN by
Jerome Friedman of Stanford University, is a method by which a
smooth curve may be fitted to a two-dimensional array of points. Its
implementation is presented here in the XLISP-STAT language. Jason
Bond. No license specified.
[DOCS]
- Variable Bandwidth
- XLispStat code to facilitate interactive bandwidth choice for
estimator (3.14), page 44 in Bagkavos (2003), “BIAS REDUCTION IN
NONPARAMETRIC HAZARD RATE ESTIMATION”. No license specified.
Spatial
- livemap
- LiveMap is a tool for exploratory spatial data analysis. Dr. Chris Brunsdon. No license specified.
[DOCS]
- variograms
- Produces variograms using algorithms from C.V. Deutsch and
A.G. Journel, “GSLIB: Geostatistical Software Library and User’s
Guide, Oxford University Press, New York, 1992. Stanley
S. Bentow. No license specified.
[DOCS]
Temporal
- Exploratory survival analysis
- A set of XLISP-STAT routines for the interactive, dynamic,
exploratory analysis of survival data. E. Neely Atkinson
(neely@odin.mda.uth.tmc.edu) “This software may be freely redistributed.”
[Docs]
- Markov
- Simulate some Markov chains in Xlisp-Stat. Complete documentation and examples are included. B. Narasimhan (naras@sci234e.mrs.umn.edu). GPL.
[Docs]
- SAPA
- Sapaclisp is a collection of Common Lisp functions that can be used to
carry out many of the computations described in the SAPA book:
Donald B. Percival and Andrew T. Walden, “Spectral Analysis for
Physical Applications: Multitaper and Conventional Univariate
Techniques”, Cambridge University Press, Cambridge, England, 1993.
The SAPA book uses a number of time series as examples of various
spectral analysis techniques.
Note
This archive contains SAPA converted to XLispStat. A
Common
Lisp version can be
obtained from the CMU archive.
From the description:
Sapaclisp features functions for converting to/from decibels, the
FORTRAN sign function, log of the gamma function, manipulating
polynomials, root finding, simple numerical integration, matrix
functions, Cholesky and modified Gram-Schmidt (i.e., Q-R) matrix
decompositions, sample means and variances, sample medians,
computation of quantiles from various distributions, linear least
squares, discrete Fourier transform, fast Fourier transform, chirp
transform, low-pass filters, high-pass filters, band-pass filters,
sample auto-covariance sequence, auto-regressive spectral estimates,
least squares, forward/backward least squares, Burg’s algorithm, the
Yule-Walker method, periodogram, direct spectral estimates, lag window
spectral estimates, WOSA spectral estimates, sample cepstrum, time
series bandwidth, cumulative periodogram test statistic for white
noise, and Fisher’s g statistic.
License: “Use and copying of this software and preparation of
derivative works based upon this software are permitted. Any
distribution of this software or derivative works must comply with all
applicable United States export control laws.”
- Times
- XLispStat functions for time series analysis, data editing, data
selection, and other statistical operations. W. Hatch
(bts!bill@uunet.uu.net). Public Domain.
Tests
The tests
directory
contains code to do one-sample and two-sample Kolmogorov-Smirnov test
(with no estimated parameters) and code to do Mann-Whitney and
Wilcoxon rank signed rank tests.
Training & Documentation
- ENAR Short Course
- This directory contains slides and examples used in a shortcourse on
Lisp-Stat presented at the 1992 ENAR meetings in Cincinnati, 22
March 1992.
- ASA Course
- Material from an ASA course given in 1992.
- Tech Report
- A 106 page mini manual on XLispStat.
Utilities
The majority of the files in the utilities
directory
are specific to XLISP-STAT and unlikely to be useful. In most cases
better alternatives now exist for Common Lisp. A few that may be worth
investigating have been noted below.
Filters
- XLisp-S
- A series of routines to allow users of Xlisp or LispStat to
interactively transfer data to and access functions in New S. Steve
McKinney kilroy@biostat.washington.edu. License restricted to
non-commercial use only.
I/O
- formatted-input
- A set of XLISP functions that can be used to read ASCII files into
lists of lists, using formatted input. The main function is
read-file
, which has as arguments a filename and a FORTRAN type
format string (with f, i, x, t, and a formats) Jan Deleeuw
deleeuw@laplace.sscnet.ucla.edu “THIS SOFTWARE CAN BE FREELY
DISTRIBUTED, USED, AND MODIFIED.”
Memoization
- automatic memoization
- As the name suggests. Marty Hall
hall@aplcenmp.apl.jhu.edu. “Permission is granted for any use or
modification of this code provided this notice is retained."
[OVERVIEW]