Distributions
Overview
The Distributions
system provides a collection of probability distributions and related functions such as:
- Sampling from distributions
- Moments (e.g mean, variance, skewness, and kurtosis), entropy, and other properties
- Probability density/mass functions (pdf) and their logarithm (logpdf)
- Moment-generating functions and characteristic functions
- Maximum likelihood estimation
- Distribution composition and derived distributions
Getting Started
Load the distributions system with (asdf:load-system :distributions)
and the plot system with (asdf:load-system :plot/vega)
. Now generate a sequence of 1000 samples drawn from the standard normal distribution:
and plot a histogram of the counts:
It looks like there’s an outlier at 5, but basically you can see it’s centered around 0.
To create a parameterised distribution, pass the parameters when you create the distribution object. In the following example we create a distribution with a mean of 2 and variance of 1 and plot it:
Now that we have the distribution as an object, we can obtain pdf
, cdf
, mean
and other parameters for it:
LS-USER> (mean rn2)
2.0d0
LS-USER> (pdf rn2 1.75)
0.38666811680284924d0
LS-USER> (cdf rn2 1.75)
0.4012936743170763d0
Gamma
In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distribution. There are two different parameterisations in common use:
- With a shape parameter k and a scale parameter θ.
- With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.
In each of these forms, both parameters are positive real numbers.
The parameterisation with k and θ appears to be more common in econometrics and certain other applied fields, where for example the gamma distribution is frequently used to model waiting times.
The parameterisation with α and β is more common in Bayesian statistics, where the gamma distribution is used as a conjugate prior distribution for various types of inverse scale (rate) parameters, such as the λ of an exponential distribution or a Poisson distribution.
When the shape parameter has an integer value, the distribution is the Erlang distribution. Since this can be produced by ensuring that the shape parameter has an integer value > 0, the Erlang distribution is not separately implemented.
The probability density function parameterized by shape-scale is:
,
and by shape-rate:
CDF
The cumulative distribution function characterized by shape and scale (k and θ) is:
where is the lower-incomplete-gamma function.
Characterized by α and β (shape and rate):
where is the lower incomplete gamma function.
Usage
Python and Boost use shape & scale for parameterization. Lisp-Stat and R use shape and rate for the default parameterisation. Both forms of parameterization are common. However, since Lisp-Stat’s implementation is based on Boost (because of the restrictive license of R), we perform the conversion internally.
Implementation notes
In the following table k is the shape parameter of the distribution, θ is its scale parameter, x is the random variate, p is the probability and q is (- 1 p). The implementation functions are in the special-functions system.
Function | Implementation |
---|---|
(/ (gamma-p-derivative k (/ x θ)) θ) | |
CDF | (incomplete-gamma k (/ x θ)) |
CDF complement | (upper-incomplete-gamma k (/ x θ)) |
quantile | (* θ (inverse-incomplete-gamma k p)) |
quantile complement | (* θ (upper-inverse-incomplete-gamma k p)) |
mean | kθ |
variance | kθ2 |
mode | (* (1- k) θ), k>1 |
skewness | (/ 2 (sqrt k)) |
kurtosis | (+ 3 (/ 6 k)) |
kurtosis excess | (/ 6 k) |
Example
On average, a train arrives at a station once every 15 minutes (θ=15/60). What is the probability there are 10 trains (occurances of the event) within three hours?
In this example we have:
alpha = 10
theta = 15/60
x = 3
To compute the exact answer:
As an alternative, we can run a simulation, where we draw from the parameterised distribution and then calculate the percentage of values that fall below our threshold, x = 3:
Finally, if we want to plot the probability:
References
Boost implementation of Gamma
Gamma distribution (Wikipedia)
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.