Probability tools (ryprob)¶
Overview¶
The functions in this module are used in the staring array model. The module was originally used in Konnik’s staring array model. Credits for the original source are included in each function’s docstring.
Code Overview¶
This module provides a few probability utility functions, most of which are used in the high level CCD and CMOS staring array model pyradi.rystare.
One of the utility functions provides a packaged version of scikit-learn’s kernel density estimation tool to provide a better estimate than does a simple histogram.
Module functions¶
- KDEbounded(x_d, x, bandwidth=nan, lowbnd=nan, uppbnd=nan, kernel='gaussian')[source]¶
Estimate the probability by Kernel Density Estimation.
If bandwidth is np.nan, calculate the optimal kernel width (bandwidth) automatically using cross-validation. Be careful — this can take a while.
Mirrors the data at either or both edges when the domain is bounded, which corrects the boundary bias that a plain KDE would otherwise exhibit.
- Parameters:
x_d (|) – domain over which values must be returned
x (|) – input sample data set
bandwidth (|) – bandwidth to use; np.nan means calculate automatically
lowbnd (|) – lower mirror-fold boundary; np.nan means no lower bound
uppbnd (|) – upper mirror-fold boundary; np.nan means no upper bound
kernel (|) – kernel type — one of ‘gaussian’, ‘tophat’, ‘epanechnikov’, ‘exponential’, ‘linear’, ‘cosine’
- Returns:
input vector used as the domain for calculations | x (np.array[N,]): probability density evaluated over x_d | bandwidth (float): bandwidth used in the KDE | kernel (str): kernel used
- Return type:
- x_d (np.array[N,])
- Raises:
| No exception is raised. –
See here for more detail and examples: https://github.com/NelisW/PythonNotesToSelf/tree/master/KernelDensityEstimation
Other references: https://jakevdp.github.io/PythonDataScienceHandbook/05.13-kernel-density-estimation.html#Motivating-KDE:-Histograms https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/ https://scikit-learn.org/stable/auto_examples/neighbors/plot_kde_1d.html https://stats.stackexchange.com/questions/405357/how-to-choose-the-bandwidth-of-a-kde-in-python https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html https://towardsdatascience.com/how-to-find-probability-from-probability-density-plots-7c392b218bab
- distribution_exp(distribParams, out, funcName)[source]¶
Exponential Distribution.
This function is meant to be called via the distributions_generator function.


Mean = 1/lambda
Variance = 1/lambda^2
Mode = lambda
Median = log(2)/lambda
Skewness = 2
Kurtosis = 6
GENERATING FUNCTION:

PARAMETERS: distribParams[0] is lambda — inverse scale or rate (lambda > 0)
SUPPORT: y, y >= 0
CLASS: Continuous skewed distributions
NOTES: The discrete version of the Exponential distribution is the Geometric distribution.
- Parameters:
distribParams (|) – [lambda]; defaults to [1.] when None or empty
out (|) – pre-allocated output array; shape determines sample size
funcName (|) – caller name, used in validation error messages
- Returns:
array of Exponential random variates (same shape as input out)
- Return type:
- out (np.array)
- Raises:
| No exception is raised. –
- distribution_lognormal(distribParams, out, funcName)[source]¶
The Log-normal Distribution (sometimes: Cobb-Douglas or antilognormal distribution).
This function is meant to be called via the distributions_generator function.
pdf = 1/(y*sigma*sqrt(2*pi)) * exp(-1/2*((log(y)-mu)/sigma)^2) cdf = 1/2*(1 + erf((log(y)-mu)/(sigma*sqrt(2))));
Mean = exp(mu + sigma^2/2)
Variance = exp(2*mu+sigma^2)*(exp(sigma^2)-1)
Skewness = (exp(1)+2)*sqrt(exp(1)-1), for mu=0 and sigma=1
Kurtosis = exp(4) + 2*exp(3) + 3*exp(2) - 6, for mu=0 and sigma=1
Mode = exp(mu-sigma^2)
PARAMETERS: mu — location, sigma — scale (sigma > 0)
SUPPORT: y, y > 0
CLASS: Continuous skewed distribution
NOTES:
The LogNormal distribution is always right-skewed.
Parameters mu and sigma are the mean and standard deviation of y in (natural) log space.
mu = log(mean(y)) - 1/2*log(1 + var(y)/(mean(y))^2)
sigma = sqrt(log(1 + var(y)/(mean(y))^2))
- Parameters:
distribParams (|) – [mu, sigma]; defaults to [0., 1.] when None or empty
out (|) – pre-allocated output array; shape determines sample size
funcName (|) – caller name, used in validation error messages
- Returns:
array of Log-normal random variates (same shape as input out)
- Return type:
- out (np.array)
- Raises:
| No exception is raised. –
- distribution_inversegauss(distribParams, out, funcName)[source]¶
The Inverse Gaussian Distribution.
This function is meant to be called via the distributions_generator function.
The Inverse Gaussian distribution is left skewed distribution whose location is set by the mean with the profile determined by the scale factor. The random variable can take a value between zero and infinity. The skewness increases rapidly with decreasing values of the scale parameter.
pdf(y) = sqrt(_lambda/(2*pi*y^3)) * exp(-_lambda./(2*y).*(y/mu-1).^2)
cdf(y) = normcdf(sqrt(_lambda./y).*(y/mu-1)) + exp(2*_lambda/mu)*normcdf(sqrt(_lambda./y).*(-y/mu-1))
where normcdf(x) = 0.5*(1+erf(y/sqrt(2))); is the standard normal CDF
Mean = mu
Variance = mu^3/_lambda
Skewness = sqrt(9*mu/_lambda)
Kurtosis = 15*mean/scale
Mode = mu/(2*_lambda)*(sqrt(9*mu^2+4*_lambda^2)-3*mu)
PARAMETERS: mu — location (mu > 0), _lambda — scale (_lambda > 0)
SUPPORT: y, y > 0
CLASS: Continuous skewed distribution
NOTES:
There are several alternate forms for the PDF, some of which have more than two parameters.
The Inverse Gaussian distribution is often called the Inverse Normal.
Wald distribution is a special case where the mean is a constant with value one.
The Inverse Gaussian distribution is a special case of the Generalised Hyperbolic Distribution.
Method:
There is an efficient procedure that utilises a transformation yielding two roots. If Y is an Inverse Gauss random variable, then following [1] we can write: V = _lambda*(Y-mu)^2/(Y*mu^2) ~ Chi-Square(1)
i.e. V is distributed as a chi-square random variable with one degree of freedom. Solving this equation for Y yields two roots:
y1 = mu + 0.5*mu/_lambda * (mu*V - sqrt(4*mu*_lambda*V + mu^2*V.^2)) y2 = mu^2/y1
In [2] it is shown that Y can be simulated by choosing y1 with probability mu/(mu+y1) and y2 with probability 1 - mu/(mu+y1).
References: [1] Shuster, J. (1968). On the Inverse Gaussian Distribution Function, Journal of the American Statistical Association 63: 1514-1516. [2] Michael, J.R., Schucany, W.R. and Haas, R.W. (1976). Generating Random Variates Using Transformations with Multiple Roots, The American Statistician 30: 88-90.
http://en.wikipedia.org/wiki/Inverse_Gaussian_distribution
- Parameters:
distribParams (|) – [mu, _lambda]; defaults to [0., 1.] when None or empty
out (|) – pre-allocated output array; shape determines sample size
funcName (|) – caller name, used in validation error messages
- Returns:
array of Inverse Gaussian random variates (same shape as input out)
- Return type:
- out (np.array)
- Raises:
| No exception is raised. –
- distribution_logistic(distribParams, out, funcName)[source]¶
The Logistic Distribution.
This function is meant to be called via the distributions_generator function.
The logistic distribution is a symmetrical bell-shaped distribution. One of its applications is as an alternative to the Normal distribution when a higher proportion of the population being modelled is distributed in the tails.
pdf(y) = exp((y-a)/k) / (k*(1+exp((y-a)/k))^2)
cdf(y) = 1 / (1+exp(-(y-a)/k))
Mean = a
Variance = k^2*pi^2/3
Skewness = 0
Kurtosis = 1.2
PARAMETERS: a — location, k — scale (k > 0)
SUPPORT: y, -Inf < y < Inf
CLASS: Continuous symmetric distribution
Method: Inverse CDF transformation.
http://en.wikipedia.org/wiki/Logistic_distribution
- Parameters:
distribParams (|) – [a, k]; defaults to [0., 1.] when None or empty
out (|) – pre-allocated output array; shape determines sample size
funcName (|) – caller name, used in validation error messages
- Returns:
array of Logistic random variates (same shape as input out)
- Return type:
- out (np.array)
- Raises:
| No exception is raised. –
- distribution_wald(distribParams, out, funcName)[source]¶
The Wald Distribution.
This function is meant to be called via the distributions_generator function.
The Wald distribution is a special case of the Inverse Gaussian Distribution. This formulation is for a constant mean value of one, which is the most common use case. The random variable can take a value between zero and infinity. The skewness increases rapidly with decreasing values of the shape parameter, chi.
pdf = sqrt(chi/(2*pi*y^3)) * exp(-chi./(2*y).*(y-1).^2)
Mean = 1
Variance = 1/chi
Skewness = sqrt(9/chi)
Kurtosis = 15/chi
PARAMETERS: chi — shape parameter (chi > 0)
SUPPORT: y, y > 0
CLASS: Continuous skewed distributions
- Parameters:
distribParams (|) – [chi]; defaults to [0.] when None or empty
out (|) – pre-allocated output array; shape determines sample size
funcName (|) – caller name, used in validation error messages
- Returns:
array of Wald random variates (same shape as input out)
- Return type:
- out (np.array)
- Raises:
| No exception is raised. –
- distributions_generator(distribName=None, distribParams=None, sampleSize=None)[source]¶
Dispatcher for simulation of various probability distributions.
Allows the user to select a distribution by name and pass the required parameters as a list. The size of the returned sample is controlled by
sampleSize.sampleSizefollows Matlab conventions:None→ return a single scalar valuescalar int N → return an N×N array
tuple or list → return an array of that shape
Possible values for
distribName:'exp','exponential''lognorm','lognormal','cobbdouglas','antilognormal''ig','inversegauss','invgauss''logistic''wald'
Note
Due to a pre-existing implementation detail, the
distribParamsargument is not forwarded to the individual distribution functions; each uses its own built-in default parameters.distribParamsaccepted here for API compatibility only.- Parameters:
distribName (|) – required distribution name (case-insensitive, spaces stripped)
distribParams (|) – list of distribution parameters (currently unused by the dispatcher — each sub-function applies its own defaults)
sampleSize (|) – size of the returned random sample
- Returns:
set of random variables for the selected distribution
- Return type:
- out (float or np.array)
- Raises:
| No exception is raised. –
- validateParam(funcName=None, distribName=None, runDistribName=None, distribParamsName=None, paramName=None, param=None, conditionStr=None)[source]¶
Validate the range and number of parameters.
Evaluates each condition string in
conditionStragainstparamand returnsTrueonly if all conditions hold. Supported operators are'<','<=','>','>=','!=', and'=='.Note
The
'=='branch contains pre-existing code referencingmath.floor_andmath.floatwhich do not exist in the standard library. That branch is not reachable through any current caller and is left unchanged.- Parameters:
funcName (|) – calling function name, used in error messages
distribName (|) – human-readable distribution name
runDistribName (|) – short distribution name used in the help hint
distribParamsName (|) – parameter group name (e.g.
'[mu, sigma]')paramName (|) – individual parameter name being validated
param (|) – parameter value to validate
conditionStr (|) – list of condition strings, e.g.
['> 0', '<= 1']
- Returns:
True if all conditions are satisfied, False otherwise
- Return type:
- (bool)
- Raises:
| No exception is raised. –
- checkParamsNum(funcName, distribName, runDistribName, distribParams, correctNum)[source]¶
Check that the correct number of parameters was supplied.
More than one acceptable count may be specified via
correctNum.- Parameters:
funcName (|) – calling function name, used in error messages
distribName (|) – human-readable distribution name
runDistribName (|) – short distribution name used in the help hint
distribParams (|) – list of distribution parameters supplied by the caller
correctNum (|) – list of acceptable parameter counts (e.g.
[0, 2])
- Returns:
True if
len(distribParams)is incorrectNum, False otherwise- Return type:
- (bool)
- Raises:
| No exception is raised. –
- unifomOnSphere(number, dimension=3)[source]¶
Create a set of n-dimensional vectors uniformly distributed on a unit sphere.
Generates points by drawing from a standard normal distribution and normalising each vector to unit length, which produces a uniform distribution on the sphere.
https://math.stackexchange.com/questions/444700/uniform-distribution-on-the-surface-of-unit-sphere https://stackoverflow.com/questions/59954810/generate-random-points-on-10-dimensional-unit-sphere
- Parameters:
number (|) – number of points to generate
dimension (|) – dimensionality of each normalised vector (default 3)
- Returns:
array of unit-norm vectors on the sphere
- Return type:
- vec (np.array[number, dimension])
- Raises:
| No exception is raised. –