Probability tools (ryprob)

Overview

The functions in this module are used in the staring array model. The module was originally used in Konnik’s staring array model. Credits for the original source are included in each function’s docstring.

Code Overview

This module provides a few probability utility functions, most of which are used in the high level CCD and CMOS staring array model pyradi.rystare.

One of the utility functions provides a packaged version of scikit-learn’s kernel density estimation tool to provide a better estimate than does a simple histogram.

Module functions

KDEbounded(x_d, x, bandwidth=nan, lowbnd=nan, uppbnd=nan, kernel='gaussian')[source]

Estimate the probability by Kernel Density Estimation.

If bandwidth is np.nan, calculate the optimal kernel width (bandwidth) automatically using cross-validation. Be careful — this can take a while.

Mirrors the data at either or both edges when the domain is bounded, which corrects the boundary bias that a plain KDE would otherwise exhibit.

Parameters:
  • x_d (|) – domain over which values must be returned

  • x (|) – input sample data set

  • bandwidth (|) – bandwidth to use; np.nan means calculate automatically

  • lowbnd (|) – lower mirror-fold boundary; np.nan means no lower bound

  • uppbnd (|) – upper mirror-fold boundary; np.nan means no upper bound

  • kernel (|) – kernel type — one of ‘gaussian’, ‘tophat’, ‘epanechnikov’, ‘exponential’, ‘linear’, ‘cosine’

Returns:

input vector used as the domain for calculations | x (np.array[N,]): probability density evaluated over x_d | bandwidth (float): bandwidth used in the KDE | kernel (str): kernel used

Return type:

x_d (np.array[N,])

Raises:

| No exception is raised.

See here for more detail and examples: https://github.com/NelisW/PythonNotesToSelf/tree/master/KernelDensityEstimation

Other references: https://jakevdp.github.io/PythonDataScienceHandbook/05.13-kernel-density-estimation.html#Motivating-KDE:-Histograms https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/ https://scikit-learn.org/stable/auto_examples/neighbors/plot_kde_1d.html https://stats.stackexchange.com/questions/405357/how-to-choose-the-bandwidth-of-a-kde-in-python https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html https://towardsdatascience.com/how-to-find-probability-from-probability-density-plots-7c392b218bab

distribution_exp(distribParams, out, funcName)[source]

Exponential Distribution.

This function is meant to be called via the distributions_generator function.

\textrm{pdf} = \lambda \cdot \exp(-\lambda \cdot y)

\textrm{cdf} = 1 - \exp(-\lambda \cdot y)

  • Mean = 1/lambda

  • Variance = 1/lambda^2

  • Mode = lambda

  • Median = log(2)/lambda

  • Skewness = 2

  • Kurtosis = 6

GENERATING FUNCTION: T = -\log_e(U) / \lambda

PARAMETERS: distribParams[0] is lambda — inverse scale or rate (lambda > 0)

SUPPORT: y, y >= 0

CLASS: Continuous skewed distributions

NOTES: The discrete version of the Exponential distribution is the Geometric distribution.

Parameters:
  • distribParams (|) – [lambda]; defaults to [1.] when None or empty

  • out (|) – pre-allocated output array; shape determines sample size

  • funcName (|) – caller name, used in validation error messages

Returns:

array of Exponential random variates (same shape as input out)

Return type:

out (np.array)

Raises:

| No exception is raised.

distribution_lognormal(distribParams, out, funcName)[source]

The Log-normal Distribution (sometimes: Cobb-Douglas or antilognormal distribution).

This function is meant to be called via the distributions_generator function.

pdf = 1/(y*sigma*sqrt(2*pi)) * exp(-1/2*((log(y)-mu)/sigma)^2) cdf = 1/2*(1 + erf((log(y)-mu)/(sigma*sqrt(2))));

  • Mean = exp(mu + sigma^2/2)

  • Variance = exp(2*mu+sigma^2)*(exp(sigma^2)-1)

  • Skewness = (exp(1)+2)*sqrt(exp(1)-1), for mu=0 and sigma=1

  • Kurtosis = exp(4) + 2*exp(3) + 3*exp(2) - 6, for mu=0 and sigma=1

  • Mode = exp(mu-sigma^2)

PARAMETERS: mu — location, sigma — scale (sigma > 0)

SUPPORT: y, y > 0

CLASS: Continuous skewed distribution

NOTES:

  1. The LogNormal distribution is always right-skewed.

  2. Parameters mu and sigma are the mean and standard deviation of y in (natural) log space.

  3. mu = log(mean(y)) - 1/2*log(1 + var(y)/(mean(y))^2)

  4. sigma = sqrt(log(1 + var(y)/(mean(y))^2))

Parameters:
  • distribParams (|) – [mu, sigma]; defaults to [0., 1.] when None or empty

  • out (|) – pre-allocated output array; shape determines sample size

  • funcName (|) – caller name, used in validation error messages

Returns:

array of Log-normal random variates (same shape as input out)

Return type:

out (np.array)

Raises:

| No exception is raised.

distribution_inversegauss(distribParams, out, funcName)[source]

The Inverse Gaussian Distribution.

This function is meant to be called via the distributions_generator function.

The Inverse Gaussian distribution is left skewed distribution whose location is set by the mean with the profile determined by the scale factor. The random variable can take a value between zero and infinity. The skewness increases rapidly with decreasing values of the scale parameter.

pdf(y) = sqrt(_lambda/(2*pi*y^3)) * exp(-_lambda./(2*y).*(y/mu-1).^2)

cdf(y) = normcdf(sqrt(_lambda./y).*(y/mu-1)) + exp(2*_lambda/mu)*normcdf(sqrt(_lambda./y).*(-y/mu-1))

where normcdf(x) = 0.5*(1+erf(y/sqrt(2))); is the standard normal CDF

  • Mean = mu

  • Variance = mu^3/_lambda

  • Skewness = sqrt(9*mu/_lambda)

  • Kurtosis = 15*mean/scale

  • Mode = mu/(2*_lambda)*(sqrt(9*mu^2+4*_lambda^2)-3*mu)

PARAMETERS: mu — location (mu > 0), _lambda — scale (_lambda > 0)

SUPPORT: y, y > 0

CLASS: Continuous skewed distribution

NOTES:

  1. There are several alternate forms for the PDF, some of which have more than two parameters.

  2. The Inverse Gaussian distribution is often called the Inverse Normal.

  3. Wald distribution is a special case where the mean is a constant with value one.

  4. The Inverse Gaussian distribution is a special case of the Generalised Hyperbolic Distribution.

Method:

There is an efficient procedure that utilises a transformation yielding two roots. If Y is an Inverse Gauss random variable, then following [1] we can write: V = _lambda*(Y-mu)^2/(Y*mu^2) ~ Chi-Square(1)

i.e. V is distributed as a chi-square random variable with one degree of freedom. Solving this equation for Y yields two roots:

y1 = mu + 0.5*mu/_lambda * (mu*V - sqrt(4*mu*_lambda*V + mu^2*V.^2)) y2 = mu^2/y1

In [2] it is shown that Y can be simulated by choosing y1 with probability mu/(mu+y1) and y2 with probability 1 - mu/(mu+y1).

References: [1] Shuster, J. (1968). On the Inverse Gaussian Distribution Function, Journal of the American Statistical Association 63: 1514-1516. [2] Michael, J.R., Schucany, W.R. and Haas, R.W. (1976). Generating Random Variates Using Transformations with Multiple Roots, The American Statistician 30: 88-90.

http://en.wikipedia.org/wiki/Inverse_Gaussian_distribution

Parameters:
  • distribParams (|) – [mu, _lambda]; defaults to [0., 1.] when None or empty

  • out (|) – pre-allocated output array; shape determines sample size

  • funcName (|) – caller name, used in validation error messages

Returns:

array of Inverse Gaussian random variates (same shape as input out)

Return type:

out (np.array)

Raises:

| No exception is raised.

distribution_logistic(distribParams, out, funcName)[source]

The Logistic Distribution.

This function is meant to be called via the distributions_generator function.

The logistic distribution is a symmetrical bell-shaped distribution. One of its applications is as an alternative to the Normal distribution when a higher proportion of the population being modelled is distributed in the tails.

pdf(y) = exp((y-a)/k) / (k*(1+exp((y-a)/k))^2)

cdf(y) = 1 / (1+exp(-(y-a)/k))

  • Mean = a

  • Variance = k^2*pi^2/3

  • Skewness = 0

  • Kurtosis = 1.2

PARAMETERS: a — location, k — scale (k > 0)

SUPPORT: y, -Inf < y < Inf

CLASS: Continuous symmetric distribution

Method: Inverse CDF transformation.

http://en.wikipedia.org/wiki/Logistic_distribution

Parameters:
  • distribParams (|) – [a, k]; defaults to [0., 1.] when None or empty

  • out (|) – pre-allocated output array; shape determines sample size

  • funcName (|) – caller name, used in validation error messages

Returns:

array of Logistic random variates (same shape as input out)

Return type:

out (np.array)

Raises:

| No exception is raised.

distribution_wald(distribParams, out, funcName)[source]

The Wald Distribution.

This function is meant to be called via the distributions_generator function.

The Wald distribution is a special case of the Inverse Gaussian Distribution. This formulation is for a constant mean value of one, which is the most common use case. The random variable can take a value between zero and infinity. The skewness increases rapidly with decreasing values of the shape parameter, chi.

pdf = sqrt(chi/(2*pi*y^3)) * exp(-chi./(2*y).*(y-1).^2)

  • Mean = 1

  • Variance = 1/chi

  • Skewness = sqrt(9/chi)

  • Kurtosis = 15/chi

PARAMETERS: chi — shape parameter (chi > 0)

SUPPORT: y, y > 0

CLASS: Continuous skewed distributions

Parameters:
  • distribParams (|) – [chi]; defaults to [0.] when None or empty

  • out (|) – pre-allocated output array; shape determines sample size

  • funcName (|) – caller name, used in validation error messages

Returns:

array of Wald random variates (same shape as input out)

Return type:

out (np.array)

Raises:

| No exception is raised.

distributions_generator(distribName=None, distribParams=None, sampleSize=None)[source]

Dispatcher for simulation of various probability distributions.

Allows the user to select a distribution by name and pass the required parameters as a list. The size of the returned sample is controlled by sampleSize.

sampleSize follows Matlab conventions:

  • None → return a single scalar value

  • scalar int N → return an N×N array

  • tuple or list → return an array of that shape

Possible values for distribName:

  • 'exp', 'exponential'

  • 'lognorm', 'lognormal', 'cobbdouglas', 'antilognormal'

  • 'ig', 'inversegauss', 'invgauss'

  • 'logistic'

  • 'wald'

Note

Due to a pre-existing implementation detail, the distribParams argument is not forwarded to the individual distribution functions; each uses its own built-in default parameters. distribParams accepted here for API compatibility only.

Parameters:
  • distribName (|) – required distribution name (case-insensitive, spaces stripped)

  • distribParams (|) – list of distribution parameters (currently unused by the dispatcher — each sub-function applies its own defaults)

  • sampleSize (|) – size of the returned random sample

Returns:

set of random variables for the selected distribution

Return type:

out (float or np.array)

Raises:

| No exception is raised.

validateParam(funcName=None, distribName=None, runDistribName=None, distribParamsName=None, paramName=None, param=None, conditionStr=None)[source]

Validate the range and number of parameters.

Evaluates each condition string in conditionStr against param and returns True only if all conditions hold. Supported operators are '<', '<=', '>', '>=', '!=', and '=='.

Note

The '==' branch contains pre-existing code referencing math.floor_ and math.float which do not exist in the standard library. That branch is not reachable through any current caller and is left unchanged.

Parameters:
  • funcName (|) – calling function name, used in error messages

  • distribName (|) – human-readable distribution name

  • runDistribName (|) – short distribution name used in the help hint

  • distribParamsName (|) – parameter group name (e.g. '[mu, sigma]')

  • paramName (|) – individual parameter name being validated

  • param (|) – parameter value to validate

  • conditionStr (|) – list of condition strings, e.g. ['> 0', '<= 1']

Returns:

True if all conditions are satisfied, False otherwise

Return type:

(bool)

Raises:

| No exception is raised.

checkParamsNum(funcName, distribName, runDistribName, distribParams, correctNum)[source]

Check that the correct number of parameters was supplied.

More than one acceptable count may be specified via correctNum.

Parameters:
  • funcName (|) – calling function name, used in error messages

  • distribName (|) – human-readable distribution name

  • runDistribName (|) – short distribution name used in the help hint

  • distribParams (|) – list of distribution parameters supplied by the caller

  • correctNum (|) – list of acceptable parameter counts (e.g. [0, 2])

Returns:

True if len(distribParams) is in correctNum, False otherwise

Return type:

(bool)

Raises:

| No exception is raised.

unifomOnSphere(number, dimension=3)[source]

Create a set of n-dimensional vectors uniformly distributed on a unit sphere.

Generates points by drawing from a standard normal distribution and normalising each vector to unit length, which produces a uniform distribution on the sphere.

https://math.stackexchange.com/questions/444700/uniform-distribution-on-the-surface-of-unit-sphere https://stackoverflow.com/questions/59954810/generate-random-points-on-10-dimensional-unit-sphere

Parameters:
  • number (|) – number of points to generate

  • dimension (|) – dimensionality of each normalised vector (default 3)

Returns:

array of unit-norm vectors on the sphere

Return type:

vec (np.array[number, dimension])

Raises:

| No exception is raised.