NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Distribution Stories:


Geometric:

The geometric distribution represents the probability of having \(x\) Bernoulli(\(p\)) failures until first success? “How many trials are needed until your first success?”


Negative binomial:

Models the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures \(r\) occurs.


Hypergeometric distribution:

n probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of \(k\) successes (random draws for which the object drawn has a specified feature) in \(n\) draws, without replacement.


Poisson:


In probability theory, a Poisson process is a stochastic process that counts the number of events and the time points at which these events occur in a given time interval. The time between each pair of consecutive events has an exponential distribution with parameter \(\lambda\) and each of these inter-arrival times is assumed to be independent of other inter-arrival times. The process is named after the French mathematician Siméon Denis Poisson and is a good model of radioactive decay, telephone calls and requests for a particular document on a web server among many other phenomena.

The Poisson process is a continuous-time process; the sum of a Bernoulli process can be thought of as its discrete-time counterpart. A Poisson process is a pure-birth process, the simplest example of a birth-death process. It is also a point process on the real half-line.

Probability of getting \(5\) e-mails (n Poisson events) in \(1\) day, given an average rate. Mnemonic: Probability of getting \(n\) fishes - “poissons” in French.


Exponential:


Probability of having to wait \(1\) hour before the next e-mail arrives (Poisson event). Parameter:

Describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate. It is the continuous analogue of the geometric distribution (number of Bernoulli trials before getting the first success, e.g. Heads), and it has the key property of being memoryless.


Erlang:


The difference between Erlang and Gamma is that in a Gamma distribution, \(k\) can be a non-integer (positive real number) and in Erlang, \(k\) is positive integer only.

The Erlang distribution is the distribution of a sum of \(k\) independent exponential variables with mean \(1/\lambda\) each. Equivalently, it is the distribution of the time until the \(k\)-th event of a Poisson process with a rate of \(\lambda.\)


Gamma:


From here:

The exponential distribution predicts the wait time until the very first event. The gamma distribution, on the other hand, predicts the wait time until the \(k\)-th event occurs. If arrivals of events follow a Poisson process with a rate \(λ,\) the wait time until \(k\) arrivals follows \(Γ(k, λ).\) For \((α, β)\) parameterization: Using our notation \(k\) (the # of events) & \(λ\) (the rate of events), simply substitute \(α\) with \(k,\) \(β\) with \(λ.\)

The gamma distribution is frequently used to model waiting times. For instance, in life testing, the waiting time until death is a random variable that is frequently modeled with a gamma distribution.

Let \(X_{1},X_{2},\dots ,X_{n}\) be \(n\) independent and identically distributed random variables following an exponential distribution with rate parameter \(λ,\) then \(\sum_{i}X_{i} ~ \operatorname{Gamma}(n, λ)\) where \(n\) is the shape parameter and \(λ\) is the rate, and \(\bar {X}={\frac {1}{n}}\sum_{i}X_{i}\sim \operatorname{Gamma} (n,n\lambda )\) where the rate changes \(nλ.\)

Probability of having to wait \(1\) day to get \(5\) e-mails (time to \(n\)-th Poisson event). We add inter-arrival times (exponential lambda). This fact: that the gamma distribution represents the sum of exponential distributions (convolution) is proved by deriving the mgf of the gamma.

Parameters: \(n\) (\(a\) or \(K\) or \(\alpha\)) (shape) and \(\lambda\) (or \(\beta\) or \(1/\theta\)) (scale). It’s the continuous analogue of the negative binomial (i.e. sum of geometric) = number of Bernoulli trials before reaching n successes (eg. \(4\) heads).


Log Normal:

From here:

Statisticians use this distribution to model growth rates that are independent of size, which frequently occurs in biology and financial areas. It also models time to failure in reliability studies, rainfall amounts, species abundance, and the number of moves in chess games. It models global income distributions.

From here:

A log-normal process is the statistical realization of the multiplicative product of many independent random variables, each of which is positive.

Heath pointed out that for ‘‘certain types of data the assumption that the data are drawn from a normal population isusually wrong, and that the alternative assumption of a log-normal distribution is better’’. As further explained below, this statement appears to be of a much broader importance: it is in line with the fact that, in general, laws and processes in science and life are rather of multiplicative than additive nature.


Chi square:


The chi square distribution is a special case of the gamma distribution.

A squared (transformed) standard normal distribution will be distributed as a chi square of \(1\) degree of freedom. If \(Z_1, Z_2, \dots, Z_k\) are standard normal variables, \(Z_1^2 + Z_2^2 + \cdots + Z_n^2 \sim \chi^2_k.\)

The chi square cames into place in the distribution of squared differences between observed and estimated values. It will only play a role in multiple inferential tests, including testing variance.


Beta:


In Bayesian inference, the beta distribution is the conjugate prior probability distribution for the Bernoulli, binomial, negative binomial and geometric distributions. The beta distribution is a suitable model for the random behavior of percentages and proportions.


Weibull:


From here) the Weibull becomes an exponential when when the shape parameter is \(1.\) Using the parameterization in Wikipedia, the CDF of the Weibull is:

\[1 - \exp\left(-x/\lambda\right)^k\]

with \(k\) being the shape parameter. And the CDF for the exponential is

\[1 - \exp\left(-\lambda x \right)\]

The exponential is also related to the gamma distribution (shape parameter of \(1\)). The advantage of the Weibull is the simple closed form of the CDF.

From here:

Survival rates - mortality rates in biology, and breakdown/failures in machine reliability analysis

Extreme events - big spikes in daily rainfall/discharges from a natural water source

Insurance - Reinsurance claims.

From here:

The Weibull distribution is a result of a random fragmentation process where the probability of splitting a particle into fragments depends on the particle size. If that probability is independent of the particle size, the log-normal size distribution results. Recently, Brown and Wohletz (1995) demonstrated that the Weibull distribution arises naturally as a consequence of the fragmentation process being fractal. Fragmentation of a solid particle is initiated by generation of a fractal crack tree.

From here:

The quantity and distribution of a tree’s foliage, and the shape of its crown, are important factors for determining a tree’s potential to utilize solar energy and assimilate carbon through photosynthesis. The Weibull distribution was selected to model the vertical distributions of foliage weight and surface area.

From here and here

Particle fragmentation results in a fractal distribution of progeny fragments. Based on the fractal distribution, a statistical approach is proposed to interpret the scaling characteristics in tensile strength of brittle particles. It is found that the relationship between the cumulative survival probability of the particles and the tensile strength follows the Weibull distribution reasonably well.

From here

The Weibull distribution derived its popularity from its use to model the strength of materials, and has since been used to model just about everything. In particular, the Weibull distribution is used to represent wear-out lifetimes in reliability, wind speed, rainfall intensity, health related issues, germination, duration of industrial stoppages, migratory systems, and thunderstorm data.

From here:

The Weibull distribution is also widely used in reliability as a model for time to failure. It generalizes the exponential model to include nonconstant failure rate functions. In particular, it encompasses both increasing and decreasing failure rate functions and has been used successfully to describe both initial burning failures as well as failures attributed to wearout.

From here:

Weibull distributions are often used to model the time until a given technical device fails. If the failure rate of the device decreases over time, one chooses \(k<1\) (resulting in a decreasing density \(f\)). If the failure rate of the device is constant over time, one chooses \(k=1\), again resulting in a decreasing function \(f\). If the failure rate of the device increases over time, one chooses \(k>1\) and obtains a density \(f\) which increases towards a maximum and then decreases forever. Manufacturers will often supply the shape and scale parameters for the lifetime distribution of a particular device. The Weibull distribution can also be used to model the distribution of wind speeds at a given location on Earth.


Inverse or reversed Weibull (Fréchet distribution):


From here:

The Fréchet \([α,β,μ]\) represents a continuous statistical distribution defined over the interval \([\mu,\infty]\) and parametrized by a real number \(μ\) (called a “location parameter”) and two positive real numbers \(α\) and \(β\) (called a “shape parameter” and a “scale parameter”, respectively).

It is one of four distributions (along with Gumbel Distribution, Extreme Value Distribution, and Weibull Distribution) classified under the general heading “extreme value distributions” and is therefore used as a tool for quantifying “extreme” or “rare” events.

The PDF of Fréchet Distribution \([α,β]\) is precisely the same as the transformed distribution on \(X\sim \text{Weibull[\alpha, \beta]}\), such that \(Y = \beta^2/X.\)


Multinomial & Categorical:


The multinomial distribution is a generalization of the binomial. In the binomial, there are \(n\) independent trials or experiments, and we add the number of successes. In each trial there are only two possibilities: success or failure - each trial or experiment is a Bernoulli trial.

In a multinomial distribution there are also \(n\) experiments, but the outcome of each experiment is not S or F, but rather \(K\) possible categories. For instance, we survey \(n=15\) people, asking them whether they intend to vote Democrat, Republican or Independent, i.e. \(K = 3.\) Knowing the percentage of supporters for each option in the general popoulation we can calculate the probability of the event \((D=7, R=5, I=3).\)

If \(K=2\) we are back to \((0,1)\) binary outcomes with \(n\) experiments, which is the definition of the binomial distribution.

If \(n=1\) and \(K = 2,\) we have a Bernouilli experiment.

If \(n=1\) but we have more than \(2\) categories, we are dealing with a categorical distribution. The categorical distribution is the generalization of the Bernoulli distribution for a categorical random variable, i.e. for a discrete variable with more than two possible outcomes, such as the roll of a die, \(K=6\). On the other hand, the categorical distribution is a special case of the multinomial distribution, in that it gives the probabilities of potential outcomes of a single drawing, \((n=1),\) rather than multiple drawings.

The parameters specifying the probabilities of each possible outcome are constrained only by the fact that each must be in the range 0 to 1, and all must sum to 1.


Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.