1.
Beta distribution
–
The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. In Bayesian inference, the distribution is the conjugate prior probability distribution for the Bernoulli, binomial, negative binomial. The beta distribution is a model for the random behavior of percentages. The beta function, B, is a constant to ensure that the total probability integrates to 1. In the above equations x is an observed value that actually occurred—of a random process X. L. Johnson. Several authors, including N. L. Johnson and S, the probability density function satisfies the differential equation f ′ = f x − x. The cumulative distribution function is F = B B = I x where B is the beta function. The mode of a Beta distributed random variable X with α, β >1 is the most likely value of the distribution, when both parameters are less than one, this is the anti-mode, the lowest point of the probability density curve. Letting α = β, the expression for the mode simplifies to 1/2, showing that for α = β >1 the mode, is at the center of the distribution, it is symmetric in those cases. See Shapes section in this article for a full list of mode cases, for several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the value of the density function occurring at the end is finite, for example, in the case of α =2, β =1, the density function becomes a right-triangle distribution which is finite at both ends. In several other cases there is a singularity at one end, for example, in the case α = β = 1/2, the Beta distribution simplifies to become the arcsine distribution. There is debate among mathematicians about some of cases and whether the ends can be called modes or not. There is no general closed-form expression for the median of the distribution for arbitrary values of α and β. Closed-form expressions for particular values of the parameters α and β follow, For symmetric cases α = β, median = 1/2. For α =1 and β >0, median =1 −2 −1 β For α >0 and β =1, median =2 −1 α For α =3 and β =2, median =0.6142724318676105. The real solution to the quartic equation 1 − 8x3 + 6x4 =0, for α =2 and β =3, median =0.38572756813238945. When α, β ≥1, the error in this approximation is less than 4%

2.
Hypergeometric distribution
–
In contrast, the binomial distribution describes the probability of k successes in n draws with replacement. In statistics, the hypergeometric test uses the hypergeometric distribution to calculate the significance of having drawn a specific k successes from the aforementioned population. The test is used to identify which sub-populations are over- or under-represented in a sample. This test has a range of applications. For example, a group could use the test to understand their customer base by testing a set of known customers for over-representation of various demographic subgroups. The following conditions characterize the distribution, The result of each draw can be classified into one of two mutually exclusive categories. The probability of a success changes on each draw, as each draw decreases the population, the pmf is positive when max ≤ k ≤ min. The pmf satisfies the recurrence relation P = P with P =, as one would expect, the probabilities sum up to 1, ∑0 ≤ k ≤ n =1 This is essentially Vandermondes identity from combinatorics. Also note the following identity holds, = and this follows from the symmetry of the problem, but it can also be shown by expressing the binomial coefficients in terms of factorials and rearranging the latter. The classical application of the distribution is sampling without replacement. Think of an urn with two types of marbles, red ones and green ones, define drawing a green marble as a success and drawing a red marble as a failure. If the variable N describes the number of all marbles in the urn and K describes the number of green marbles, in this example, X is the random variable whose outcome is k, the number of green marbles actually drawn in the experiment. This situation is illustrated by the following table, Now. Standing next to the urn, you close your eyes and draw 10 marbles without replacement, what is the probability that exactly 4 of the 10 are green. This problem is summarized by the following table, The probability of drawing exactly k green marbles can be calculated by the formula P = f =. Hence, in this example calculate P = f = =5 ⋅814506010272278170 =0.003964583 …, intuitively we would expect it to be even more unlikely for all 5 marbles to be green. P = f = =1 ⋅122175910272278170 =0.0001189375 …, As expected, in Holdem Poker players make the best hand they can combining the two cards in their hand with the 5 cards eventually turned up on the table. The deck has 52 and there are 13 of each suit, for this example assume a player has 2 clubs in the hand and there are 3 cards showing on the table,2 of which are also clubs

3.
Probability theory
–
Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. It is not possible to predict precisely results of random events, two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, a great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. Christiaan Huygens published a book on the subject in 1657 and in the 19th century, initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial. Eventually, analytical considerations compelled the incorporation of continuous variables into the theory and this culminated in modern probability theory, on foundations laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of space, introduced by Richard von Mises. This became the mostly undisputed axiomatic basis for modern probability theory, most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory-based treatment of probability covers the discrete, continuous, consider an experiment that can produce a number of outcomes. The set of all outcomes is called the space of the experiment. The power set of the space is formed by considering all different collections of possible results. For example, rolling an honest die produces one of six possible results, one collection of possible results corresponds to getting an odd number. Thus, the subset is an element of the set of the sample space of die rolls. In this case, is the event that the die falls on some odd number, If the results that actually occur fall in a given event, that event is said to have occurred. Probability is a way of assigning every event a value between zero and one, with the requirement that the event made up of all possible results be assigned a value of one, the probability that any one of the events, or will occur is 5/6. This is the same as saying that the probability of event is 5/6 and this event encompasses the possibility of any number except five being rolled. The mutually exclusive event has a probability of 1/6, and the event has a probability of 1, discrete probability theory deals with events that occur in countable sample spaces. Modern definition, The modern definition starts with a finite or countable set called the sample space, which relates to the set of all possible outcomes in classical sense, denoted by Ω

4.
Gumbel distribution
–
In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions. This distribution might be used to represent the distribution of the level of a river in a particular year if there was a list of maximum values for the past ten years. It is useful in predicting the chance that an extreme earthquake, the rest of this article refers to the Gumbel to model the distribution of the maximum value. To model the value, use the negative of the original values. The Gumbel distribution is a case of the generalized extreme value distribution. It is also known as the distribution and the double exponential distribution. It is related to the Gompertz distribution, when its density is first reflected about the origin and then restricted to the half line. In the latent variable formulation of the logit model — common in discrete choice theory — the errors of the latent variables follow a Gumbel distribution. This is useful because the difference of two Gumbel-distributed random variables has a logistic distribution, the Gumbel distribution is named after Emil Julius Gumbel, based on his original papers describing the distribution. The cumulative distribution function of the Gumbel distribution is F = e − e − / β. The mode is μ, while the median is μ − β ln , and the mean is given by E = μ + γ β, the standard deviation is β π /6. The standard Gumbel distribution is the case where μ =0 and β =1 with cumulative distribution function F = e − e and probability density function f = e −. In this case the mode is 0, the median is − ln ≈0.3665, the mean is γ, the cumulants, for n>1, are given by κ n =. If X has a Gumbel distribution, then the distribution of Y=-X given that Y is positive. The cdf G of Y is related to F, the cdf of X, consequently the densities are related by g = f / F, the Gompertz density is proportional to a reflected Gumbel density, restricted to the positive half-line. If X is an exponential with mean 1, then -log has a standard Gumbel-Distribution, theory related to the generalized multivariate log-gamma distribution provides a multivariate version of the Gumbel distribution. In pre-software times probability paper was used to picture the Gumbel distribution, the paper is based on linearization of the cumulative distribution function F, − ln = / β In the paper the horizontal axis is constructed at a double log scale. By plotting F on the axis of the paper and the x -variable on the vertical axis

5.
Extreme value theory
–
Extreme value theory or extreme value analysis is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a random variable. Extreme value analysis is used in many disciplines, such as structural engineering, finance, earth sciences, traffic prediction. For example, EVA might be used in the field of hydrology to estimate the probability of an unusually large flooding event, similarly, for the design of a breakwater, a coastal engineer would seek to estimate the 50-year wave and design the structure accordingly. Two approaches exist for practical extreme value analysis, the first method relies on deriving block maxima series as a preliminary step. In many situations it is customary and convenient to extract the annual maxima, the second method relies on extracting, from a continuous record, the peak values reached for any period during which values exceed a certain threshold. This method is referred to as the Peak Over Threshold method. For AMS data, the analysis may partly rely on the results of the Fisher–Tippett–Gnedenko theorem, however, in practice, various procedures are applied to select between a wider range of distributions. The theorem here relates to the distributions for the minimum or the maximum of a very large collection of independent random variables from the same distribution. For POT data, the analysis may involve fitting two distributions, one for the number of events in a period considered and a second for the size of the exceedances. A common assumption for the first is the Poisson distribution, with the generalized Pareto distribution being used for the exceedances, a tail-fitting can be based on the Pickands–Balkema–de Haan theorem. Novak reserves the term “POT method” to the case where the threshold is non-random, pipeline failures due to pitting corrosion. Anomalous IT network traffic, prevent attackers from reaching important data The field of value theory was pioneered by Leonard Tippett. Tippett was employed by the British Cotton Industry Research Association, where he worked to make cotton thread stronger, in his studies, he realized that the strength of a thread was controlled by the strength of its weakest fibres. With the help of R. A. Fisher, Tippet obtained three asymptotic limits describing the distributions of extremes, emil Julius Gumbel codified this theory in his 1958 book Statistics of Extremes, including the Gumbel distributions that bear his name. A summary of important publications relating to extreme value theory can be found on the article List of publications in statistics. Let X1, …, X n be a sequence of independent and identically distributed variables with cumulative distribution function F, in theory, the exact distribution of the maximum can be derived, Pr = Pr = Pr ⋯ Pr = n. The associated indicator function I n = I is a Bernoulli process with a probability p = that depends on the magnitude z of the extreme event

6.
Zipf's law
–
The law is named after the American linguist George Kingsley Zipf, who popularized it and sought to explain it, though he did not claim to have originated it. The French stenographer Jean-Baptiste Estoup appears to have noticed the regularity before Zipf and it was also noted in 1913 by German physicist Felix Auerbach. Zipfs law states that given some corpus of natural language utterances, for example, in the Brown Corpus of American English text, the word the is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences. True to Zipfs Law, the word of accounts for slightly over 3. 5% of words, followed by. Only 135 vocabulary items are needed to account for half the Brown Corpus, the appearance of the distribution in rankings of cities by population was first noticed by Felix Auerbach in 1913. When Zipfs law is checked for cities, a better fit has been found with exponent s =1.07, while Zipfs law holds for the upper tail of the distribution, the entire distribution of cities is log-normal and follows Gibrats law. Both laws are consistent because a log-normal tail can not be distinguished from a Pareto tail. Zipfs law is most easily observed by plotting the data on a log-log graph, for example, the word the would appear at x = log, y = log. It is also possible to plot reciprocal rank against frequency or reciprocal frequency or interword interval against rank, the data conform to Zipfs law to the extent that the plot is linear. Formally, let, N be the number of elements, k be their rank and it has been claimed that this representation of Zipfs law is more suitable for statistical testing, and in this way it has been analyzed in more than 30,000 English texts. The goodness-of-fit tests yield that only about 15% of the texts are statistically compatible with this form of Zipfs law, slight variations in the definition of Zipfs law can increase this percentage up to close to 50%. In the example of the frequency of words in the English language, N is the number of words in the English language and, if we use the version of Zipfs law. F will then be the fraction of the time the kth most common word occurs, the law may also be written, f =1 k s H N, s where HN, s is the Nth generalized harmonic number. The simplest case of Zipfs law is a 1⁄f function, given a set of Zipfian distributed frequencies, sorted from most common to least common, the second most common frequency will occur ½ as often as the first. The third most common frequency will occur ⅓ as often as the first, the fourth most common frequency will occur ¼ as often as the first. The nth most common frequency will occur 1⁄n as often as the first, however, this cannot hold exactly, because items must occur an integer number of times, there cannot be 2.5 occurrences of a word. Nevertheless, over fairly wide ranges, and to a good approximation. Mathematically, the sum of all frequencies in a Zipf distribution is equal to the harmonic series

7.
Wigner semicircle distribution
–
This distribution arises as the limiting distribution of eigenvalues of many random symmetric matrices as the size of the matrix approaches infinity. It is a beta distribution, more precisely, if Y is beta distributed with parameters α = β = 3/2. The Chebyshev polynomials of the second kind are orthogonal polynomials with respect to the Wigner semicircle distribution, similarly, the characteristic function is given by, φ =2 J1 R t where J1 is the Bessel function. Noting that the corresponding integral involving sin is zero. )In the limit of R approaching zero, differential equation In free probability theory, the role of Wigners semicircle distribution is analogous to that of the normal distribution in classical probability theory. The W. s. d. is the limit of the Kesten–McKay distributions, in number-theoretic literature, the Wigner distribution is sometimes called the Sato–Tate distribution. Marchenko–Pastur distribution or Free Poisson distribution Milton Abramowitz and Irene A. Stegun, handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables

8.
Gamma distribution
–
In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The common exponential distribution and chi-squared distribution are special cases of the gamma distribution, there are three different parametrizations in common use, With a shape parameter k and a scale parameter θ. With a shape parameter α = k and a scale parameter β = 1/θ. With a shape parameter k and a mean parameter μ = k/β, in each of these three forms, both parameters are positive real numbers. The gamma distribution is the maximum entropy probability distribution for a random variable X for which E = kθ = α/β is fixed and greater than zero, and E = ψ + ln = ψ − ln is fixed. The parameterization with k and θ appears to be common in econometrics and certain other applied fields. For instance, in testing, the waiting time until death is a random variable that is frequently modeled with a gamma distribution. If k is an integer, then the distribution represents an Erlang distribution, i. e. the sum of k independent exponentially distributed random variables. The gamma distribution can be parameterized in terms of a shape parameter α = k, both parametrizations are common because either can be more convenient depending on the situation. The cumulative distribution function is the gamma function, F = ∫0 x f d u = γ Γ where γ is the lower incomplete gamma function. If α is an integer, the cumulative distribution function has the following series expansion. E − β x = e − β x ∑ i = α ∞ i i, here Γ is the gamma function evaluated at k. The cumulative distribution function is the gamma function, F = ∫0 x f d u = γ Γ where γ is the lower incomplete gamma function. It can also be expressed as follows, if k is a positive integer, I e − x / θ = e − x / θ ∑ i = k ∞1 i. I The skewness is equal to 2 / k, it only on the shape parameter. Unlike the mode and the mean which have readily calculable formulas based on the parameters, the median for this distribution is defined as the value ν such that 1 Γ θ k ∫0 ν x k −1 e − x / θ d x =12. A formula for approximating the median for any distribution, when the mean is known, has been derived based on the fact that the ratio μ/ is approximately a linear function of k when k ≥1. The approximation formula is ν ≈ μ3 k −0.83 k +0.2, K. P. Later, it was shown that λ is a convex function of m

9.
Benford's law
–
Benfords law, also called the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in naturally occurring collections of numbers. For example, in sets which obey the law, the number 1 appears as the most significant digit about 30% of the time, by contrast, if the digits were distributed uniformly, they would each occur about 11. 1% of the time. Benfords law also makes predictions about the distribution of digits, third digits, digit combinations. It tends to be most accurate values are distributed across multiple orders of magnitude. The graph here shows Benfords law for base 10, there is a generalization of the law to numbers expressed in other bases, and also a generalization from leading 1 digit to leading n digits. It is named after physicist Frank Benford, who stated it in 1938, Benfords law is a special case of Zipfs law. A set of numbers is said to satisfy Benfords law if the digit d occurs with probability P = log 10 − log 10 = log 10 = log 10 . Therefore, this is the distribution expected if the mantissae of the logarithms of the numbers are uniformly and randomly distributed. For example, a x, constrained to lie between 1 and 10, starts with the digit 1 if 1 ≤ x <2. Therefore, x starts with the digit 1 if log 1 ≤ log x < log 2, the probabilities are proportional to the interval widths, and this gives the equation above. An extension of Benfords law predicts the distribution of first digits in other bases besides decimal, in fact, the general form is, P = log b − log b = log b . For b =2, Benfords law is true but trivial, the discovery of Benfords law goes back to 1881, when the American astronomer Simon Newcomb noticed that in logarithm tables the earlier pages were much more worn than the other pages. Newcombs published result is the first known instance of this observation and includes a distribution on the second digit, Newcomb proposed a law that the probability of a single number N being the first digit of a number was equal to log − log. The phenomenon was noted in 1938 by the physicist Frank Benford. The total number of used in the paper was 20,229. This discovery was named after Benford. In 1995, Ted Hill proved the result about mixed distributions mentioned below, arno Berger and Ted Hill have stated that, The widely known phenomenon called Benford’s law continues to defy attempts at an easy derivation

10.
Beta-binomial distribution
–
The beta-binomial distribution is the binomial distribution in which the probability of success at each trial is not fixed but random and follows the beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and it reduces to the Bernoulli distribution as a special case when n =1. For α = β =1, it is the uniform distribution from 0 to n. It also approximates the binomial distribution arbitrarily well for large α and β, the Beta distribution is a conjugate distribution of the binomial distribution. This fact leads to an analytically tractable compound distribution where one can think of the p parameter in the distribution as being randomly drawn from a beta distribution. Namely, if X ∼ Bin then P = L = p k n − k where Bin stands for the distribution. Using the properties of the function, this can alternatively be written f = Γ Γ Γ Γ Γ Γ Γ Γ Γ. The beta-binomial distribution can also be motivated via an urn model for positive values of α and β. Specifically, imagine an urn containing α red balls and β black balls, if a red ball is observed, then two red balls are returned to the urn. Likewise, if a ball is drawn, then two black balls are returned to the urn. If this is repeated n times, then the probability of observing k red balls follows a distribution with parameters n, α and β. The first three raw moments are μ1 = n α α + β μ2 = n α μ3 = n α, the parameter ρ is known as the intra class or intra cluster correlation. It is this positive correlation which gives rise to overdispersion, note that these estimates can be non-sensically negative which is evidence that the data is either undispersed or underdispersed relative to the binomial distribution. In this case, the distribution and the hypergeometric distribution are alternative candidates respectively. While closed-form maximum likelihood estimates are impractical, given that the pdf consists of common functions, maximum likelihood estimates from empirical data can be computed using general methods for fitting multinomial Pólya distributions, methods for which are described in. The R package VGAM through the function vglm, via maximum likelihood, note also that there is no requirement that n is fixed throughout the observations. The following data gives the number of children among the first 12 children of family size 13 in 6115 families taken from hospital records in 19th century Saxony. The 13th child is ignored to assuage the effect of families non-randomly stopping when a desired gender is reached