1.
Probability theory
–
Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. It is not possible to predict precisely results of random events, two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, a great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. Christiaan Huygens published a book on the subject in 1657 and in the 19th century, initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial. Eventually, analytical considerations compelled the incorporation of continuous variables into the theory and this culminated in modern probability theory, on foundations laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of space, introduced by Richard von Mises. This became the mostly undisputed axiomatic basis for modern probability theory, most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory-based treatment of probability covers the discrete, continuous, consider an experiment that can produce a number of outcomes. The set of all outcomes is called the space of the experiment. The power set of the space is formed by considering all different collections of possible results. For example, rolling an honest die produces one of six possible results, one collection of possible results corresponds to getting an odd number. Thus, the subset is an element of the set of the sample space of die rolls. In this case, is the event that the die falls on some odd number, If the results that actually occur fall in a given event, that event is said to have occurred. Probability is a way of assigning every event a value between zero and one, with the requirement that the event made up of all possible results be assigned a value of one, the probability that any one of the events, or will occur is 5/6. This is the same as saying that the probability of event is 5/6 and this event encompasses the possibility of any number except five being rolled. The mutually exclusive event has a probability of 1/6, and the event has a probability of 1, discrete probability theory deals with events that occur in countable sample spaces. Modern definition, The modern definition starts with a finite or countable set called the sample space, which relates to the set of all possible outcomes in classical sense, denoted by Ω

2.
Extreme value theory
–
Extreme value theory or extreme value analysis is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a random variable. Extreme value analysis is used in many disciplines, such as structural engineering, finance, earth sciences, traffic prediction. For example, EVA might be used in the field of hydrology to estimate the probability of an unusually large flooding event, similarly, for the design of a breakwater, a coastal engineer would seek to estimate the 50-year wave and design the structure accordingly. Two approaches exist for practical extreme value analysis, the first method relies on deriving block maxima series as a preliminary step. In many situations it is customary and convenient to extract the annual maxima, the second method relies on extracting, from a continuous record, the peak values reached for any period during which values exceed a certain threshold. This method is referred to as the Peak Over Threshold method. For AMS data, the analysis may partly rely on the results of the Fisher–Tippett–Gnedenko theorem, however, in practice, various procedures are applied to select between a wider range of distributions. The theorem here relates to the distributions for the minimum or the maximum of a very large collection of independent random variables from the same distribution. For POT data, the analysis may involve fitting two distributions, one for the number of events in a period considered and a second for the size of the exceedances. A common assumption for the first is the Poisson distribution, with the generalized Pareto distribution being used for the exceedances, a tail-fitting can be based on the Pickands–Balkema–de Haan theorem. Novak reserves the term “POT method” to the case where the threshold is non-random, pipeline failures due to pitting corrosion. Anomalous IT network traffic, prevent attackers from reaching important data The field of value theory was pioneered by Leonard Tippett. Tippett was employed by the British Cotton Industry Research Association, where he worked to make cotton thread stronger, in his studies, he realized that the strength of a thread was controlled by the strength of its weakest fibres. With the help of R. A. Fisher, Tippet obtained three asymptotic limits describing the distributions of extremes, emil Julius Gumbel codified this theory in his 1958 book Statistics of Extremes, including the Gumbel distributions that bear his name. A summary of important publications relating to extreme value theory can be found on the article List of publications in statistics. Let X1, …, X n be a sequence of independent and identically distributed variables with cumulative distribution function F, in theory, the exact distribution of the maximum can be derived, Pr = Pr = Pr ⋯ Pr = n. The associated indicator function I n = I is a Bernoulli process with a probability p = that depends on the magnitude z of the extreme event

3.
Gumbel distribution
–
In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions. This distribution might be used to represent the distribution of the level of a river in a particular year if there was a list of maximum values for the past ten years. It is useful in predicting the chance that an extreme earthquake, the rest of this article refers to the Gumbel to model the distribution of the maximum value. To model the value, use the negative of the original values. The Gumbel distribution is a case of the generalized extreme value distribution. It is also known as the distribution and the double exponential distribution. It is related to the Gompertz distribution, when its density is first reflected about the origin and then restricted to the half line. In the latent variable formulation of the logit model — common in discrete choice theory — the errors of the latent variables follow a Gumbel distribution. This is useful because the difference of two Gumbel-distributed random variables has a logistic distribution, the Gumbel distribution is named after Emil Julius Gumbel, based on his original papers describing the distribution. The cumulative distribution function of the Gumbel distribution is F = e − e − / β. The mode is μ, while the median is μ − β ln , and the mean is given by E = μ + γ β, the standard deviation is β π /6. The standard Gumbel distribution is the case where μ =0 and β =1 with cumulative distribution function F = e − e and probability density function f = e −. In this case the mode is 0, the median is − ln ≈0.3665, the mean is γ, the cumulants, for n>1, are given by κ n =. If X has a Gumbel distribution, then the distribution of Y=-X given that Y is positive. The cdf G of Y is related to F, the cdf of X, consequently the densities are related by g = f / F, the Gompertz density is proportional to a reflected Gumbel density, restricted to the positive half-line. If X is an exponential with mean 1, then -log has a standard Gumbel-Distribution, theory related to the generalized multivariate log-gamma distribution provides a multivariate version of the Gumbel distribution. In pre-software times probability paper was used to picture the Gumbel distribution, the paper is based on linearization of the cumulative distribution function F, − ln = / β In the paper the horizontal axis is constructed at a double log scale. By plotting F on the axis of the paper and the x -variable on the vertical axis

4.
Probability distribution
–
For instance, if the random variable X is used to denote the outcome of a coin toss, then the probability distribution of X would take the value 0.5 for X = heads, and 0.5 for X = tails. In more technical terms, the probability distribution is a description of a phenomenon in terms of the probabilities of events. Examples of random phenomena can include the results of an experiment or survey, a probability distribution is defined in terms of an underlying sample space, which is the set of all possible outcomes of the random phenomenon being observed. The sample space may be the set of numbers or a higher-dimensional vector space, or it may be a list of non-numerical values, for example. Probability distributions are divided into two classes. A discrete probability distribution can be encoded by a discrete list of the probabilities of the outcomes, on the other hand, a continuous probability distribution is typically described by probability density functions. The normal distribution represents a commonly encountered continuous probability distribution, more complex experiments, such as those involving stochastic processes defined in continuous time, may demand the use of more general probability measures. A probability distribution whose sample space is the set of numbers is called univariate. Important and commonly encountered univariate probability distributions include the distribution, the hypergeometric distribution. The multivariate normal distribution is a commonly encountered multivariate distribution, to define probability distributions for the simplest cases, one needs to distinguish between discrete and continuous random variables. For example, the probability that an object weighs exactly 500 g is zero. Continuous probability distributions can be described in several ways, the cumulative distribution function is the antiderivative of the probability density function provided that the latter function exists. As probability theory is used in diverse applications, terminology is not uniform. The following terms are used for probability distribution functions, Distribution. Probability distribution, is a table that displays the probabilities of outcomes in a sample. Could be called a frequency distribution table, where all occurrences of outcomes sum to 1. Distribution function, is a form of frequency distribution table. Probability distribution function, is a form of probability distribution table

5.
Benford's law
–
Benfords law, also called the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in naturally occurring collections of numbers. For example, in sets which obey the law, the number 1 appears as the most significant digit about 30% of the time, by contrast, if the digits were distributed uniformly, they would each occur about 11. 1% of the time. Benfords law also makes predictions about the distribution of digits, third digits, digit combinations. It tends to be most accurate values are distributed across multiple orders of magnitude. The graph here shows Benfords law for base 10, there is a generalization of the law to numbers expressed in other bases, and also a generalization from leading 1 digit to leading n digits. It is named after physicist Frank Benford, who stated it in 1938, Benfords law is a special case of Zipfs law. A set of numbers is said to satisfy Benfords law if the digit d occurs with probability P = log 10 − log 10 = log 10 = log 10 . Therefore, this is the distribution expected if the mantissae of the logarithms of the numbers are uniformly and randomly distributed. For example, a x, constrained to lie between 1 and 10, starts with the digit 1 if 1 ≤ x <2. Therefore, x starts with the digit 1 if log 1 ≤ log x < log 2, the probabilities are proportional to the interval widths, and this gives the equation above. An extension of Benfords law predicts the distribution of first digits in other bases besides decimal, in fact, the general form is, P = log b − log b = log b . For b =2, Benfords law is true but trivial, the discovery of Benfords law goes back to 1881, when the American astronomer Simon Newcomb noticed that in logarithm tables the earlier pages were much more worn than the other pages. Newcombs published result is the first known instance of this observation and includes a distribution on the second digit, Newcomb proposed a law that the probability of a single number N being the first digit of a number was equal to log − log. The phenomenon was noted in 1938 by the physicist Frank Benford. The total number of used in the paper was 20,229. This discovery was named after Benford. In 1995, Ted Hill proved the result about mixed distributions mentioned below, arno Berger and Ted Hill have stated that, The widely known phenomenon called Benford’s law continues to defy attempts at an easy derivation

6.
Bernoulli distribution
–
It can be used to represent a coin toss where 1 and 0 would represent head and tail, respectively. In particular, unfair coins would have p ≠0.5, the Bernoulli distribution is a special case of the binomial distribution where a single experiment/trial is conducted. It is also a case of the two-point distribution, for which the outcome need not be a bit. If X is a variable with this distribution, we have. The probability mass function f of this distribution, over possible outcomes k, is f = { p if k =1,1 − p if k =0 and this can also be expressed as f = p k 1 − k for k ∈. The Bernoulli distribution is a case of the binomial distribution with n =1. The Bernoulli distributions for 0 ≤ p ≤1 form an exponential family, the maximum likelihood estimator of p based on a random sample is the sample mean. When we take the standardized Bernoulli distributed random variable X − E Var we find that this random variable attains q p q with probability p, the Bernoulli distribution is simply B. The categorical distribution is the generalization of the Bernoulli distribution for variables with any constant number of discrete values, the Beta distribution is the conjugate prior of the Bernoulli distribution. The geometric distribution models the number of independent and identical Bernoulli trials needed to get one success, if Y ~ Bernoulli, then has a Rademacher distribution. Bernoulli process Bernoulli sampling Bernoulli trial Binary entropy function Binomial distribution McCullagh, Peter, Nelder, johnson, N. L. Kotz, S. Kemp A. Univariate Discrete Distributions. Binomial distribution, Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4 Weisstein, Eric W. Bernoulli Distribution

7.
Beta-binomial distribution
–
The beta-binomial distribution is the binomial distribution in which the probability of success at each trial is not fixed but random and follows the beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and it reduces to the Bernoulli distribution as a special case when n =1. For α = β =1, it is the uniform distribution from 0 to n. It also approximates the binomial distribution arbitrarily well for large α and β, the Beta distribution is a conjugate distribution of the binomial distribution. This fact leads to an analytically tractable compound distribution where one can think of the p parameter in the distribution as being randomly drawn from a beta distribution. Namely, if X ∼ Bin then P = L = p k n − k where Bin stands for the distribution. Using the properties of the function, this can alternatively be written f = Γ Γ Γ Γ Γ Γ Γ Γ Γ. The beta-binomial distribution can also be motivated via an urn model for positive values of α and β. Specifically, imagine an urn containing α red balls and β black balls, if a red ball is observed, then two red balls are returned to the urn. Likewise, if a ball is drawn, then two black balls are returned to the urn. If this is repeated n times, then the probability of observing k red balls follows a distribution with parameters n, α and β. The first three raw moments are μ1 = n α α + β μ2 = n α μ3 = n α, the parameter ρ is known as the intra class or intra cluster correlation. It is this positive correlation which gives rise to overdispersion, note that these estimates can be non-sensically negative which is evidence that the data is either undispersed or underdispersed relative to the binomial distribution. In this case, the distribution and the hypergeometric distribution are alternative candidates respectively. While closed-form maximum likelihood estimates are impractical, given that the pdf consists of common functions, maximum likelihood estimates from empirical data can be computed using general methods for fitting multinomial Pólya distributions, methods for which are described in. The R package VGAM through the function vglm, via maximum likelihood, note also that there is no requirement that n is fixed throughout the observations. The following data gives the number of children among the first 12 children of family size 13 in 6115 families taken from hospital records in 19th century Saxony. The 13th child is ignored to assuage the effect of families non-randomly stopping when a desired gender is reached

8.
Binomial distribution
–
The binomial distribution is the basis for the popular binomial test of statistical significance. The binomial distribution is used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the distribution is a hypergeometric distribution. However, for N much larger than n, the distribution remains a good approximation. In general, if the random variable X follows the distribution with parameters n ∈ ℕ and p ∈. The probability of getting exactly k successes in n trials is given by the probability mass function, N, where = n. k. is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows, K successes occur with probability pk and n − k failures occur with probability n − k. However, the k successes can occur anywhere among the n trials, in creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as f = f. The probability mass function satisfies the recurrence relation, for every n, p, Looking at the expression ƒ as a function of k. This k value can be found by calculating f f = p and comparing it to 1. There is always an integer M that satisfies p −1 ≤ M < p. ƒ is monotone increasing for k < M and monotone decreasing for k > M, in this case, there are two values for which ƒ is maximal, p and p −1. M is the most probable outcome of the Bernoulli trials and is called the mode, note that the probability of it occurring can be fairly small. It can also be represented in terms of the incomplete beta function, as follows. Some closed-form bounds for the distribution function are given below. Suppose a biased coin comes up heads with probability 0.3 when tossed, what is the probability of achieving 0,1. For example, if n =100, and p =1/4, P k −1 − = n p ∑ k =1 n. Since Var = p, we get, Var = Var = Var + ⋯ + Var = n Var = n p