1.
Probability theory
–
Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. It is not possible to predict precisely results of random events, two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, a great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. Christiaan Huygens published a book on the subject in 1657 and in the 19th century, initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial. Eventually, analytical considerations compelled the incorporation of continuous variables into the theory and this culminated in modern probability theory, on foundations laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of space, introduced by Richard von Mises. This became the mostly undisputed axiomatic basis for modern probability theory, most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory-based treatment of probability covers the discrete, continuous, consider an experiment that can produce a number of outcomes. The set of all outcomes is called the space of the experiment. The power set of the space is formed by considering all different collections of possible results. For example, rolling an honest die produces one of six possible results, one collection of possible results corresponds to getting an odd number. Thus, the subset is an element of the set of the sample space of die rolls. In this case, is the event that the die falls on some odd number, If the results that actually occur fall in a given event, that event is said to have occurred. Probability is a way of assigning every event a value between zero and one, with the requirement that the event made up of all possible results be assigned a value of one, the probability that any one of the events, or will occur is 5/6. This is the same as saying that the probability of event is 5/6 and this event encompasses the possibility of any number except five being rolled. The mutually exclusive event has a probability of 1/6, and the event has a probability of 1, discrete probability theory deals with events that occur in countable sample spaces. Modern definition, The modern definition starts with a finite or countable set called the sample space, which relates to the set of all possible outcomes in classical sense, denoted by Ω
2.
Random variable
–
In probability and statistics, a random variable, random quantity, aleatory variable, or stochastic variable is a variable quantity whose value depends on possible outcomes. It is common that these outcomes depend on physical variables that are not well understood. For example, when you toss a coin, the outcome of heads or tails depends on the uncertain physics. Which outcome will be observed is not certain, of course the coin could get caught in a crack in the floor, but such a possibility is excluded from consideration. The domain of a variable is the set of possible outcomes. In the case of the coin, there are two possible outcomes, namely heads or tails. Since one of these outcomes must occur, thus either the event that the coin lands heads or the event that the coin lands tails must have non-zero probability, a random variable is defined as a function that maps outcomes to numerical quantities, typically real numbers. In this sense, it is a procedure for assigning a numerical quantity to each outcome, and, contrary to its name. What is random is the physics that describes how the coin lands. A random variables possible values might represent the possible outcomes of a yet-to-be-performed experiment and they may also conceptually represent either the results of an objectively random process or the subjective randomness that results from incomplete knowledge of a quantity. The mathematics works the same regardless of the interpretation in use. A random variable has a probability distribution, which specifies the probability that its value falls in any given interval, two random variables with the same probability distribution can still differ in terms of their associations with, or independence from, other random variables. The realizations of a variable, that is, the results of randomly choosing values according to the variables probability distribution function, are called random variates. The formal mathematical treatment of random variables is a topic in probability theory, in that context, a random variable is understood as a function defined on a sample space whose outputs are numerical values. A random variable X, Ω → E is a function from a set of possible outcomes Ω to a measurable space E. The technical axiomatic definition requires Ω to be a probability space, a random variable does not return a probability. The probability of a set of outcomes is given by the probability measure P with which Ω is equipped. Rather, X returns a numerical quantity of outcomes in Ω — e. g. the number of heads in a collection of coin flips
3.
Probability mass function
–
In probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Suppose that X, S → A is a random variable defined on a sample space S. Then the probability mass function fX, A → for X is defined as f X = Pr = Pr and that is, fX may be defined for all real numbers and fX =0 for all x ∉ X as shown in the figure. Since the image of X is countable, the probability mass function fX is zero for all, the discontinuity of probability mass functions is related to the fact that the cumulative distribution function of a discrete random variable is also discontinuous. Where it is differentiable, the derivative is zero, just as the probability function is zero at all such points. We make this more precise below, suppose that is a probability space and that is a measurable space whose underlying σ-algebra is discrete, so in particular contains singleton sets of B. In this setting, a random variable X, A → B is discrete provided its image is countable, now suppose that is a measure space equipped with the counting measure μ. As a consequence, for any b in B we have P = P, = ∫ X −1 d P = ∫ f d μ = f, demonstrating that f is in fact a probability mass function. Suppose that S is the space of all outcomes of a single toss of a fair coin. Since the coin is fair, the probability function is f X = {12, x ∈,0, x ∉. This is a case of the binomial distribution, the Bernoulli distribution. An example of a discrete distribution, and of its probability mass function, is provided by the multinomial distribution. Johnson, N. L. Kotz, S. Kemp A. Univariate Discrete Distributions
4.
Poisson distribution
–
The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. For instance, an individual keeping track of the amount of mail they receive each day may notice that they receive a number of 4 letters per day. Other examples that may follow a Poisson, the number of calls received by a call center per hour or the number of decay events per second from a radioactive source. The Poisson distribution is popular for modelling the number of times an event occurs in an interval of time or space. K is the number of times an event occurs in an interval, the rate at which events occur is constant. The rate cannot be higher in some intervals and lower in other intervals, two events cannot occur at exactly the same instant. The probability of an event in an interval is proportional to the length of the interval. If these conditions are true, then K is a Poisson random variable, an event can occur 0,1,2, … times in an interval. The average number of events in an interval is designated λ, lambda is the event rate, also called the rate parameter. The probability of observing k events in an interval is given by the equation P = e − λ λ k k. where λ is the number of events per interval e is the number 2.71828. The base of the natural logarithms k takes values 0,1,2, … k, = k × × × … ×2 ×1 is the factorial of k. This equation is the probability function for a Poisson distribution. On a particular river, overflow floods occur once every 100 years on average, calculate the probability of k =0,1,2,3,4,5, or 6 overflow floods in a 100-year interval, assuming the Poisson model is appropriate. Because the average event rate is one overflow flood per 100 years, = e −11 =0.368 P =11 e −11. = e −11 =0.368 P =12 e −12, = e −12 =0.184 The table below gives the probability for 0 to 6 overflow floods in a 100-year period. Ugarte and colleagues report that the number of goals in a World Cup soccer match is approximately 2.5. Because the average event rate is 2.5 goals per match, = e −2.51 =0.082 P =2.51 e −2.51. =2.5 e −2.51 =0.205 P =2.52 e −2.52
5.
Binomial distribution
–
The binomial distribution is the basis for the popular binomial test of statistical significance. The binomial distribution is used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the distribution is a hypergeometric distribution. However, for N much larger than n, the distribution remains a good approximation. In general, if the random variable X follows the distribution with parameters n ∈ ℕ and p ∈. The probability of getting exactly k successes in n trials is given by the probability mass function, N, where = n. k. is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows, K successes occur with probability pk and n − k failures occur with probability n − k. However, the k successes can occur anywhere among the n trials, in creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as f = f. The probability mass function satisfies the recurrence relation, for every n, p, Looking at the expression ƒ as a function of k. This k value can be found by calculating f f = p and comparing it to 1. There is always an integer M that satisfies p −1 ≤ M < p. ƒ is monotone increasing for k < M and monotone decreasing for k > M, in this case, there are two values for which ƒ is maximal, p and p −1. M is the most probable outcome of the Bernoulli trials and is called the mode, note that the probability of it occurring can be fairly small. It can also be represented in terms of the incomplete beta function, as follows. Some closed-form bounds for the distribution function are given below. Suppose a biased coin comes up heads with probability 0.3 when tossed, what is the probability of achieving 0,1. For example, if n =100, and p =1/4, P k −1 − = n p ∑ k =1 n. Since Var = p, we get, Var = Var = Var + ⋯ + Var = n Var = n p
6.
Logarithmic distribution
–
In probability and statistics, the logarithmic distribution is a discrete probability distribution derived from the Maclaurin series expansion − ln = p + p 22 + p 33 + ⋯. From this we obtain the identity ∑ k =1 ∞ −1 ln p k k =1. This leads directly to the probability function of a Log-distributed random variable, f = −1 ln p k k for k ≥1. Because of the identity above, the distribution is properly normalized, the cumulative distribution function is F =1 + B ln where B is the incomplete beta function. A Poisson compounded with Log-distributed random variables has a binomial distribution. In this way, the binomial distribution is seen to be a compound Poisson distribution. R. A. Fisher described the distribution in a paper that used it to model relative species abundance. The probability mass function ƒ of this distribution satisfies the relation f = k p k +1 f. Poisson distribution Johnson, Norman Lloyd, Kemp, Adrienne W, Kotz, chapter 7, Logarithmic and Lagrangian distributions
7.
Discrete uniform distribution
–
Another way of saying discrete uniform distribution would be a known, finite number of outcomes equally likely to happen. A simple example of the uniform distribution is throwing a fair die. The possible values are 1,2,3,4,5,6, if two dice are thrown and their values added, the resulting distribution is no longer uniform since not all sums have equal probability. The discrete uniform distribution itself is inherently non-parametric and it is convenient, however, to represent its values generally by an integer interval, so that a, b become the main parameters of the distribution. This problem is known as the German tank problem, following the application of maximum estimation to estimates of German tank production during World War II. The UMVU estimator for the maximum is given by N ^ = k +1 k m −1 = m + m k −1 where m is the maximum and k is the sample size. This can be seen as a simple case of maximum spacing estimation. This has a variance of 1 k ≈ N2 k 2 for small samples k ≪ N so a standard deviation of approximately N k, the sample maximum is the maximum likelihood estimator for the population maximum, but, as discussed above, it is biased. If samples are not numbered but are recognizable or markable, one can instead estimate population size via the capture-recapture method, see rencontres numbers for an account of the probability distribution of the number of fixed points of a uniformly distributed random permutation
8.
Actuarial science
–
Actuarial science is the discipline that applies mathematical and statistical methods to assess risk in insurance, finance and other industries and professions. Actuaries are professionals who are qualified in this field through intense education, in many countries, actuaries must demonstrate their competence by passing a series of thorough professional examinations. Actuarial science includes a number of interrelated subjects, including mathematics, probability theory, statistics, finance, economics, historically, actuarial science used deterministic models in the construction of tables and premiums. The science has gone through changes during the last 30 years due to the proliferation of high speed computers. Many universities have undergraduate and graduate programs in actuarial science. In 2010, a study published by job search website CareerCast ranked actuary as the #1 job in the United States, the study used five key criteria to rank jobs, environment, income, employment outlook, physical demands, and stress. A similar study by U. S. News & World Report in 2006 included actuaries among the 25 Best Professions that it expects will be in demand in the future. Actuarial science became a mathematical discipline in the late 17th century with the increased demand for long-term insurance coverage such as Burial, Life insurance. These long term coverage required that money be set aside to pay future benefits, such as annuity and this led to the development of an important actuarial concept, referred to as the Present value of a future sum. Certain aspects of the methods for discounting pension funds have come under criticism from modern financial economics. Contemporary life insurance programs have extended to include credit and mortgage insurance, key man insurance for small businesses, long term care insurance. The effects of consumer choice and the distribution of the utilization of medical services and procedures. These factors underlay the development of the Resource-Base Relative Value Scale at Harvard in a multi-disciplined study, actuarial science also aids in the design of benefit structures, reimbursement standards, and the effects of proposed government standards on the cost of healthcare. It is common with mergers and acquisitions that several pension plans have to be combined or at least administered on an equitable basis, benefit plans liabilities have to be properly valued, reflecting both earned benefits for past service, and the benefits for future service. Actuarial science is applied to Property, Casualty, Liability. In these forms of insurance, coverage is provided on a renewable period. Coverage can be cancelled at the end of the period by either party, Property and casualty insurance companies tend to specialize because of the complexity and diversity of risks. One division is to organize around personal and commercial lines of insurance, personal lines of insurance are for individuals and include fire, auto, homeowners, theft and umbrella coverages
9.
Probability generating function
–
In probability theory, the probability generating function of a discrete random variable is a power series representation of the probability mass function of the random variable. Note that the subscripted notations GX and pX are often used to emphasize that these pertain to a random variable X. The power series converges absolutely at least for all complex numbers z with |z| ≤1, the power series converges absolutely at least for all complex vectors z = ∈ ℂd with max ≤1. Probability generating functions obey all the rules of series with non-negative coefficients. In particular, G =1, where G = limz→1G from below, so the radius of convergence of any probability generating function must be at least 1, by Abels theorem for power series with non-negative coefficients. The following properties allow the derivation of various basic quantities related to X,1, the probability mass function of X is recovered by taking derivatives of G p = Pr = G k.2. It follows from Property 1 that if random variables X and Y have probability generating functions that are equal, GX = GY and that is, if X and Y have identical probability generating functions, then they have identical distributions. The normalization of the probability density function can be expressed in terms of the function by E = G = ∑ i =0 ∞ f =1. The expectation of X is given by E = G ′, more generally, the kth factorial moment, E of X is given by E = G, k ≥0. So the variance of X is given by Var = G ″ + G ′ −2, G X = M X where X is a random variable, G X is the probability generating function and M X is the moment-generating function. Probability generating functions are useful for dealing with functions of independent random variables. For example, if S N = ∑ i =1 N X i, then the probability generating function, GSN, is given by G S N = G X1 G X2 ⋯ G X N. It also follows that the probability generating function of the difference of two independent random variables S = X1 − X2 is G S = G X1 G X2. Suppose that N is also an independent, discrete variable taking values on the non-negative integers. XN are independent and identically distributed with common probability generating function GX and this can be seen, using the law of total expectation, as follows, G S N = E = E = E = E = G N. This last fact is useful in the study of Galton–Watson processes, suppose again that N is also an independent, discrete random variable taking values on the non-negative integers, with probability generating function GN and probability density f i = Pr. XN are independent, but not identically distributed random variables, where G X i denotes the probability generating function of X i, for identically distributed Xi this simplifies to the identity stated before. The general case is useful to obtain a decomposition of SN by means of generating functions
10.
Bias of an estimator
–
In statistics, the bias of an estimator is the difference between this estimators expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased, otherwise the estimator is said to be biased. In statistics, bias is a property of an estimator. Bias can also be measured with respect to the median, rather than the mean, all else being equal, an unbiased estimator is preferable to a biased estimator, but in practice all else is not equal, and biased estimators are frequently used, generally with small bias. When a biased estimator is used, bounds of the bias are calculated and that is, we assume that our data follow some unknown distribution P, and then we construct some estimator θ ^ that maps observed data to values that we hope are close to θ. The bias of θ ^ relative to θ is defined as Bias θ = E θ − θ = E θ , the second equation follows since θ is measurable with respect to the conditional distribution P. An estimator is said to be unbiased if its bias is equal to zero for all values of parameter θ, in a simulation experiment concerning the properties of an estimator, the bias of the estimator may be assessed using the mean signed difference. Concretely, the naive estimator sums the squared deviations and divides by n, dividing instead by n −1 yields an unbiased estimator. Conversely, MSE can be minimized by dividing by a different number, but this results in a biased estimator. This number is larger than n −1, so this is known as a shrinkage estimator, as it shrinks the unbiased estimator towards zero. Xn are independent and identically distributed random variables with expectation μ, meaning, n ⋅ = ∑ i =1 n. Then, the previous becomes, E = E = E = E = E = E − E = σ2 − E < σ2. In other words, the value of the uncorrected sample variance does not equal the population variance σ2. The sample mean, on the hand, is an unbiased estimator of the population mean μ. Note that the definition of sample variance is S2 =1 n −1 ∑ i =1 n 2. The ratio between the biased and unbiased estimates of the variance is known as Bessels correction and that is, when any other number is plugged into this sum, the sum can only increase. In particular, the choice μ ≠ X ¯ gives,1 n ∑ i =1 n 2 <1 n ∑ i =1 n 2, one gets A → = for the part along u → and B → = for the complementary part. This is in fact true in general, as explained above, a far more extreme case of a biased estimator being better than any unbiased estimator arises from the Poisson distribution
11.
Slope
–
In mathematics, the slope or gradient of a line is a number that describes both the direction and the steepness of the line. The direction of a line is increasing, decreasing, horizontal or vertical. A line is increasing if it goes up from left to right, the slope is positive, i. e. m >0. A line is decreasing if it goes down from left to right, the slope is negative, i. e. m <0. If a line is horizontal the slope is zero, if a line is vertical the slope is undefined. The steepness, incline, or grade of a line is measured by the value of the slope. A slope with an absolute value indicates a steeper line Slope is calculated by finding the ratio of the vertical change to the horizontal change between two distinct points on a line. Sometimes the ratio is expressed as a quotient, giving the number for every two distinct points on the same line. A line that is decreasing has a negative rise, the line may be practical - as set by a road surveyor, or in a diagram that models a road or a roof either as a description or as a plan. The rise of a road between two points is the difference between the altitude of the road at two points, say y1 and y2, or in other words, the rise is = Δy. Here the slope of the road between the two points is described as the ratio of the altitude change to the horizontal distance between any two points on the line. In mathematical language, the m of the line is m = y 2 − y 1 x 2 − x 1. The concept of slope applies directly to grades or gradients in geography, as a generalization of this practical description, the mathematics of differential calculus defines the slope of a curve at a point as the slope of the tangent line at that point. When the curve given by a series of points in a diagram or in a list of the coordinates of points, thereby, the simple idea of slope becomes one of the main basis of the modern world in terms of both technology and the built environment. This is described by the equation, m = Δ y Δ x = vertical change horizontal change = rise run. Given two points and, the change in x from one to the other is x2 − x1, substituting both quantities into the above equation generates the formula, m = y 2 − y 1 x 2 − x 1. The formula fails for a line, parallel to the y axis. Suppose a line runs through two points, P = and Q =, since the slope is positive, the direction of the line is increasing
12.
Panjer recursion
–
In more general cases the distribution of S is a compound distribution. The recursion for the cases considered was introduced in a paper by Harry Panjer. It is heavily used in actuarial science and we are interested in the compound random variable S = ∑ i =1 N X i where N and X i fulfill the following preconditions. We assume the X i to be i. i. d. furthermore the X i have to be distributed on a lattice h N0 with latticewidth h >0. In actuarial practice, X i is obtained by discretisation of the density function. The number of claims N is a variable, which is said to have a claim number distribution. For the Panjer recursion, the probability distribution of N has to be a member of the Panjer class, otherwise known as the class of distributions. This class consists of all counting random variables which fulfill the following relation, the initial value p 0 is determined such that ∑ k =0 ∞ p k =1. The Panjer recursion makes use of this relationship to specify a recursive way of constructing the probability distribution of S. In the following W N denotes the probability generating function of N, in the case of claim number is known, please note the De Pril algorithm. This algorithm is suitable to compute the sum distribution of n random variables. The algorithm now gives a recursion to compute the g k = P, the following example shows the approximated density of S = ∑ i =1 N X i where N ∼ NegBin and X ∼ Frechet with lattice width h =0.04. As observed, an issue may arise at the initialization of the recursion, guégan and Hassani have proposed a solution to deal with that issue. Panjer recursion and the distributions it can be used with
13.
International Standard Book Number
–
The International Standard Book Number is a unique numeric commercial book identifier. An ISBN is assigned to each edition and variation of a book, for example, an e-book, a paperback and a hardcover edition of the same book would each have a different ISBN. The ISBN is 13 digits long if assigned on or after 1 January 2007, the method of assigning an ISBN is nation-based and varies from country to country, often depending on how large the publishing industry is within a country. The initial ISBN configuration of recognition was generated in 1967 based upon the 9-digit Standard Book Numbering created in 1966, the 10-digit ISBN format was developed by the International Organization for Standardization and was published in 1970 as international standard ISO2108. Occasionally, a book may appear without a printed ISBN if it is printed privately or the author does not follow the usual ISBN procedure, however, this can be rectified later. Another identifier, the International Standard Serial Number, identifies periodical publications such as magazines, the ISBN configuration of recognition was generated in 1967 in the United Kingdom by David Whitaker and in 1968 in the US by Emery Koltay. The 10-digit ISBN format was developed by the International Organization for Standardization and was published in 1970 as international standard ISO2108, the United Kingdom continued to use the 9-digit SBN code until 1974. The ISO on-line facility only refers back to 1978, an SBN may be converted to an ISBN by prefixing the digit 0. For example, the edition of Mr. J. G. Reeder Returns, published by Hodder in 1965, has SBN340013818 -340 indicating the publisher,01381 their serial number. This can be converted to ISBN 0-340-01381-8, the check digit does not need to be re-calculated, since 1 January 2007, ISBNs have contained 13 digits, a format that is compatible with Bookland European Article Number EAN-13s. An ISBN is assigned to each edition and variation of a book, for example, an ebook, a paperback, and a hardcover edition of the same book would each have a different ISBN. The ISBN is 13 digits long if assigned on or after 1 January 2007, a 13-digit ISBN can be separated into its parts, and when this is done it is customary to separate the parts with hyphens or spaces. Separating the parts of a 10-digit ISBN is also done with either hyphens or spaces, figuring out how to correctly separate a given ISBN number is complicated, because most of the parts do not use a fixed number of digits. ISBN issuance is country-specific, in that ISBNs are issued by the ISBN registration agency that is responsible for country or territory regardless of the publication language. Some ISBN registration agencies are based in national libraries or within ministries of culture, in other cases, the ISBN registration service is provided by organisations such as bibliographic data providers that are not government funded. In Canada, ISBNs are issued at no cost with the purpose of encouraging Canadian culture. In the United Kingdom, United States, and some countries, where the service is provided by non-government-funded organisations. Australia, ISBNs are issued by the library services agency Thorpe-Bowker
14.
Probability distribution
–
For instance, if the random variable X is used to denote the outcome of a coin toss, then the probability distribution of X would take the value 0.5 for X = heads, and 0.5 for X = tails. In more technical terms, the probability distribution is a description of a phenomenon in terms of the probabilities of events. Examples of random phenomena can include the results of an experiment or survey, a probability distribution is defined in terms of an underlying sample space, which is the set of all possible outcomes of the random phenomenon being observed. The sample space may be the set of numbers or a higher-dimensional vector space, or it may be a list of non-numerical values, for example. Probability distributions are divided into two classes. A discrete probability distribution can be encoded by a discrete list of the probabilities of the outcomes, on the other hand, a continuous probability distribution is typically described by probability density functions. The normal distribution represents a commonly encountered continuous probability distribution, more complex experiments, such as those involving stochastic processes defined in continuous time, may demand the use of more general probability measures. A probability distribution whose sample space is the set of numbers is called univariate. Important and commonly encountered univariate probability distributions include the distribution, the hypergeometric distribution. The multivariate normal distribution is a commonly encountered multivariate distribution, to define probability distributions for the simplest cases, one needs to distinguish between discrete and continuous random variables. For example, the probability that an object weighs exactly 500 g is zero. Continuous probability distributions can be described in several ways, the cumulative distribution function is the antiderivative of the probability density function provided that the latter function exists. As probability theory is used in diverse applications, terminology is not uniform. The following terms are used for probability distribution functions, Distribution. Probability distribution, is a table that displays the probabilities of outcomes in a sample. Could be called a frequency distribution table, where all occurrences of outcomes sum to 1. Distribution function, is a form of frequency distribution table. Probability distribution function, is a form of probability distribution table
15.
Benford's law
–
Benfords law, also called the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in naturally occurring collections of numbers. For example, in sets which obey the law, the number 1 appears as the most significant digit about 30% of the time, by contrast, if the digits were distributed uniformly, they would each occur about 11. 1% of the time. Benfords law also makes predictions about the distribution of digits, third digits, digit combinations. It tends to be most accurate values are distributed across multiple orders of magnitude. The graph here shows Benfords law for base 10, there is a generalization of the law to numbers expressed in other bases, and also a generalization from leading 1 digit to leading n digits. It is named after physicist Frank Benford, who stated it in 1938, Benfords law is a special case of Zipfs law. A set of numbers is said to satisfy Benfords law if the digit d occurs with probability P = log 10 − log 10 = log 10 = log 10 . Therefore, this is the distribution expected if the mantissae of the logarithms of the numbers are uniformly and randomly distributed. For example, a x, constrained to lie between 1 and 10, starts with the digit 1 if 1 ≤ x <2. Therefore, x starts with the digit 1 if log 1 ≤ log x < log 2, the probabilities are proportional to the interval widths, and this gives the equation above. An extension of Benfords law predicts the distribution of first digits in other bases besides decimal, in fact, the general form is, P = log b − log b = log b . For b =2, Benfords law is true but trivial, the discovery of Benfords law goes back to 1881, when the American astronomer Simon Newcomb noticed that in logarithm tables the earlier pages were much more worn than the other pages. Newcombs published result is the first known instance of this observation and includes a distribution on the second digit, Newcomb proposed a law that the probability of a single number N being the first digit of a number was equal to log − log. The phenomenon was noted in 1938 by the physicist Frank Benford. The total number of used in the paper was 20,229. This discovery was named after Benford. In 1995, Ted Hill proved the result about mixed distributions mentioned below, arno Berger and Ted Hill have stated that, The widely known phenomenon called Benford’s law continues to defy attempts at an easy derivation
16.
Bernoulli distribution
–
It can be used to represent a coin toss where 1 and 0 would represent head and tail, respectively. In particular, unfair coins would have p ≠0.5, the Bernoulli distribution is a special case of the binomial distribution where a single experiment/trial is conducted. It is also a case of the two-point distribution, for which the outcome need not be a bit. If X is a variable with this distribution, we have. The probability mass function f of this distribution, over possible outcomes k, is f = { p if k =1,1 − p if k =0 and this can also be expressed as f = p k 1 − k for k ∈. The Bernoulli distribution is a case of the binomial distribution with n =1. The Bernoulli distributions for 0 ≤ p ≤1 form an exponential family, the maximum likelihood estimator of p based on a random sample is the sample mean. When we take the standardized Bernoulli distributed random variable X − E Var we find that this random variable attains q p q with probability p, the Bernoulli distribution is simply B. The categorical distribution is the generalization of the Bernoulli distribution for variables with any constant number of discrete values, the Beta distribution is the conjugate prior of the Bernoulli distribution. The geometric distribution models the number of independent and identical Bernoulli trials needed to get one success, if Y ~ Bernoulli, then has a Rademacher distribution. Bernoulli process Bernoulli sampling Bernoulli trial Binary entropy function Binomial distribution McCullagh, Peter, Nelder, johnson, N. L. Kotz, S. Kemp A. Univariate Discrete Distributions. Binomial distribution, Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4 Weisstein, Eric W. Bernoulli Distribution
17.
Beta-binomial distribution
–
The beta-binomial distribution is the binomial distribution in which the probability of success at each trial is not fixed but random and follows the beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and it reduces to the Bernoulli distribution as a special case when n =1. For α = β =1, it is the uniform distribution from 0 to n. It also approximates the binomial distribution arbitrarily well for large α and β, the Beta distribution is a conjugate distribution of the binomial distribution. This fact leads to an analytically tractable compound distribution where one can think of the p parameter in the distribution as being randomly drawn from a beta distribution. Namely, if X ∼ Bin then P = L = p k n − k where Bin stands for the distribution. Using the properties of the function, this can alternatively be written f = Γ Γ Γ Γ Γ Γ Γ Γ Γ. The beta-binomial distribution can also be motivated via an urn model for positive values of α and β. Specifically, imagine an urn containing α red balls and β black balls, if a red ball is observed, then two red balls are returned to the urn. Likewise, if a ball is drawn, then two black balls are returned to the urn. If this is repeated n times, then the probability of observing k red balls follows a distribution with parameters n, α and β. The first three raw moments are μ1 = n α α + β μ2 = n α μ3 = n α, the parameter ρ is known as the intra class or intra cluster correlation. It is this positive correlation which gives rise to overdispersion, note that these estimates can be non-sensically negative which is evidence that the data is either undispersed or underdispersed relative to the binomial distribution. In this case, the distribution and the hypergeometric distribution are alternative candidates respectively. While closed-form maximum likelihood estimates are impractical, given that the pdf consists of common functions, maximum likelihood estimates from empirical data can be computed using general methods for fitting multinomial Pólya distributions, methods for which are described in. The R package VGAM through the function vglm, via maximum likelihood, note also that there is no requirement that n is fixed throughout the observations. The following data gives the number of children among the first 12 children of family size 13 in 6115 families taken from hospital records in 19th century Saxony. The 13th child is ignored to assuage the effect of families non-randomly stopping when a desired gender is reached
18.
Categorical distribution
–
There is not necessarily an underlying ordering of these outcomes, but numerical labels are often attached for convenience in describing the distribution. Note that the K-dimensional categorical distribution is the most general distribution over a K-way event, the parameters specifying the probabilities of each possible outcome are constrained only by the fact that each must be in the range 0 to 1, and all must sum to 1. On the other hand, the distribution is a special case of the multinomial distribution. Occasionally, the distribution is termed the discrete distribution. However, this refers not to one particular family of distributions. However, conflating the categorical and multinomial distributions can lead to problems, both forms have very similar-looking probability mass functions, which both make reference to multinomial-style counts of nodes in a category. However, the multinomial-style PMF has a factor, a multinomial coefficient. Confusing the two can easily lead to results in settings where this extra factor is not constant with respect to the distributions of interest. The factor is constant in the complete conditionals used in Gibbs sampling. A categorical distribution is a probability distribution whose sample space is the set of k individually identified items. It is the generalization of the Bernoulli distribution for a random variable. In one formulation of the distribution, the space is taken to be a finite sequence of integers. The exact integers used as labels are unimportant, they might be or or any other set of values. In the following descriptions, we use for convenience, although this disagrees with the convention for the Bernoulli distribution, which uses. In this case, the probability function f is, f = p i. Another formulation that appears more complex but facilitates mathematical manipulations is as follows, using the Iverson bracket, f = ∏ i =1 k p i, there are various advantages of this formulation, e. g. It is easier to write out the function of a set of independent identically distributed categorical variables. It connects the categorical distribution with the related multinomial distribution and it shows why the Dirichlet distribution is the conjugate prior of the categorical distribution, and allows the posterior distribution of the parameters to be calculated
19.
Hypergeometric distribution
–
In contrast, the binomial distribution describes the probability of k successes in n draws with replacement. In statistics, the hypergeometric test uses the hypergeometric distribution to calculate the significance of having drawn a specific k successes from the aforementioned population. The test is used to identify which sub-populations are over- or under-represented in a sample. This test has a range of applications. For example, a group could use the test to understand their customer base by testing a set of known customers for over-representation of various demographic subgroups. The following conditions characterize the distribution, The result of each draw can be classified into one of two mutually exclusive categories. The probability of a success changes on each draw, as each draw decreases the population, the pmf is positive when max ≤ k ≤ min. The pmf satisfies the recurrence relation P = P with P =, as one would expect, the probabilities sum up to 1, ∑0 ≤ k ≤ n =1 This is essentially Vandermondes identity from combinatorics. Also note the following identity holds, = and this follows from the symmetry of the problem, but it can also be shown by expressing the binomial coefficients in terms of factorials and rearranging the latter. The classical application of the distribution is sampling without replacement. Think of an urn with two types of marbles, red ones and green ones, define drawing a green marble as a success and drawing a red marble as a failure. If the variable N describes the number of all marbles in the urn and K describes the number of green marbles, in this example, X is the random variable whose outcome is k, the number of green marbles actually drawn in the experiment. This situation is illustrated by the following table, Now. Standing next to the urn, you close your eyes and draw 10 marbles without replacement, what is the probability that exactly 4 of the 10 are green. This problem is summarized by the following table, The probability of drawing exactly k green marbles can be calculated by the formula P = f =. Hence, in this example calculate P = f = =5 ⋅814506010272278170 =0.003964583 …, intuitively we would expect it to be even more unlikely for all 5 marbles to be green. P = f = =1 ⋅122175910272278170 =0.0001189375 …, As expected, in Holdem Poker players make the best hand they can combining the two cards in their hand with the 5 cards eventually turned up on the table. The deck has 52 and there are 13 of each suit, for this example assume a player has 2 clubs in the hand and there are 3 cards showing on the table,2 of which are also clubs
20.
Zipf's law
–
The law is named after the American linguist George Kingsley Zipf, who popularized it and sought to explain it, though he did not claim to have originated it. The French stenographer Jean-Baptiste Estoup appears to have noticed the regularity before Zipf and it was also noted in 1913 by German physicist Felix Auerbach. Zipfs law states that given some corpus of natural language utterances, for example, in the Brown Corpus of American English text, the word the is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences. True to Zipfs Law, the word of accounts for slightly over 3. 5% of words, followed by. Only 135 vocabulary items are needed to account for half the Brown Corpus, the appearance of the distribution in rankings of cities by population was first noticed by Felix Auerbach in 1913. When Zipfs law is checked for cities, a better fit has been found with exponent s =1.07, while Zipfs law holds for the upper tail of the distribution, the entire distribution of cities is log-normal and follows Gibrats law. Both laws are consistent because a log-normal tail can not be distinguished from a Pareto tail. Zipfs law is most easily observed by plotting the data on a log-log graph, for example, the word the would appear at x = log, y = log. It is also possible to plot reciprocal rank against frequency or reciprocal frequency or interword interval against rank, the data conform to Zipfs law to the extent that the plot is linear. Formally, let, N be the number of elements, k be their rank and it has been claimed that this representation of Zipfs law is more suitable for statistical testing, and in this way it has been analyzed in more than 30,000 English texts. The goodness-of-fit tests yield that only about 15% of the texts are statistically compatible with this form of Zipfs law, slight variations in the definition of Zipfs law can increase this percentage up to close to 50%. In the example of the frequency of words in the English language, N is the number of words in the English language and, if we use the version of Zipfs law. F will then be the fraction of the time the kth most common word occurs, the law may also be written, f =1 k s H N, s where HN, s is the Nth generalized harmonic number. The simplest case of Zipfs law is a 1⁄f function, given a set of Zipfian distributed frequencies, sorted from most common to least common, the second most common frequency will occur ½ as often as the first. The third most common frequency will occur ⅓ as often as the first, the fourth most common frequency will occur ¼ as often as the first. The nth most common frequency will occur 1⁄n as often as the first, however, this cannot hold exactly, because items must occur an integer number of times, there cannot be 2.5 occurrences of a word. Nevertheless, over fairly wide ranges, and to a good approximation. Mathematically, the sum of all frequencies in a Zipf distribution is equal to the harmonic series
21.
Delaporte distribution
–
The Delaporte distribution is a discrete probability distribution that has received attention in actuarial science. It can be defined using the convolution of a binomial distribution with a Poisson distribution. The skewness of the Delaporte distribution is, λ + α β32 The excess kurtosis of the distribution is, λ +3 λ2 + α β2 Murat, M. Szynal, on moments of counting distributions satisfying the kth-order recursion and their compound distributions
22.
Geometric distribution
–
These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the one, however, to avoid ambiguity, it is considered wise to indicate which is intended. It’s the probability that the first occurrence of success requires k number of independent trials, each with success probability p. If the probability of success on each trial is p, then the probability that the kth trial is the first success is Pr = k −1 p for k =1,2,3. The above form of distribution is used for modeling the number of trials up to. By contrast, the form of the geometric distribution is used for modeling the number of failures until the first success. In either case, the sequence of probabilities is a geometric sequence, for example, suppose an ordinary die is thrown repeatedly until the first time a 1 appears. The probability distribution of the number of times it is thrown is supported on the set and is a geometric distribution with p = 1/6. Consider a sequence of trials, where each trial has only two possible outcomes, the probability of success is assumed to be the same for each trial. In such a sequence of trials, the distribution is useful to model the number of failures before the first success. The distribution gives the probability that there are zero failures before the first success, one failure before the first success, a newly-wed couple plans to have children, and will continue until the first girl. What is the probability that there are zero boys before the first girl, one boy before the first girl, a doctor is seeking an anti-depressant for a newly diagnosed patient. Suppose that, of the available anti-depressant drugs, the probability that any particular drug will be effective for a patient is p=0.6. What is the probability that the first drug found to be effective for this patient is the first drug tried, the second drug tried, what is the expected number of drugs that will be tried to find one that is effective. A patient is waiting for a suitable matching kidney donor for a transplant, if the probability that a randomly selected donor is a suitable match is p=0.1, what is the expected number of donors who will be tested before a matching donor is found. The geometric distribution is a model if the following assumptions are true. The phenomenon being modelled is a sequence of independent trials, there are only two possible outcomes for each trial, often designated success or failure. The probability of success, p, is the same for every trial, if these conditions are true, then the geometric random variable is the count of the number of failures before the first success
23.
Skellam distribution
–
Since k is an integer we have that Ik=I|k|. Note that the probability function of a Poisson-distributed random variable with mean μ is given by p = μ k k. Since the Poisson distribution is zero for negative values of the count, the second sum is only taken for those terms where n >=0 and n + k >=0. It can be shown that the above sum implies that p p = k so that, the special case for μ1 = μ2 is given by Irwin, p = e −2 μ I | k |. Note also that, using the values of the modified Bessel function for small arguments. As it is a probability function, the Skellam probability mass function is normalized. We know that the probability generating function for a Poisson distribution is, G = e μ. It follows that the pgf, G, for a Skellam probability mass function will be, the moment-generating function is given by, M = G = ∑ k =0 ∞ t k k. M k which yields the raw moments mk, define, Δ = d e f μ1 − μ2 μ = d e f /2. The mean, variance, skewness, and kurtosis excess are respectively, the cumulant-generating function is given by, K = d e f ln = ∑ k =0 ∞ t k k. κ k which yields the cumulants, κ2 k =2 μ κ2 k +1 = Δ. For the special case when μ1 = μ2, an expansion of the modified Bessel function of the first kind yields for large μ, p ∼14 π μ. Also, for this case, when k is also large, and of order of the square root of 2μ. These special results can easily be extended to the general case of different means. Let P = p be the probability function for a Skellam-distributed random variable with parameters μ1 and μ2. Then where 0 F ~1 denotes the hypergeometric function. Handbook of mathematical functions with formulas, graphs, and mathematical tables, the frequency distribution of the difference between two independent variates following the same Poisson distribution. Journal of the Royal Statistical Society, Series A,100, analysis of sports data using bivariate Poisson models
24.
Zeta distribution
–
In probability theory and statistics, the zeta distribution is a discrete probability distribution. The multiplicities of distinct factors of X are independent random variables. The Riemann zeta function being the sum of all terms k − s for positive integer k, indeed the terms Zipf distribution and the zeta distribution are often used interchangeably. But note that while the Zeta distribution is a probability distribution by itself and this does not change the fact that the moments are specified by the series itself, and are therefore undefined for large n. The moment generating function is defined as M = E =1 ζ ∑ k =1 ∞ e t k k s. The series is just the definition of the polylogarithm, valid for e t <1 so that M = Li s ζ for t <0, the Taylor series expansion of this function will not necessarily yield the moments of the distribution. If we use the analytically continued terms instead of the moments themselves, T n = Li s − Φ ζ for | t | <2 π. ζ is infinite as the series, and so the case when s =1 is not meaningful. The latter limit can also exist in cases in which A does not have a density. The Zeta distribution can be constructed with a sequence of independent random variables with a Geometric distribution, some remarks on the Riemann zeta distribution. What Gut calls the Riemann zeta distribution is actually the probability distribution of −log X, where X is a random variable with what this article calls the zeta distribution
25.
Beta distribution
–
The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. In Bayesian inference, the distribution is the conjugate prior probability distribution for the Bernoulli, binomial, negative binomial. The beta distribution is a model for the random behavior of percentages. The beta function, B, is a constant to ensure that the total probability integrates to 1. In the above equations x is an observed value that actually occurred—of a random process X. L. Johnson. Several authors, including N. L. Johnson and S, the probability density function satisfies the differential equation f ′ = f x − x. The cumulative distribution function is F = B B = I x where B is the beta function. The mode of a Beta distributed random variable X with α, β >1 is the most likely value of the distribution, when both parameters are less than one, this is the anti-mode, the lowest point of the probability density curve. Letting α = β, the expression for the mode simplifies to 1/2, showing that for α = β >1 the mode, is at the center of the distribution, it is symmetric in those cases. See Shapes section in this article for a full list of mode cases, for several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the value of the density function occurring at the end is finite, for example, in the case of α =2, β =1, the density function becomes a right-triangle distribution which is finite at both ends. In several other cases there is a singularity at one end, for example, in the case α = β = 1/2, the Beta distribution simplifies to become the arcsine distribution. There is debate among mathematicians about some of cases and whether the ends can be called modes or not. There is no general closed-form expression for the median of the distribution for arbitrary values of α and β. Closed-form expressions for particular values of the parameters α and β follow, For symmetric cases α = β, median = 1/2. For α =1 and β >0, median =1 −2 −1 β For α >0 and β =1, median =2 −1 α For α =3 and β =2, median =0.6142724318676105. The real solution to the quartic equation 1 − 8x3 + 6x4 =0, for α =2 and β =3, median =0.38572756813238945. When α, β ≥1, the error in this approximation is less than 4%
26.
Beta rectangular distribution
–
The support is of the distribution is indicated by the parameters a and b, which are the minimum and maximum values respectively. The distribution provides an alternative to the beta distribution such that it allows more density to be placed at the extremes of the interval of support. Thus it is a distribution that allows for outliers to have a greater chance of occurring than does the beta distribution. The cumulative distribution function is F = θ I z + b − a f o r a ≤ x ≤ b, the beta distribution is frequently used in PERT, critical path method and other project management methodologies to characterize the distribution of an activity’s time to completion. However, the variance is seen to be a constant conditional on the range, as a result, there is no scope for expressing differing levels of uncertainty that the project manager might have about the activity time. Eliciting the beta rectangular’s certainty parameter θ allows the manager to incorporate the rectangular distribution. The above expectation formula then becomes E = θ +36, if the project manager assumes the beta distribution is symmetric under the standard PERT conditions then the variance is Var =236, while for the asymmetric case it is Var =236. The variance can now be increased when uncertainty is larger, however, the beta distribution may still apply depending on the project manager’s judgment. The beta rectangular has been compared to the uniform-two sided power distribution, the beta rectangular exhibited larger variance and smaller kurtosis by comparison. The beta rectangular distribution has been compared to the elevated two-sided power distribution in fitting U. S. income data
27.
Kumaraswamy distribution
–
In probability and statistics, the Kumaraswamys double bounded distribution is a family of continuous probability distributions defined on the interval. This distribution was proposed by Poondi Kumaraswamy for variables that are lower and upper bounded. The probability density function of the Kumaraswamy distribution is f = a b x a −1 b −1, where x ∈, the cumulative distribution function is F = ∫0 x f d ξ =1 − b. In its simplest form, the distribution has a support of. In a more form, the normalized variable x is replaced with the unshifted and unscaled variable z where, x = z − z min z max − z min. The raw moments of the Kumaraswamy distribution are given by, m n = b Γ Γ Γ = b B where B is the Beta function, the variance, skewness, and excess kurtosis can be calculated from these raw moments. For example, the variance is, σ2 = m 2 − m 12, the Shannon entropy of the distribution is, H = + H b + ln where H i is the harmonic number function. The Kumaraswamy distribution is related to Beta distribution. Assume that Xa, b is a Kumaraswamy distributed random variable with parameters a and b, then Xa, b is the a-th root of a suitably defined Beta distributed random variable. More formally, Let Y1, b denote a Beta distributed random variable with parameters α =1 and β = b, one has the following relation between Xa, b and Y1, b. X a, b = Y1, b 1 / a, P = ∫0 x a b t a −1 b −1 d t = ∫0 x a b b −1 d t = P = P . The raw moments of this generalized Kumaraswamy distribution are given by, note that we can reobtain the original moments setting α =1, β = b and γ = a. However, in general the cumulative distribution function does not have a closed form solution, a good example of the use of the Kumaraswamy distribution is the storage volume of a reservoir of capacity zmax whose upper bound is zmax and lower bound is 0. A generalized probability density function for double-bounded random processes, estimation of reservoir yield and storage distribution using moments analysis. Kumaraswamys distribution, A beta-type distribution with some tractability advantages, improved point estimation for the Kumaraswamy distribution. Journal of Statistical Computation and Simulation
28.
Logit-normal distribution
–
In probability theory, a logit-normal distribution is a probability distribution of a random variable whose logit has a normal distribution. It is also known as the normal distribution, which often refers to a multinomial logit version. A variable might be modeled as logit-normal if it is a proportion, which is bounded by zero and one, and where values of zero and one never occur. The density obtained by changing the sign of μ is symmetrical, in that it is equal to f, the moments of the logit-normal distribution have no analytic solution. However, they can be estimated by numerical integration, when the derivative of the density equals 0 then the location of the mode x satisfies the following equation, logit = σ2 + μ. The logistic normal distribution is a generalization of the distribution to D-dimensional probability vectors by taking a logistic transformation of a multivariate normal distribution. This follows from applying the logistic transformation to map a multivariate normal random variable y ∼ N, y ∈ R D −1 to the simplex, x = ⊤ The unique inverse mapping is given by. This is the case of a vector x which components sum up to one and this is because the Jacobian matrix of the transformation is diagonal with elements 1 x i. The logistic normal distribution is a flexible alternative to the Dirichlet distribution in that it can capture correlations between components of probability vectors. It also has the potential to simplify statistical analyses of data by allowing one to answer questions about log-ratios of the components of the data vectors. One is often interested in rather than absolute component values. The probability simplex is a space, making standard techniques that are typically applied to vectors in R n less meaningful. Aitchison described the problem of spurious negative correlations when applying such methods directly to simplicial vectors, however, mapping compositional data in S D through the inverse of the additive logistic transformation yields real-valued data in R D −1. Standard techniques can be applied to representation of the data. This approach justifies use of the normal distribution, which can thus be regarded as the Gaussian of the simplex. The Dirichlet and logistic normal distributions are never exactly equal for any choice of parameters, in fact, one can show that for α i → ∞, i =1, ⋯, D, we have that p → q. Beta distribution and Kumaraswamy distribution, other two-parameter distributions on an interval with similar shapes Frederic, P. & Lad. Two Moments of the Logitnormal Distribution
29.
Raised cosine distribution
–
In probability theory and statistics, the raised cosine distribution is a continuous probability distribution supported on the interval. The probability density function is f =12 s =1 s hvc for μ − s ≤ x ≤ μ + s and zero otherwise. The cumulative distribution function is F =12 for μ − s ≤ x ≤ μ + s and zero for x < μ − s, the moments of the raised cosine distribution are somewhat complicated, but are considerably simplified for the standard raised cosine distribution. The standard raised cosine distribution is just the raised cosine distribution with μ =0 and s =1, because the standard raised cosine distribution is an even function, the odd moments are zero. The pdf of the raised cosine distribution is a solution to the differential equation. Location-Scale Distributions - Linear Estimation and Probability Plotting Using MATLAB