1.
Natural number
–
In mathematics, the natural numbers are those used for counting and ordering. In common language, words used for counting are cardinal numbers, texts that exclude zero from the natural numbers sometimes refer to the natural numbers together with zero as the whole numbers, but in other writings, that term is used instead for the integers. These chains of extensions make the natural numbers canonically embedded in the number systems. Properties of the numbers, such as divisibility and the distribution of prime numbers, are studied in number theory. Problems concerning counting and ordering, such as partitioning and enumerations, are studied in combinatorics, the most primitive method of representing a natural number is to put down a mark for each object. Later, a set of objects could be tested for equality, excess or shortage, by striking out a mark, the first major advance in abstraction was the use of numerals to represent numbers. This allowed systems to be developed for recording large numbers, the ancient Egyptians developed a powerful system of numerals with distinct hieroglyphs for 1,10, and all the powers of 10 up to over 1 million. A stone carving from Karnak, dating from around 1500 BC and now at the Louvre in Paris, depicts 276 as 2 hundreds,7 tens, and 6 ones, and similarly for the number 4,622. A much later advance was the development of the idea that 0 can be considered as a number, with its own numeral. The use of a 0 digit in place-value notation dates back as early as 700 BC by the Babylonians, the Olmec and Maya civilizations used 0 as a separate number as early as the 1st century BC, but this usage did not spread beyond Mesoamerica. The use of a numeral 0 in modern times originated with the Indian mathematician Brahmagupta in 628, the first systematic study of numbers as abstractions is usually credited to the Greek philosophers Pythagoras and Archimedes. Some Greek mathematicians treated the number 1 differently than larger numbers, independent studies also occurred at around the same time in India, China, and Mesoamerica. In 19th century Europe, there was mathematical and philosophical discussion about the nature of the natural numbers. A school of Naturalism stated that the numbers were a direct consequence of the human psyche. Henri Poincaré was one of its advocates, as was Leopold Kronecker who summarized God made the integers, in opposition to the Naturalists, the constructivists saw a need to improve the logical rigor in the foundations of mathematics. In the 1860s, Hermann Grassmann suggested a recursive definition for natural numbers thus stating they were not really natural, later, two classes of such formal definitions were constructed, later, they were shown to be equivalent in most practical applications. The second class of definitions was introduced by Giuseppe Peano and is now called Peano arithmetic and it is based on an axiomatization of the properties of ordinal numbers, each natural number has a successor and every non-zero natural number has a unique predecessor. Peano arithmetic is equiconsistent with several systems of set theory
2.
Real number
–
In mathematics, a real number is a value that represents a quantity along a line. The adjective real in this context was introduced in the 17th century by René Descartes, the real numbers include all the rational numbers, such as the integer −5 and the fraction 4/3, and all the irrational numbers, such as √2. Included within the irrationals are the numbers, such as π. Real numbers can be thought of as points on a long line called the number line or real line. Any real number can be determined by a possibly infinite decimal representation, such as that of 8.632, the real line can be thought of as a part of the complex plane, and complex numbers include real numbers. These descriptions of the numbers are not sufficiently rigorous by the modern standards of pure mathematics. All these definitions satisfy the definition and are thus equivalent. The statement that there is no subset of the reals with cardinality greater than ℵ0. Simple fractions were used by the Egyptians around 1000 BC, the Vedic Sulba Sutras in, c.600 BC, around 500 BC, the Greek mathematicians led by Pythagoras realized the need for irrational numbers, in particular the irrationality of the square root of 2. Arabic mathematicians merged the concepts of number and magnitude into a general idea of real numbers. In the 16th century, Simon Stevin created the basis for modern decimal notation, in the 17th century, Descartes introduced the term real to describe roots of a polynomial, distinguishing them from imaginary ones. In the 18th and 19th centuries, there was work on irrational and transcendental numbers. Johann Heinrich Lambert gave the first flawed proof that π cannot be rational, Adrien-Marie Legendre completed the proof, Évariste Galois developed techniques for determining whether a given equation could be solved by radicals, which gave rise to the field of Galois theory. Charles Hermite first proved that e is transcendental, and Ferdinand von Lindemann, lindemanns proof was much simplified by Weierstrass, still further by David Hilbert, and has finally been made elementary by Adolf Hurwitz and Paul Gordan. The development of calculus in the 18th century used the set of real numbers without having defined them cleanly. The first rigorous definition was given by Georg Cantor in 1871, in 1874, he showed that the set of all real numbers is uncountably infinite but the set of all algebraic numbers is countably infinite. Contrary to widely held beliefs, his first method was not his famous diagonal argument, the real number system can be defined axiomatically up to an isomorphism, which is described hereafter. Another possibility is to start from some rigorous axiomatization of Euclidean geometry, from the structuralist point of view all these constructions are on equal footing
3.
Probability mass function
–
In probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Suppose that X, S → A is a random variable defined on a sample space S. Then the probability mass function fX, A → for X is defined as f X = Pr = Pr and that is, fX may be defined for all real numbers and fX =0 for all x ∉ X as shown in the figure. Since the image of X is countable, the probability mass function fX is zero for all, the discontinuity of probability mass functions is related to the fact that the cumulative distribution function of a discrete random variable is also discontinuous. Where it is differentiable, the derivative is zero, just as the probability function is zero at all such points. We make this more precise below, suppose that is a probability space and that is a measurable space whose underlying σ-algebra is discrete, so in particular contains singleton sets of B. In this setting, a random variable X, A → B is discrete provided its image is countable, now suppose that is a measure space equipped with the counting measure μ. As a consequence, for any b in B we have P = P, = ∫ X −1 d P = ∫ f d μ = f, demonstrating that f is in fact a probability mass function. Suppose that S is the space of all outcomes of a single toss of a fair coin. Since the coin is fair, the probability function is f X = {12, x ∈,0, x ∉. This is a case of the binomial distribution, the Bernoulli distribution. An example of a discrete distribution, and of its probability mass function, is provided by the multinomial distribution. Johnson, N. L. Kotz, S. Kemp A. Univariate Discrete Distributions
4.
Cumulative distribution function
–
In the case of a continuous distribution, it gives the area under the probability density function from minus infinity to x. Cumulative distribution functions are used to specify the distribution of multivariate random variables. The probability that X lies in the semi-closed interval (a, b], in the definition above, the less than or equal to sign, ≤, is a convention, not a universally used one, but is important for discrete distributions. The proper use of tables of the binomial and Poisson distributions depends upon this convention, moreover, important formulas like Paul Lévys inversion formula for the characteristic function also rely on the less than or equal formulation. If treating several random variables X, Y. etc. the corresponding letters are used as subscripts while, if treating only one, the subscript is usually omitted. It is conventional to use a capital F for a distribution function, in contrast to the lower-case f used for probability density functions. This applies when discussing general distributions, some specific distributions have their own conventional notation, the CDF of a continuous random variable X can be expressed as the integral of its probability density function ƒX as follows, F X = ∫ − ∞ x f X d t. In the case of a random variable X which has distribution having a discrete component at a value b, P = F X − lim x → b − F X. If FX is continuous at b, this equals zero and there is no discrete component at b, every cumulative distribution function F is non-decreasing and right-continuous, which makes it a càdlàg function. Furthermore, lim x → − ∞ F =0, lim x → + ∞ F =1, the function f is equal to the derivative of F almost everywhere, and it is called the probability density function of the distribution of X. As an example, suppose X is uniformly distributed on the unit interval, then the CDF of X is given by F = {0, x <0 x,0 ≤ x <11, x ≥1. Suppose instead that X takes only the discrete values 0 and 1, then the CDF of X is given by F = {0, x <01 /2,0 ≤ x <11, x ≥1. Sometimes, it is useful to study the question and ask how often the random variable is above a particular level. This is called the cumulative distribution function or simply the tail distribution or exceedance. This has applications in statistical hypothesis testing, for example, because the one-sided p-value is the probability of observing a test statistic at least as extreme as the one observed. Thus, provided that the test statistic, T, has a continuous distribution, in survival analysis, F ¯ is called the survival function and denoted S, while the term reliability function is common in engineering. Properties For a non-negative continuous random variable having an expectation, Markovs inequality states that F ¯ ≤ E x, as x → ∞, F ¯ →0, and in fact F ¯ = o provided that E is finite. This form of illustration emphasises the median and dispersion of the distribution or of the empirical results, if the CDF F is strictly increasing and continuous then F −1, p ∈, is the unique real number x such that F = p
5.
Expected value
–
In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. For example, the value in rolling a six-sided die is 3.5. Less roughly, the law of large states that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions approaches infinity. The expected value is known as the expectation, mathematical expectation, EV, average, mean value, mean. More practically, the value of a discrete random variable is the probability-weighted average of all possible values. In other words, each value the random variable can assume is multiplied by its probability of occurring. The same principle applies to a random variable, except that an integral of the variable with respect to its probability density replaces the sum. The expected value does not exist for random variables having some distributions with large tails, for random variables such as these, the long-tails of the distribution prevent the sum/integral from converging. The expected value is a key aspect of how one characterizes a probability distribution, by contrast, the variance is a measure of dispersion of the possible values of the random variable around the expected value. The variance itself is defined in terms of two expectations, it is the value of the squared deviation of the variables value from the variables expected value. The expected value plays important roles in a variety of contexts, in regression analysis, one desires a formula in terms of observed data that will give a good estimate of the parameter giving the effect of some explanatory variable upon a dependent variable. The formula will give different estimates using different samples of data, a formula is typically considered good in this context if it is an unbiased estimator—that is, if the expected value of the estimate can be shown to equal the true value of the desired parameter. In decision theory, and in particular in choice under uncertainty, one example of using expected value in reaching optimal decisions is the Gordon–Loeb model of information security investment. According to the model, one can conclude that the amount a firm spends to protect information should generally be only a fraction of the expected loss. Suppose random variable X can take value x1 with probability p1, value x2 with probability p2, then the expectation of this random variable X is defined as E = x 1 p 1 + x 2 p 2 + ⋯ + x k p k. If all outcomes xi are equally likely, then the weighted average turns into the simple average and this is intuitive, the expected value of a random variable is the average of all values it can take, thus the expected value is what one expects to happen on average. If the outcomes xi are not equally probable, then the simple average must be replaced with the weighted average, the intuition however remains the same, the expected value of X is what one expects to happen on average. Let X represent the outcome of a roll of a fair six-sided die, more specifically, X will be the number of pips showing on the top face of the die after the toss
6.
Variance
–
The variance has a central role in statistics. It is used in statistics, statistical inference, hypothesis testing, goodness of fit. This makes it a central quantity in numerous such as physics, biology, chemistry, cryptography, economics. The variance of a random variable X is the value of the squared deviation from the mean of X, μ = E . This definition encompasses random variables that are generated by processes that are discrete, continuous, neither, the variance can also be thought of as the covariance of a random variable with itself, Var = Cov . The variance is also equivalent to the second cumulant of a probability distribution that generates X, the variance is typically designated as Var , σ X2, or simply σ2. On computational floating point arithmetic, this equation should not be used, if a continuous distribution does not have an expected value, as is the case for the Cauchy distribution, it does not have a variance either. Many other distributions for which the value does exist also do not have a finite variance because the integral in the variance definition diverges. An example is a Pareto distribution whose index k satisfies 1 < k ≤2. e, the normal distribution with parameters μ and σ is a continuous distribution whose probability density function is given by f =12 π σ2 e −22 σ2. In this distribution, E = μ and the variance Var is related with σ via Var = ∫ − ∞ ∞22 π σ2 e −22 σ2 d x = σ2. The role of the distribution in the central limit theorem is in part responsible for the prevalence of the variance in probability. The exponential distribution with parameter λ is a distribution whose support is the semi-infinite interval. Its probability density function is given by f = λ e − λ x, the variance is equal to Var = ∫0 ∞2 λ e − λ x d x = λ −2. So for an exponentially distributed random variable, σ2 = μ2, the Poisson distribution with parameter λ is a discrete distribution for k =0,1,2, …. Its probability mass function is given by p = λ k k, E − λ, and it has expected value μ = λ. The variance is equal to Var = ∑ k =0 ∞ λ k k, E − λ2 = λ, So for a Poisson-distributed random variable, σ2 = μ. The binomial distribution with n and p is a discrete distribution for k =0,1,2, …, n. Its probability mass function is given by p = p k n − k, the variance is equal to Var = ∑ k =0 n p k n − k 2 = n p
7.
Skewness
–
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive or negative, or even undefined, the qualitative interpretation of the skew is complicated and unintuitive. Skew must not be thought to refer to the direction the curve appears to be leaning, in fact, conversely, positive skew indicates that the tail on the right side is longer or fatter than the left side. In cases where one tail is long but the tail is fat. Further, in multimodal distributions and discrete distributions, skewness is also difficult to interpret, importantly, the skewness does not determine the relationship of mean and median. In cases where it is necessary, data might be transformed to have a normal distribution, consider the two distributions in the figure just below. Within each graph, the values on the side of the distribution taper differently from the values on the left side. A left-skewed distribution usually appears as a right-leaning curve, positive skew, The right tail is longer, the mass of the distribution is concentrated on the left of the figure. A right-skewed distribution usually appears as a left-leaning curve, Skewness in a data series may sometimes be observed not only graphically but by simple inspection of the values. For instance, consider the sequence, whose values are evenly distributed around a central value of 50. If the distribution is symmetric, then the mean is equal to the median, if, in addition, the distribution is unimodal, then the mean = median = mode. This is the case of a coin toss or the series 1,2,3,4, note, however, that the converse is not true in general, i. e. zero skewness does not imply that the mean is equal to the median. Paul T. von Hippel points out, Many textbooks, teach a rule of thumb stating that the mean is right of the median under right skew and this rule fails with surprising frequency. It can fail in multimodal distributions, or in distributions where one tail is long, most commonly, though, the rule fails in discrete distributions where the areas to the left and right of the median are not equal. Such distributions not only contradict the textbook relationship between mean, median, and skew, they contradict the textbook interpretation of the median. It is sometimes referred to as Pearsons moment coefficient of skewness, or simply the moment coefficient of skewness, the last equality expresses skewness in terms of the ratio of the third cumulant κ3 to the 1. 5th power of the second cumulant κ2. This is analogous to the definition of kurtosis as the fourth cumulant normalized by the square of the second cumulant, the skewness is also sometimes denoted Skew. Starting from a standard cumulant expansion around a distribution, one can show that skewness =6 /standard deviation + O
8.
Kurtosis
–
In probability theory and statistics, kurtosis is a measure of the tailedness of the probability distribution of a real-valued random variable. Depending on the measure of kurtosis that is used, there are various interpretations of kurtosis. The standard measure of kurtosis, originating with Karl Pearson, is based on a version of the fourth moment of the data or population. This number is related to the tails of the distribution, not its peak, hence, for this measure, higher kurtosis is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. The kurtosis of any normal distribution is 3. It is common to compare the kurtosis of a distribution to this value, distributions with kurtosis less than 3 are said to be platykurtic, although this does not imply the distribution is flat-topped as sometimes reported. Rather, it means the distribution produces fewer and less extreme outliers than does the normal distribution, an example of a platykurtic distribution is the uniform distribution, which does not produce outliers. Distributions with kurtosis greater than 3 are said to be leptokurtic and it is also common practice to use an adjusted version of Pearsons kurtosis, the excess kurtosis, which is the kurtosis minus 3, to provide the comparison to the normal distribution. Some authors use kurtosis by itself to refer to the excess kurtosis, for the reason of clarity and generality, however, this article follows the non-excess convention and explicitly indicates where excess kurtosis is meant. Alternative measures of kurtosis are, the L-kurtosis, which is a version of the fourth L-moment. These are analogous to the measures of skewness that are not based on ordinary moments. The kurtosis is the fourth standardized moment, defined as Kurt = μ4 σ4 = E 2, several letters are used in the literature to denote the kurtosis. A very common choice is κ, which is fine as long as it is clear that it does not refer to a cumulant, other choices include γ2, to be similar to the notation for skewness, although sometimes this is instead reserved for the excess kurtosis. The kurtosis is bounded below by the squared skewness plus 1, μ4 σ4 ≥2 +1, the lower bound is realized by the Bernoulli distribution. There is no limit to the excess kurtosis of a general probability distribution. A reason why some authors favor the excess kurtosis is that cumulants are extensive, formulas related to the extensive property are more naturally expressed in terms of the excess kurtosis. Xn be independent random variables for which the fourth moment exists, the excess kurtosis of Y is Kurt −3 =12 ∑ i =1 n σ i 4 ⋅, where σ i is the standard deviation of X i. In particular if all of the Xi have the same variance, the reason not to subtract off 3 is that the bare fourth moment better generalizes to multivariate distributions, especially when independence is not assumed
9.
Moment-generating function
–
In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of a route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the functions of distributions defined by the weighted sums of random variables. Note, however, that not all variables have moment-generating functions. In addition to real-valued distributions, moment-generating functions can be defined for vector- or matrix-valued random variables, the moment-generating function of a real-valued distribution does not always exist, unlike the characteristic function. There are relations between the behavior of the function of a distribution and properties of the distribution, such as the existence of moments. In probability theory and statistics, the function of a random variable X is M X, = E, t ∈ R. In other terms, the function can be interpreted as the expectation of the random variable e t X. M X always exists and is equal to 1. A key problem with moment-generating functions is that moments and the function may not exist. By contrast, the function or Fourier transform always exists. More generally, where X = T, a random vector. The reason for defining this function is that it can be used to all the moments of the distribution. The series expansion of etX is, e t X =1 + t X + t 2 X22, hence, M X = E =1 + t E + t 2 E2. + ⋯ + t n E n. + ⋯ =1 + t m 1 + t 2 m 22, + ⋯ + t n m n n. + ⋯, where mn is the nth moment. Differentiating MX i times with respect to t and setting t =0 we hence obtain the ith moment about the origin, mi, here are some examples of the moment generating function and the characteristic function for comparison. It can be seen that the function is a Wick rotation of the moment generating function MX when the latter exists. Note that for the case where X has a probability density function ƒ. M X = ∫ − ∞ ∞ e t x f d x = ∫ − ∞ ∞ f d x =1 + t m 1 + t 2 m 22
10.
Characteristic function (probability theory)
–
In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the function is the Fourier transform of the probability density function. Thus it provides the basis of a route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the functions of distributions defined by the weighted sums of random variables. In addition to univariate distributions, characteristic functions can be defined for vector or matrix-valued random variables, the characteristic function always exists when treated as a function of a real-valued argument, unlike the moment-generating function. There are relations between the behavior of the function of a distribution and properties of the distribution, such as the existence of moments. The characteristic function provides a way for describing a random variable. However, in cases, there can be differences in whether these functions can be represented as expressions involving simple standard functions. If a random variable admits a density function, then the function is its dual. If a random variable has a function, then the domain of the characteristic function can be extended to the complex plane. Note however that the function of a distribution always exists. Another important application is to the theory of the decomposability of random variables, QX is the inverse cumulative distribution function of X also called the quantile function of X. It should be noted though, that this convention for the constants appearing in the definition of the function differs from the usual convention for the Fourier transform. For example, some authors define φX = Ee−2πitX, which is essentially a change of parameter, other notation may be encountered in the literature, p ^ as the characteristic function for a probability measure p, or f ^ as the characteristic function corresponding to a density f. The notion of characteristic functions generalizes to multivariate random variables and more complicated random elements, the argument of the characteristic function will always belong to the continuous dual of the space where random variable X takes values. Here T denotes matrix transpose, tr — the matrix trace operator, Re is the part of a complex number, z denotes complex conjugate. Oberhettinger provides extensive tables of characteristic functions, the characteristic function of a real-valued random variable always exists, since it is an integral of a bounded continuous function over a space whose measure is finite. A characteristic function is continuous on the entire space It is non-vanishing in a region around zero
11.
Probability theory
–
Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. It is not possible to predict precisely results of random events, two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, a great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. Christiaan Huygens published a book on the subject in 1657 and in the 19th century, initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial. Eventually, analytical considerations compelled the incorporation of continuous variables into the theory and this culminated in modern probability theory, on foundations laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of space, introduced by Richard von Mises. This became the mostly undisputed axiomatic basis for modern probability theory, most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory-based treatment of probability covers the discrete, continuous, consider an experiment that can produce a number of outcomes. The set of all outcomes is called the space of the experiment. The power set of the space is formed by considering all different collections of possible results. For example, rolling an honest die produces one of six possible results, one collection of possible results corresponds to getting an odd number. Thus, the subset is an element of the set of the sample space of die rolls. In this case, is the event that the die falls on some odd number, If the results that actually occur fall in a given event, that event is said to have occurred. Probability is a way of assigning every event a value between zero and one, with the requirement that the event made up of all possible results be assigned a value of one, the probability that any one of the events, or will occur is 5/6. This is the same as saying that the probability of event is 5/6 and this event encompasses the possibility of any number except five being rolled. The mutually exclusive event has a probability of 1/6, and the event has a probability of 1, discrete probability theory deals with events that occur in countable sample spaces. Modern definition, The modern definition starts with a finite or countable set called the sample space, which relates to the set of all possible outcomes in classical sense, denoted by Ω
12.
Statistics
–
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e. g. a scientific, industrial, or social problem, populations can be diverse topics such as all people living in a country or every atom composing a crystal. Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys, statistician Sir Arthur Lyon Bowley defines statistics as Numerical statements of facts in any department of inquiry placed in relation to each other. When census data cannot be collected, statisticians collect data by developing specific experiment designs, representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. In contrast, an observational study does not involve experimental manipulation, inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two data sets, or a data set and a synthetic data drawn from idealized model. A hypothesis is proposed for the relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the hypothesis is done using statistical tests that quantify the sense in which the null can be proven false. Working from a hypothesis, two basic forms of error are recognized, Type I errors and Type II errors. Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis, measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random or systematic, the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics continues to be an area of research, for example on the problem of how to analyze Big data. Statistics is a body of science that pertains to the collection, analysis, interpretation or explanation. Some consider statistics to be a mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty, mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure-theoretic probability theory. In applying statistics to a problem, it is practice to start with a population or process to be studied. Populations can be diverse topics such as all living in a country or every atom composing a crystal. Ideally, statisticians compile data about the entire population and this may be organized by governmental statistical institutes
13.
Probability distribution
–
For instance, if the random variable X is used to denote the outcome of a coin toss, then the probability distribution of X would take the value 0.5 for X = heads, and 0.5 for X = tails. In more technical terms, the probability distribution is a description of a phenomenon in terms of the probabilities of events. Examples of random phenomena can include the results of an experiment or survey, a probability distribution is defined in terms of an underlying sample space, which is the set of all possible outcomes of the random phenomenon being observed. The sample space may be the set of numbers or a higher-dimensional vector space, or it may be a list of non-numerical values, for example. Probability distributions are divided into two classes. A discrete probability distribution can be encoded by a discrete list of the probabilities of the outcomes, on the other hand, a continuous probability distribution is typically described by probability density functions. The normal distribution represents a commonly encountered continuous probability distribution, more complex experiments, such as those involving stochastic processes defined in continuous time, may demand the use of more general probability measures. A probability distribution whose sample space is the set of numbers is called univariate. Important and commonly encountered univariate probability distributions include the distribution, the hypergeometric distribution. The multivariate normal distribution is a commonly encountered multivariate distribution, to define probability distributions for the simplest cases, one needs to distinguish between discrete and continuous random variables. For example, the probability that an object weighs exactly 500 g is zero. Continuous probability distributions can be described in several ways, the cumulative distribution function is the antiderivative of the probability density function provided that the latter function exists. As probability theory is used in diverse applications, terminology is not uniform. The following terms are used for probability distribution functions, Distribution. Probability distribution, is a table that displays the probabilities of outcomes in a sample. Could be called a frequency distribution table, where all occurrences of outcomes sum to 1. Distribution function, is a form of frequency distribution table. Probability distribution function, is a form of probability distribution table
14.
Bernoulli trial
–
It is named after Jacob Bernoulli, a 17th century Swiss mathematician. The mathematical formalisation of the Bernoulli trial is known as the Bernoulli process and this article offers an elementary introduction to the concept, whereas the article on the Bernoulli process offers a more advanced treatment. Since a Bernoulli trial has only two outcomes, it can be framed as some yes or no question. For example, Is the top card of a shuffled deck an ace, was the newborn child a girl. Therefore, success and failure are merely labels for the two outcomes, and should not be construed literally, the term success in this sense consists in the result meeting specified conditions, not in any moral judgement. More generally, given any probability space, for any event, one can define a Bernoulli trial, examples of Bernoulli trials include, Flipping a coin. In this context, obverse conventionally denotes success and reverse denotes failure, a fair coin has the probability of success 0.5 by definition. In this case there are exactly two outcomes, rolling a dice, where a six is success and everything else a failure. In this case there are six outcomes, and the event is a six, in conducting a political opinion poll, choosing a voter at random to ascertain whether that voter will vote yes in an upcoming referendum. Independent repeated trials of an experiment with two possible outcomes are called Bernoulli trials. Call one of the success and the other outcome failure. Let p be the probability of success in a Bernoulli trial, then the probability of success and the probability of failure sum to unity, since these are complementary events, success and failure are mutually exclusive and exhaustive. Random variables describing Bernoulli trials are often encoded using the convention that 1 = success,0 = failure, a random variable corresponding to a binomial is denoted by B, and is said to have a binomial distribution. When multiple Bernoulli trials are performed, each with its probability of success, consider the simple experiment where a fair coin is tossed four times. Find the probability that two of the tosses result in heads. For this experiment, let a heads be defined as a success, because the coin is assumed to be fair, the probability of success is p =12. Thus the probability of failure, q, is given by q =1 − p =1 −12 =12. Using the equation above, the probability of exactly two out of four total tosses resulting in a heads is given by, P = p 2 q 2 =6 ×2 ×2 =38
15.
Binomial distribution
–
The binomial distribution is the basis for the popular binomial test of statistical significance. The binomial distribution is used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the distribution is a hypergeometric distribution. However, for N much larger than n, the distribution remains a good approximation. In general, if the random variable X follows the distribution with parameters n ∈ ℕ and p ∈. The probability of getting exactly k successes in n trials is given by the probability mass function, N, where = n. k. is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows, K successes occur with probability pk and n − k failures occur with probability n − k. However, the k successes can occur anywhere among the n trials, in creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as f = f. The probability mass function satisfies the recurrence relation, for every n, p, Looking at the expression ƒ as a function of k. This k value can be found by calculating f f = p and comparing it to 1. There is always an integer M that satisfies p −1 ≤ M < p. ƒ is monotone increasing for k < M and monotone decreasing for k > M, in this case, there are two values for which ƒ is maximal, p and p −1. M is the most probable outcome of the Bernoulli trials and is called the mode, note that the probability of it occurring can be fairly small. It can also be represented in terms of the incomplete beta function, as follows. Some closed-form bounds for the distribution function are given below. Suppose a biased coin comes up heads with probability 0.3 when tossed, what is the probability of achieving 0,1. For example, if n =100, and p =1/4, P k −1 − = n p ∑ k =1 n. Since Var = p, we get, Var = Var = Var + ⋯ + Var = n Var = n p
16.
Beta distribution
–
The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. In Bayesian inference, the distribution is the conjugate prior probability distribution for the Bernoulli, binomial, negative binomial. The beta distribution is a model for the random behavior of percentages. The beta function, B, is a constant to ensure that the total probability integrates to 1. In the above equations x is an observed value that actually occurred—of a random process X. L. Johnson. Several authors, including N. L. Johnson and S, the probability density function satisfies the differential equation f ′ = f x − x. The cumulative distribution function is F = B B = I x where B is the beta function. The mode of a Beta distributed random variable X with α, β >1 is the most likely value of the distribution, when both parameters are less than one, this is the anti-mode, the lowest point of the probability density curve. Letting α = β, the expression for the mode simplifies to 1/2, showing that for α = β >1 the mode, is at the center of the distribution, it is symmetric in those cases. See Shapes section in this article for a full list of mode cases, for several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the value of the density function occurring at the end is finite, for example, in the case of α =2, β =1, the density function becomes a right-triangle distribution which is finite at both ends. In several other cases there is a singularity at one end, for example, in the case α = β = 1/2, the Beta distribution simplifies to become the arcsine distribution. There is debate among mathematicians about some of cases and whether the ends can be called modes or not. There is no general closed-form expression for the median of the distribution for arbitrary values of α and β. Closed-form expressions for particular values of the parameters α and β follow, For symmetric cases α = β, median = 1/2. For α =1 and β >0, median =1 −2 −1 β For α >0 and β =1, median =2 −1 α For α =3 and β =2, median =0.6142724318676105. The real solution to the quartic equation 1 − 8x3 + 6x4 =0, for α =2 and β =3, median =0.38572756813238945. When α, β ≥1, the error in this approximation is less than 4%
17.
Bernoulli distribution
–
It can be used to represent a coin toss where 1 and 0 would represent head and tail, respectively. In particular, unfair coins would have p ≠0.5, the Bernoulli distribution is a special case of the binomial distribution where a single experiment/trial is conducted. It is also a case of the two-point distribution, for which the outcome need not be a bit. If X is a variable with this distribution, we have. The probability mass function f of this distribution, over possible outcomes k, is f = { p if k =1,1 − p if k =0 and this can also be expressed as f = p k 1 − k for k ∈. The Bernoulli distribution is a case of the binomial distribution with n =1. The Bernoulli distributions for 0 ≤ p ≤1 form an exponential family, the maximum likelihood estimator of p based on a random sample is the sample mean. When we take the standardized Bernoulli distributed random variable X − E Var we find that this random variable attains q p q with probability p, the Bernoulli distribution is simply B. The categorical distribution is the generalization of the Bernoulli distribution for variables with any constant number of discrete values, the Beta distribution is the conjugate prior of the Bernoulli distribution. The geometric distribution models the number of independent and identical Bernoulli trials needed to get one success, if Y ~ Bernoulli, then has a Rademacher distribution. Bernoulli process Bernoulli sampling Bernoulli trial Binary entropy function Binomial distribution McCullagh, Peter, Nelder, johnson, N. L. Kotz, S. Kemp A. Univariate Discrete Distributions. Binomial distribution, Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4 Weisstein, Eric W. Bernoulli Distribution
18.
Discrete uniform distribution
–
Another way of saying discrete uniform distribution would be a known, finite number of outcomes equally likely to happen. A simple example of the uniform distribution is throwing a fair die. The possible values are 1,2,3,4,5,6, if two dice are thrown and their values added, the resulting distribution is no longer uniform since not all sums have equal probability. The discrete uniform distribution itself is inherently non-parametric and it is convenient, however, to represent its values generally by an integer interval, so that a, b become the main parameters of the distribution. This problem is known as the German tank problem, following the application of maximum estimation to estimates of German tank production during World War II. The UMVU estimator for the maximum is given by N ^ = k +1 k m −1 = m + m k −1 where m is the maximum and k is the sample size. This can be seen as a simple case of maximum spacing estimation. This has a variance of 1 k ≈ N2 k 2 for small samples k ≪ N so a standard deviation of approximately N k, the sample maximum is the maximum likelihood estimator for the population maximum, but, as discussed above, it is biased. If samples are not numbered but are recognizable or markable, one can instead estimate population size via the capture-recapture method, see rencontres numbers for an account of the probability distribution of the number of fixed points of a uniformly distributed random permutation