1.
Dice
–
Dice are small throwable objects with multiple resting positions, used for generating random numbers. Dice are suitable as gambling devices for games like craps and are used in non-gambling tabletop games. A traditional die is a cube, with each of its six faces showing a different number of dots from 1 to 6. When thrown or rolled, the die comes to rest showing on its surface a random integer from one to six. A variety of devices are also described as dice, such specialized dice may have polyhedral or irregular shapes. They may be used to produce other than one through six. Loaded and crooked dice are designed to favor some results over others for purposes of cheating or amusement. A dice tray, a used to contain thrown dice, is sometimes used for gambling or board games. Dice have been used since before recorded history, and it is uncertain where they originated, the oldest known dice were excavated as part of a backgammon-like game set at the Burnt City, an archeological site in south-eastern Iran, estimated to be from between 2800–2500 BCE. Other excavations from ancient tombs in the Indus Valley civilization indicate a South Asian origin, the Egyptian game of Senet was played with dice. Senet was played before 3000 BC and up to the 2nd century AD and it was likely a racing game, but there is no scholarly consensus on the rules of Senet. Dicing is mentioned as an Indian game in the Rigveda, Atharvaveda, there are several biblical references to casting lots, as in Psalm 22, indicating that dicing was commonplace when the psalm was composed. Knucklebones was a game of skill played by women and children, although gambling was illegal, many Romans were passionate gamblers who enjoyed dicing, which was known as aleam ludere. Dicing was even a popular pastime of emperors, letters by Augustus to Tacitus and his daughter recount his hobby of dicing. There were two sizes of Roman dice, tali were large dice inscribed with one, three, four, and six on four sides. Tesserae were smaller dice with sides numbered one to six. Twenty-sided dice date back to the 2nd century AD and from Ptolemaic Egypt as early as the 2nd century BC, dominoes and playing cards originated in China as developments from dice. The transition from dice to playing cards occurred in China around the Tang dynasty, in Japan, dice were used to play a popular game called sugoroku
2.
Statistics
–
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e. g. a scientific, industrial, or social problem, populations can be diverse topics such as all people living in a country or every atom composing a crystal. Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys, statistician Sir Arthur Lyon Bowley defines statistics as Numerical statements of facts in any department of inquiry placed in relation to each other. When census data cannot be collected, statisticians collect data by developing specific experiment designs, representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. In contrast, an observational study does not involve experimental manipulation, inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two data sets, or a data set and a synthetic data drawn from idealized model. A hypothesis is proposed for the relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the hypothesis is done using statistical tests that quantify the sense in which the null can be proven false. Working from a hypothesis, two basic forms of error are recognized, Type I errors and Type II errors. Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis, measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random or systematic, the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics continues to be an area of research, for example on the problem of how to analyze Big data. Statistics is a body of science that pertains to the collection, analysis, interpretation or explanation. Some consider statistics to be a mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty, mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure-theoretic probability theory. In applying statistics to a problem, it is practice to start with a population or process to be studied. Populations can be diverse topics such as all living in a country or every atom composing a crystal. Ideally, statisticians compile data about the entire population and this may be organized by governmental statistical institutes
3.
Sample space
–
In probability theory, the sample space of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is denoted using set notation, and the possible outcomes are listed as elements in the set. It is common to refer to a space by the labels S, Ω. For example, if the experiment is tossing a coin, the space is typically the set. For tossing two coins, the sample space would be. For tossing a single six-sided die, the sample space is. A well-defined sample space is one of three elements in a probabilistic model, the other two are a well-defined set of possible events and a probability assigned to each event. For many experiments, there may be more than one plausible sample space available, for example, when drawing a card from a standard deck of fifty-two playing cards, one possibility for the sample space could be the various ranks, while another could be the suits. Still other sample spaces are possible, such as if some cards have been flipped when shuffling, some treatments of probability assume that the various outcomes of an experiment are always defined so as to be equally likely. The result of this is every possible combination of individuals who could be chosen for the sample is also equally likely. In an elementary approach to probability, any subset of the space is usually called an event. However, this rise to problems when the sample space is infinite. Under this definition only measurable subsets of the space, constituting a σ-algebra over the sample space itself, are considered events. Probability space Space Set Event σ-algebra
4.
Kurtosis
–
In probability theory and statistics, kurtosis is a measure of the tailedness of the probability distribution of a real-valued random variable. Depending on the measure of kurtosis that is used, there are various interpretations of kurtosis. The standard measure of kurtosis, originating with Karl Pearson, is based on a version of the fourth moment of the data or population. This number is related to the tails of the distribution, not its peak, hence, for this measure, higher kurtosis is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. The kurtosis of any normal distribution is 3. It is common to compare the kurtosis of a distribution to this value, distributions with kurtosis less than 3 are said to be platykurtic, although this does not imply the distribution is flat-topped as sometimes reported. Rather, it means the distribution produces fewer and less extreme outliers than does the normal distribution, an example of a platykurtic distribution is the uniform distribution, which does not produce outliers. Distributions with kurtosis greater than 3 are said to be leptokurtic and it is also common practice to use an adjusted version of Pearsons kurtosis, the excess kurtosis, which is the kurtosis minus 3, to provide the comparison to the normal distribution. Some authors use kurtosis by itself to refer to the excess kurtosis, for the reason of clarity and generality, however, this article follows the non-excess convention and explicitly indicates where excess kurtosis is meant. Alternative measures of kurtosis are, the L-kurtosis, which is a version of the fourth L-moment. These are analogous to the measures of skewness that are not based on ordinary moments. The kurtosis is the fourth standardized moment, defined as Kurt = μ4 σ4 = E 2, several letters are used in the literature to denote the kurtosis. A very common choice is κ, which is fine as long as it is clear that it does not refer to a cumulant, other choices include γ2, to be similar to the notation for skewness, although sometimes this is instead reserved for the excess kurtosis. The kurtosis is bounded below by the squared skewness plus 1, μ4 σ4 ≥2 +1, the lower bound is realized by the Bernoulli distribution. There is no limit to the excess kurtosis of a general probability distribution. A reason why some authors favor the excess kurtosis is that cumulants are extensive, formulas related to the extensive property are more naturally expressed in terms of the excess kurtosis. Xn be independent random variables for which the fourth moment exists, the excess kurtosis of Y is Kurt −3 =12 ∑ i =1 n σ i 4 ⋅, where σ i is the standard deviation of X i. In particular if all of the Xi have the same variance, the reason not to subtract off 3 is that the bare fourth moment better generalizes to multivariate distributions, especially when independence is not assumed
5.
Probability theory
–
Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. It is not possible to predict precisely results of random events, two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, a great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. Christiaan Huygens published a book on the subject in 1657 and in the 19th century, initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial. Eventually, analytical considerations compelled the incorporation of continuous variables into the theory and this culminated in modern probability theory, on foundations laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of space, introduced by Richard von Mises. This became the mostly undisputed axiomatic basis for modern probability theory, most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory-based treatment of probability covers the discrete, continuous, consider an experiment that can produce a number of outcomes. The set of all outcomes is called the space of the experiment. The power set of the space is formed by considering all different collections of possible results. For example, rolling an honest die produces one of six possible results, one collection of possible results corresponds to getting an odd number. Thus, the subset is an element of the set of the sample space of die rolls. In this case, is the event that the die falls on some odd number, If the results that actually occur fall in a given event, that event is said to have occurred. Probability is a way of assigning every event a value between zero and one, with the requirement that the event made up of all possible results be assigned a value of one, the probability that any one of the events, or will occur is 5/6. This is the same as saying that the probability of event is 5/6 and this event encompasses the possibility of any number except five being rolled. The mutually exclusive event has a probability of 1/6, and the event has a probability of 1, discrete probability theory deals with events that occur in countable sample spaces. Modern definition, The modern definition starts with a finite or countable set called the sample space, which relates to the set of all possible outcomes in classical sense, denoted by Ω
6.
Standard deviation
–
In statistics, the standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of data values. The standard deviation of a variable, statistical population, data set. It is algebraically simpler, though in practice less robust, than the absolute deviation. A useful property of the deviation is that, unlike the variance. There are also other measures of deviation from the norm, including mean absolute deviation, in addition to expressing the variability of a population, the standard deviation is commonly used to measure confidence in statistical conclusions. For example, the margin of error in polling data is determined by calculating the standard deviation in the results if the same poll were to be conducted multiple times. This derivation of a deviation is often called the standard error of the estimate or standard error of the mean when referring to a mean. It is computed as the deviation of all the means that would be computed from that population if an infinite number of samples were drawn. It is very important to note that the deviation of a population. The reported margin of error of a poll is computed from the error of the mean and is typically about twice the standard deviation—the half-width of a 95 percent confidence interval. The standard deviation is also important in finance, where the standard deviation on the rate of return on an investment is a measure of the volatility of the investment. For a finite set of numbers, the deviation is found by taking the square root of the average of the squared deviations of the values from their average value. For example, the marks of a class of eight students are the eight values,2,4,4,4,5,5,7,9. These eight data points have the mean of 5,2 +4 +4 +4 +5 +5 +7 +98 =5 and this formula is valid only if the eight values with which we began form the complete population. If the values instead were a sample drawn from some large parent population. In that case the result would be called the standard deviation. Dividing by n −1 rather than by n gives an estimate of the variance of the larger parent population. This is known as Bessels correction, as a slightly more complicated real-life example, the average height for adult men in the United States is about 70 inches, with a standard deviation of around 3 inches
7.
Skewness
–
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive or negative, or even undefined, the qualitative interpretation of the skew is complicated and unintuitive. Skew must not be thought to refer to the direction the curve appears to be leaning, in fact, conversely, positive skew indicates that the tail on the right side is longer or fatter than the left side. In cases where one tail is long but the tail is fat. Further, in multimodal distributions and discrete distributions, skewness is also difficult to interpret, importantly, the skewness does not determine the relationship of mean and median. In cases where it is necessary, data might be transformed to have a normal distribution, consider the two distributions in the figure just below. Within each graph, the values on the side of the distribution taper differently from the values on the left side. A left-skewed distribution usually appears as a right-leaning curve, positive skew, The right tail is longer, the mass of the distribution is concentrated on the left of the figure. A right-skewed distribution usually appears as a left-leaning curve, Skewness in a data series may sometimes be observed not only graphically but by simple inspection of the values. For instance, consider the sequence, whose values are evenly distributed around a central value of 50. If the distribution is symmetric, then the mean is equal to the median, if, in addition, the distribution is unimodal, then the mean = median = mode. This is the case of a coin toss or the series 1,2,3,4, note, however, that the converse is not true in general, i. e. zero skewness does not imply that the mean is equal to the median. Paul T. von Hippel points out, Many textbooks, teach a rule of thumb stating that the mean is right of the median under right skew and this rule fails with surprising frequency. It can fail in multimodal distributions, or in distributions where one tail is long, most commonly, though, the rule fails in discrete distributions where the areas to the left and right of the median are not equal. Such distributions not only contradict the textbook relationship between mean, median, and skew, they contradict the textbook interpretation of the median. It is sometimes referred to as Pearsons moment coefficient of skewness, or simply the moment coefficient of skewness, the last equality expresses skewness in terms of the ratio of the third cumulant κ3 to the 1. 5th power of the second cumulant κ2. This is analogous to the definition of kurtosis as the fourth cumulant normalized by the square of the second cumulant, the skewness is also sometimes denoted Skew. Starting from a standard cumulant expansion around a distribution, one can show that skewness =6 /standard deviation + O
8.
Measure (mathematics)
–
In mathematical analysis, a measure on a set is a systematic way to assign a number to each suitable subset of that set, intuitively interpreted as its size. In this sense, a measure is a generalization of the concepts of length, area, for instance, the Lebesgue measure of the interval in the real numbers is its length in the everyday sense of the word – specifically,1. Technically, a measure is a function that assigns a real number or +∞ to subsets of a set X. It must further be countably additive, the measure of a subset that can be decomposed into a finite number of smaller disjoint subsets, is the sum of the measures of the smaller subsets. In general, if one wants to associate a consistent size to each subset of a set while satisfying the other axioms of a measure. This problem was resolved by defining measure only on a sub-collection of all subsets, the so-called measurable subsets and this means that countable unions, countable intersections and complements of measurable subsets are measurable. Non-measurable sets in a Euclidean space, on which the Lebesgue measure cannot be defined consistently, are complicated in the sense of being badly mixed up with their complement. Indeed, their existence is a consequence of the axiom of choice. Measure theory was developed in stages during the late 19th and early 20th centuries by Émile Borel, Henri Lebesgue, Johann Radon. The main applications of measures are in the foundations of the Lebesgue integral, in Andrey Kolmogorovs axiomatisation of probability theory, probability theory considers measures that assign to the whole set the size 1, and considers measurable subsets to be events whose probability is given by the measure. Ergodic theory considers measures that are invariant under, or arise naturally from, let X be a set and Σ a σ-algebra over X. A function μ from Σ to the real number line is called a measure if it satisfies the following properties, Non-negativity. Countable additivity, For all countable collections i =1 ∞ of pairwise disjoint sets in Σ, μ = ∑ k =1 ∞ μ One may require that at least one set E has finite measure. Then the empty set automatically has measure zero because of countable additivity, because μ = μ = μ + μ + μ + …, which implies that μ =0. If only the second and third conditions of the definition of measure above are met, the pair is called a measurable space, the members of Σ are called measurable sets. If and are two spaces, then a function f, X → Y is called measurable if for every Y-measurable set B ∈ Σ Y. See also Measurable function#Caveat about another setup, a triple is called a measure space. A probability measure is a measure with total measure one – i. e, a probability space is a measure space with a probability measure
9.
Joint probability distribution
–
In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution. The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint probability density function or joint probability mass function. Consider the flip of two coins, let A and B be discrete random variables associated with the outcomes first. If a coin displays heads then associated random variable is 1, the joint probability density function of A and B defines probabilities for each pair of outcomes. All possible outcomes are, Since each outcome is likely the joint probability density function becomes P =1 /4 when A, B ∈. Since the coin flips are independent, the joint probability density function is the product of the marginals, in general, each coin flip is a Bernoulli trial and the sequence of flips follows a Bernoulli distribution. Consider the roll of a dice and let A =1 if the number is even. Furthermore, let B =1 if the number is prime and B =0 otherwise. Then, the joint distribution of A and B, expressed as a probability function, is P = P =16, P = P =26, P = P =26, P = P =16. These probabilities necessarily sum to 1, since the probability of some combination of A and B occurring is 1. The joint probability function of two discrete random variables X, Y is, P = P ⋅ P = P ⋅ P. Again, since these are probability distributions, one has ∫ x ∫ y f X, Y d y d x =1, formally, fX, Y is the probability density function of with respect to the product measure on the respective supports of X and Y. Two discrete random variables X and Y are independent if the joint probability mass function satisfies P = P ⋅ P for all x and y. Similarly, two absolutely continuous random variables are independent if f X, Y = f X ⋅ f Y for all x and y, such conditional independence relations can be represented with a Bayesian network or copula functions
10.
Quantile function
–
It is also called the percent-point function or inverse cumulative distribution function. d. f would fall p percent of the time. In terms of the distribution function F, the quantile function Q returns the value x such that F X, = Pr = p, another way to express the quantile function is Q = inf for a probability 0 < p <1. Here we capture the fact that the function returns the minimum value of x from amongst all those values whose c. d. f value exceeds p. In this case we need to use the more complicated formula above, for example, the cumulative distribution function of Exponential is F = {1 − e − λ x x ≥0,0 x <0. The quantile function for Exponential is derived by finding the value of Q for which 1 − e − λ Q = p, Q = − ln λ, for 0 ≤ p <1. The quartiles are therefore, first quartile ln / λ median ln / λ third quartile ln / λ, Quantile functions are used in both statistical applications and Monte Carlo methods. The quantile function, Q, of a probability distribution is the inverse of its distribution function F. The derivative of the function, namely the quantile density function, is yet another way of prescribing a probability distribution. It is the reciprocal of the pdf composed with the quantile function, for statistical applications, users need to know key percentage points of a given distribution. Before the popularization of computers, it was not uncommon for books to have appendices with statistical tables sampling the quantile function, statistical applications of quantile functions are discussed extensively by Gilchrist. Monte-Carlo simulations employ quantile functions to produce non-uniform random or pseudorandom numbers for use in diverse types of simulation calculations, a sample from a given distribution may be obtained in principle by applying its quantile function to a sample from a uniform distribution. When the cdf itself has an expression, one can always use a numerical root-finding algorithm such as the bisection method to invert the cdf. Other algorithms to evaluate quantile functions are given in the Numerical Recipes series of books, algorithms for common distributions are built into many statistical software packages. Quantile functions may also be characterized as solutions of non-linear ordinary, the ordinary differential equations for the cases of the normal, Student, beta and gamma distributions have been given and solved. The normal distribution is perhaps the most important case, unfortunately, this function has no closed-form representation using basic algebraic functions, as a result, approximate representations are usually used. Thorough composite rational and polynomial approximations have been given by Wichura, non-composite rational approximations have been developed by Shaw. A non-linear ordinary differential equation for the normal quantile, w and it is d 2 w d p 2 = w 2 with the centre conditions w =0, w ′ =2 π. This equation may be solved by methods, including the classical power series approach
11.
Binomial distribution
–
The binomial distribution is the basis for the popular binomial test of statistical significance. The binomial distribution is used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the distribution is a hypergeometric distribution. However, for N much larger than n, the distribution remains a good approximation. In general, if the random variable X follows the distribution with parameters n ∈ ℕ and p ∈. The probability of getting exactly k successes in n trials is given by the probability mass function, N, where = n. k. is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows, K successes occur with probability pk and n − k failures occur with probability n − k. However, the k successes can occur anywhere among the n trials, in creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as f = f. The probability mass function satisfies the recurrence relation, for every n, p, Looking at the expression ƒ as a function of k. This k value can be found by calculating f f = p and comparing it to 1. There is always an integer M that satisfies p −1 ≤ M < p. ƒ is monotone increasing for k < M and monotone decreasing for k > M, in this case, there are two values for which ƒ is maximal, p and p −1. M is the most probable outcome of the Bernoulli trials and is called the mode, note that the probability of it occurring can be fairly small. It can also be represented in terms of the incomplete beta function, as follows. Some closed-form bounds for the distribution function are given below. Suppose a biased coin comes up heads with probability 0.3 when tossed, what is the probability of achieving 0,1. For example, if n =100, and p =1/4, P k −1 − = n p ∑ k =1 n. Since Var = p, we get, Var = Var = Var + ⋯ + Var = n Var = n p
12.
Probability distribution
–
For instance, if the random variable X is used to denote the outcome of a coin toss, then the probability distribution of X would take the value 0.5 for X = heads, and 0.5 for X = tails. In more technical terms, the probability distribution is a description of a phenomenon in terms of the probabilities of events. Examples of random phenomena can include the results of an experiment or survey, a probability distribution is defined in terms of an underlying sample space, which is the set of all possible outcomes of the random phenomenon being observed. The sample space may be the set of numbers or a higher-dimensional vector space, or it may be a list of non-numerical values, for example. Probability distributions are divided into two classes. A discrete probability distribution can be encoded by a discrete list of the probabilities of the outcomes, on the other hand, a continuous probability distribution is typically described by probability density functions. The normal distribution represents a commonly encountered continuous probability distribution, more complex experiments, such as those involving stochastic processes defined in continuous time, may demand the use of more general probability measures. A probability distribution whose sample space is the set of numbers is called univariate. Important and commonly encountered univariate probability distributions include the distribution, the hypergeometric distribution. The multivariate normal distribution is a commonly encountered multivariate distribution, to define probability distributions for the simplest cases, one needs to distinguish between discrete and continuous random variables. For example, the probability that an object weighs exactly 500 g is zero. Continuous probability distributions can be described in several ways, the cumulative distribution function is the antiderivative of the probability density function provided that the latter function exists. As probability theory is used in diverse applications, terminology is not uniform. The following terms are used for probability distribution functions, Distribution. Probability distribution, is a table that displays the probabilities of outcomes in a sample. Could be called a frequency distribution table, where all occurrences of outcomes sum to 1. Distribution function, is a form of frequency distribution table. Probability distribution function, is a form of probability distribution table