Beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval parametrized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution. It is a special case of the Dirichlet distribution; the beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. In Bayesian inference, the beta distribution is the conjugate prior probability distribution for the Bernoulli, negative binomial and geometric distributions. For example, the beta distribution can be used in Bayesian analysis to describe initial knowledge concerning probability of success such as the probability that a space vehicle will complete a specified mission; the beta distribution is a suitable model for the random behavior of proportions. The usual formulation of the beta distribution is known as the beta distribution of the first kind, whereas beta distribution of the second kind is an alternative name for the beta prime distribution.
The probability density function of the beta distribution, for 0 ≤ x ≤ 1, shape parameters α, β > 0, is a power function of the variable x and of its reflection as follows: f = c o n s t a n t ⋅ x α − 1 β − 1 = x α − 1 β − 1 ∫ 0 1 u α − 1 β − 1 d u = Γ Γ Γ x α − 1 β − 1 = 1 B x α − 1 β − 1 where Γ is the gamma function. The beta function, B, is a normalization constant to ensure that the total probability is 1. In the above equations x is a realization—an observed value that occurred—of a random process X; this definition includes both ends x = 0 and x = 1, consistent with definitions for other continuous distributions supported on a bounded interval which are special cases of the beta distribution, for example the arcsine distribution, consistent with several authors, like N. L. Johnson and S. Kotz. However, the inclusion of x = 0 and x = 1 does not work for α, β < 1. Several authors, including N. L. Johnson and S. Kotz, use the symbols p and q for the shape parameters of the beta distribution, reminiscent of the symbols traditionally used for the parameters of the Bernoulli distribution, because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters α and β approach the value of zero.
In the following, a random variable X beta-distributed with parameters α and β will be denoted by: X ∼ Beta Other notations for beta-distributed random variables used in the statistical literature are X ∼ B e and
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be undefined. For a unimodal distribution, negative skew indicates that the tail is on the left side of the distribution, positive skew indicates that the tail is on the right. In cases where one tail is long but the other tail is fat, skewness does not obey a simple rule. For example, a zero value means. Consider the two distributions in the figure just below. Within each graph, the values on the right side of the distribution taper differently from the values on the left side; these tapering sides are called tails, they provide a visual means to determine which of the two kinds of skewness a distribution has: negative skew: The left tail is longer. The distribution is said to be left-skewed, left-tailed, or skewed to the left, despite the fact that the curve itself appears to be skewed or leaning to the right. A left-skewed distribution appears as a right-leaning curve.
Positive skew: The right tail is longer. The distribution is said to be right-skewed, right-tailed, or skewed to the right, despite the fact that the curve itself appears to be skewed or leaning to the left. A right-skewed distribution appears as a left-leaning curve. Skewness in a data series may sometimes be observed not only graphically but by simple inspection of the values. For instance, consider the numeric sequence, whose values are evenly distributed around a central value of 50. We can transform this sequence into a negatively skewed distribution by adding a value far below the mean, e.g.. We can make the sequence positively skewed by adding a value far above the mean, e.g.. The skewness is not directly related to the relationship between the mean and median: a distribution with negative skew can have its mean greater than or less than the median, for positive skew. In the older notion of nonparametric skew, defined as / σ, where μ is the mean, ν is the median, σ is the standard deviation, the skewness is defined in terms of this relationship: positive/right nonparametric skew means the mean is greater than the median, while negative/left nonparametric skew means the mean is less than the median.
However, the modern definition of skewness and the traditional nonparametric definition do not in general have the same sign: while they agree for some families of distributions, they differ in general, conflating them is misleading. If the distribution is symmetric the mean is equal to the median, the distribution has zero skewness. If the distribution is both symmetric and unimodal the mean = median = mode; this is the case of a coin toss or the series 1,2,3,4... Note, that the converse is not true in general, i.e. zero skewness does not imply that the mean is equal to the median. A 2005 journal article points out: Many textbooks, teach a rule of thumb stating that the mean is right of the median under right skew, left of the median under left skew; this rule fails with surprising frequency. It can fail in multimodal distributions, or in distributions where one tail is long but the other is heavy. Most though, the rule fails in discrete distributions where the areas to the left and right of the median are not equal.
Such distributions not only contradict the textbook relationship between mean and skew, they contradict the textbook interpretation of the median. The skewness of a random variable X is the third standardized moment γ1, defined as: γ 1 = E = μ 3 σ 3 = E 3 / 2 = κ 3 κ 2 3 / 2 where μ is the mean, σ is the standard deviation, E is the expectation operator, μ3 is the third central moment, κt are the tth
Statistics
Statistics is a branch of mathematics dealing with data collection, analysis and presentation. In applying statistics to, for example, a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model process to be studied. Populations can be diverse topics such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments. See glossary of probability and statistics; when census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements.
In contrast, an observational study does not involve experimental manipulation. Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, inferential statistics, which draw conclusions from data that are subject to random variation. Descriptive statistics are most concerned with two sets of properties of a distribution: central tendency seeks to characterize the distribution's central or typical value, while dispersion characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets.
Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized: Type I errors and Type II errors. Multiple problems have come to be associated with this framework: ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis. Measurement processes that generate statistical data are subject to error. Many of these errors are classified as random or systematic, but other types of errors can be important; the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics can be said to have begun in ancient civilization, going back at least to the 5th century BC, but it was not until the 18th century that it started to draw more from calculus and probability theory. In more recent years statistics has relied more on statistical software to produce tests such as descriptive analysis.
Some definitions are: Merriam-Webster dictionary defines statistics as "a branch of mathematics dealing with the collection, analysis and presentation of masses of numerical data." Statistician Arthur Lyon Bowley defines statistics as "Numerical statements of facts in any department of inquiry placed in relation to each other."Statistics is a mathematical body of science that pertains to the collection, interpretation or explanation, presentation of data, or as a branch of mathematics. Some consider statistics to be a distinct mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty and decision making in the face of uncertainty. Mathematical statistics is the application of mathematics to statistics. Mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, measure-theoretic probability theory.
In applying statistics to a problem, it is common practice to start with a population or process to be studied. Populations can be diverse topics such as "all people living in a country" or "every atom composing a crystal". Ideally, statisticians compile data about the entire population; this may be organized by governmental statistical institutes. Descriptive statistics can be used to summarize the population data. Numerical descriptors include mean and standard deviation for continuous data types, while frequency and percentage are more useful in terms of describing categorical data; when a census is not feasible, a chosen subset of the population called. Once a sample, representative of the population is determined, data is collected for the sample members in an observational or experimental setting. Again, descriptive statistics can be used to summarize the sample data. However, the drawing of the sample has been subject to an element of randomness, hence the established numerical descriptors from the sample are due to uncertainty.
To still draw meaningful conclusions about the entire population, in
Maxima and minima
In mathematical analysis, the maxima and minima of a function, known collectively as extrema, are the largest and smallest value of the function, either within a given range or on the entire domain of a function. Pierre de Fermat was one of the first mathematicians to propose a general technique, for finding the maxima and minima of functions; as defined in set theory, the maximum and minimum of a set are the greatest and least elements in the set, respectively. Unbounded infinite sets, such as the set of real numbers, have no maximum. A real-valued function f defined on a domain X has a global maximum point at x∗ if f ≥ f for all x in X. Similarly, the function has a global minimum point at x∗ if f ≤ f for all x in X; the value of the function at a maximum point is called the maximum value of the function and the value of the function at a minimum point is called the minimum value of the function. Symbolically, this can be written as follows: x 0 ∈ X is a global maximum point of function f: X → R if f ≥ f.
For global minimum point. If the domain X is a metric space f is said to have a local maximum point at the point x∗ if there exists some ε > 0 such that f ≥ f for all x in X within distance ε of x∗. The function has a local minimum point at x∗ if f ≤ f for all x in X within distance ε of x∗. A similar definition can be used when X is a topological space, since the definition just given can be rephrased in terms of neighbourhoods. Mathematically, the given definition is written as follows: Let be a metric space and function f: X → R. X 0 ∈ X is a local maximum point of function f if such that d X < ε ⟹ f ≥ f. For a local minimum point. In both the global and local cases, the concept of a strict extremum can be defined. For example, x∗ is a strict global maximum point if, for all x in X with x ≠ x∗, we have f > f, x∗ is a strict local maximum point if there exists some ε > 0 such that, for all x in X within distance ε of x∗ with x ≠ x∗, we have f > f. Note that a point is a strict global maximum point if and only if it is the unique global maximum point, for minimum points.
A continuous real-valued function with a compact domain always has a maximum point and a minimum point. An important example is a function. Finding global maxima and minima is the goal of mathematical optimization. If a function is continuous on a closed interval by the extreme value theorem global maxima and minima exist. Furthermore, a global maximum either must be a local maximum in the interior of the domain, or must lie on the boundary of the domain. So a method of finding a global maximum is to look at all the local maxima in the interior, look at the maxima of the points on the boundary, take the largest one; the most important, yet quite obvious, feature of continuous real-valued functions of a real variable is that they decrease before local minima and increase afterwards for maxima. A direct consequence of this is the Fermat's theorem, which states that local extrema must occur at critical points. One can distinguish whether a critical point is a local maximum or local minimum by using the first derivative test, second derivative test, or higher-order derivative test, given sufficient differentiability.
For any function, defined piecewise, one finds a maximum by finding the maximum of each piece separately, seeing which one is largest. The function x2 has a unique global minimum at x = 0; the function x3 has maxima. Although the first derivative is 0 at x = 0, this is an inflection point; the function x. The function x−x has a unique global maximum over the positive real numbers at x = 1/e; the function x3/3 − x has first derivative x2 − second derivative 2x. Setting the first derivative to 0 and solving for x gives stationary points at −1 and +1. From the sign o
Median
The median is the value separating the higher half from the lower half of a data sample. For a data set, it may be thought of as the "middle" value. For example, in the data set, the median is 6, the fourth largest, the fifth smallest, number in the sample. For a continuous probability distribution, the median is the value such that a number is likely to fall above or below it; the median is a used measure of the properties of a data set in statistics and probability theory. The basic advantage of the median in describing data compared to the mean is that it is not skewed so much by large or small values, so it may give a better idea of a "typical" value. For example, in understanding statistics like household income or assets which vary a mean may be skewed by a small number of high or low values. Median income, for example, may be a better way to suggest; because of this, the median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the median will not give an arbitrarily large or small result.
The median of a finite list of numbers can be found by arranging all the numbers from smallest to greatest. If there is an odd number of numbers, the middle one is picked. For example, consider the list of numbers 1, 3, 3, 6, 7, 8, 9This list contains seven numbers; the median is the fourth of them, 6. If there is an number of observations there is no single middle value. For example, in the data set 1, 2, 3, 4, 5, 6, 8, 9the median is the mean of the middle two numbers: this is / 2, 4.5.. The formula used to find the index of the middle number of a data set of n numerically ordered numbers is / 2; this either gives the halfway point between the two middle values. For example, with 14 values, the formula will give an index of 7.5, the median will be taken by averaging the seventh and eighth values. So the median can be represented by the following formula: m e d i a n = a ⌈ # x ÷ 2 ⌉ + a ⌈ # x ÷ 2 + 1 ⌉ 2 One can find the median using the Stem-and-Leaf Plot. There is no accepted standard notation for the median, but some authors represent the median of a variable x either as x͂ or as μ1/2 sometimes M.
In any of these cases, the use of these or other symbols for the median needs to be explicitly defined when they are introduced. The median is used for skewed distributions, which it summarizes differently from the arithmetic mean. Consider the multiset; the median is 2 in this case, it might be seen as a better indication of central tendency than the arithmetic mean of 4. The median is a popular summary statistic used in descriptive statistics, since it is simple to understand and easy to calculate, while giving a measure, more robust in the presence of outlier values than is the mean; the cited empirical relationship between the relative locations of the mean and the median for skewed distributions is, not true. There are, various relationships for the absolute difference between them. With an number of observations no value need be at the value of the median. Nonetheless, the value of the median is uniquely determined with the usual definition. A related concept, in which the outcome is forced to correspond to a member of the sample, is the medoid.
In a population, at most half have values less than the median and at most half have values greater than it. If each group contains less than half the population some of the population is equal to the median. For example, if a < b < c the median of the list is b, and, if a < b < c < d the median of the list is the mean of b and c. Indeed, as it is based on the middle data in a group, it is not necessary to know the value of extreme results in order to calculate a median. For example, in a psychology test investigating the time needed to solve a problem, if a small number of people failed to solve the problem at all in the given time a median can still be calculated; the median can be used as a measure of location when a distribution is skewed, when end-values are not known, or when one requires reduced importance to be attached to outliers, e.g. because they may be measurement errors. A median is only defined on ordered one-dimensional data, is independent of any distance metric. A geometric median, on the other hand, is defined in any number of dimensions.
The median is one of a number of ways
Kurtosis
In probability theory and statistics, kurtosis is a measure of the "tailedness" of the probability distribution of a real-valued random variable. In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution and, just as for skewness, there are different ways of quantifying it for a theoretical distribution and corresponding ways of estimating it from a sample from a population. Depending on the particular measure of kurtosis, used, there are various interpretations of kurtosis, of how particular measures should be interpreted; the standard measure of kurtosis, originating with Karl Pearson, is based on a scaled version of the fourth moment of the data or population. This number is related to the tails of the distribution, not its peak. For this measure, higher kurtosis is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations; the kurtosis of any univariate normal distribution is 3. It is common to compare the kurtosis of a distribution to this value.
Distributions with kurtosis less than 3 are said to be platykurtic, although this does not imply the distribution is "flat-topped" as sometimes reported. Rather, it means the distribution produces fewer and less extreme outliers than does the normal distribution. An example of a platykurtic distribution is the uniform distribution, which does not produce outliers. Distributions with kurtosis greater than 3 are said to be leptokurtic. An example of a leptokurtic distribution is the Laplace distribution, which has tails that asymptotically approach zero more than a Gaussian, therefore produces more outliers than the normal distribution, it is common practice to use an adjusted version of Pearson's kurtosis, the excess kurtosis, the kurtosis minus 3, to provide the comparison to the normal distribution. Some authors use "kurtosis" by itself to refer to the excess kurtosis. For the reason of clarity and generality, this article follows the non-excess convention and explicitly indicates where excess kurtosis is meant.
Alternative measures of kurtosis are: the L-kurtosis, a scaled version of the fourth L-moment. These are analogous to the alternative measures of skewness; the kurtosis is the fourth standardized moment, defined as Kurt = E = μ 4 σ 4 = E 2, where μ4 is the fourth central moment and σ is the standard deviation. Several letters are used in the literature to denote the kurtosis. A common choice is κ, fine as long as it is clear that it does not refer to a cumulant. Other choices include γ2, to be similar to the notation for skewness, although sometimes this is instead reserved for the excess kurtosis; the kurtosis is bounded below by the squared skewness plus 1: μ 4 σ 4 ≥ 2 + 1, where μ3 is the third central moment. The lower bound is realized by the Bernoulli distribution. There is no upper limit to the excess kurtosis of a general probability distribution, it may be infinite. A reason why some authors favor the excess kurtosis is. Formulas related to the extensive property are more expressed in terms of the excess kurtosis.
For example, let X1... Xn be independent random variables for which the fourth moment exists, let Y be the random variable defined by the sum of the Xi; the excess kurtosis of Y is Kurt − 3 = 1 2 ∑ i = 1 n σ i 4 ⋅, where σ i is the standard deviation of X i. In particular if all of the Xi have the same variance this simplifies to Kurt −
Dice
Dice are small throwable objects that can rest in multiple positions, used for generating random numbers. Dice are suitable as gambling devices for games like craps and are used in non-gambling tabletop games. A traditional die is a cube, with each of its six faces showing a different number of dots from one to six; when thrown or rolled, the die comes to rest showing on its upper surface a random integer from one to six, each value being likely. A variety of similar devices are described as dice, they may be used to produce results other than one through six. Loaded and crooked dice are designed to favor some results over others for purposes of cheating or amusement. A dice tray, a tray used to contain thrown dice, is sometimes used for gambling or board games, in particular to allow dice throws which do not interfere with other game pieces. Dice have been used since before recorded history, it is uncertain where they originated; the oldest known dice were excavated as part of a backgammon-like game set at the Burnt City, an archeological site in south-eastern Iran, estimated to be from between 2800–2500 BC.
Other excavations from ancient tombs in the Indus Valley civilization indicate a South Asian origin. The Egyptian game of Senet was played with dice. Senet was played before 3000 BC and up to the 2nd century AD, it was a racing game, but there is no scholarly consensus on the rules of Senet. Dicing is mentioned as an Indian game in the Rigveda and the early Buddhist games list. There are several biblical references to "casting lots", as in Psalm 22, indicating that dicing was commonplace when the psalm was composed, it is theorized that dice developed from the practice of fortunetelling with the talus of hoofed animals, colloquially known as "knucklebones", but knucklebones is not the oldest divination technique that incorporates randomness. Knucklebones was a game of skill played by children. Although gambling was illegal, many Romans were passionate gamblers who enjoyed dicing, known as aleam ludere. Dicing was a popular pastime of emperors. Letters by Augustus to Tacitus and his daughter recount his hobby of dicing.
There were two sizes of Roman dice. Tali were large dice inscribed with one, three and six on four sides. Tesserae were smaller dice with sides numbered from one to six. Twenty-sided dice date back to the 2nd century AD and from Ptolemaic Egypt as early as the 2nd century BC. Dominoes and playing cards originated in China as developments from dice; the transition from dice to playing cards occurred in China around the Tang dynasty, coincides with the technological transition from rolls of manuscripts to block printed books. In Japan, dice were used to play a popular game called sugoroku. There are two types of sugoroku. Ban-sugoroku is similar to backgammon and dates to the Heian period, while e-sugoroku is a racing game. Dice are thrown onto a surface either from a container designed for this; the face of the die, uppermost when it comes to rest provides the value of the throw. One typical dice game today is craps, where two dice are thrown and wagers are made on the total value of the two dice.
Dice are used to randomize moves in board games by deciding the distance through which a piece will move along the board. The result of a die roll is determined by the way it is thrown, according to the laws of classical mechanics. A die roll is made random by uncertainty in minor factors such as tiny movements in the thrower's hand. To mitigate concerns that the pips on the faces of certain styles of dice cause a small bias, casinos use precision dice with flush markings. Common dice are small cubes most 1.6 cm across, whose faces are numbered from one to six by patterns of round dots called pips. Opposite sides of a modern die traditionally add up to seven, implying that the 1, 2 and 3 faces share a vertex; the faces of a die may be placed counterclockwise about this vertex. If the 1, 2 and 3 faces run counterclockwise, the die is called "right-handed", if those faces run clockwise, the die is called "left-handed". Western dice are right-handed, Chinese dice are left-handed; the pips on dice are arranged in specific patterns.
Asian style dice bear similar patterns to Western ones, but the pips are closer to the center of the face. One possible explanation is. In some older sets, the "one" pip is a colorless depression. Non-precision dice are manufactured via the plastic injection molding process; the pips or numbers on the die are a part of the mold. The coloring for numbering is achieved by submerging the die in paint, allowed to dry; the die is polished via a tumble finishing process similar to rock polishing. The abrasive agent scrapes off all of the paint except for the indents of the numbering. A finer abrasive is used to polish the die; this process creates the smoother, rounded edges on the dice. Precision casino dice may have a polished or sand finish, making them transparent or translucent res