1.
Probability density function
–
In a more precise sense, the PDF is used to specify the probability of the random variable falling within a particular range of values, as opposed to taking on any one value. The probability density function is everywhere, and its integral over the entire space is equal to one. The terms probability distribution function and probability function have also sometimes used to denote the probability density function. However, this use is not standard among probabilists and statisticians, further confusion of terminology exists because density function has also been used for what is here called the probability mass function. In general though, the PMF is used in the context of random variables. Suppose a species of bacteria typically lives 4 to 6 hours, what is the probability that a bacterium lives exactly 5 hours. A lot of bacteria live for approximately 5 hours, but there is no chance that any given bacterium dies at exactly 5.0000000000, instead we might ask, What is the probability that the bacterium dies between 5 hours and 5.01 hours. Lets say the answer is 0.02, next, What is the probability that the bacterium dies between 5 hours and 5.001 hours. The answer is probably around 0.002, since this is 1/10th of the previous interval, the probability that the bacterium dies between 5 hours and 5.0001 hours is probably about 0.0002, and so on. In these three examples, the ratio / is approximately constant, and equal to 2 per hour, for example, there is 0.02 probability of dying in the 0. 01-hour interval between 5 and 5.01 hours, and =2 hour−1. This quantity 2 hour−1 is called the probability density for dying at around 5 hours, therefore, in response to the question What is the probability that the bacterium dies at 5 hours. A literally correct but unhelpful answer is 0, but an answer can be written as dt. This is the probability that the bacterium dies within a window of time around 5 hours. For example, the probability that it lives longer than 5 hours, there is a probability density function f with f =2 hour−1. The integral of f over any window of time is the probability that the dies in that window. A probability density function is most commonly associated with absolutely continuous univariate distributions, a random variable X has density fX, where fX is a non-negative Lebesgue-integrable function, if, Pr = ∫ a b f X d x. That is, f is any function with the property that. In the continuous univariate case above, the measure is the Lebesgue measure
2.
Expected value
–
In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. For example, the value in rolling a six-sided die is 3.5. Less roughly, the law of large states that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions approaches infinity. The expected value is known as the expectation, mathematical expectation, EV, average, mean value, mean. More practically, the value of a discrete random variable is the probability-weighted average of all possible values. In other words, each value the random variable can assume is multiplied by its probability of occurring. The same principle applies to a random variable, except that an integral of the variable with respect to its probability density replaces the sum. The expected value does not exist for random variables having some distributions with large tails, for random variables such as these, the long-tails of the distribution prevent the sum/integral from converging. The expected value is a key aspect of how one characterizes a probability distribution, by contrast, the variance is a measure of dispersion of the possible values of the random variable around the expected value. The variance itself is defined in terms of two expectations, it is the value of the squared deviation of the variables value from the variables expected value. The expected value plays important roles in a variety of contexts, in regression analysis, one desires a formula in terms of observed data that will give a good estimate of the parameter giving the effect of some explanatory variable upon a dependent variable. The formula will give different estimates using different samples of data, a formula is typically considered good in this context if it is an unbiased estimator—that is, if the expected value of the estimate can be shown to equal the true value of the desired parameter. In decision theory, and in particular in choice under uncertainty, one example of using expected value in reaching optimal decisions is the Gordon–Loeb model of information security investment. According to the model, one can conclude that the amount a firm spends to protect information should generally be only a fraction of the expected loss. Suppose random variable X can take value x1 with probability p1, value x2 with probability p2, then the expectation of this random variable X is defined as E = x 1 p 1 + x 2 p 2 + ⋯ + x k p k. If all outcomes xi are equally likely, then the weighted average turns into the simple average and this is intuitive, the expected value of a random variable is the average of all values it can take, thus the expected value is what one expects to happen on average. If the outcomes xi are not equally probable, then the simple average must be replaced with the weighted average, the intuition however remains the same, the expected value of X is what one expects to happen on average. Let X represent the outcome of a roll of a fair six-sided die, more specifically, X will be the number of pips showing on the top face of the die after the toss
3.
Mode (statistics)
–
The mode is the value that appears most often in a set of data. The mode of a probability distribution is the value x at which its probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled, the mode of a continuous probability distribution is the value x at which its probability density function has its maximum value, so the mode is at the peak. Like the statistical mean and median, the mode is a way of expressing, in a single number, the numerical value of the mode is the same as that of the mean and median in a normal distribution, and it may be very different in highly skewed distributions. The mode is not necessarily unique to a distribution, since the probability mass function or probability density function may take the same maximum value at several points x1, x2. The most extreme case occurs in uniform distributions, where all values occur equally frequently, when a probability density function has multiple local maxima it is common to refer to all of the local maxima as modes of the distribution. Such a continuous distribution is called multimodal, in symmetric unimodal distributions, such as the normal distribution, the mean, median and mode all coincide. For samples, if it is known that they are drawn from a symmetric distribution, the mode of a sample is the element that occurs most often in the collection. For example, the mode of the sample is 6, given the list of data the mode is not unique - the dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal. For a sample from a distribution, such as, the concept is unusable in its raw form. The mode is then the value where the histogram reaches its peak, the following MATLAB code example computes the mode of a sample, The algorithm requires as a first step to sort the sample in ascending order. It then computes the derivative of the sorted list. Unlike mean and median, the concept of mode makes sense for nominal data. For example, taking a sample of Korean family names, one might find that Kim occurs more often than any other name, then Kim would be the mode of the sample. In any voting system where a plurality victory, a single modal value determines the victor. Unlike median, the concept of mode makes sense for any random variable assuming values from a space, including the real numbers. For example, a distribution of points in the plane will typically have a mean and a mode, the median makes sense when there is a linear order on the possible values. Generalizations of the concept of median to higher-dimensional spaces are the geometric median, for the remainder, the assumption is that we have a real-valued random variable
4.
Variance
–
The variance has a central role in statistics. It is used in statistics, statistical inference, hypothesis testing, goodness of fit. This makes it a central quantity in numerous such as physics, biology, chemistry, cryptography, economics. The variance of a random variable X is the value of the squared deviation from the mean of X, μ = E . This definition encompasses random variables that are generated by processes that are discrete, continuous, neither, the variance can also be thought of as the covariance of a random variable with itself, Var = Cov . The variance is also equivalent to the second cumulant of a probability distribution that generates X, the variance is typically designated as Var , σ X2, or simply σ2. On computational floating point arithmetic, this equation should not be used, if a continuous distribution does not have an expected value, as is the case for the Cauchy distribution, it does not have a variance either. Many other distributions for which the value does exist also do not have a finite variance because the integral in the variance definition diverges. An example is a Pareto distribution whose index k satisfies 1 < k ≤2. e, the normal distribution with parameters μ and σ is a continuous distribution whose probability density function is given by f =12 π σ2 e −22 σ2. In this distribution, E = μ and the variance Var is related with σ via Var = ∫ − ∞ ∞22 π σ2 e −22 σ2 d x = σ2. The role of the distribution in the central limit theorem is in part responsible for the prevalence of the variance in probability. The exponential distribution with parameter λ is a distribution whose support is the semi-infinite interval. Its probability density function is given by f = λ e − λ x, the variance is equal to Var = ∫0 ∞2 λ e − λ x d x = λ −2. So for an exponentially distributed random variable, σ2 = μ2, the Poisson distribution with parameter λ is a discrete distribution for k =0,1,2, …. Its probability mass function is given by p = λ k k, E − λ, and it has expected value μ = λ. The variance is equal to Var = ∑ k =0 ∞ λ k k, E − λ2 = λ, So for a Poisson-distributed random variable, σ2 = μ. The binomial distribution with n and p is a discrete distribution for k =0,1,2, …, n. Its probability mass function is given by p = p k n − k, the variance is equal to Var = ∑ k =0 n p k n − k 2 = n p
5.
Moment-generating function
–
In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of a route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the functions of distributions defined by the weighted sums of random variables. Note, however, that not all variables have moment-generating functions. In addition to real-valued distributions, moment-generating functions can be defined for vector- or matrix-valued random variables, the moment-generating function of a real-valued distribution does not always exist, unlike the characteristic function. There are relations between the behavior of the function of a distribution and properties of the distribution, such as the existence of moments. In probability theory and statistics, the function of a random variable X is M X, = E, t ∈ R. In other terms, the function can be interpreted as the expectation of the random variable e t X. M X always exists and is equal to 1. A key problem with moment-generating functions is that moments and the function may not exist. By contrast, the function or Fourier transform always exists. More generally, where X = T, a random vector. The reason for defining this function is that it can be used to all the moments of the distribution. The series expansion of etX is, e t X =1 + t X + t 2 X22, hence, M X = E =1 + t E + t 2 E2. + ⋯ + t n E n. + ⋯ =1 + t m 1 + t 2 m 22, + ⋯ + t n m n n. + ⋯, where mn is the nth moment. Differentiating MX i times with respect to t and setting t =0 we hence obtain the ith moment about the origin, mi, here are some examples of the moment generating function and the characteristic function for comparison. It can be seen that the function is a Wick rotation of the moment generating function MX when the latter exists. Note that for the case where X has a probability density function ƒ. M X = ∫ − ∞ ∞ e t x f d x = ∫ − ∞ ∞ f d x =1 + t m 1 + t 2 m 22
6.
Characteristic function (probability theory)
–
In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the function is the Fourier transform of the probability density function. Thus it provides the basis of a route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the functions of distributions defined by the weighted sums of random variables. In addition to univariate distributions, characteristic functions can be defined for vector or matrix-valued random variables, the characteristic function always exists when treated as a function of a real-valued argument, unlike the moment-generating function. There are relations between the behavior of the function of a distribution and properties of the distribution, such as the existence of moments. The characteristic function provides a way for describing a random variable. However, in cases, there can be differences in whether these functions can be represented as expressions involving simple standard functions. If a random variable admits a density function, then the function is its dual. If a random variable has a function, then the domain of the characteristic function can be extended to the complex plane. Note however that the function of a distribution always exists. Another important application is to the theory of the decomposability of random variables, QX is the inverse cumulative distribution function of X also called the quantile function of X. It should be noted though, that this convention for the constants appearing in the definition of the function differs from the usual convention for the Fourier transform. For example, some authors define φX = Ee−2πitX, which is essentially a change of parameter, other notation may be encountered in the literature, p ^ as the characteristic function for a probability measure p, or f ^ as the characteristic function corresponding to a density f. The notion of characteristic functions generalizes to multivariate random variables and more complicated random elements, the argument of the characteristic function will always belong to the continuous dual of the space where random variable X takes values. Here T denotes matrix transpose, tr — the matrix trace operator, Re is the part of a complex number, z denotes complex conjugate. Oberhettinger provides extensive tables of characteristic functions, the characteristic function of a real-valued random variable always exists, since it is an integral of a bounded continuous function over a space whose measure is finite. A characteristic function is continuous on the entire space It is non-vanishing in a region around zero
7.
Probability theory
–
Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. It is not possible to predict precisely results of random events, two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, a great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. Christiaan Huygens published a book on the subject in 1657 and in the 19th century, initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial. Eventually, analytical considerations compelled the incorporation of continuous variables into the theory and this culminated in modern probability theory, on foundations laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of space, introduced by Richard von Mises. This became the mostly undisputed axiomatic basis for modern probability theory, most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory-based treatment of probability covers the discrete, continuous, consider an experiment that can produce a number of outcomes. The set of all outcomes is called the space of the experiment. The power set of the space is formed by considering all different collections of possible results. For example, rolling an honest die produces one of six possible results, one collection of possible results corresponds to getting an odd number. Thus, the subset is an element of the set of the sample space of die rolls. In this case, is the event that the die falls on some odd number, If the results that actually occur fall in a given event, that event is said to have occurred. Probability is a way of assigning every event a value between zero and one, with the requirement that the event made up of all possible results be assigned a value of one, the probability that any one of the events, or will occur is 5/6. This is the same as saying that the probability of event is 5/6 and this event encompasses the possibility of any number except five being rolled. The mutually exclusive event has a probability of 1/6, and the event has a probability of 1, discrete probability theory deals with events that occur in countable sample spaces. Modern definition, The modern definition starts with a finite or countable set called the sample space, which relates to the set of all possible outcomes in classical sense, denoted by Ω
8.
Statistics
–
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e. g. a scientific, industrial, or social problem, populations can be diverse topics such as all people living in a country or every atom composing a crystal. Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys, statistician Sir Arthur Lyon Bowley defines statistics as Numerical statements of facts in any department of inquiry placed in relation to each other. When census data cannot be collected, statisticians collect data by developing specific experiment designs, representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. In contrast, an observational study does not involve experimental manipulation, inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two data sets, or a data set and a synthetic data drawn from idealized model. A hypothesis is proposed for the relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the hypothesis is done using statistical tests that quantify the sense in which the null can be proven false. Working from a hypothesis, two basic forms of error are recognized, Type I errors and Type II errors. Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis, measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random or systematic, the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics continues to be an area of research, for example on the problem of how to analyze Big data. Statistics is a body of science that pertains to the collection, analysis, interpretation or explanation. Some consider statistics to be a mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty, mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure-theoretic probability theory. In applying statistics to a problem, it is practice to start with a population or process to be studied. Populations can be diverse topics such as all living in a country or every atom composing a crystal. Ideally, statisticians compile data about the entire population and this may be organized by governmental statistical institutes
9.
Probability distribution
–
For instance, if the random variable X is used to denote the outcome of a coin toss, then the probability distribution of X would take the value 0.5 for X = heads, and 0.5 for X = tails. In more technical terms, the probability distribution is a description of a phenomenon in terms of the probabilities of events. Examples of random phenomena can include the results of an experiment or survey, a probability distribution is defined in terms of an underlying sample space, which is the set of all possible outcomes of the random phenomenon being observed. The sample space may be the set of numbers or a higher-dimensional vector space, or it may be a list of non-numerical values, for example. Probability distributions are divided into two classes. A discrete probability distribution can be encoded by a discrete list of the probabilities of the outcomes, on the other hand, a continuous probability distribution is typically described by probability density functions. The normal distribution represents a commonly encountered continuous probability distribution, more complex experiments, such as those involving stochastic processes defined in continuous time, may demand the use of more general probability measures. A probability distribution whose sample space is the set of numbers is called univariate. Important and commonly encountered univariate probability distributions include the distribution, the hypergeometric distribution. The multivariate normal distribution is a commonly encountered multivariate distribution, to define probability distributions for the simplest cases, one needs to distinguish between discrete and continuous random variables. For example, the probability that an object weighs exactly 500 g is zero. Continuous probability distributions can be described in several ways, the cumulative distribution function is the antiderivative of the probability density function provided that the latter function exists. As probability theory is used in diverse applications, terminology is not uniform. The following terms are used for probability distribution functions, Distribution. Probability distribution, is a table that displays the probabilities of outcomes in a sample. Could be called a frequency distribution table, where all occurrences of outcomes sum to 1. Distribution function, is a form of frequency distribution table. Probability distribution function, is a form of probability distribution table