1.
Real number
–
In mathematics, a real number is a value that represents a quantity along a line. The adjective real in this context was introduced in the 17th century by René Descartes, the real numbers include all the rational numbers, such as the integer −5 and the fraction 4/3, and all the irrational numbers, such as √2. Included within the irrationals are the numbers, such as π. Real numbers can be thought of as points on a long line called the number line or real line. Any real number can be determined by a possibly infinite decimal representation, such as that of 8.632, the real line can be thought of as a part of the complex plane, and complex numbers include real numbers. These descriptions of the numbers are not sufficiently rigorous by the modern standards of pure mathematics. All these definitions satisfy the definition and are thus equivalent. The statement that there is no subset of the reals with cardinality greater than ℵ0. Simple fractions were used by the Egyptians around 1000 BC, the Vedic Sulba Sutras in, c.600 BC, around 500 BC, the Greek mathematicians led by Pythagoras realized the need for irrational numbers, in particular the irrationality of the square root of 2. Arabic mathematicians merged the concepts of number and magnitude into a general idea of real numbers. In the 16th century, Simon Stevin created the basis for modern decimal notation, in the 17th century, Descartes introduced the term real to describe roots of a polynomial, distinguishing them from imaginary ones. In the 18th and 19th centuries, there was work on irrational and transcendental numbers. Johann Heinrich Lambert gave the first flawed proof that π cannot be rational, Adrien-Marie Legendre completed the proof, Évariste Galois developed techniques for determining whether a given equation could be solved by radicals, which gave rise to the field of Galois theory. Charles Hermite first proved that e is transcendental, and Ferdinand von Lindemann, lindemanns proof was much simplified by Weierstrass, still further by David Hilbert, and has finally been made elementary by Adolf Hurwitz and Paul Gordan. The development of calculus in the 18th century used the set of real numbers without having defined them cleanly. The first rigorous definition was given by Georg Cantor in 1871, in 1874, he showed that the set of all real numbers is uncountably infinite but the set of all algebraic numbers is countably infinite. Contrary to widely held beliefs, his first method was not his famous diagonal argument, the real number system can be defined axiomatically up to an isomorphism, which is described hereafter. Another possibility is to start from some rigorous axiomatization of Euclidean geometry, from the structuralist point of view all these constructions are on equal footing

2.
Probability density function
–
In a more precise sense, the PDF is used to specify the probability of the random variable falling within a particular range of values, as opposed to taking on any one value. The probability density function is everywhere, and its integral over the entire space is equal to one. The terms probability distribution function and probability function have also sometimes used to denote the probability density function. However, this use is not standard among probabilists and statisticians, further confusion of terminology exists because density function has also been used for what is here called the probability mass function. In general though, the PMF is used in the context of random variables. Suppose a species of bacteria typically lives 4 to 6 hours, what is the probability that a bacterium lives exactly 5 hours. A lot of bacteria live for approximately 5 hours, but there is no chance that any given bacterium dies at exactly 5.0000000000, instead we might ask, What is the probability that the bacterium dies between 5 hours and 5.01 hours. Lets say the answer is 0.02, next, What is the probability that the bacterium dies between 5 hours and 5.001 hours. The answer is probably around 0.002, since this is 1/10th of the previous interval, the probability that the bacterium dies between 5 hours and 5.0001 hours is probably about 0.0002, and so on. In these three examples, the ratio / is approximately constant, and equal to 2 per hour, for example, there is 0.02 probability of dying in the 0. 01-hour interval between 5 and 5.01 hours, and =2 hour−1. This quantity 2 hour−1 is called the probability density for dying at around 5 hours, therefore, in response to the question What is the probability that the bacterium dies at 5 hours. A literally correct but unhelpful answer is 0, but an answer can be written as dt. This is the probability that the bacterium dies within a window of time around 5 hours. For example, the probability that it lives longer than 5 hours, there is a probability density function f with f =2 hour−1. The integral of f over any window of time is the probability that the dies in that window. A probability density function is most commonly associated with absolutely continuous univariate distributions, a random variable X has density fX, where fX is a non-negative Lebesgue-integrable function, if, Pr = ∫ a b f X d x. That is, f is any function with the property that. In the continuous univariate case above, the measure is the Lebesgue measure

3.
Cumulative distribution function
–
In the case of a continuous distribution, it gives the area under the probability density function from minus infinity to x. Cumulative distribution functions are used to specify the distribution of multivariate random variables. The probability that X lies in the semi-closed interval (a, b], in the definition above, the less than or equal to sign, ≤, is a convention, not a universally used one, but is important for discrete distributions. The proper use of tables of the binomial and Poisson distributions depends upon this convention, moreover, important formulas like Paul Lévys inversion formula for the characteristic function also rely on the less than or equal formulation. If treating several random variables X, Y. etc. the corresponding letters are used as subscripts while, if treating only one, the subscript is usually omitted. It is conventional to use a capital F for a distribution function, in contrast to the lower-case f used for probability density functions. This applies when discussing general distributions, some specific distributions have their own conventional notation, the CDF of a continuous random variable X can be expressed as the integral of its probability density function ƒX as follows, F X = ∫ − ∞ x f X d t. In the case of a random variable X which has distribution having a discrete component at a value b, P = F X − lim x → b − F X. If FX is continuous at b, this equals zero and there is no discrete component at b, every cumulative distribution function F is non-decreasing and right-continuous, which makes it a càdlàg function. Furthermore, lim x → − ∞ F =0, lim x → + ∞ F =1, the function f is equal to the derivative of F almost everywhere, and it is called the probability density function of the distribution of X. As an example, suppose X is uniformly distributed on the unit interval, then the CDF of X is given by F = {0, x <0 x,0 ≤ x <11, x ≥1. Suppose instead that X takes only the discrete values 0 and 1, then the CDF of X is given by F = {0, x <01 /2,0 ≤ x <11, x ≥1. Sometimes, it is useful to study the question and ask how often the random variable is above a particular level. This is called the cumulative distribution function or simply the tail distribution or exceedance. This has applications in statistical hypothesis testing, for example, because the one-sided p-value is the probability of observing a test statistic at least as extreme as the one observed. Thus, provided that the test statistic, T, has a continuous distribution, in survival analysis, F ¯ is called the survival function and denoted S, while the term reliability function is common in engineering. Properties For a non-negative continuous random variable having an expectation, Markovs inequality states that F ¯ ≤ E x, as x → ∞, F ¯ →0, and in fact F ¯ = o provided that E is finite. This form of illustration emphasises the median and dispersion of the distribution or of the empirical results, if the CDF F is strictly increasing and continuous then F −1, p ∈, is the unique real number x such that F = p

4.
Expected value
–
In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. For example, the value in rolling a six-sided die is 3.5. Less roughly, the law of large states that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions approaches infinity. The expected value is known as the expectation, mathematical expectation, EV, average, mean value, mean. More practically, the value of a discrete random variable is the probability-weighted average of all possible values. In other words, each value the random variable can assume is multiplied by its probability of occurring. The same principle applies to a random variable, except that an integral of the variable with respect to its probability density replaces the sum. The expected value does not exist for random variables having some distributions with large tails, for random variables such as these, the long-tails of the distribution prevent the sum/integral from converging. The expected value is a key aspect of how one characterizes a probability distribution, by contrast, the variance is a measure of dispersion of the possible values of the random variable around the expected value. The variance itself is defined in terms of two expectations, it is the value of the squared deviation of the variables value from the variables expected value. The expected value plays important roles in a variety of contexts, in regression analysis, one desires a formula in terms of observed data that will give a good estimate of the parameter giving the effect of some explanatory variable upon a dependent variable. The formula will give different estimates using different samples of data, a formula is typically considered good in this context if it is an unbiased estimator—that is, if the expected value of the estimate can be shown to equal the true value of the desired parameter. In decision theory, and in particular in choice under uncertainty, one example of using expected value in reaching optimal decisions is the Gordon–Loeb model of information security investment. According to the model, one can conclude that the amount a firm spends to protect information should generally be only a fraction of the expected loss. Suppose random variable X can take value x1 with probability p1, value x2 with probability p2, then the expectation of this random variable X is defined as E = x 1 p 1 + x 2 p 2 + ⋯ + x k p k. If all outcomes xi are equally likely, then the weighted average turns into the simple average and this is intuitive, the expected value of a random variable is the average of all values it can take, thus the expected value is what one expects to happen on average. If the outcomes xi are not equally probable, then the simple average must be replaced with the weighted average, the intuition however remains the same, the expected value of X is what one expects to happen on average. Let X represent the outcome of a roll of a fair six-sided die, more specifically, X will be the number of pips showing on the top face of the die after the toss

5.
Variance
–
The variance has a central role in statistics. It is used in statistics, statistical inference, hypothesis testing, goodness of fit. This makes it a central quantity in numerous such as physics, biology, chemistry, cryptography, economics. The variance of a random variable X is the value of the squared deviation from the mean of X, μ = E . This definition encompasses random variables that are generated by processes that are discrete, continuous, neither, the variance can also be thought of as the covariance of a random variable with itself, Var = Cov . The variance is also equivalent to the second cumulant of a probability distribution that generates X, the variance is typically designated as Var , σ X2, or simply σ2. On computational floating point arithmetic, this equation should not be used, if a continuous distribution does not have an expected value, as is the case for the Cauchy distribution, it does not have a variance either. Many other distributions for which the value does exist also do not have a finite variance because the integral in the variance definition diverges. An example is a Pareto distribution whose index k satisfies 1 < k ≤2. e, the normal distribution with parameters μ and σ is a continuous distribution whose probability density function is given by f =12 π σ2 e −22 σ2. In this distribution, E = μ and the variance Var is related with σ via Var = ∫ − ∞ ∞22 π σ2 e −22 σ2 d x = σ2. The role of the distribution in the central limit theorem is in part responsible for the prevalence of the variance in probability. The exponential distribution with parameter λ is a distribution whose support is the semi-infinite interval. Its probability density function is given by f = λ e − λ x, the variance is equal to Var = ∫0 ∞2 λ e − λ x d x = λ −2. So for an exponentially distributed random variable, σ2 = μ2, the Poisson distribution with parameter λ is a discrete distribution for k =0,1,2, …. Its probability mass function is given by p = λ k k, E − λ, and it has expected value μ = λ. The variance is equal to Var = ∑ k =0 ∞ λ k k, E − λ2 = λ, So for a Poisson-distributed random variable, σ2 = μ. The binomial distribution with n and p is a discrete distribution for k =0,1,2, …, n. Its probability mass function is given by p = p k n − k, the variance is equal to Var = ∑ k =0 n p k n − k 2 = n p

6.
Probability theory
–
Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. It is not possible to predict precisely results of random events, two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, a great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. Christiaan Huygens published a book on the subject in 1657 and in the 19th century, initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial. Eventually, analytical considerations compelled the incorporation of continuous variables into the theory and this culminated in modern probability theory, on foundations laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of space, introduced by Richard von Mises. This became the mostly undisputed axiomatic basis for modern probability theory, most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory-based treatment of probability covers the discrete, continuous, consider an experiment that can produce a number of outcomes. The set of all outcomes is called the space of the experiment. The power set of the space is formed by considering all different collections of possible results. For example, rolling an honest die produces one of six possible results, one collection of possible results corresponds to getting an odd number. Thus, the subset is an element of the set of the sample space of die rolls. In this case, is the event that the die falls on some odd number, If the results that actually occur fall in a given event, that event is said to have occurred. Probability is a way of assigning every event a value between zero and one, with the requirement that the event made up of all possible results be assigned a value of one, the probability that any one of the events, or will occur is 5/6. This is the same as saying that the probability of event is 5/6 and this event encompasses the possibility of any number except five being rolled. The mutually exclusive event has a probability of 1/6, and the event has a probability of 1, discrete probability theory deals with events that occur in countable sample spaces. Modern definition, The modern definition starts with a finite or countable set called the sample space, which relates to the set of all possible outcomes in classical sense, denoted by Ω

7.
Weibull distribution
–
In probability theory and statistics, the Weibull distribution /ˈveɪbʊl/ is a continuous probability distribution. Its complementary cumulative distribution function is an exponential function. The Weibull distribution is related to a number of probability distributions, in particular. If the quantity X is a time-to-failure, the Weibull distribution gives a distribution for which the rate is proportional to a power of time. The shape parameter, k, is that power plus one and this happens if there is significant infant mortality, or defective items failing early and the failure rate decreasing over time as the defective items are weeded out of the population. This might suggest random external events are causing mortality, or failure, the Weibull distribution reduces to an exponential distribution, A value of k >1 indicates that the failure rate increases with time. This happens if there is a process, or parts that are more likely to fail as time goes on. In the context of the diffusion of innovations, this means positive word of mouth, the function is first concave, then convex with an inflexion point at / e 1 / k, k >1. In the field of science, the shape parameter k of a distribution of strengths is known as the Weibull modulus. In the context of diffusion of innovations, the Weibull distribution is a pure imitation/rejection model, in medical statistics a different parameterization is used. The shape parameter k is the same as above and the parameter is b = λ. For x ≥0 the hazard function is h = b k x k −1, a third parameterization is sometimes used. In this the shape parameter k is the same as above, the form of the density function of the Weibull distribution changes drastically with the value of k. For 0 < k <1, the density function tends to ∞ as x approaches zero from above and is strictly decreasing, for k =1, the density function tends to 1/λ as x approaches zero from above and is strictly decreasing. For k >1, the density function tends to zero as x approaches zero from above, increases until its mode, for k =2 the density has a finite positive slope at x =0. As k goes to infinity, the Weibull distribution converges to a Dirac delta distribution centered at x = λ, moreover, the skewness and coefficient of variation depend only on the shape parameter. The cumulative distribution function for the Weibull distribution is F =1 − e − k for x ≥0, the quantile function for the Weibull distribution is Q = λ1 / k for 0 ≤ p <1. The failure rate h is given by h = k λ k −1, the moment generating function of the logarithm of a Weibull distributed random variable is given by E = λ t Γ where Γ is the gamma function

8.
Mean
–
In mathematics, mean has several different definitions depending on the context. An analogous formula applies to the case of a probability distribution. Not every probability distribution has a mean, see the Cauchy distribution for an example. Moreover, for some distributions the mean is infinite, for example, the arithmetic mean of a set of numbers x1, x2. Xn is typically denoted by x ¯, pronounced x bar, if the data set were based on a series of observations obtained by sampling from a statistical population, the arithmetic mean is termed the sample mean to distinguish it from the population mean. For a finite population, the mean of a property is equal to the arithmetic mean of the given property while considering every member of the population. For example, the mean height is equal to the sum of the heights of every individual divided by the total number of individuals. The sample mean may differ from the mean, especially for small samples. The law of large numbers dictates that the larger the size of the sample, outside of probability and statistics, a wide range of other notions of mean are often used in geometry and analysis, examples are given below. The geometric mean is an average that is useful for sets of numbers that are interpreted according to their product. X ¯ =1 n For example, the mean of five values,4,36,45,50,75 is,1 /5 =243000005 =30. The harmonic mean is an average which is useful for sets of numbers which are defined in relation to some unit, for example speed. AM, GM, and HM satisfy these inequalities, A M ≥ G M ≥ H M Equality holds if, in descriptive statistics, the mean may be confused with the median, mode or mid-range, as any of these may be called an average. The mean of a set of observations is the average of the values, however, for skewed distributions. For example, mean income is typically skewed upwards by a number of people with very large incomes. By contrast, the income is the level at which half the population is below. The mode income is the most likely income, and favors the larger number of people with lower incomes, the mean of a probability distribution is the long-run arithmetic average value of a random variable having that distribution. In this context, it is known as the expected value

9.
Extreme value theory
–
Extreme value theory or extreme value analysis is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a random variable. Extreme value analysis is used in many disciplines, such as structural engineering, finance, earth sciences, traffic prediction. For example, EVA might be used in the field of hydrology to estimate the probability of an unusually large flooding event, similarly, for the design of a breakwater, a coastal engineer would seek to estimate the 50-year wave and design the structure accordingly. Two approaches exist for practical extreme value analysis, the first method relies on deriving block maxima series as a preliminary step. In many situations it is customary and convenient to extract the annual maxima, the second method relies on extracting, from a continuous record, the peak values reached for any period during which values exceed a certain threshold. This method is referred to as the Peak Over Threshold method. For AMS data, the analysis may partly rely on the results of the Fisher–Tippett–Gnedenko theorem, however, in practice, various procedures are applied to select between a wider range of distributions. The theorem here relates to the distributions for the minimum or the maximum of a very large collection of independent random variables from the same distribution. For POT data, the analysis may involve fitting two distributions, one for the number of events in a period considered and a second for the size of the exceedances. A common assumption for the first is the Poisson distribution, with the generalized Pareto distribution being used for the exceedances, a tail-fitting can be based on the Pickands–Balkema–de Haan theorem. Novak reserves the term “POT method” to the case where the threshold is non-random, pipeline failures due to pitting corrosion. Anomalous IT network traffic, prevent attackers from reaching important data The field of value theory was pioneered by Leonard Tippett. Tippett was employed by the British Cotton Industry Research Association, where he worked to make cotton thread stronger, in his studies, he realized that the strength of a thread was controlled by the strength of its weakest fibres. With the help of R. A. Fisher, Tippet obtained three asymptotic limits describing the distributions of extremes, emil Julius Gumbel codified this theory in his 1958 book Statistics of Extremes, including the Gumbel distributions that bear his name. A summary of important publications relating to extreme value theory can be found on the article List of publications in statistics. Let X1, …, X n be a sequence of independent and identically distributed variables with cumulative distribution function F, in theory, the exact distribution of the maximum can be derived, Pr = Pr = Pr ⋯ Pr = n. The associated indicator function I n = I is a Bernoulli process with a probability p = that depends on the magnitude z of the extreme event