1.
Statistics
–
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e. g. a scientific, industrial, or social problem, populations can be diverse topics such as all people living in a country or every atom composing a crystal. Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys, statistician Sir Arthur Lyon Bowley defines statistics as Numerical statements of facts in any department of inquiry placed in relation to each other. When census data cannot be collected, statisticians collect data by developing specific experiment designs, representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. In contrast, an observational study does not involve experimental manipulation, inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two data sets, or a data set and a synthetic data drawn from idealized model. A hypothesis is proposed for the relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the hypothesis is done using statistical tests that quantify the sense in which the null can be proven false. Working from a hypothesis, two basic forms of error are recognized, Type I errors and Type II errors. Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis, measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random or systematic, the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics continues to be an area of research, for example on the problem of how to analyze Big data. Statistics is a body of science that pertains to the collection, analysis, interpretation or explanation. Some consider statistics to be a mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty, mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure-theoretic probability theory. In applying statistics to a problem, it is practice to start with a population or process to be studied. Populations can be diverse topics such as all living in a country or every atom composing a crystal. Ideally, statisticians compile data about the entire population and this may be organized by governmental statistical institutes

2.
Pie chart
–
A pie chart is a circular statistical graphic which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice, is proportional to the quantity it represents, while it is named for its resemblance to a pie which has been sliced, there are variations on the way it can be presented. The earliest known pie chart is generally credited to William Playfairs Statistical Breviary of 1801, Pie charts are very widely used in the business world and the mass media. Pie charts can be replaced in most cases by other such as the bar chart. The earliest known pie chart is generally credited to William Playfairs Statistical Breviary of 1801, playfair presented an illustration, which contained a series of pie charts. One of those charts depicting the proportions of the Turkish Empire located in Asia, Europe and this invention was not widely used at first, The French engineer Charles Joseph Minard was one of the first to use pie charts in 1858, in particular in maps. Minards map,1858 used pie charts to represent the cattle sent from all around France for consumption in Paris, playfair thought that pie charts were in need of a third dimension to add additional information. It has been said that Florence Nightingale invented it, though in fact she just popularised it, a 3d pie cake, or perspective pie cake, is used to give the chart a 3D look. The use of superfluous dimensions not used to display the data of interest is discouraged for charts in general, a doughnut chart is a variant of the pie chart, with a blank center allowing for additional information about the data as a whole to be included. A chart with one or more separated from the rest of the disk is known as an exploded pie chart. This effect is used to highlight a sector, or to highlight smaller segments of the chart with small proportions. The polar area diagram is similar to a pie chart, except sectors are equal angles. The polar area diagram is used to plot cyclic phenomena, for example, if the count of deaths in each month for a year are to be plotted then there will be 12 sectors all with the same angle of 30 degrees each. The radius of each sector would be proportional to the root of the death count for the month. Léon Lalanne later used a diagram to show the frequency of wind directions around compass points in 1843. The wind rose is used by meteorologists. Nightingale published her rose diagram in 1858, the name coxcomb is sometimes used erroneously, this was the name Nightingale used to refer to a book containing the diagrams rather than the diagrams themselves. A ring chart, also known as a sunburst chart or a pie chart, is used to visualize hierarchical data

3.
Life insurance
–
Depending on the contract, other events such as terminal illness or critical illness can also trigger payment. The policy holder typically pays a premium, either regularly or as one lump sum, other expenses can also be included in the benefits. Life policies are legal contracts and the terms of the contract describe the limitations of the insured events. Specific exclusions are often written into the contract to limit the liability of the insurer, common examples are claims relating to suicide, fraud, war, riot, and civil commotion. Life-based contracts tend to fall into two categories, Protection policies – designed to provide a benefit, typically a lump sum payment. A common form of a protection policy design is term insurance, investment policies – where the main objective is to facilitate the growth of capital by regular or single premiums. Common forms are whole life, universal life, and variable life policies, an early form of life insurance dates to Ancient Rome, burial clubs covered the cost of members funeral expenses and assisted survivors financially. The first company to offer insurance in modern times was the Amicable Society for a Perpetual Assurance Office, founded in London in 1706 by William Talbot. Each member made a payment per share on one to three shares with consideration to age of the members being twelve to fifty-five. At the end of the year a portion of the contribution was divided among the wives and children of deceased members. The Amicable Society started with 2000 members and he was unsuccessful in his attempts at procuring a charter from the government. His disciple, Edward Rowe Mores, was able to establish the Society for Equitable Assurances on Lives, Mores also gave the name actuary to the chief official - the earliest known reference to the position as a business concern. The first modern actuary was William Morgan, who served from 1775 to 1830, in 1776 the Society carried out the first actuarial valuation of liabilities and subsequently distributed the first reversionary bonus and interim bonus among its members. It also used regular valuations to balance competing interests, the Society sought to treat its members equitably and the Directors tried to ensure that policyholders received a fair return on their investments. Premiums were regulated according to age, and anybody could be admitted regardless of their state of health, the sale of life insurance in the U. S. began in the 1760s. Between 1787 and 1837 more than two dozen life insurance companies were started, but fewer than half a dozen survived. S. The person responsible for making payments for a policy is the policy owner, the owner and insured may or may not be the same person. For example, if Joe buys a policy on his own life, but if Jane, his wife, buys a policy on Joes life, she is the owner and he is the insured

4.
Biplot
–
Biplots are a type of exploratory graph used in statistics, a generalization of the simple two-variable scatterplot. A biplot allows information on both samples and variables of a matrix to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, in the case of categorical variables, category level points may be used to represent the levels of a categorical variable. A generalised biplot displays information on both continuous and categorical variables, the biplot was introduced by K. Ruben Gabriel. Gower and Hand wrote a monograph on biplots, Yan and Kang described various methods which can be used in order to visualize and interpret a biplot. The transformed data matrix Y is obtained from the original matrix X by centering and optionally standardizing the columns. Using the SVD, we can write Y = ∑k=1. pdkukvkT, where the uk are n-dimensional column vectors, the vk are p-dimensional column vectors, the biplot is formed from two scatterplots that share a common set of axes and have a between-set scalar product interpretation. The first scatterplot is formed from the points, for i =1. n, the second plot is formed from the points, for j =1. p. This is the biplot formed by the dominant two terms of the SVD, which can then be represented in a two-dimensional display, typical choices of α are 1 and 0, and in some rare cases α=1/2 to obtain a symmetrically scaled biplot. Available for free download ISBN 978-84-923846-8-6, with materials, the biplot graphic display of matrices with application to principal component analysis. Gower, J. C. Lubbe, S. and le Roux, ISBN 978-0-470-01255-0 Gower, J. C. and Hand, D. J. ISBN 0-412-71630-5 Yan, W. and Kang, M. S, ISBN 0-8493-1338-4 Demey, J. R. Vicente-Villardón, J. L. Galindo-Villardón, M. P. and Zambrano, A. Y. Identifying molecular markers associated with classification of genotypes by External Logistic Biplots

5.
Central limit theorem
–
If this procedure is performed many times, the central limit theorem says that the computed values of the average will be distributed according to the normal distribution. The central limit theorem has a number of variants, in its common form, the random variables must be identically distributed. In variants, convergence of the mean to the normal distribution also occurs for non-identical distributions or for non-independent observations, in more general usage, a central limit theorem is any of a set of weak-convergence theorems in probability theory. When the variance of the i. i. d, Variables is finite, the attractor distribution is the normal distribution. In contrast, the sum of a number of i. i. d, Random variables with power law tail distributions decreasing as | x |−α −1 where 0 < α <2 will tend to an alpha-stable distribution with stability parameter of α as the number of variables grows. Suppose we are interested in the sample average S n, = X1 + ⋯ + X n n of these random variables, by the law of large numbers, the sample averages converge in probability and almost surely to the expected value µ as n → ∞. The classical central limit theorem describes the size and the form of the stochastic fluctuations around the deterministic number µ during this convergence. For large enough n, the distribution of Sn is close to the distribution with mean µ. The usefulness of the theorem is that the distribution of √n approaches normality regardless of the shape of the distribution of the individual Xi, formally, the theorem can be stated as follows, Lindeberg–Lévy CLT. Suppose is a sequence of i. i. d, Random variables with E = µ and Var = σ2 < ∞. Then as n approaches infinity, the random variables √n converge in distribution to a normal N, n → d N. Note that the convergence is uniform in z in the sense that lim n → ∞ sup z ∈ R | Pr − Φ | =0, the theorem is named after Russian mathematician Aleksandr Lyapunov. In this variant of the limit theorem the random variables Xi have to be independent. The theorem also requires that random variables | Xi | have moments of order. Suppose is a sequence of independent random variables, each with finite expected value μi, in practice it is usually easiest to check Lyapunov’s condition for δ =1. If a sequence of random variables satisfies Lyapunov’s condition, then it also satisfies Lindeberg’s condition, the converse implication, however, does not hold. In the same setting and with the notation as above. Suppose that for every ε >0 lim n → ∞1 s n 2 ∑ i =1 n E =0 where 1 is the indicator function

6.
Kurtosis
–
In probability theory and statistics, kurtosis is a measure of the tailedness of the probability distribution of a real-valued random variable. Depending on the measure of kurtosis that is used, there are various interpretations of kurtosis. The standard measure of kurtosis, originating with Karl Pearson, is based on a version of the fourth moment of the data or population. This number is related to the tails of the distribution, not its peak, hence, for this measure, higher kurtosis is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. The kurtosis of any normal distribution is 3. It is common to compare the kurtosis of a distribution to this value, distributions with kurtosis less than 3 are said to be platykurtic, although this does not imply the distribution is flat-topped as sometimes reported. Rather, it means the distribution produces fewer and less extreme outliers than does the normal distribution, an example of a platykurtic distribution is the uniform distribution, which does not produce outliers. Distributions with kurtosis greater than 3 are said to be leptokurtic and it is also common practice to use an adjusted version of Pearsons kurtosis, the excess kurtosis, which is the kurtosis minus 3, to provide the comparison to the normal distribution. Some authors use kurtosis by itself to refer to the excess kurtosis, for the reason of clarity and generality, however, this article follows the non-excess convention and explicitly indicates where excess kurtosis is meant. Alternative measures of kurtosis are, the L-kurtosis, which is a version of the fourth L-moment. These are analogous to the measures of skewness that are not based on ordinary moments. The kurtosis is the fourth standardized moment, defined as Kurt = μ4 σ4 = E 2, several letters are used in the literature to denote the kurtosis. A very common choice is κ, which is fine as long as it is clear that it does not refer to a cumulant, other choices include γ2, to be similar to the notation for skewness, although sometimes this is instead reserved for the excess kurtosis. The kurtosis is bounded below by the squared skewness plus 1, μ4 σ4 ≥2 +1, the lower bound is realized by the Bernoulli distribution. There is no limit to the excess kurtosis of a general probability distribution. A reason why some authors favor the excess kurtosis is that cumulants are extensive, formulas related to the extensive property are more naturally expressed in terms of the excess kurtosis. Xn be independent random variables for which the fourth moment exists, the excess kurtosis of Y is Kurt −3 =12 ∑ i =1 n σ i 4 ⋅, where σ i is the standard deviation of X i. In particular if all of the Xi have the same variance, the reason not to subtract off 3 is that the bare fourth moment better generalizes to multivariate distributions, especially when independence is not assumed

7.
Scatter plot
–
A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are color-coded, one variable can be displayed. A scatter plot can be used either one continuous variable that is under the control of the experimenter. The measured or dependent variable is plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis, a scatter plot can suggest various kinds of correlations between variables with a certain confidence interval. For example, weight and height, weight would be on y axis, correlations may be positive, negative, or null. If the pattern of dots slopes from lower left to upper right, if the pattern of dots slopes from upper left to lower right, it indicates a negative correlation. A line of best fit can be drawn in order to study the relationship between the variables, an equation for the correlation between the variables can be determined by established best-fit procedures. For a linear correlation, the procedure is known as linear regression and is guaranteed to generate a correct solution in a finite time. No universal best-fit procedure is guaranteed to generate a solution for arbitrary relationships. A scatter plot is very useful when we wish to see how two comparable data sets agree with each other. In this case, an identity line, i. e. a y=x line, one of the most powerful aspects of a scatter plot, however, is its ability to show nonlinear relationships between variables. The ability to do this can be enhanced by adding a line such as LOESS. Furthermore, if the data are represented by a model of simple relationships. The scatter diagram is one of the seven basic tools of quality control, scatter charts can be built in the form of bubble, marker, or/and line charts. The researcher would then plot the data in a plot, assigning lung capacity to the horizontal axis. A person with a capacity of 400 cl who held his/her breath for 21.7 seconds would be represented by a single dot on the scatter plot at the point in the Cartesian coordinates. For a set of data variables X1, X2, xk, the scatter plot matrix shows all the pairwise scatter plots of the variables on a single view with multiple scatterplots in a matrix format

8.
Reliability engineering
–
Reliability engineering is engineering that emphasizes dependability in the lifecycle management of a product. Dependability, or reliability, describes the ability of a system or component to function under stated conditions for a period of time. Reliability may also describe the ability to function at a moment or interval of time. Reliability engineering represents a sub-discipline within systems engineering, Testability, Maintainability and maintenance are often defined as a part of reliability engineering in Reliability Programs. Reliability plays a key role in the cost-effectiveness of systems, Reliability engineering deals with the estimation, prevention and management of high levels of lifetime engineering uncertainty and risks of failure. Although stochastic parameters define and affect reliability, according to some authors on reliability engineering. You cannot really find a cause by only looking at statistics. Reliability engineering relates closely to safety engineering and to safety, in that they use common methods for their analysis. Reliability engineering focuses on costs of failure caused by system downtime, cost of spares, repair equipment, personnel, Safety engineering normally emphasizes not cost, but preserving life and nature, and therefore deals only with particular dangerous system-failure modes. High reliability levels also result from good engineering and from attention to detail, the word reliability can be traced back to 1816, by poet Coleridge. Before World War II the name has been linked mostly to repeatability, a test was considered reliable if the same results would be obtained repeatedly. The development of reliability engineering was here on a path with quality. The modern use of the reliability was defined by the U. S. military in the 1940s, characterizing a product that would operate when expected. In World War II, many reliability issues were due to inherent unreliability of electronics, in 1945, M. A. Miner published the seminal paper titled Cumulative Damage in Fatigue in an ASME journal. The IEEE formed the Reliability Society in 1948, in 1950, on the military side, a group called the Advisory Group on the Reliability of Electronic Equipment, AGREE, was born. The famous military standard 781 was created at that time, around this period also the much-used military handbook 217 was published by RCA and was used for the prediction of failure rates of components. The emphasis on component reliability and empirical research alone slowly decreases, More pragmatic approaches, as used in the consumer industries, are being used. In the 1980s, televisions were made up of solid-state semiconductors

9.
Poisson point process
–
In probability, statistics and related fields, a Poisson point process or Poisson process is a type of random mathematical object that consists of points randomly located on a mathematical space. The Poisson point process is defined on the real line. In this setting, it is used, for example, in queueing theory to model events, such as the arrival of customers at a store or phone calls at an exchange. In this setting, the process is used in mathematical models and in the related fields of spatial point processes, stochastic geometry, spatial statistics. On more abstract spaces, the Poisson point process serves as an object of study in its own right. This has inspired the proposal of other point processes, some of which are constructed with the Poisson point process, the process is named after French mathematician Siméon Denis Poisson despite Poisson never having studied the process. The process was discovered independently and repeatedly in different settings, including experiments on radioactive decay, telephone call arrivals and insurance mathematics. The point process depends on a mathematical object, which, depending on the context, may be a constant, a locally integrable function or, in more general settings. In the first case, the constant, known as the rate or intensity, is the density of the points in the Poisson process located in some region of space. The resulting point process is called a homogeneous or stationary Poisson point process, depending on the setting, the process has several equivalent definitions as well as definitions of varying generality owing to its many applications and characterizations. Consequently, the notation, terminology and level of mathematical rigour used to define and study the Poisson point process, despite its different forms and varying generality, the Poisson point process has two key properties. The Poisson point process is related to the Poisson distribution, which implies that the probability of a Poisson random variable N being equal to n is given by, P = Λ n n. E − Λ where n. denotes n factorial and Λ is the single Poisson parameter that is used to define the Poisson distribution. If a Poisson point process is defined on some underlying space and this property is known under several names such as complete randomness, complete independence, or independent scattering and is common to all Poisson point processes. In other words, there is a lack of interaction between different regions and the points in general, which motivates the Poisson process being called a purely or completely random process. For all the instances of the Poisson point process, the two key properties of the Poisson distribution and complete independence play an important role, if a Poisson point process has a constant parameter, say, λ, then it is called a homogeneous or stationary Poisson point process. The parameter, called rate or intensity, is related to the number of Poisson points existing in some bounded region. The homogeneous Poisson point process, when considered on the positive half-line, can be defined as a process, a type of stochastic process

10.
Statistical graphics
–
Statistical graphics, also known as graphical techniques, are graphics in the field of statistics used to visualize quantitative data. Whereas statistics and data analysis procedures generally yield their output in numeric or tabular form and they include plots such as scatter plots, histograms, probability plots, spaghetti plots, residual plots, box plots, block plots and biplots. Exploratory data analysis relies heavily on such techniques, in addition, the choice of appropriate statistical graphics can provide a convincing means of communicating the underlying message that is present in the data to others. If one is not using statistical graphics, then one is forfeiting insight into one or more aspects of the structure of the data. Statistical graphics have been central to the development of science and date to the earliest attempts to analyse data, many familiar forms, including bivariate plots, statistical maps, bar charts, and coordinate paper were used in the 18th century. Since the 1970s statistical graphics have been re-emerging as an important analytic tool with the revitalisation of computer graphics, famous graphics were designed by, William Playfair who produced what could be called the first line, bar, pie, and area charts. See the plots page for more examples of statistical graphics. nist. gov. The Visual Display of Quantitative Information, trend Compass Alphabetic gallery of graphical techniques DataScope a website devoted to data visualization and statistical graphics

11.
Partial correlation
–
In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. The coefficient of alienation, and its relation with joint variance through correlation are available in Guilford, a simple way to compute the sample partial correlation for some data is to solve the two associated linear regression problems, get the residuals, and calculate the correlation between the residuals. Let X and Y be, as above, random variables taking real values, if we write xi, yi and zi to denote the ith of N i. i. d. Note that in some formulations the regression includes a constant term and it can be computationally expensive to solve the linear regression problems. Actually, the partial correlation can be easily computed from three th-order partial correlations. The zeroth-order partial correlation ρXY·Ø is defined to be the regular correlation coefficient ρXY, naïvely implementing this computation as a recursive algorithm yields an exponential time complexity. However, this computation has the overlapping subproblems property, such that using dynamic programming or simply caching the results of the recursive calls yields a complexity of O. Note in the case where Z is a variable, this reduces to. If we define P = Ω−1, we have, ρ X i X j ⋅ V ∖ = − p i j p i i p j j, let three variables X, Y, Z be chosen from a joint probability distribution over n variables V. Further let vi,1 ≤ i ≤ N, be N n-dimensional i. i. d, samples taken from the joint probability distribution over V. We then consider the N-dimensional vectors x, y and z and it can be shown that the residuals RX coming from the linear regression of X on Z, if also considered as an N-dimensional vector rX, have a zero scalar product with the vector z generated by Z. This means that the vector lies on an -dimensional hyperplane Sz that is perpendicular to z. The same also applies to the residuals RY generating a vector rY, the desired partial correlation is then the cosine of the angle φ between the projections rX and rY of x and y, respectively, onto the hyperplane perpendicular to z. With the assumption that all involved variables are multivariate Gaussian, the partial correlation ρXY·Z is zero if and this property does not hold in the general case. To test if a partial correlation ρ ^ X Y ⋅ Z vanishes, Fishers z-transform of the partial correlation can be used. The null hypothesis is H0, ρ ^ X Y ⋅ Z =0, note that this z-transform is approximate and that the actual distribution of the sample correlation coefficient is not straightforward. However, an exact t-test based on a combination of the regression coefficient, the partial correlation coefficient. The distribution of the partial correlation was described by Fisher