1.
Statistics
–
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e. g. a scientific, industrial, or social problem, populations can be diverse topics such as all people living in a country or every atom composing a crystal. Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys, statistician Sir Arthur Lyon Bowley defines statistics as Numerical statements of facts in any department of inquiry placed in relation to each other. When census data cannot be collected, statisticians collect data by developing specific experiment designs, representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. In contrast, an observational study does not involve experimental manipulation, inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two data sets, or a data set and a synthetic data drawn from idealized model. A hypothesis is proposed for the relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the hypothesis is done using statistical tests that quantify the sense in which the null can be proven false. Working from a hypothesis, two basic forms of error are recognized, Type I errors and Type II errors. Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis, measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random or systematic, the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics continues to be an area of research, for example on the problem of how to analyze Big data. Statistics is a body of science that pertains to the collection, analysis, interpretation or explanation. Some consider statistics to be a mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty, mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure-theoretic probability theory. In applying statistics to a problem, it is practice to start with a population or process to be studied. Populations can be diverse topics such as all living in a country or every atom composing a crystal. Ideally, statisticians compile data about the entire population and this may be organized by governmental statistical institutes
2.
Pie chart
–
A pie chart is a circular statistical graphic which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice, is proportional to the quantity it represents, while it is named for its resemblance to a pie which has been sliced, there are variations on the way it can be presented. The earliest known pie chart is generally credited to William Playfairs Statistical Breviary of 1801, Pie charts are very widely used in the business world and the mass media. Pie charts can be replaced in most cases by other such as the bar chart. The earliest known pie chart is generally credited to William Playfairs Statistical Breviary of 1801, playfair presented an illustration, which contained a series of pie charts. One of those charts depicting the proportions of the Turkish Empire located in Asia, Europe and this invention was not widely used at first, The French engineer Charles Joseph Minard was one of the first to use pie charts in 1858, in particular in maps. Minards map,1858 used pie charts to represent the cattle sent from all around France for consumption in Paris, playfair thought that pie charts were in need of a third dimension to add additional information. It has been said that Florence Nightingale invented it, though in fact she just popularised it, a 3d pie cake, or perspective pie cake, is used to give the chart a 3D look. The use of superfluous dimensions not used to display the data of interest is discouraged for charts in general, a doughnut chart is a variant of the pie chart, with a blank center allowing for additional information about the data as a whole to be included. A chart with one or more separated from the rest of the disk is known as an exploded pie chart. This effect is used to highlight a sector, or to highlight smaller segments of the chart with small proportions. The polar area diagram is similar to a pie chart, except sectors are equal angles. The polar area diagram is used to plot cyclic phenomena, for example, if the count of deaths in each month for a year are to be plotted then there will be 12 sectors all with the same angle of 30 degrees each. The radius of each sector would be proportional to the root of the death count for the month. Léon Lalanne later used a diagram to show the frequency of wind directions around compass points in 1843. The wind rose is used by meteorologists. Nightingale published her rose diagram in 1858, the name coxcomb is sometimes used erroneously, this was the name Nightingale used to refer to a book containing the diagrams rather than the diagrams themselves. A ring chart, also known as a sunburst chart or a pie chart, is used to visualize hierarchical data
3.
Biplot
–
Biplots are a type of exploratory graph used in statistics, a generalization of the simple two-variable scatterplot. A biplot allows information on both samples and variables of a matrix to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, in the case of categorical variables, category level points may be used to represent the levels of a categorical variable. A generalised biplot displays information on both continuous and categorical variables, the biplot was introduced by K. Ruben Gabriel. Gower and Hand wrote a monograph on biplots, Yan and Kang described various methods which can be used in order to visualize and interpret a biplot. The transformed data matrix Y is obtained from the original matrix X by centering and optionally standardizing the columns. Using the SVD, we can write Y = ∑k=1. pdkukvkT, where the uk are n-dimensional column vectors, the vk are p-dimensional column vectors, the biplot is formed from two scatterplots that share a common set of axes and have a between-set scalar product interpretation. The first scatterplot is formed from the points, for i =1. n, the second plot is formed from the points, for j =1. p. This is the biplot formed by the dominant two terms of the SVD, which can then be represented in a two-dimensional display, typical choices of α are 1 and 0, and in some rare cases α=1/2 to obtain a symmetrically scaled biplot. Available for free download ISBN 978-84-923846-8-6, with materials, the biplot graphic display of matrices with application to principal component analysis. Gower, J. C. Lubbe, S. and le Roux, ISBN 978-0-470-01255-0 Gower, J. C. and Hand, D. J. ISBN 0-412-71630-5 Yan, W. and Kang, M. S, ISBN 0-8493-1338-4 Demey, J. R. Vicente-Villardón, J. L. Galindo-Villardón, M. P. and Zambrano, A. Y. Identifying molecular markers associated with classification of genotypes by External Logistic Biplots
4.
Central limit theorem
–
If this procedure is performed many times, the central limit theorem says that the computed values of the average will be distributed according to the normal distribution. The central limit theorem has a number of variants, in its common form, the random variables must be identically distributed. In variants, convergence of the mean to the normal distribution also occurs for non-identical distributions or for non-independent observations, in more general usage, a central limit theorem is any of a set of weak-convergence theorems in probability theory. When the variance of the i. i. d, Variables is finite, the attractor distribution is the normal distribution. In contrast, the sum of a number of i. i. d, Random variables with power law tail distributions decreasing as | x |−α −1 where 0 < α <2 will tend to an alpha-stable distribution with stability parameter of α as the number of variables grows. Suppose we are interested in the sample average S n, = X1 + ⋯ + X n n of these random variables, by the law of large numbers, the sample averages converge in probability and almost surely to the expected value µ as n → ∞. The classical central limit theorem describes the size and the form of the stochastic fluctuations around the deterministic number µ during this convergence. For large enough n, the distribution of Sn is close to the distribution with mean µ. The usefulness of the theorem is that the distribution of √n approaches normality regardless of the shape of the distribution of the individual Xi, formally, the theorem can be stated as follows, Lindeberg–Lévy CLT. Suppose is a sequence of i. i. d, Random variables with E = µ and Var = σ2 < ∞. Then as n approaches infinity, the random variables √n converge in distribution to a normal N, n → d N. Note that the convergence is uniform in z in the sense that lim n → ∞ sup z ∈ R | Pr − Φ | =0, the theorem is named after Russian mathematician Aleksandr Lyapunov. In this variant of the limit theorem the random variables Xi have to be independent. The theorem also requires that random variables | Xi | have moments of order. Suppose is a sequence of independent random variables, each with finite expected value μi, in practice it is usually easiest to check Lyapunov’s condition for δ =1. If a sequence of random variables satisfies Lyapunov’s condition, then it also satisfies Lindeberg’s condition, the converse implication, however, does not hold. In the same setting and with the notation as above. Suppose that for every ε >0 lim n → ∞1 s n 2 ∑ i =1 n E =0 where 1 is the indicator function
5.
Kurtosis
–
In probability theory and statistics, kurtosis is a measure of the tailedness of the probability distribution of a real-valued random variable. Depending on the measure of kurtosis that is used, there are various interpretations of kurtosis. The standard measure of kurtosis, originating with Karl Pearson, is based on a version of the fourth moment of the data or population. This number is related to the tails of the distribution, not its peak, hence, for this measure, higher kurtosis is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. The kurtosis of any normal distribution is 3. It is common to compare the kurtosis of a distribution to this value, distributions with kurtosis less than 3 are said to be platykurtic, although this does not imply the distribution is flat-topped as sometimes reported. Rather, it means the distribution produces fewer and less extreme outliers than does the normal distribution, an example of a platykurtic distribution is the uniform distribution, which does not produce outliers. Distributions with kurtosis greater than 3 are said to be leptokurtic and it is also common practice to use an adjusted version of Pearsons kurtosis, the excess kurtosis, which is the kurtosis minus 3, to provide the comparison to the normal distribution. Some authors use kurtosis by itself to refer to the excess kurtosis, for the reason of clarity and generality, however, this article follows the non-excess convention and explicitly indicates where excess kurtosis is meant. Alternative measures of kurtosis are, the L-kurtosis, which is a version of the fourth L-moment. These are analogous to the measures of skewness that are not based on ordinary moments. The kurtosis is the fourth standardized moment, defined as Kurt = μ4 σ4 = E 2, several letters are used in the literature to denote the kurtosis. A very common choice is κ, which is fine as long as it is clear that it does not refer to a cumulant, other choices include γ2, to be similar to the notation for skewness, although sometimes this is instead reserved for the excess kurtosis. The kurtosis is bounded below by the squared skewness plus 1, μ4 σ4 ≥2 +1, the lower bound is realized by the Bernoulli distribution. There is no limit to the excess kurtosis of a general probability distribution. A reason why some authors favor the excess kurtosis is that cumulants are extensive, formulas related to the extensive property are more naturally expressed in terms of the excess kurtosis. Xn be independent random variables for which the fourth moment exists, the excess kurtosis of Y is Kurt −3 =12 ∑ i =1 n σ i 4 ⋅, where σ i is the standard deviation of X i. In particular if all of the Xi have the same variance, the reason not to subtract off 3 is that the bare fourth moment better generalizes to multivariate distributions, especially when independence is not assumed
6.
Scatter plot
–
A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are color-coded, one variable can be displayed. A scatter plot can be used either one continuous variable that is under the control of the experimenter. The measured or dependent variable is plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis, a scatter plot can suggest various kinds of correlations between variables with a certain confidence interval. For example, weight and height, weight would be on y axis, correlations may be positive, negative, or null. If the pattern of dots slopes from lower left to upper right, if the pattern of dots slopes from upper left to lower right, it indicates a negative correlation. A line of best fit can be drawn in order to study the relationship between the variables, an equation for the correlation between the variables can be determined by established best-fit procedures. For a linear correlation, the procedure is known as linear regression and is guaranteed to generate a correct solution in a finite time. No universal best-fit procedure is guaranteed to generate a solution for arbitrary relationships. A scatter plot is very useful when we wish to see how two comparable data sets agree with each other. In this case, an identity line, i. e. a y=x line, one of the most powerful aspects of a scatter plot, however, is its ability to show nonlinear relationships between variables. The ability to do this can be enhanced by adding a line such as LOESS. Furthermore, if the data are represented by a model of simple relationships. The scatter diagram is one of the seven basic tools of quality control, scatter charts can be built in the form of bubble, marker, or/and line charts. The researcher would then plot the data in a plot, assigning lung capacity to the horizontal axis. A person with a capacity of 400 cl who held his/her breath for 21.7 seconds would be represented by a single dot on the scatter plot at the point in the Cartesian coordinates. For a set of data variables X1, X2, xk, the scatter plot matrix shows all the pairwise scatter plots of the variables on a single view with multiple scatterplots in a matrix format
7.
Statistical graphics
–
Statistical graphics, also known as graphical techniques, are graphics in the field of statistics used to visualize quantitative data. Whereas statistics and data analysis procedures generally yield their output in numeric or tabular form and they include plots such as scatter plots, histograms, probability plots, spaghetti plots, residual plots, box plots, block plots and biplots. Exploratory data analysis relies heavily on such techniques, in addition, the choice of appropriate statistical graphics can provide a convincing means of communicating the underlying message that is present in the data to others. If one is not using statistical graphics, then one is forfeiting insight into one or more aspects of the structure of the data. Statistical graphics have been central to the development of science and date to the earliest attempts to analyse data, many familiar forms, including bivariate plots, statistical maps, bar charts, and coordinate paper were used in the 18th century. Since the 1970s statistical graphics have been re-emerging as an important analytic tool with the revitalisation of computer graphics, famous graphics were designed by, William Playfair who produced what could be called the first line, bar, pie, and area charts. See the plots page for more examples of statistical graphics. nist. gov. The Visual Display of Quantitative Information, trend Compass Alphabetic gallery of graphical techniques DataScope a website devoted to data visualization and statistical graphics
8.
Run chart
–
A run chart, also known as a run-sequence plot is a graph that displays observed data in a time sequence. Often, the data displayed represent some aspect of the output or performance of a manufacturing or other business process and it is therefore a form of line chart. Run sequence plots are a way to graphically summarize a univariate data set. A common assumption of data sets is that they behave like, random drawings, from a fixed distribution, with a common location. With run sequence plots, shifts in location and scale are typically quite evident, also, outliers can easily be detected. Examples could include measurements of the level of bottles filled at a bottling plant or the water temperature of a dish-washing machine each time it is run. Time is generally represented on the axis and the property under observation on the vertical axis. Often, some measure of central tendency of the data is indicated by a reference line. Run charts are analyzed to find anomalies in data that suggest shifts in a process over time or special factors that may be influencing the variability of a process. Run charts are similar in regards to the control charts used in statistical process control. They are therefore simpler to produce, but do not allow for the range of analytic techniques supported by control charts. This article incorporates public domain material from the National Institute of Standards and Technology website http, //www. nist. gov
9.
Partition of a set
–
In mathematics, a partition of a set is a grouping of the sets elements into non-empty subsets, in such a way that every element is included in one and only one of the subsets. A partition of a set X is a set of nonempty subsets of X such that every element x in X is in one of these subsets. Equivalently, a family of sets P is a partition of X if and only if all of the following conditions hold, the union of the sets in P is equal to X. The sets in P are said to cover X, the intersection of any two distinct sets in P is empty. The elements of P are said to be pairwise disjoint, the sets in P are called the blocks, parts or cells of the partition. The rank of P is |X| − |P|, if X is finite, every singleton set has exactly one partition, namely. The empty set ∅ has exactly one partition, namely ∅, for any nonempty set X, P = is a partition of X, called the trivial partition. For any non-empty proper subset A of a set U, the set A together with its complement form a partition of U, the set has these five partitions, sometimes written 1|2|3. The following are not partitions of, is not a partition because one of its elements is the empty set, is not a partition because the element 2 is contained in more than one block. Is not a partition of because none of its blocks contains 3, however, thus the notions of equivalence relation and partition are essentially equivalent. The axiom of choice guarantees for any partition of a set X the existence of a subset of X containing exactly one element from each part of the partition and this implies that given an equivalence relation on a set one can select a canonical representative element from every equivalence class. Informally, this means that α is a fragmentation of ρ. In that case, it is written that α ≤ ρ and this finer-than relation on the set of partitions of X is a partial order. Each set of elements has a least upper bound and a greatest lower bound, so that it forms a lattice, the partition lattice of a 4-element set has 15 elements and is depicted in the Hasse diagram on the left. These atomic partitions correspond one-for-one with the edges of a complete graph, in this way, the lattice of partitions corresponds to the lattice of flats of the graphic matroid of the complete graph. Another example illustrates the refining of partitions from the perspective of equivalence relations, if D is the set of cards in a standard 52-card deck, the same-color-as relation on D – which can be denoted ~C – has two equivalence classes, the sets and. The 2-part partition corresponding to ~C has a refinement that yields the same-suit-as relation ~S, which has the four equivalence classes, and. In other words, given distinct numbers a, b, c in N, with a < b < c, if a ~ c, it follows that also a ~ b and b ~ c, that is b is also in C
10.
Logistic regression
–
In statistics, logistic regression, or logit regression, or logit model is a regression model where the dependent variable is categorical. This article covers the case of a binary dependent variable—that is, cases where the dependent variable has more than two outcome categories may be analysed in multinomial logistic regression, or, if the multiple categories are ordered, in ordinal logistic regression. In the terminology of economics, logistic regression is an example of a qualitative response/discrete choice model, Logistic regression was developed by statistician David Cox in 1958. The binary logistic model is used to estimate the probability of a response based on one or more predictor variables. It allows one to say that the presence of a risk factor increases the probability of an outcome by a specific percentage. Logistic regression is used in fields, including machine learning, most medical fields. For example, the Trauma and Injury Severity Score, which is used to predict mortality in injured patients, was originally developed by Boyd et al. using logistic regression. Many other medical scales used to assess severity of a patient have been developed using logistic regression, Logistic regression may be used to predict whether a patient has a given disease, based on observed characteristics of the patient. Another example might be to predict whether an American voter will vote Democratic or Republican, based on age, income, sex, race, state of residence, votes in previous elections, etc. The technique can also be used in engineering, especially for predicting the probability of failure of a given process and it is also used in marketing applications such as prediction of a customers propensity to purchase a product or halt a subscription, etc. Conditional random fields, an extension of logistic regression to sequential data, are used in language processing. Suppose we wish to answer the question, A group of 20 students spend between 0 and 6 hours studying for an exam. How does the number of hours spent studying affect the probability that the student will pass the exam, the reason for using logistic regression for this problem is that the dependent variable pass/fail represented by 1 and 0 are not cardinal numbers. If the problem were changed so that pass/fail was replaced with the grade 0–100, the table shows the number of hours each student spent studying, and whether they passed or failed. The graph shows the probability of passing the exam versus the number of hours studying, the logistic regression analysis gives the following output. The output indicates that hours studying is significantly associated with the probability of passing the exam, the output from the logistic regression analysis gives a p-value of p =0.0167, which is based on the Wald z-score. Rather than the Wald method, the method to calculate the p-value for logistic regression is the Likelihood Ratio Test. Logistic regression can be binomial, ordinal or multinomial, binomial or binary logistic regression deals with situations in which the observed outcome for a dependent variable can have only two possible types,0 and 1
11.
Partial correlation
–
In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. The coefficient of alienation, and its relation with joint variance through correlation are available in Guilford, a simple way to compute the sample partial correlation for some data is to solve the two associated linear regression problems, get the residuals, and calculate the correlation between the residuals. Let X and Y be, as above, random variables taking real values, if we write xi, yi and zi to denote the ith of N i. i. d. Note that in some formulations the regression includes a constant term and it can be computationally expensive to solve the linear regression problems. Actually, the partial correlation can be easily computed from three th-order partial correlations. The zeroth-order partial correlation ρXY·Ø is defined to be the regular correlation coefficient ρXY, naïvely implementing this computation as a recursive algorithm yields an exponential time complexity. However, this computation has the overlapping subproblems property, such that using dynamic programming or simply caching the results of the recursive calls yields a complexity of O. Note in the case where Z is a variable, this reduces to. If we define P = Ω−1, we have, ρ X i X j ⋅ V ∖ = − p i j p i i p j j, let three variables X, Y, Z be chosen from a joint probability distribution over n variables V. Further let vi,1 ≤ i ≤ N, be N n-dimensional i. i. d, samples taken from the joint probability distribution over V. We then consider the N-dimensional vectors x, y and z and it can be shown that the residuals RX coming from the linear regression of X on Z, if also considered as an N-dimensional vector rX, have a zero scalar product with the vector z generated by Z. This means that the vector lies on an -dimensional hyperplane Sz that is perpendicular to z. The same also applies to the residuals RY generating a vector rY, the desired partial correlation is then the cosine of the angle φ between the projections rX and rY of x and y, respectively, onto the hyperplane perpendicular to z. With the assumption that all involved variables are multivariate Gaussian, the partial correlation ρXY·Z is zero if and this property does not hold in the general case. To test if a partial correlation ρ ^ X Y ⋅ Z vanishes, Fishers z-transform of the partial correlation can be used. The null hypothesis is H0, ρ ^ X Y ⋅ Z =0, note that this z-transform is approximate and that the actual distribution of the sample correlation coefficient is not straightforward. However, an exact t-test based on a combination of the regression coefficient, the partial correlation coefficient. The distribution of the partial correlation was described by Fisher