1.
Statistics
–
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e. g. a scientific, industrial, or social problem, populations can be diverse topics such as all people living in a country or every atom composing a crystal. Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys, statistician Sir Arthur Lyon Bowley defines statistics as Numerical statements of facts in any department of inquiry placed in relation to each other. When census data cannot be collected, statisticians collect data by developing specific experiment designs, representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. In contrast, an observational study does not involve experimental manipulation, inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two data sets, or a data set and a synthetic data drawn from idealized model. A hypothesis is proposed for the relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the hypothesis is done using statistical tests that quantify the sense in which the null can be proven false. Working from a hypothesis, two basic forms of error are recognized, Type I errors and Type II errors. Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis, measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random or systematic, the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics continues to be an area of research, for example on the problem of how to analyze Big data. Statistics is a body of science that pertains to the collection, analysis, interpretation or explanation. Some consider statistics to be a mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty, mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure-theoretic probability theory. In applying statistics to a problem, it is practice to start with a population or process to be studied. Populations can be diverse topics such as all living in a country or every atom composing a crystal. Ideally, statisticians compile data about the entire population and this may be organized by governmental statistical institutes
2.
Analysis of variance
–
In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of groups are equal. Hy, ANOVAs are useful for comparing three or more means for statistical significance and it is conceptually similar to multiple two-sample t-tests, but is more conservative and is therefore suited to a wide range of practical problems. While the analysis of variance reached fruition in the 20th century and these include hypothesis testing, the partitioning of sums of squares, experimental techniques and the additive model. Laplace was performing hypothesis testing in the 1770s, the development of least-squares methods by Laplace and Gauss circa 1800 provided an improved method of combining observations. It also initiated much study of the contributions to sums of squares, Laplace soon knew how to estimate a variance from a residual sum of squares. By 1827 Laplace was using least squares methods to address ANOVA problems regarding measurements of atmospheric tides, before 1800 astronomers had isolated observational errors resulting from reaction times and had developed methods of reducing the errors. An eloquent non-mathematical explanation of the effects model was available in 1885. Ronald Fisher introduced the term variance and proposed its formal analysis in a 1918 article The Correlation Between Relatives on the Supposition of Mendelian Inheritance and his first application of the analysis of variance was published in 1921. Analysis of variance became widely known after being included in Fishers 1925 book Statistical Methods for Research Workers, Randomization models were developed by several researchers. The first was published in Polish by Neyman in 1923, one of the attributes of ANOVA which ensured its early popularity was computational elegance. The structure of the model allows solution for the additive coefficients by simple algebra rather than by matrix calculations. In the era of mechanical calculators this simplicity was critical, the determination of statistical significance also required access to tables of the F function which were supplied by early statistics texts. The analysis of variance can be used as an tool to explain observations. A dog show provides an example, a dog show is not a random sampling of the breed, it is typically limited to dogs that are adult, pure-bred, and exemplary. A histogram of dog weights from a show might plausibly be rather complex, suppose we wanted to predict the weight of a dog based on a certain set of characteristics of each dog. Before we could do that, we would need to explain the distribution of weights by dividing the dog population into groups based on those characteristics. A successful grouping will split dogs such that each group has a low variance of dog weights, in the illustrations to the right, each group is identified as X1, X2, etc
3.
F distribution
–
If a random variable X has an F-distribution with parameters d1 and d2, we write X ~ F. Here B is the beta function, in many applications, the parameters d1 and d2 are positive integers, but the distribution is well-defined for positive real values of these parameters. The cumulative distribution function is F = I d 1 x d 1 x + d 2, where I is the regularized incomplete beta function. The expectation, variance, and other details about the F are given in the sidebox, for d2 >8, the characteristic function is listed incorrectly in many standard references. The correct expression is φ d 1, d 2 F = Γ Γ U where U is the confluent hypergeometric function of the second kind. In instances where the F-distribution is used, for example in the analysis of variance, independence of U1 and U2 might be demonstrated by applying Cochrans theorem. In a frequentist context, a scaled F-distribution therefore gives the probability p, with the F-distribution itself, the quantity X has the same distribution in Bayesian statistics, if an uninformative rescaling-invariant Jeffreys prior is taken for the prior probabilities of σ12 and σ22. In this context, a scaled F-distribution thus gives the probability p. If X ~ F then Y = lim d 2 → ∞ d 1 X has the chi-squared distribution χ d 12 F is equivalent to the scaled Hotellings T-squared distribution d 2 d 1 T2 . Table of critical values of the F-distribution Earliest Uses of Some of the Words of Mathematics, entry on F-distribution contains a brief history Free calculator for F-testing
4.
Null hypothesis
–
In inferential statistics, the term null hypothesis is a general statement or default position that there is no relationship between two measured phenomena, or no association among groups. The null hypothesis is generally assumed to be true until evidence indicates otherwise, in statistics, it is often denoted H0. The concept of a hypothesis is used differently in two approaches to statistical inference. In the significance testing approach of Ronald Fisher, a hypothesis is rejected if the observed data are significantly unlikely to have occurred if the null hypothesis were true. In this case the null hypothesis is rejected and a hypothesis is accepted in its place. If the data are consistent with the hypothesis, then the null hypothesis is not rejected. In neither case is the hypothesis or its alternative proven, the null hypothesis is tested with data. This is analogous to a trial, in which the defendant is assumed to be innocent until proven guilty beyond a reasonable doubt. Proponents of each approach criticize the other approach, nowadays, though, a hybrid approach is widely practiced and presented in textbooks. The hybrid is in turn criticized as incorrect and incoherent—for details, hypothesis testing requires constructing a statistical model of what the data would look like given that chance or random processes alone were responsible for the results. The hypothesis that chance alone is responsible for the results is called the null hypothesis, the model of the result of the random process is called the distribution under the null hypothesis. The obtained results are compared with the distribution under the null hypothesis. The null hypothesis assumes no relationship between variables in the population from which the sample is selected, If the data-set of a randomly selected representative sample is very unlikely relative to the null hypothesis, the experimenter rejects the null hypothesis concluding it is false. This class of data-sets is usually specified via a test statistic which is designed to measure the extent of apparent departure from the null hypothesis. If the data do not contradict the hypothesis, then only a weak conclusion can be made, namely. For instance, a drug may reduce the chance of having a heart attack. Possible null hypotheses are this drug does not reduce the chances of having an attack or this drug has no effect on the chances of having a heart attack. The test of the consists of administering the drug to half of the people in a study group as a controlled experiment
5.
Central limit theorem
–
If this procedure is performed many times, the central limit theorem says that the computed values of the average will be distributed according to the normal distribution. The central limit theorem has a number of variants, in its common form, the random variables must be identically distributed. In variants, convergence of the mean to the normal distribution also occurs for non-identical distributions or for non-independent observations, in more general usage, a central limit theorem is any of a set of weak-convergence theorems in probability theory. When the variance of the i. i. d, Variables is finite, the attractor distribution is the normal distribution. In contrast, the sum of a number of i. i. d, Random variables with power law tail distributions decreasing as | x |−α −1 where 0 < α <2 will tend to an alpha-stable distribution with stability parameter of α as the number of variables grows. Suppose we are interested in the sample average S n, = X1 + ⋯ + X n n of these random variables, by the law of large numbers, the sample averages converge in probability and almost surely to the expected value µ as n → ∞. The classical central limit theorem describes the size and the form of the stochastic fluctuations around the deterministic number µ during this convergence. For large enough n, the distribution of Sn is close to the distribution with mean µ. The usefulness of the theorem is that the distribution of √n approaches normality regardless of the shape of the distribution of the individual Xi, formally, the theorem can be stated as follows, Lindeberg–Lévy CLT. Suppose is a sequence of i. i. d, Random variables with E = µ and Var = σ2 < ∞. Then as n approaches infinity, the random variables √n converge in distribution to a normal N, n → d N. Note that the convergence is uniform in z in the sense that lim n → ∞ sup z ∈ R | Pr − Φ | =0, the theorem is named after Russian mathematician Aleksandr Lyapunov. In this variant of the limit theorem the random variables Xi have to be independent. The theorem also requires that random variables | Xi | have moments of order. Suppose is a sequence of independent random variables, each with finite expected value μi, in practice it is usually easiest to check Lyapunov’s condition for δ =1. If a sequence of random variables satisfies Lyapunov’s condition, then it also satisfies Lindeberg’s condition, the converse implication, however, does not hold. In the same setting and with the notation as above. Suppose that for every ε >0 lim n → ∞1 s n 2 ∑ i =1 n E =0 where 1 is the indicator function
6.
F-test
–
An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, exact F-tests mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Sir Ronald A. Fisher, Fisher initially developed the statistic as the variance ratio in the 1920s. This is perhaps the best-known F-test, and plays an important role in the analysis of variance, the hypothesis that a proposed regression model fits the data well. The hypothesis that a set in a regression analysis follows the simpler of two proposed linear models that are nested within each other. In addition, some procedures, such as Scheffés method for multiple comparisons adjustment in linear models. The F-test is sensitive to non-normality, in the analysis of variance, alternative tests include Levenes test, Bartletts test, and the Brown–Forsythe test. Most F-tests arise by considering a decomposition of the variability in a collection of data in terms of sums of squares, the test statistic in an F-test is the ratio of two scaled sums of squares reflecting different sources of variability. These sums of squares are constructed so that the statistic tends to be greater when the hypothesis is not true. In order for the statistic to follow the F-distribution under the hypothesis, the sums of squares should be statistically independent. The latter condition is guaranteed if the values are independent. The F-test in one-way analysis of variance is used to assess whether the values of a quantitative variable within several pre-defined groups differ from each other. For example, suppose that a medical trial compares four treatments and this is an example of an omnibus test, meaning that a single test is performed to detect any of several possible differences. Alternatively, we could carry out pairwise tests among the treatments, the advantage of the ANOVA F-test is that we do not need to pre-specify which treatments are to be compared, and we do not need to adjust for making multiple comparisons. The formula for the one-way ANOVA F-test statistic is F = explained variance unexplained variance, or F = between-group variability within-group variability. The unexplained variance, or within-group variability is ∑ i =1 K ∑ j =1 n i 2 / and this F-statistic follows the F-distribution with K−1, N −K degrees of freedom under the null hypothesis. Note that when there are two groups for the one-way ANOVA F-test, F=t2 where t is the Students t statistic. Consider two models,1 and 2, where model 1 is nested within model 2, Model 1 is the Restricted model, and Model 2 is the Unrestricted one
7.
Normally distributed
–
In probability theory, the normal distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are used in the natural and social sciences to represent real-valued random variables whose distributions are not known. The normal distribution is useful because of the limit theorem. Physical quantities that are expected to be the sum of independent processes often have distributions that are nearly normal. Moreover, many results and methods can be derived analytically in explicit form when the relevant variables are normally distributed, the normal distribution is sometimes informally called the bell curve. However, many other distributions are bell-shaped, the probability density of the normal distribution is, f =12 π σ2 e −22 σ2 Where, μ is mean or expectation of the distribution. σ is standard deviation σ2 is variance A random variable with a Gaussian distribution is said to be distributed and is called a normal deviate. The simplest case of a distribution is known as the standard normal distribution. The factor 1 /2 in the exponent ensures that the distribution has unit variance and this function is symmetric around x =0, where it attains its maximum value 1 /2 π and has inflection points at x = +1 and x = −1. Authors may differ also on which normal distribution should be called the standard one, the probability density must be scaled by 1 / σ so that the integral is still 1. If Z is a normal deviate, then X = Zσ + μ will have a normal distribution with expected value μ. Conversely, if X is a normal deviate, then Z = /σ will have a standard normal distribution. Every normal distribution is the exponential of a function, f = e a x 2 + b x + c where a is negative. In this form, the mean value μ is −b/, for the standard normal distribution, a is −1/2, b is zero, and c is − ln /2. The standard Gaussian distribution is denoted with the Greek letter ϕ. The alternative form of the Greek phi letter, φ, is used quite often. The normal distribution is often denoted by N. Thus when a random variable X is distributed normally with mean μ and variance σ2, some authors advocate using the precision τ as the parameter defining the width of the distribution, instead of the deviation σ or the variance σ2
8.
Robust statistics
–
Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have developed for many common problems, such as estimating location, scale. One motivation is to produce statistical methods that are not unduly affected by outliers, another motivation is to provide methods with good performance when there are small departures from parametric distributions. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations, under this model, non-robust methods like a t-test work poorly. Robust statistics seek to provide methods that emulate popular statistical methods, in statistics, classical estimation methods rely heavily on assumptions which are often not met in practice. In particular, it is assumed that the data errors are normally distributed, at least approximately. Unfortunately, when there are outliers in the data, classical estimators often have poor performance, when judged using the breakdown point. For instance, one may use a mixture of 95% a normal distribution, the median is a robust measure of central tendency, while the mean is not. The median has a point of 50%, while the mean has a breakdown point of 0%. The median absolute deviation and interquartile range are robust measures of statistical dispersion, while the standard deviation, trimmed estimators and Winsorised estimators are general methods to make statistics more robust. There are various definitions of a robust statistic, strictly speaking, a robust statistic is resistant to errors in the results, produced by deviations from assumptions. One of the most important cases is distributional robustness, classical statistical procedures are typically sensitive to longtailedness. Thus, in the context of robust statistics, distributionally robust, for one perspective on research in robust statistics up to 2000, see Portnoy & He. A related topic is that of resistant statistics, which are resistant to the effect of extreme scores, gelman et al. in Bayesian Data Analysis consider a data set relating to speed-of-light measurements made by Simon Newcomb. The data sets for that book can be found via the Classic data sets page, although the bulk of the data look to be more or less normally distributed, there are two obvious outliers. These outliers have an effect on the mean, dragging it towards them. Thus, if the mean is intended as a measure of the location of the center of the data, it is, in a sense, also, the distribution of the mean is known to be asymptotically normal due to the central limit theorem. However, outliers can make the distribution of the mean non-normal even for large data sets
9.
Normal distribution
–
In probability theory, the normal distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are used in the natural and social sciences to represent real-valued random variables whose distributions are not known. The normal distribution is useful because of the limit theorem. Physical quantities that are expected to be the sum of independent processes often have distributions that are nearly normal. Moreover, many results and methods can be derived analytically in explicit form when the relevant variables are normally distributed, the normal distribution is sometimes informally called the bell curve. However, many other distributions are bell-shaped, the probability density of the normal distribution is, f =12 π σ2 e −22 σ2 Where, μ is mean or expectation of the distribution. σ is standard deviation σ2 is variance A random variable with a Gaussian distribution is said to be distributed and is called a normal deviate. The simplest case of a distribution is known as the standard normal distribution. The factor 1 /2 in the exponent ensures that the distribution has unit variance and this function is symmetric around x =0, where it attains its maximum value 1 /2 π and has inflection points at x = +1 and x = −1. Authors may differ also on which normal distribution should be called the standard one, the probability density must be scaled by 1 / σ so that the integral is still 1. If Z is a normal deviate, then X = Zσ + μ will have a normal distribution with expected value μ. Conversely, if X is a normal deviate, then Z = /σ will have a standard normal distribution. Every normal distribution is the exponential of a function, f = e a x 2 + b x + c where a is negative. In this form, the mean value μ is −b/, for the standard normal distribution, a is −1/2, b is zero, and c is − ln /2. The standard Gaussian distribution is denoted with the Greek letter ϕ. The alternative form of the Greek phi letter, φ, is used quite often. The normal distribution is often denoted by N. Thus when a random variable X is distributed normally with mean μ and variance σ2, some authors advocate using the precision τ as the parameter defining the width of the distribution, instead of the deviation σ or the variance σ2
10.
Homoscedasticity
–
In statistics, a sequence or a vector of random variables is homoscedastic /ˌhoʊmoʊskəˈdæstɪk/ if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance, the complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used, the assumption of homoscedasticity simplifies mathematical and computational treatment. Serious violations in homoscedasticity may result in overestimating the goodness of fit as measured by the Pearson coefficient. As used in describing simple linear regression analysis, one assumption of the model is that the standard deviations of the error terms are constant. Consequently, each probability distribution for y has the standard deviation regardless of the x-value. In short, this assumption is homoscedasticity, homoscedasticity is not required for the estimates to be unbiased, consistent, and asymptotically normal. Residuals can be tested for homoscedasticity using the Breusch–Pagan test, which performs an auxiliary regression of the squared residuals on the independent variables, the null hypothesis of this chi-squared test is homoscedasticity, and the alternative hypothesis would indicate heteroscedasticity. Since the Breusch–Pagan test is sensitive to departures from normality or small sample sizes, from the auxiliary regression, it retains the R-squared value which is then multiplied by the sample size, and then becomes the test statistic for a chi-squared distribution. Although it is not necessary for the Koenker–Bassett test, the Breusch–Pagan test requires that the squared residuals also be divided by the sum of squares divided by the sample size. Testing for groupwise heteroscedasticity requires the Goldfeld–Quandt test, two or more normal distributions, N, are homoscedastic if they share a common covariance matrix, Σ i = Σ j, ∀ i, j. Homoscedastic distributions are useful to derive statistical pattern recognition and machine learning algorithms. One popular example is Fishers linear discriminant analysis, the concept of homoscedasticity can be applied to distributions on spheres
11.
Statistical significance
–
In statistical hypothesis testing, statistical significance is attained whenever the observed p-value of a test statistic is less than the significance level defined for the study. The p-value is the probability of obtaining results at least as extreme as those observed, the significance level, α, is the probability of rejecting the null hypothesis, given that it is true. In any experiment or observation that involves drawing a sample from a population, a significance level is chosen before data collection, and typically set to 5% or much lower, depending on the field of study. This technique for testing the significance of results was developed in the early 20th century, the term significance does not imply importance here, and the term statistical significance is not the same as research, theoretical, or practical significance. For example, the clinical significance refers to the practical importance of a treatment effect. In 1925, Ronald Fisher advanced the idea of hypothesis testing. Fisher suggested a probability of one in twenty as a convenient cutoff level to reject the null hypothesis, in a 1933 paper, Jerzy Neyman and Egon Pearson called this cutoff the significance level, which they named α. They recommended that α be set ahead of time, prior to any data collection, despite his initial suggestion of 0.05 as a significance level, Fisher did not intend this cutoff value to be fixed. In his 1956 publication Statistical methods and scientific inference, he recommended that significance levels be set according to specific circumstances, the significance level α is the threshold for p below which the experimenter assumes the null hypothesis is false, and something else is going on. This means α is also the probability of rejecting the null hypothesis. Sometimes researchers talk about the confidence level γ = instead and this is the probability of not rejecting the null hypothesis given that it is true. Confidence levels and confidence intervals were introduced by Neyman in 1937, Statistical significance plays a pivotal role in statistical hypothesis testing. It is used to determine whether the null hypothesis should be rejected or retained, the null hypothesis is the default assumption that nothing happened or changed. For the null hypothesis to be rejected, a result has to be statistically significant. To determine whether a result is significant, a researcher calculates a p-value. The null hypothesis is rejected if the p-value is less than a predetermined level, α is called the significance level, and is the probability of rejecting the null hypothesis given that it is true. It is usually set at or below 5%, when drawing data from a sample, this means that the rejection region comprises 5% of the sampling distribution. As a result, the hypothesis can be rejected with a less extreme result if a one-tailed test was used
12.
P-value
–
Their misuse has been a matter of considerable controversy. The p-value is defined informally as the probability of obtaining an equal to or more extreme than what was actually observed. This ignores the distinction between two-tailed and one-tailed tests which is discussed below, in frequentist inference, the p-value is widely used in statistical hypothesis testing, specifically in null hypothesis significance testing. If the p-value is less than or equal to the significance level. However, that does not prove that the hypothesis is true. When the p-value is calculated correctly, this test guarantees that the Type I error rate is at most α. For typical analysis, using the standard α =0.05 cutoff, the p-value does not, in itself, support reasoning about the probabilities of hypotheses but is only a tool for deciding whether to reject the null hypothesis. In statistics, a hypothesis refers to a probability distribution that is assumed to govern the observed data. However, if X is a random variable and an instance x is observed. Thus, this definition is inadequate and needs to be changed so as to accommodate the continuous random variables. The p-value is defined as the probability, under the assumption of hypothesis H, depending on how it is looked at, the more extreme than what was actually observed can mean or or the smaller of and. Thus, the p-value is given by Pr for right tail event, the smaller the p-value, the larger the significance because it tells the investigator that the hypothesis under consideration may not adequately explain the observation. The hypothesis H is rejected if any of these probabilities is less than or equal to a small, fixed but arbitrarily pre-defined threshold value α, which is referred to as the level of significance. Unlike the p-value, the α level is not derived from any observational data and does not depend on the underlying hypothesis, thus, the p-value is not fixed. This implies that p-value cannot be given a frequency counting interpretation since the probability has to be fixed for the frequency counting interpretation to hold. In other words, if the same test is repeated independently bearing upon the same null hypothesis. Nevertheless, these different p-values can be combined using Fishers combined probability test, the fixed pre-defined α level can be interpreted as the rate of falsely rejecting the null hypothesis, since Pr = Pr = α. Usually, instead of the observations, X is instead a test statistic
13.
Standard error
–
The standard error is the standard deviation of the sampling distribution of a statistic, most commonly of the mean. Different samples drawn from that population would in general have different values of the sample mean. The relationship with the deviation is defined such that, for a given sample size. As the sample increases, the dispersion of the sample means clusters more closely around the population mean. The term may also be used to refer to an estimate of that standard deviation, the standard error of the mean is the standard deviation of the sample-means estimate of a population mean. This estimate may be compared with the formula for the standard deviation of the sample mean. This formula may be derived from what we know about the variance of a sum of independent random variables. If X1, X2, …, X n are n independent observations from a population that has a mean μ and standard deviation σ, the variance of T / n must be 1 n 2 n σ2 = σ2 n. And the standard deviation of T / n must be σ / n, of course, T / n is the sample mean x ¯. With n =2 the underestimate is about 25%, but for n =6 the underestimate is only 5%, gurland and Tripathi provide a correction and equation for this effect. Sokal and Rohlf give an equation of the factor for small samples of n <20. See unbiased estimation of standard deviation for further discussion, a practical result, Decreasing the uncertainty in a mean value estimate by a factor of two requires acquiring four times as many observations in the sample. Or decreasing standard error by a factor of ten requires a hundred times as many observations, in many practical applications, the true value of σ is unknown. As a result, we need to use a distribution that takes into account that spread of possible σs, when the true underlying distribution is known to be Gaussian, although with unknown σ, then the resulting estimated distribution follows the Student t-distribution. The standard error is the deviation of the Student t-distribution. T-distributions are slightly different from Gaussian, and vary depending on the size of the sample, to estimate the standard error of a student t-distribution it is sufficient to use the sample standard deviation s instead of σ, and we could use this value to calculate confidence intervals. Note, The Students probability distribution is approximated well by the Gaussian distribution when the size is over 100. For such samples one can use the distribution, which is much simpler
14.
Expected value
–
In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. For example, the value in rolling a six-sided die is 3.5. Less roughly, the law of large states that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions approaches infinity. The expected value is known as the expectation, mathematical expectation, EV, average, mean value, mean. More practically, the value of a discrete random variable is the probability-weighted average of all possible values. In other words, each value the random variable can assume is multiplied by its probability of occurring. The same principle applies to a random variable, except that an integral of the variable with respect to its probability density replaces the sum. The expected value does not exist for random variables having some distributions with large tails, for random variables such as these, the long-tails of the distribution prevent the sum/integral from converging. The expected value is a key aspect of how one characterizes a probability distribution, by contrast, the variance is a measure of dispersion of the possible values of the random variable around the expected value. The variance itself is defined in terms of two expectations, it is the value of the squared deviation of the variables value from the variables expected value. The expected value plays important roles in a variety of contexts, in regression analysis, one desires a formula in terms of observed data that will give a good estimate of the parameter giving the effect of some explanatory variable upon a dependent variable. The formula will give different estimates using different samples of data, a formula is typically considered good in this context if it is an unbiased estimator—that is, if the expected value of the estimate can be shown to equal the true value of the desired parameter. In decision theory, and in particular in choice under uncertainty, one example of using expected value in reaching optimal decisions is the Gordon–Loeb model of information security investment. According to the model, one can conclude that the amount a firm spends to protect information should generally be only a fraction of the expected loss. Suppose random variable X can take value x1 with probability p1, value x2 with probability p2, then the expectation of this random variable X is defined as E = x 1 p 1 + x 2 p 2 + ⋯ + x k p k. If all outcomes xi are equally likely, then the weighted average turns into the simple average and this is intuitive, the expected value of a random variable is the average of all values it can take, thus the expected value is what one expects to happen on average. If the outcomes xi are not equally probable, then the simple average must be replaced with the weighted average, the intuition however remains the same, the expected value of X is what one expects to happen on average. Let X represent the outcome of a roll of a fair six-sided die, more specifically, X will be the number of pips showing on the top face of the die after the toss
15.
F-distribution
–
If a random variable X has an F-distribution with parameters d1 and d2, we write X ~ F. Here B is the beta function, in many applications, the parameters d1 and d2 are positive integers, but the distribution is well-defined for positive real values of these parameters. The cumulative distribution function is F = I d 1 x d 1 x + d 2, where I is the regularized incomplete beta function. The expectation, variance, and other details about the F are given in the sidebox, for d2 >8, the characteristic function is listed incorrectly in many standard references. The correct expression is φ d 1, d 2 F = Γ Γ U where U is the confluent hypergeometric function of the second kind. In instances where the F-distribution is used, for example in the analysis of variance, independence of U1 and U2 might be demonstrated by applying Cochrans theorem. In a frequentist context, a scaled F-distribution therefore gives the probability p, with the F-distribution itself, the quantity X has the same distribution in Bayesian statistics, if an uninformative rescaling-invariant Jeffreys prior is taken for the prior probabilities of σ12 and σ22. In this context, a scaled F-distribution thus gives the probability p. If X ~ F then Y = lim d 2 → ∞ d 1 X has the chi-squared distribution χ d 12 F is equivalent to the scaled Hotellings T-squared distribution d 2 d 1 T2 . Table of critical values of the F-distribution Earliest Uses of Some of the Words of Mathematics, entry on F-distribution contains a brief history Free calculator for F-testing
16.
F test
–
An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, exact F-tests mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Sir Ronald A. Fisher, Fisher initially developed the statistic as the variance ratio in the 1920s. This is perhaps the best-known F-test, and plays an important role in the analysis of variance, the hypothesis that a proposed regression model fits the data well. The hypothesis that a set in a regression analysis follows the simpler of two proposed linear models that are nested within each other. In addition, some procedures, such as Scheffés method for multiple comparisons adjustment in linear models. The F-test is sensitive to non-normality, in the analysis of variance, alternative tests include Levenes test, Bartletts test, and the Brown–Forsythe test. Most F-tests arise by considering a decomposition of the variability in a collection of data in terms of sums of squares, the test statistic in an F-test is the ratio of two scaled sums of squares reflecting different sources of variability. These sums of squares are constructed so that the statistic tends to be greater when the hypothesis is not true. In order for the statistic to follow the F-distribution under the hypothesis, the sums of squares should be statistically independent. The latter condition is guaranteed if the values are independent. The F-test in one-way analysis of variance is used to assess whether the values of a quantitative variable within several pre-defined groups differ from each other. For example, suppose that a medical trial compares four treatments and this is an example of an omnibus test, meaning that a single test is performed to detect any of several possible differences. Alternatively, we could carry out pairwise tests among the treatments, the advantage of the ANOVA F-test is that we do not need to pre-specify which treatments are to be compared, and we do not need to adjust for making multiple comparisons. The formula for the one-way ANOVA F-test statistic is F = explained variance unexplained variance, or F = between-group variability within-group variability. The unexplained variance, or within-group variability is ∑ i =1 K ∑ j =1 n i 2 / and this F-statistic follows the F-distribution with K−1, N −K degrees of freedom under the null hypothesis. Note that when there are two groups for the one-way ANOVA F-test, F=t2 where t is the Students t statistic. Consider two models,1 and 2, where model 1 is nested within model 2, Model 1 is the Restricted model, and Model 2 is the Unrestricted one
17.
International Standard Book Number
–
The International Standard Book Number is a unique numeric commercial book identifier. An ISBN is assigned to each edition and variation of a book, for example, an e-book, a paperback and a hardcover edition of the same book would each have a different ISBN. The ISBN is 13 digits long if assigned on or after 1 January 2007, the method of assigning an ISBN is nation-based and varies from country to country, often depending on how large the publishing industry is within a country. The initial ISBN configuration of recognition was generated in 1967 based upon the 9-digit Standard Book Numbering created in 1966, the 10-digit ISBN format was developed by the International Organization for Standardization and was published in 1970 as international standard ISO2108. Occasionally, a book may appear without a printed ISBN if it is printed privately or the author does not follow the usual ISBN procedure, however, this can be rectified later. Another identifier, the International Standard Serial Number, identifies periodical publications such as magazines, the ISBN configuration of recognition was generated in 1967 in the United Kingdom by David Whitaker and in 1968 in the US by Emery Koltay. The 10-digit ISBN format was developed by the International Organization for Standardization and was published in 1970 as international standard ISO2108, the United Kingdom continued to use the 9-digit SBN code until 1974. The ISO on-line facility only refers back to 1978, an SBN may be converted to an ISBN by prefixing the digit 0. For example, the edition of Mr. J. G. Reeder Returns, published by Hodder in 1965, has SBN340013818 -340 indicating the publisher,01381 their serial number. This can be converted to ISBN 0-340-01381-8, the check digit does not need to be re-calculated, since 1 January 2007, ISBNs have contained 13 digits, a format that is compatible with Bookland European Article Number EAN-13s. An ISBN is assigned to each edition and variation of a book, for example, an ebook, a paperback, and a hardcover edition of the same book would each have a different ISBN. The ISBN is 13 digits long if assigned on or after 1 January 2007, a 13-digit ISBN can be separated into its parts, and when this is done it is customary to separate the parts with hyphens or spaces. Separating the parts of a 10-digit ISBN is also done with either hyphens or spaces, figuring out how to correctly separate a given ISBN number is complicated, because most of the parts do not use a fixed number of digits. ISBN issuance is country-specific, in that ISBNs are issued by the ISBN registration agency that is responsible for country or territory regardless of the publication language. Some ISBN registration agencies are based in national libraries or within ministries of culture, in other cases, the ISBN registration service is provided by organisations such as bibliographic data providers that are not government funded. In Canada, ISBNs are issued at no cost with the purpose of encouraging Canadian culture. In the United Kingdom, United States, and some countries, where the service is provided by non-government-funded organisations. Australia, ISBNs are issued by the library services agency Thorpe-Bowker
18.
JSTOR
–
JSTOR is a digital library founded in 1995. Originally containing digitized back issues of journals, it now also includes books and primary sources. It provides full-text searches of almost 2,000 journals, more than 8,000 institutions in more than 160 countries have access to JSTOR, most access is by subscription, but some older public domain content is freely available to anyone. William G. Bowen, president of Princeton University from 1972 to 1988, JSTOR originally was conceived as a solution to one of the problems faced by libraries, especially research and university libraries, due to the increasing number of academic journals in existence. Most libraries found it prohibitively expensive in terms of cost and space to maintain a collection of journals. By digitizing many journal titles, JSTOR allowed libraries to outsource the storage of journals with the confidence that they would remain available long-term, online access and full-text search ability improved access dramatically. Bowen initially considered using CD-ROMs for distribution, JSTOR was initiated in 1995 at seven different library sites, and originally encompassed ten economics and history journals. JSTOR access improved based on feedback from its sites. Special software was put in place to make pictures and graphs clear, with the success of this limited project, Bowen and Kevin Guthrie, then-president of JSTOR, wanted to expand the number of participating journals. They met with representatives of the Royal Society of London and an agreement was made to digitize the Philosophical Transactions of the Royal Society dating from its beginning in 1665, the work of adding these volumes to JSTOR was completed by December 2000. The Andrew W. Mellon Foundation funded JSTOR initially, until January 2009 JSTOR operated as an independent, self-sustaining nonprofit organization with offices in New York City and in Ann Arbor, Michigan. JSTOR content is provided by more than 900 publishers, the database contains more than 1,900 journal titles, in more than 50 disciplines. Each object is identified by an integer value, starting at 1. In addition to the site, the JSTOR labs group operates an open service that allows access to the contents of the archives for the purposes of corpus analysis at its Data for Research service. This site offers a facility with graphical indication of the article coverage. Users may create focused sets of articles and then request a dataset containing word and n-gram frequencies and they are notified when the dataset is ready and may download it in either XML or CSV formats. The service does not offer full-text, although academics may request that from JSTOR, JSTOR Plant Science is available in addition to the main site. The materials on JSTOR Plant Science are contributed through the Global Plants Initiative and are only to JSTOR
19.
Journal of the American Statistical Association
–
The Journal of the American Statistical Association is the primary journal published by the American Statistical Association, the main professional body for statisticians in the United States. It is published four times a year and it had an impact factor of 2.063 in 2010, tenth highest in the Statistics and Probability category of Journal Citation Reports. In a 2003 survey of statisticians, the Journal of the American Statistical Association was ranked first, among all journals, for Applications of Statistics, the predecessor of this journal started in 1888 with the name Publications of the American Statistical Association. It became Quarterly publications of the American Statistical Association in 1912, Journal of the American Statistical Association