1.
Normal distribution
–
In probability theory, the normal distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are used in the natural and social sciences to represent real-valued random variables whose distributions are not known. The normal distribution is useful because of the limit theorem. Physical quantities that are expected to be the sum of independent processes often have distributions that are nearly normal. Moreover, many results and methods can be derived analytically in explicit form when the relevant variables are normally distributed, the normal distribution is sometimes informally called the bell curve. However, many other distributions are bell-shaped, the probability density of the normal distribution is, f =12 π σ2 e −22 σ2 Where, μ is mean or expectation of the distribution. σ is standard deviation σ2 is variance A random variable with a Gaussian distribution is said to be distributed and is called a normal deviate. The simplest case of a distribution is known as the standard normal distribution. The factor 1 /2 in the exponent ensures that the distribution has unit variance and this function is symmetric around x =0, where it attains its maximum value 1 /2 π and has inflection points at x = +1 and x = −1. Authors may differ also on which normal distribution should be called the standard one, the probability density must be scaled by 1 / σ so that the integral is still 1. If Z is a normal deviate, then X = Zσ + μ will have a normal distribution with expected value μ. Conversely, if X is a normal deviate, then Z = /σ will have a standard normal distribution. Every normal distribution is the exponential of a function, f = e a x 2 + b x + c where a is negative. In this form, the mean value μ is −b/, for the standard normal distribution, a is −1/2, b is zero, and c is − ln /2. The standard Gaussian distribution is denoted with the Greek letter ϕ. The alternative form of the Greek phi letter, φ, is used quite often. The normal distribution is often denoted by N. Thus when a random variable X is distributed normally with mean μ and variance σ2, some authors advocate using the precision τ as the parameter defining the width of the distribution, instead of the deviation σ or the variance σ2
2.
Probability
–
Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1, the higher the probability of an event, the more certain that the event will occur. A simple example is the tossing of a fair coin, since the coin is unbiased, the two outcomes are both equally probable, the probability of head equals the probability of tail. Since no other outcomes are possible, the probability is 1/2 and this type of probability is also called a priori probability. Probability theory is used to describe the underlying mechanics and regularities of complex systems. For example, tossing a coin twice will yield head-head, head-tail, tail-head. The probability of getting an outcome of head-head is 1 out of 4 outcomes or 1/4 or 0.25 and this interpretation considers probability to be the relative frequency in the long run of outcomes. A modification of this is propensity probability, which interprets probability as the tendency of some experiment to yield a certain outcome, subjectivists assign numbers per subjective probability, i. e. as a degree of belief. The degree of belief has been interpreted as, the price at which you would buy or sell a bet that pays 1 unit of utility if E,0 if not E. The most popular version of subjective probability is Bayesian probability, which includes expert knowledge as well as data to produce probabilities. The expert knowledge is represented by some prior probability distribution and these data are incorporated in a likelihood function. The product of the prior and the likelihood, normalized, results in a probability distribution that incorporates all the information known to date. The scientific study of probability is a development of mathematics. Gambling shows that there has been an interest in quantifying the ideas of probability for millennia, there are reasons of course, for the slow development of the mathematics of probability. Whereas games of chance provided the impetus for the study of probability. According to Richard Jeffrey, Before the middle of the century, the term probable meant approvable. A probable action or opinion was one such as people would undertake or hold. However, in legal contexts especially, probable could also apply to propositions for which there was good evidence, the sixteenth century Italian polymath Gerolamo Cardano demonstrated the efficacy of defining odds as the ratio of favourable to unfavourable outcomes
3.
Statistics
–
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e. g. a scientific, industrial, or social problem, populations can be diverse topics such as all people living in a country or every atom composing a crystal. Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys, statistician Sir Arthur Lyon Bowley defines statistics as Numerical statements of facts in any department of inquiry placed in relation to each other. When census data cannot be collected, statisticians collect data by developing specific experiment designs, representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. In contrast, an observational study does not involve experimental manipulation, inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two data sets, or a data set and a synthetic data drawn from idealized model. A hypothesis is proposed for the relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the hypothesis is done using statistical tests that quantify the sense in which the null can be proven false. Working from a hypothesis, two basic forms of error are recognized, Type I errors and Type II errors. Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis, measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random or systematic, the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics continues to be an area of research, for example on the problem of how to analyze Big data. Statistics is a body of science that pertains to the collection, analysis, interpretation or explanation. Some consider statistics to be a mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty, mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure-theoretic probability theory. In applying statistics to a problem, it is practice to start with a population or process to be studied. Populations can be diverse topics such as all living in a country or every atom composing a crystal. Ideally, statisticians compile data about the entire population and this may be organized by governmental statistical institutes
4.
Percentile
–
A percentile is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value below which 20% of the observations may be found, the term percentile and the related term percentile rank are often used in the reporting of scores from norm-referenced tests. For example, if a score is at the 86th percentile, the 25th percentile is also known as the first quartile, the 50th percentile as the median or second quartile, and the 75th percentile as the third quartile. In general, percentiles and quartiles are specific types of quantiles, when ISPs bill burstable internet bandwidth, the 95th or 98th percentile usually cuts off the top 5% or 2% of bandwidth peaks in each month, and then bills at the nearest rate. In this way infrequent peaks are ignored, and the customer is charged in a fairer way, the reason this statistic is so useful in measuring data throughput is that it gives a very accurate picture of the cost of the bandwidth. The 95th percentile says that 95% of the time, the usage is below this amount, just the same, the remaining 5% of the time, the usage is above that amount. Physicians will often use infant and childrens weight and height to assess their growth in comparison to national averages and percentiles which are found in growth charts. The 85th percentile speed of traffic on a road is used as a guideline in setting speed limits. The methods given in the Definitions section are approximations for use in small-sample statistics, in general terms, for very large populations following a normal distribution, percentiles may often be represented by reference to a normal curve plot. The normal distribution is plotted along an axis scaled to standard deviations, mathematically, the normal distribution extends to negative infinity on the left and positive infinity on the right. Note, however, that only a small proportion of individuals in a population will fall outside the −3 to +3 range. For example, with human heights very few people are above the +3 sigma height level, Percentiles represent the area under the normal curve, increasing from left to right. Each standard deviation represents a fixed percentile and this is related to the 68–95–99.7 rule or the three-sigma rule. There is no definition of percentile, however all definitions yield similar results when the number of observations is very large. Some methods for calculating the percentiles are given below and this is obtained by first calculating the ordinal rank and then taking the value from the ordered list that corresponds to that rank. A percentile calculated using the Nearest Rank method will always be a member of the ordered list. The 100th percentile is defined to be the largest value in the ordered list, Example 1, Consider the ordered list, which contains five data values. What are the 5th, 30th, 40th, 50th and 100th percentiles of this list using the Nearest Rank method
5.
Standard deviation
–
In statistics, the standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of data values. The standard deviation of a variable, statistical population, data set. It is algebraically simpler, though in practice less robust, than the absolute deviation. A useful property of the deviation is that, unlike the variance. There are also other measures of deviation from the norm, including mean absolute deviation, in addition to expressing the variability of a population, the standard deviation is commonly used to measure confidence in statistical conclusions. For example, the margin of error in polling data is determined by calculating the standard deviation in the results if the same poll were to be conducted multiple times. This derivation of a deviation is often called the standard error of the estimate or standard error of the mean when referring to a mean. It is computed as the deviation of all the means that would be computed from that population if an infinite number of samples were drawn. It is very important to note that the deviation of a population. The reported margin of error of a poll is computed from the error of the mean and is typically about twice the standard deviation—the half-width of a 95 percent confidence interval. The standard deviation is also important in finance, where the standard deviation on the rate of return on an investment is a measure of the volatility of the investment. For a finite set of numbers, the deviation is found by taking the square root of the average of the squared deviations of the values from their average value. For example, the marks of a class of eight students are the eight values,2,4,4,4,5,5,7,9. These eight data points have the mean of 5,2 +4 +4 +4 +5 +5 +7 +98 =5 and this formula is valid only if the eight values with which we began form the complete population. If the values instead were a sample drawn from some large parent population. In that case the result would be called the standard deviation. Dividing by n −1 rather than by n gives an estimate of the variance of the larger parent population. This is known as Bessels correction, as a slightly more complicated real-life example, the average height for adult men in the United States is about 70 inches, with a standard deviation of around 3 inches
6.
Mean
–
In mathematics, mean has several different definitions depending on the context. An analogous formula applies to the case of a probability distribution. Not every probability distribution has a mean, see the Cauchy distribution for an example. Moreover, for some distributions the mean is infinite, for example, the arithmetic mean of a set of numbers x1, x2. Xn is typically denoted by x ¯, pronounced x bar, if the data set were based on a series of observations obtained by sampling from a statistical population, the arithmetic mean is termed the sample mean to distinguish it from the population mean. For a finite population, the mean of a property is equal to the arithmetic mean of the given property while considering every member of the population. For example, the mean height is equal to the sum of the heights of every individual divided by the total number of individuals. The sample mean may differ from the mean, especially for small samples. The law of large numbers dictates that the larger the size of the sample, outside of probability and statistics, a wide range of other notions of mean are often used in geometry and analysis, examples are given below. The geometric mean is an average that is useful for sets of numbers that are interpreted according to their product. X ¯ =1 n For example, the mean of five values,4,36,45,50,75 is,1 /5 =243000005 =30. The harmonic mean is an average which is useful for sets of numbers which are defined in relation to some unit, for example speed. AM, GM, and HM satisfy these inequalities, A M ≥ G M ≥ H M Equality holds if, in descriptive statistics, the mean may be confused with the median, mode or mid-range, as any of these may be called an average. The mean of a set of observations is the average of the values, however, for skewed distributions. For example, mean income is typically skewed upwards by a number of people with very large incomes. By contrast, the income is the level at which half the population is below. The mode income is the most likely income, and favors the larger number of people with lower incomes, the mean of a probability distribution is the long-run arithmetic average value of a random variable having that distribution. In this context, it is known as the expected value
7.
Central limit theorem
–
If this procedure is performed many times, the central limit theorem says that the computed values of the average will be distributed according to the normal distribution. The central limit theorem has a number of variants, in its common form, the random variables must be identically distributed. In variants, convergence of the mean to the normal distribution also occurs for non-identical distributions or for non-independent observations, in more general usage, a central limit theorem is any of a set of weak-convergence theorems in probability theory. When the variance of the i. i. d, Variables is finite, the attractor distribution is the normal distribution. In contrast, the sum of a number of i. i. d, Random variables with power law tail distributions decreasing as | x |−α −1 where 0 < α <2 will tend to an alpha-stable distribution with stability parameter of α as the number of variables grows. Suppose we are interested in the sample average S n, = X1 + ⋯ + X n n of these random variables, by the law of large numbers, the sample averages converge in probability and almost surely to the expected value µ as n → ∞. The classical central limit theorem describes the size and the form of the stochastic fluctuations around the deterministic number µ during this convergence. For large enough n, the distribution of Sn is close to the distribution with mean µ. The usefulness of the theorem is that the distribution of √n approaches normality regardless of the shape of the distribution of the individual Xi, formally, the theorem can be stated as follows, Lindeberg–Lévy CLT. Suppose is a sequence of i. i. d, Random variables with E = µ and Var = σ2 < ∞. Then as n approaches infinity, the random variables √n converge in distribution to a normal N, n → d N. Note that the convergence is uniform in z in the sense that lim n → ∞ sup z ∈ R | Pr − Φ | =0, the theorem is named after Russian mathematician Aleksandr Lyapunov. In this variant of the limit theorem the random variables Xi have to be independent. The theorem also requires that random variables | Xi | have moments of order. Suppose is a sequence of independent random variables, each with finite expected value μi, in practice it is usually easiest to check Lyapunov’s condition for δ =1. If a sequence of random variables satisfies Lyapunov’s condition, then it also satisfies Lindeberg’s condition, the converse implication, however, does not hold. In the same setting and with the notation as above. Suppose that for every ε >0 lim n → ∞1 s n 2 ∑ i =1 n E =0 where 1 is the indicator function
8.
Confidence interval
–
In statistics, a confidence interval is a type of interval estimate of a population parameter. It is an interval, in principle different from sample to sample. How frequently the observed interval contains the true parameter if the experiment is repeated is called the confidence level, whereas two-sided confidence limits form a confidence interval, and one-sided limits are referred to as lower/upper confidence bounds. Confidence intervals consist of a range of values that act as good estimates of the population parameter. However, the interval computed from a sample does not necessarily include the true value of the parameter. After any particular sample is taken, the parameter is either in the interval or not. Since the observed data are random samples from the true population, the 99% confidence level means that 99% of the intervals obtained from such samples will contain the true parameter. The desired level of confidence is set by the researcher, If a corresponding hypothesis test is performed, the confidence level is the complement of the level of significance, i. e. a 95% confidence interval reflects a significance level of 0.05. The confidence interval contains the values that, when tested. Confidence intervals of difference parameters not containing 0 imply that there is a significant difference between the populations. In applied practice, confidence intervals are typically stated at the 95% confidence level, however, when presented graphically, confidence intervals can be shown at several confidence levels, for example, 90%, 95%, and 99%. Factors affecting the width of the confidence interval include the size of the sample, the level. A larger sample size normally will lead to an estimate of the population parameter. Confidence intervals were introduced to statistics by Jerzy Neyman in a paper published in 1937, Interval estimates can be contrasted with point estimates. A point estimate is a value given as the estimate of a population parameter that is of interest, for example. An interval estimate specifies instead a range within which the parameter is estimated to lie, Confidence intervals are commonly reported in tables or graphs along with point estimates of the same parameters, to show the reliability of the estimates. For example, an interval can be used to describe how reliable survey results are. In a poll of election–voting intentions, the result might be that 40% of respondents intend to vote for a certain party, a 99% confidence interval for the proportion in the whole population having the same intention on the survey might be 30% to 50%
9.
Probability density function
–
In a more precise sense, the PDF is used to specify the probability of the random variable falling within a particular range of values, as opposed to taking on any one value. The probability density function is everywhere, and its integral over the entire space is equal to one. The terms probability distribution function and probability function have also sometimes used to denote the probability density function. However, this use is not standard among probabilists and statisticians, further confusion of terminology exists because density function has also been used for what is here called the probability mass function. In general though, the PMF is used in the context of random variables. Suppose a species of bacteria typically lives 4 to 6 hours, what is the probability that a bacterium lives exactly 5 hours. A lot of bacteria live for approximately 5 hours, but there is no chance that any given bacterium dies at exactly 5.0000000000, instead we might ask, What is the probability that the bacterium dies between 5 hours and 5.01 hours. Lets say the answer is 0.02, next, What is the probability that the bacterium dies between 5 hours and 5.001 hours. The answer is probably around 0.002, since this is 1/10th of the previous interval, the probability that the bacterium dies between 5 hours and 5.0001 hours is probably about 0.0002, and so on. In these three examples, the ratio / is approximately constant, and equal to 2 per hour, for example, there is 0.02 probability of dying in the 0. 01-hour interval between 5 and 5.01 hours, and =2 hour−1. This quantity 2 hour−1 is called the probability density for dying at around 5 hours, therefore, in response to the question What is the probability that the bacterium dies at 5 hours. A literally correct but unhelpful answer is 0, but an answer can be written as dt. This is the probability that the bacterium dies within a window of time around 5 hours. For example, the probability that it lives longer than 5 hours, there is a probability density function f with f =2 hour−1. The integral of f over any window of time is the probability that the dies in that window. A probability density function is most commonly associated with absolutely continuous univariate distributions, a random variable X has density fX, where fX is a non-negative Lebesgue-integrable function, if, Pr = ∫ a b f X d x. That is, f is any function with the property that. In the continuous univariate case above, the measure is the Lebesgue measure
10.
Ronald Fisher
–
Sir Ronald Aylmer Fisher FRS, who published as R. A. Fisher, was an English statistician and biologist who used mathematics to combine Mendelian genetics and natural selection. This helped to create the new Darwinist synthesis of evolution known as the evolutionary synthesis. He was also a prominent eugenicist in the part of his life. He is known as one of the three founders of population genetics. He outlined Fishers principle as well as the Fisherian runaway and sexy son hypothesis theories of sexual selection and he also made important contributions to statistics, including the maximum likelihood, fiducial inference, the derivation of various sampling distributions among many others. Anders Hald called him a genius who almost single-handedly created the foundations for modern statistical science, not only was he the most original and constructive of the architects of the neo-Darwinian synthesis, Fisher also was the father of modern statistics and experimental design. He therefore could be said to have provided researchers in biology and medicine with their most important research tools, geoffrey Miller said of him To biologists, he was an architect of the modern synthesis that used mathematical models to integrate Mendelian genetics with Darwins selection theories. To psychologists, Fisher was the inventor of various tests that are still supposed to be used whenever possible in psychology journals. To farmers, Fisher was the founder of agricultural research. Fisher was born in East Finchley in London, England, one of twins with the other being still-born, from 1896 until 1904 they lived at Inverforth House in London, where English Heritage installed a blue plaque in 2002, before moving to Streatham. He entered Harrow School age 14 and won the schools Neeld Medal in mathematics, in 1909, he won a scholarship to Gonville and Caius College, Cambridge. In 1919 he began working at Rothamsted Research and his fame grew and he began to travel and lecture widely. In 1937, he visited the Indian Statistical Institute in Calcutta, mahalanobis, often returning to encourage its development, being the guest of honour at its 25th anniversary in 1957 when it had 2000 employees. His marriage disintegrated during World War II and his oldest son George and his daughter and one of his biographers, Joan, married the noted statistician George E. P. Box. Fisher gained a scholarship to study Mathematics at the University of Cambridge in 1909, in 1915 he published a paper The evolution of sexual preference on sexual selection and mate choice. He published The Correlation Between Relatives on the Supposition of Mendelian Inheritance in 1918, in which he introduced the term variance, Joan Box, Fishers biographer and daughter says that Fisher had resolved this problem in 1911. Between 1912 and 1922 Fisher recommended, analyzed and vastly popularized Maximum likelihood, in 1928 Joseph Oscar Irwin began a three-year stint at Rothamsted and became one of the first people to master Fishers innovations. His first application of the analysis of variance was published in 1921 and he pioneered the principles of the design of experiments and the statistics of small samples and the analysis of real data
11.
Statistical Methods for Research Workers
–
Statistical Methods for Research Workers is a classic 1925 book on statistics by the statistician R. A. It is considered by some to be one of the 20th centurys most influential books on statistical methods and his Design of Experiments statistical technique and application. In that book he emphasized examples and how to design experiments systematically from a point of view. The mathematical justification of the methods described was not stressed and, indeed, a fact which led H. B. Mann to fill the gaps with a rigorous mathematical treatment in his well-known treatise, Mann. The Design of Experiments The March 1951 issue of the Journal of the American Statistical Association contains articles celebrating the 25th anniversary of the publication of the first edition, edwards R. A. Fisher, Statistical Methods for Research Workers,1925, in I. Grattan-Guinness Landmark Writings in Western Mathematics, Case Studies, 1640-1940, Amsterdam, Elsevier
12.
Cumulative distribution function
–
In the case of a continuous distribution, it gives the area under the probability density function from minus infinity to x. Cumulative distribution functions are used to specify the distribution of multivariate random variables. The probability that X lies in the semi-closed interval (a, b], in the definition above, the less than or equal to sign, ≤, is a convention, not a universally used one, but is important for discrete distributions. The proper use of tables of the binomial and Poisson distributions depends upon this convention, moreover, important formulas like Paul Lévys inversion formula for the characteristic function also rely on the less than or equal formulation. If treating several random variables X, Y. etc. the corresponding letters are used as subscripts while, if treating only one, the subscript is usually omitted. It is conventional to use a capital F for a distribution function, in contrast to the lower-case f used for probability density functions. This applies when discussing general distributions, some specific distributions have their own conventional notation, the CDF of a continuous random variable X can be expressed as the integral of its probability density function ƒX as follows, F X = ∫ − ∞ x f X d t. In the case of a random variable X which has distribution having a discrete component at a value b, P = F X − lim x → b − F X. If FX is continuous at b, this equals zero and there is no discrete component at b, every cumulative distribution function F is non-decreasing and right-continuous, which makes it a càdlàg function. Furthermore, lim x → − ∞ F =0, lim x → + ∞ F =1, the function f is equal to the derivative of F almost everywhere, and it is called the probability density function of the distribution of X. As an example, suppose X is uniformly distributed on the unit interval, then the CDF of X is given by F = {0, x <0 x,0 ≤ x <11, x ≥1. Suppose instead that X takes only the discrete values 0 and 1, then the CDF of X is given by F = {0, x <01 /2,0 ≤ x <11, x ≥1. Sometimes, it is useful to study the question and ask how often the random variable is above a particular level. This is called the cumulative distribution function or simply the tail distribution or exceedance. This has applications in statistical hypothesis testing, for example, because the one-sided p-value is the probability of observing a test statistic at least as extreme as the one observed. Thus, provided that the test statistic, T, has a continuous distribution, in survival analysis, F ¯ is called the survival function and denoted S, while the term reliability function is common in engineering. Properties For a non-negative continuous random variable having an expectation, Markovs inequality states that F ¯ ≤ E x, as x → ∞, F ¯ →0, and in fact F ¯ = o provided that E is finite. This form of illustration emphasises the median and dispersion of the distribution or of the empirical results, if the CDF F is strictly increasing and continuous then F −1, p ∈, is the unique real number x such that F = p
13.
Microsoft Excel
–
Microsoft Excel is a spreadsheet developed by Microsoft for Windows, macOS, Android and iOS. It features calculation, graphing tools, pivot tables, and a programming language called Visual Basic for Applications. It has been a widely applied spreadsheet for these platforms, especially since version 5 in 1993. Excel forms part of Microsoft Office, Microsoft Excel has the basic features of all spreadsheets, using a grid of cells arranged in numbered rows and letter-named columns to organize data manipulations like arithmetic operations. It has a battery of supplied functions to answer statistical, engineering, in addition, it can display data as line graphs, histograms and charts, and with a very limited three-dimensional graphical display. It allows sectioning of data to view its dependencies on various factors for different perspectives, Excel was not designed to be used as a database. Microsoft allows for a number of optional command-line switches to control the manner in which Excel starts, the Windows version of Excel supports programming through Microsofts Visual Basic for Applications, which is a dialect of Visual Basic. Programming with VBA allows spreadsheet manipulation that is awkward or impossible with standard spreadsheet techniques, programmers may write code directly using the Visual Basic Editor, which includes a window for writing code, debugging code, and code module organization environment. A common and easy way to generate VBA code is by using the Macro Recorder, the Macro Recorder records actions of the user and generates VBA code in the form of a macro. These actions can then be repeated automatically by running the macro, the macros can also be linked to different trigger types like keyboard shortcuts, a command button or a graphic. The actions in the macro can be executed from these types or from the generic toolbar options. The VBA code of the macro can also be edited in the VBE, advanced users can employ user prompts to create an interactive program, or react to events such as sheets being loaded or changed. Macro Recorded code may not be compatible between Excel versions, some code that is used in Excel 2010 can not be used in Excel 2003. Making a Macro that changes the colors and making changes to other aspects of cells may not be backward compatible. User-created VBA subroutines execute these actions and operate like macros generated using the macro recorder, from its first version Excel supported end user programming of macros and user defined functions.0. Beginning with version 5.0 Excel recorded macros in VBA by default, after version 5.0 that option was discontinued. All versions of Excel, including Excel 2010 are capable of running an XLM macro, Excel supports charts, graphs, or histograms generated from specified groups of cells. The generated graphic component can either be embedded within the current sheet and these displays are dynamically updated if the content of cells change
14.
MATLAB
–
MATLAB is a multi-paradigm numerical computing environment and fourth-generation programming language. Although MATLAB is intended primarily for numerical computing, an optional toolbox uses the MuPAD symbolic engine, an additional package, Simulink, adds graphical multi-domain simulation and model-based design for dynamic and embedded systems. In 2004, MATLAB had around one million users across industry, MATLAB users come from various backgrounds of engineering, science, and economics. Cleve Moler, the chairman of the science department at the University of New Mexico. He designed it to give his students access to LINPACK and EISPACK without them having to learn Fortran and it soon spread to other universities and found a strong audience within the applied mathematics community. Jack Little, an engineer, was exposed to it during a visit Moler made to Stanford University in 1983, recognizing its commercial potential, he joined with Moler and Steve Bangert. They rewrote MATLAB in C and founded MathWorks in 1984 to continue its development and these rewritten libraries were known as JACKPAC. In 2000, MATLAB was rewritten to use a set of libraries for matrix manipulation. MATLAB was first adopted by researchers and practitioners in control engineering, Littles specialty and it is now also used in education, in particular the teaching of linear algebra, numerical analysis, and is popular amongst scientists involved in image processing. The MATLAB application is built around the MATLAB scripting language, common usage of the MATLAB application involves using the Command Window as an interactive mathematical shell or executing text files containing MATLAB code. Variables are defined using the assignment operator, =, MATLAB is a weakly typed programming language because types are implicitly converted. It is a typed language because variables can be assigned without declaring their type, except if they are to be treated as symbolic objects. Values can come from constants, from computation involving values of other variables, for example, A simple array is defined using the colon syntax, init, increment, terminator. For instance, defines a variable named array which is an array consisting of the values 1,3,5,7 and that is, the array starts at 1, increments with each step from the previous value by 2, and stops once it reaches 9. The increment value can actually be left out of this syntax, assigns to the variable named ari an array with the values 1,2,3,4, and 5, since the default value of 1 is used as the incrementer. Indexing is one-based, which is the convention for matrices in mathematics, although not for some programming languages such as C, C++. Matrices can be defined by separating the elements of a row with blank space or comma, the list of elements should be surrounded by square brackets. Parentheses, are used to access elements and subarrays, sets of indices can be specified by expressions such as 2,4, which evaluates to
15.
R (programming language)
–
R is an open source programming language and software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing. The R language is used among statisticians and data miners for developing statistical software. Polls, surveys of data miners, and studies of scholarly literature databases show that Rs popularity has increased substantially in recent years, while R has a command line interface, there are several graphical front-ends available. R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme, S was created by John Chambers while at Bell Labs. There are some important differences, but much of the code written for S runs unaltered. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, R is named partly after the first names of the first two R authors and partly as a play on the name of S. The project was conceived in 1992, with a version released in 1995. R is easily extensible through functions and extensions, and the R community is noted for its contributions in terms of packages. Many of Rs standard functions are written in R itself, which makes it easy for users to follow the algorithmic choices made, for computationally intensive tasks, C, C++, and Fortran code can be linked and called at run time. Advanced users can write C, C++, Java. NET or Python code to manipulate R objects directly, R is highly extensible through the use of user-submitted packages for specific functions or specific areas of study. Due to its S heritage, R has stronger object-oriented programming facilities than most statistical computing languages, extending R is also eased by its lexical scoping rules. Another strength of R is static graphics, which can produce publication-quality graphs, dynamic and interactive graphics are available through additional packages. R has Rd, its own LaTeX-like documentation format, which is used to supply comprehensive documentation, R is an interpreted language, users typically access it through a command-line interpreter. If a user types 2+2 at the R command prompt and presses enter, Rs data structures include vectors, matrices, arrays, data frames and lists. Rs extensible object system includes objects for, regression models, time-series, the scalar data type was never a data structure of R. Instead, a scalar is represented as a vector with length one. R supports procedural programming with functions and, for some functions, a generic function acts differently depending on the classes of arguments passed to it. In other words, the generic function dispatches the function specific to that class of object, for example, R has a generic print function that can print almost every class of object in R with a simple print syntax. Arrays are stored in column-major order, the capabilities of R are extended through user-created packages, which allow specialized statistical techniques, graphical devices, import/export capabilities, reporting tools, etc
16.
SciPy
–
SciPy is an open source Python library used for scientific computing and technical computing. SciPy builds on the NumPy array object and is part of the NumPy stack which includes tools like Matplotlib, pandas, there is an expanding set of scientific computing libraries that are being added to the NumPy stack every day. This NumPy stack has similar users to other such as MATLAB, GNU Octave. The NumPy stack is sometimes referred to as the SciPy stack. SciPy is also a family of conferences for users and developers of these tools, enthought originated the SciPy conference in the United States and continues to sponsor many of the international conferences as well as host the SciPy website. The SciPy library is distributed under the BSD license. It is also supported by Numfocus which is a community foundation for supporting reproducible and accessible science, a typical Python Scientific Computing Environment includes many dedicated software tools. The SciPy package of key algorithms and functions core to Pythons scientific computing capabilities, NumPy provides some functions for linear algebra, Fourier transforms and random number generation, but not with the generality of the equivalent functions in SciPy. NumPy can also be used as an efficient multi-dimensional container of data with arbitrary data-types and this allows NumPy to seamlessly and speedily integrate with a wide variety of databases. Older versions of SciPy used Numeric as a type, which is now deprecated in favor of the newer NumPy array code. In the 1990s, Python was extended to include a type for numerical computing called Numeric. As of 2000, there was a number of extension modules and increasing interest in creating a complete environment for scientific. In 2001, Travis Oliphant, Eric Jones, and Pearu Peterson merged code they had written, the newly created package provided a standard collection of common numerical operations on top of the Numeric array data structure. Since then the SciPy environment has continued to grow with more packages, list of numerical analysis software Comparison of numerical analysis software SageMath Official website NumPy website SciPy Course Outline by Dave Kuhlman Python Scientific Lecture Notes
17.
SAS Institute
–
SAS Institute is an American multinational developer of analytics software based in Cary, North Carolina. SAS develops and markets a suite of software, which helps access, manage, analyze. The company is the worlds largest privately held software business and its software is used by most of the Fortune 500, SAS has developed a model workplace environment and benefits program designed to retain employees, allow them to focus on their work, and reduce operating costs. It provides on-site, subsidized or free healthcare, gyms, daycare and it became an independent, private business led by current CEO James Goodnight and three other project leaders from the university in 1976. SAS grew from $10 million in revenues in 1980 to $1.1 billion by 2000, a larger proportion of these revenues are spent on research and development than at most other software companies, at one point more than double the industry average. The Statistical Analysis System began as a project at North Carolina State Universitys agricultural department and it was originally led by Anthony James Barr in 1966, then joined by NCSU graduate student James Goodnight in 1967 and John Sall in 1973. In the early 1970s, the software was primarily leased to other departments in order to analyze the effect soil, weather. The project was funded by the National Institutes of Health and later by a coalition of university statistics programs called the University Statisticians of the Southern Experiment Stations, by 1976 the software had 100 customers and 300 people attended the first SAS user conference in Kissimmee, Florida that year. Goodnight, Barr, Sall and another participant, Jane Helwig, founded SAS Institute Inc. as a private company on July 1,1976. Barr and Helwig later sold their interest in the company, SAS tradition of polling users for suggestions to improve the software through the SASWare Ballot was adopted during its first year of operation. Many of the companys employee perks, such as fruit, reasonable work hours. In the late 1970s, the company established its first marketing department, SAS started building its current headquarters in a forested area of Cary, North Carolina in 1980. Later that year it started providing on-site daycare in order to keep an employee who was planning on being a stay-at-home mom, by 1984, SAS had begun building a fitness center, medical center, on-site cafe and other facilities. It had also developed some of its other benefits programs, SAS became known as a good place to work and was frequently recognized by national magazines like BusinessWeek, Working Mother and Fortune for its work environment. During the 1980s, SAS was one of Inc, Magazines fastest growing companies in America from 1979 and 1985. It grew more than ten percent per year from $10 million in revenues in 1980 to $1.1 billion by 2000, in 2007, SAS revenue was $2.15 billion, and in 2013 its revenue was $3.02 billion. By the late 1990s, SAS was the largest privately held software company, the Associated Press reported that analysts attributed the growth to aggressive R&D spending. It had the highest ratio of its revenues spent on R&D in the industry for eight years, setting a record of 34 percent of its revenues in 1993, the company began its relationship with Microsoft and development for Windows operating systems in 1989
18.
SPSS
–
SPSS Statistics is a software package used for logical batched and non-batched statistical analysis. Long produced by SPSS Inc. it was acquired by IBM in 2009, the current versions are officially named IBM SPSS Statistics. Companion products in the family are used for survey authoring and deployment, data mining, text analytics. SPSS is a widely used program for statistical analysis in social science and it is also used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations, data miners, and others. The original SPSS manual has been described as one of sociologys most influential books for allowing ordinary researchers to do their own statistical analysis, in addition to statistical analysis, data management and data documentation are features of the base software. Command syntax programming has the benefits of reproducibility, simplifying repetitive tasks, additionally, some complex applications can only be programmed in syntax and are not accessible through the menu structure. The pull-down menu interface also generates command syntax, this can be displayed in the output and they can also be pasted into a syntax file using the paste button present in each menu. Programs can be run interactively or unattended, using the supplied Production Job Facility, additionally a macro language can be used to write command language subroutines. A Python programmability extension can access the information in the dictionary and data. The Python programmability extension, introduced in SPSS14, replaced the less functional SAX Basic scripts for most purposes, in addition, the Python extension allows SPSS to run any of the statistics in the free software package R. From version 14 onwards, SPSS can be driven externally by a Python or a VB. NET program using supplied plug-ins, SPSS Statistics places constraints on internal file structure, data types, data processing, and matching files, which together considerably simplify programming. SPSS datasets have a table structure, where the rows typically represent cases. Only two data types are defined, numeric and text, all data processing occurs sequentially case-by-case through the file. Files can be matched one-to-one and one-to-many, but not many-to-many, in addition to that cases-by-variables structure and processing, there is a separate Matrix session where one can process data as matrices using matrix and linear algebra operations. The graphical user interface has two views which can be toggled by clicking on one of the two tabs in the left of the SPSS Statistics window. The Data View shows a view of the cases and variables. Unlike spreadsheets, the cells can only contain numbers or text. Cells in both views can be edited, defining the file structure and allowing data entry without using command syntax
19.
Stata
–
Stata is a general-purpose statistical software package created in 1985 by StataCorp. Most of its users work in research, especially in the fields of economics, sociology, political science, biomedicine, Statas capabilities include data management, statistical analysis, graphics, simulations, regression, and custom programming. The name Stata is an abbreviation of the words statistics. The FAQ for the forum of Stata insists that the correct English pronunciation of Stata must remain a mystery. Starting with version 8.0, however, Stata has included a user interface which uses menus. This generates code which is displayed, easing the transition to the command line interface. The dataset can be viewed or edited in spreadsheet format, from version 11 on, other commands can be executed while the data browser or editor is opened. Stata can only open a single dataset at any one time, Stata holds the entire dataset in memory, which limits its use with extremely large datasets. The dataset is always rectangular in format, that is, all hold the same number of observations. Stata can import data in a variety of formats and this includes ASCII data formats and spreadsheet formats. Statas proprietary file formats are platform independent, so users of different operating systems can easily exchange datasets, Statas data format has changed over time, although not every Stata release includes a new dataset format. Every version of Stata can read all older dataset formats, thus, the current Stata release can always open datasets that were created with older versions, but older versions cannot read newer format datasets. Stata can read and write SAS XPORT format datasets natively, using the fdause, some other econometric applications, including gretl, can directly import Stata file formats. Stata allows user-written commands, distributed as so-called ado-files, to be downloaded from the internet which are then indistinguishable to the user from the built-in commands. Some user-written commands have later adopted by StataCorp to become part of a subsequent official release after appropriate checking, certification. Stata had an email list from August 1994 which was turned into a web forum in March 2014 and is still called Statalist. StataCorp employees regularly contribute to Statalist and it is maintained by Marcello Pagano of the Harvard School of Public Health, and not by StataCorp itself. Articles about the use of Stata and new commands are published in the quarterly peer-reviewed Stata Journal
20.
Wolfram Mathematica
–
Wolfram Mathematica is a mathematical symbolic computation program, sometimes termed a computer algebra system or program, used in many scientific, engineering, mathematical, and computing fields. It was conceived by Stephen Wolfram and is developed by Wolfram Research of Champaign, the Wolfram Language is the programming language used in Mathematica. The kernel interprets expressions and returns result expressions, all content and formatting can be generated algorithmically or edited interactively. Standard word processing capabilities are supported, including real-time multi-lingual spell-checking, documents can be structured using a hierarchy of cells, which allow for outlining and sectioning of a document and support automatic numbering index creation. Documents can be presented in an environment for presentations. Notebooks and their contents are represented as Mathematica expressions that can be created, modified or analyzed by Mathematica programs or converted to other formats, the front end includes development tools such as a debugger, input completion, and automatic syntax highlighting. Among the alternative front ends is the Wolfram Workbench, an Eclipse based integrated development environment and it provides project-based code development tools for Mathematica, including revision management, debugging, profiling, and testing. There is a plugin for IntelliJ IDEA based IDEs to work with Wolfram Language code which in addition to syntax highlighting can analyse and auto-complete local variables, the Mathematica Kernel also includes a command line front end. Other interfaces include JMath, based on GNU readline and MASH which runs self-contained Mathematica programs from the UNIX command line, version 5.2 added automatic multi-threading when computations are performed on multi-core computers. This release included CPU specific optimized libraries, in addition Mathematica is supported by third party specialist acceleration hardware such as ClearSpeed. Support for CUDA and OpenCL GPU hardware was added in 2010, also, since version 8 it can generate C code, which is automatically compiled by a system C compiler, such as GCC or Microsoft Visual Studio. A free-of-charge version, Wolfram CDF Player, is provided for running Mathematica programs that have saved in the Computable Document Format. It can also view standard Mathematica files, but not run them and it includes plugins for common web browsers on Windows and Macintosh. WebMathematica allows a web browser to act as a front end to a remote Mathematica server and it is designed to allow a user written application to be remotely accessed via a browser on any platform. It may not be used to full access to Mathematica. Due to bandwidth limitations interactive 3D graphics is not fully supported within a web browser, Wolfram Language code can be converted to C code or to an automatically generated DLL. Wolfram Language code can be run on a Wolfram cloud service as a web-app or as an API either on Wolfram-hosted servers or in an installation of the Wolfram Enterprise Private Cloud. Communication with other applications occurs through a protocol called Wolfram Symbolic Transfer Protocol and it allows communication between the Wolfram Mathematica kernel and front-end, and also provides a general interface between the kernel and other applications
21.
Margin of error
–
The margin of error is a statistic expressing the amount of random sampling error in a surveys results. It asserts a likelihood that the result from a sample is close to the one would get if the whole population had been queried. The likelihood of a result being within the margin of error is itself a probability, commonly 95%, though other values are sometimes used. The larger the margin of error, the confidence one should have that the polls reported results are close to the true figures, that is. Margin of error applies whenever a population is incompletely sampled, Margin of error is often used in non-survey contexts to indicate observational error in reporting measured quantities. The latter notation, with the ±, is commonly seen in most other science. The margin of error is defined as the radius of a confidence interval for a particular statistic from a survey. One example is the percent of people who prefer product A versus product B, when a single, global margin of error is reported for a survey, it refers to the maximum margin of error for all reported percentages using the full sample from the survey. If the statistic is a percentage, this margin of error can be calculated as the radius of the confidence interval for a reported percentage of 50%. The margin of error has been described as an absolute quantity, for example, if the true value is 50 percentage points, and the statistic has a confidence interval radius of 5 percentage points, then we say the margin of error is 5 percentage points. As another example, if the value is 50 people. In some cases, the margin of error is not expressed as an absolute quantity, for example, suppose the true value is 50 people, and the statistic has a confidence interval radius of 5 people. If we use the definition, the margin of error would be 5 people. If we use the definition, then we express this absolute margin of error as a percent of the true value. So in this case, the margin of error is 5 people. Often, however, the distinction is not explicitly made, yet usually is apparent from context, like confidence intervals, the margin of error can be defined for any desired confidence level, but usually a level of 90%, 95% or 99% is chosen. This level is the percentage of polls, if repeated with the design and procedure. Along with the level, the sample design for a survey
22.
Probit
–
It has applications in exploratory statistical graphics and specialized regression modeling of binary response variables. Largely because of the limit theorem, the standard normal distribution plays a fundamental role in probability theory. If we consider the fact that the standard normal distribution places 95% of probability between −1.96 and 1.96, and is symmetric around zero, it follows that Φ =0.025 =1 − Φ. The probit function gives the inverse computation, generating a value of an N random variable, continuing the example, probit = −1.96 = − probit . In general, Φ = p and probit = z, the idea of the probit function was published by Chester Ittner Bliss in a 1934 article in Science on how to treat data such as the percentage of a pest killed by a pesticide. Bliss proposed transforming the percentage killed into a probability unit which was related to the modern definition. Such a so-called probit model is important in toxicology, as well as other fields. The method introduced by Bliss was carried forward in Probit Analysis, values tabled by Finney can be derived from probits as defined here by adding a value of 5. This distinction is summarized by Collett, The original definition of a probit was primarily to avoid having to work with negative probits. This definition is used in some quarters, but in the major statistical software packages for what is referred to as probit analysis. It should be observed that probit methodology, including numerical optimization for fitting of probit functions, was introduced before widespread availability of electronic computing, when using tables, it was convenient to have probits uniformly positive. Common areas of application do not require positive probits, if a set of data is actually a sample of a normal distribution, a plot of the values against their probit scores will be approximately linear. Specific deviations from normality such as asymmetry, heavy tails, or bimodality can be diagnosed based on detection of deviations from linearity. The normal distribution CDF and its inverse are not available in closed form, however, the functions are widely available in software for statistics and probability modeling, and in spreadsheets. In Microsoft Excel, for example, the function is available as norm. s. inv. In computing environments where numerical implementations of the error function are available. An example is MATLAB, where a function is available. Other environments directly implement the probit function as is shown in the session in the R programming language
23.
Reference range
–
In health-related fields, a reference range or reference interval is the range of values for a physiologic measurement in healthy persons. It is a basis for comparison for a physician or other health professional to interpret a set of test results for a particular patient, some important reference ranges in medicine are reference ranges for blood tests and reference ranges for urine tests. The standard definition of a reference range originates in what is most prevalent in a group taken from the general population. This is the reference range. However, there are also optimal health ranges and ranges for particular conditions or statuses, values within the reference range are those within the normal distribution and are thus often described as within normal limits. The limits of the distribution are called the upper reference limit or upper limit of normal. In health care–related publishing, style sheets sometimes prefer the word reference over the normal to prevent the nontechnical senses of normal from being conflated with the statistical sense. Values outside a reference range are not necessarily pathologic, and they are not necessarily abnormal in any other than statistically. Nonetheless, they are indicators of probable pathosis, sometimes the underlying cause is obvious, in other cases, challenging differential diagnosis is required to determine what is wrong and thus how to treat it. Reference ranges that are given by this definition are sometimes referred as standard ranges and these are likewise established using reference groups from the healthy population, and are sometimes termed normal ranges or normal values. However, using the term normal may not be appropriate as not everyone outside the interval is abnormal, however, reference ranges may also be established by taking samples from the whole population, with or without diseases and conditions. In some cases, diseased individuals are taken as the population, however, in the real world, neither the population mean nor the population standard deviation are known. They both need to be estimated from a sample, whose size can be designated n, the population standard deviation is estimated by the sample standard deviation and the population mean is estimated by the sample mean. When the sample size is large t 0.975, n −1 ≃2 and this method is often acceptably accurate if the standard deviation, as compared to the mean, is not very large. Thus, the reference range for this example is estimated to be 4.4 to 6.3 mmol/L. Likewise, with calculations, the upper limit of the reference range can be written as 6.3 mmol/L. As a comparison, actual reference ranges used clinically for fasting plasma glucose are estimated to have a limit of approximately 3.8 to 4.0. In reality, biological parameters tend to have a log-normal distribution, thus, the arithmetical normal distribution may be more appropriate to use with small standard deviations for convenience, and the log-normal distribution with large standard deviations
24.
Standard error
–
The standard error is the standard deviation of the sampling distribution of a statistic, most commonly of the mean. Different samples drawn from that population would in general have different values of the sample mean. The relationship with the deviation is defined such that, for a given sample size. As the sample increases, the dispersion of the sample means clusters more closely around the population mean. The term may also be used to refer to an estimate of that standard deviation, the standard error of the mean is the standard deviation of the sample-means estimate of a population mean. This estimate may be compared with the formula for the standard deviation of the sample mean. This formula may be derived from what we know about the variance of a sum of independent random variables. If X1, X2, …, X n are n independent observations from a population that has a mean μ and standard deviation σ, the variance of T / n must be 1 n 2 n σ2 = σ2 n. And the standard deviation of T / n must be σ / n, of course, T / n is the sample mean x ¯. With n =2 the underestimate is about 25%, but for n =6 the underestimate is only 5%, gurland and Tripathi provide a correction and equation for this effect. Sokal and Rohlf give an equation of the factor for small samples of n <20. See unbiased estimation of standard deviation for further discussion, a practical result, Decreasing the uncertainty in a mean value estimate by a factor of two requires acquiring four times as many observations in the sample. Or decreasing standard error by a factor of ten requires a hundred times as many observations, in many practical applications, the true value of σ is unknown. As a result, we need to use a distribution that takes into account that spread of possible σs, when the true underlying distribution is known to be Gaussian, although with unknown σ, then the resulting estimated distribution follows the Student t-distribution. The standard error is the deviation of the Student t-distribution. T-distributions are slightly different from Gaussian, and vary depending on the size of the sample, to estimate the standard error of a student t-distribution it is sufficient to use the sample standard deviation s instead of σ, and we could use this value to calculate confidence intervals. Note, The Students probability distribution is approximated well by the Gaussian distribution when the size is over 100. For such samples one can use the distribution, which is much simpler
25.
International Standard Book Number
–
The International Standard Book Number is a unique numeric commercial book identifier. An ISBN is assigned to each edition and variation of a book, for example, an e-book, a paperback and a hardcover edition of the same book would each have a different ISBN. The ISBN is 13 digits long if assigned on or after 1 January 2007, the method of assigning an ISBN is nation-based and varies from country to country, often depending on how large the publishing industry is within a country. The initial ISBN configuration of recognition was generated in 1967 based upon the 9-digit Standard Book Numbering created in 1966, the 10-digit ISBN format was developed by the International Organization for Standardization and was published in 1970 as international standard ISO2108. Occasionally, a book may appear without a printed ISBN if it is printed privately or the author does not follow the usual ISBN procedure, however, this can be rectified later. Another identifier, the International Standard Serial Number, identifies periodical publications such as magazines, the ISBN configuration of recognition was generated in 1967 in the United Kingdom by David Whitaker and in 1968 in the US by Emery Koltay. The 10-digit ISBN format was developed by the International Organization for Standardization and was published in 1970 as international standard ISO2108, the United Kingdom continued to use the 9-digit SBN code until 1974. The ISO on-line facility only refers back to 1978, an SBN may be converted to an ISBN by prefixing the digit 0. For example, the edition of Mr. J. G. Reeder Returns, published by Hodder in 1965, has SBN340013818 -340 indicating the publisher,01381 their serial number. This can be converted to ISBN 0-340-01381-8, the check digit does not need to be re-calculated, since 1 January 2007, ISBNs have contained 13 digits, a format that is compatible with Bookland European Article Number EAN-13s. An ISBN is assigned to each edition and variation of a book, for example, an ebook, a paperback, and a hardcover edition of the same book would each have a different ISBN. The ISBN is 13 digits long if assigned on or after 1 January 2007, a 13-digit ISBN can be separated into its parts, and when this is done it is customary to separate the parts with hyphens or spaces. Separating the parts of a 10-digit ISBN is also done with either hyphens or spaces, figuring out how to correctly separate a given ISBN number is complicated, because most of the parts do not use a fixed number of digits. ISBN issuance is country-specific, in that ISBNs are issued by the ISBN registration agency that is responsible for country or territory regardless of the publication language. Some ISBN registration agencies are based in national libraries or within ministries of culture, in other cases, the ISBN registration service is provided by organisations such as bibliographic data providers that are not government funded. In Canada, ISBNs are issued at no cost with the purpose of encouraging Canadian culture. In the United Kingdom, United States, and some countries, where the service is provided by non-government-funded organisations. Australia, ISBNs are issued by the library services agency Thorpe-Bowker
26.
PubMed Identifier
–
PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine at the National Institutes of Health maintains the database as part of the Entrez system of information retrieval, from 1971 to 1997, MEDLINE online access to the MEDLARS Online computerized database primarily had been through institutional facilities, such as university libraries. PubMed, first released in January 1996, ushered in the era of private, free, home-, the PubMed system was offered free to the public in June 1997, when MEDLINE searches via the Web were demonstrated, in a ceremony, by Vice President Al Gore. Information about the journals indexed in MEDLINE, and available through PubMed, is found in the NLM Catalog. As of 5 January 2017, PubMed has more than 26.8 million records going back to 1966, selectively to the year 1865, and very selectively to 1809, about 500,000 new records are added each year. As of the date,13.1 million of PubMeds records are listed with their abstracts. In 2016, NLM changed the system so that publishers will be able to directly correct typos. Simple searches on PubMed can be carried out by entering key aspects of a subject into PubMeds search window, when a journal article is indexed, numerous article parameters are extracted and stored as structured information. Such parameters are, Article Type, Secondary identifiers, Language, publication type parameter enables many special features. As these clinical girish can generate small sets of robust studies with considerable precision, since July 2005, the MEDLINE article indexing process extracts important identifiers from the article abstract and puts those in a field called Secondary Identifier. The secondary identifier field is to store numbers to various databases of molecular sequence data, gene expression or chemical compounds. For clinical trials, PubMed extracts trial IDs for the two largest trial registries, ClinicalTrials. gov and the International Standard Randomized Controlled Trial Number Register, a reference which is judged particularly relevant can be marked and related articles can be identified. If relevant, several studies can be selected and related articles to all of them can be generated using the Find related data option, the related articles are then listed in order of relatedness. To create these lists of related articles, PubMed compares words from the title and abstract of each citation, as well as the MeSH headings assigned, using a powerful word-weighted algorithm. The related articles function has been judged to be so precise that some researchers suggest it can be used instead of a full search, a strong feature of PubMed is its ability to automatically link to MeSH terms and subheadings. Examples would be, bad breath links to halitosis, heart attack to myocardial infarction, where appropriate, these MeSH terms are automatically expanded, that is, include more specific terms. Terms like nursing are automatically linked to Nursing or Nursing and this important feature makes PubMed searches automatically more sensitive and avoids false-negative hits by compensating for the diversity of medical terminology. The My NCBI area can be accessed from any computer with web-access, an earlier version of My NCBI was called PubMed Cubby
27.
JSTOR
–
JSTOR is a digital library founded in 1995. Originally containing digitized back issues of journals, it now also includes books and primary sources. It provides full-text searches of almost 2,000 journals, more than 8,000 institutions in more than 160 countries have access to JSTOR, most access is by subscription, but some older public domain content is freely available to anyone. William G. Bowen, president of Princeton University from 1972 to 1988, JSTOR originally was conceived as a solution to one of the problems faced by libraries, especially research and university libraries, due to the increasing number of academic journals in existence. Most libraries found it prohibitively expensive in terms of cost and space to maintain a collection of journals. By digitizing many journal titles, JSTOR allowed libraries to outsource the storage of journals with the confidence that they would remain available long-term, online access and full-text search ability improved access dramatically. Bowen initially considered using CD-ROMs for distribution, JSTOR was initiated in 1995 at seven different library sites, and originally encompassed ten economics and history journals. JSTOR access improved based on feedback from its sites. Special software was put in place to make pictures and graphs clear, with the success of this limited project, Bowen and Kevin Guthrie, then-president of JSTOR, wanted to expand the number of participating journals. They met with representatives of the Royal Society of London and an agreement was made to digitize the Philosophical Transactions of the Royal Society dating from its beginning in 1665, the work of adding these volumes to JSTOR was completed by December 2000. The Andrew W. Mellon Foundation funded JSTOR initially, until January 2009 JSTOR operated as an independent, self-sustaining nonprofit organization with offices in New York City and in Ann Arbor, Michigan. JSTOR content is provided by more than 900 publishers, the database contains more than 1,900 journal titles, in more than 50 disciplines. Each object is identified by an integer value, starting at 1. In addition to the site, the JSTOR labs group operates an open service that allows access to the contents of the archives for the purposes of corpus analysis at its Data for Research service. This site offers a facility with graphical indication of the article coverage. Users may create focused sets of articles and then request a dataset containing word and n-gram frequencies and they are notified when the dataset is ready and may download it in either XML or CSV formats. The service does not offer full-text, although academics may request that from JSTOR, JSTOR Plant Science is available in addition to the main site. The materials on JSTOR Plant Science are contributed through the Global Plants Initiative and are only to JSTOR
28.
Doug Altman
–
Douglas Altman FMedSci is an English statistician best known for his work on improving the reliability and reporting of medical research and for highly cited papers on statistical methodology. Doug Altman graduated in statistics from the University of Bath and his first job was in the Department of Community Medicine at St Thomass Hospital Medical School. He then spent 11 years working for the Medical Research Councils Clinical Research Centre where he worked almost entirely as a consultant in a wide variety of medical areas. In 1998 he was made Professor of Statistics in Medicine by the University of Oxford, Altman is regarded as a leading authority on the execution and reporting of health research, and has played a leading role in establishing better standards. He is also one of the authors of the IDEAL framework for improving surgical research. His textbook Practical Statistics for Medical Research, published in 1991, has sold 50,000 copies in hardback, Altman is the author of over 450 papers in statistical methodology, with 11 being cited over 1,000 times. Among them is one Lancet paper, which has been cited over 23,000 times and is ranked 29th in the Nature/Web of Science Top 100 most-cited research papers of all time. Altman was awarded the Bradford Hill Medal by the Royal Statistical Society for his contributions to statistics in 1997. Altman is also editor in chief of Trials, a Fellow of the Academy of Medical Sciences, Altman, Douglas G. Practical Statistics for Medical Research. Monographs on Statistics and Applied Probability, Douglas G. Altman ISBN 0-412-27630-5 Systematic Reviews in Healthcare, Meta-Analysis in Context. Editors, Douglas G. Altman, Iain Chalmers, Gerd Antes, Michael Bradburn, Mike Clarke, Matthias Egger, ISBN 0-7279-1488-X Statistics With Confidence, Confidence Intervals and Statistical Guidelines. Editors, Douglas G. Altman, David Machin, T. N. Bryant, editors, Douglas G. Altman, Iain Chalmers. ISBN 0-7279-0904-5 Statistics in Practice, Articles Published in the British Medical Journal, editors, Sheila M. Gore, Douglas G. Altman. ISBN 0-7279-0085-4 List of the over 396 articles by Doug Altman available through PubMed, David M, Kenneth FS and Altman DG for the CONSORT Group. Revised recommendations for improving the quality of reports of parallel group randomized trials, Statistical methods for assessing agreement between 2 methods of clinical measurement. A reprint is available HERE BMJ Statistical Notes - A series of articles on the use of statistics by Doug Altman. Measurement in medicine - the analysis of method comparison studies, measuring agreement in method comparison studies. Statistical Methods in Medical Research 8, 135-160, comparing methods of measurement - why plotting difference against standard method is misleading