1.
Clinical trial
–
Clinical trials are experiments or observations done in clinical research. Clinical trials generate data on safety and efficacy and they are conducted only after they have received health authority/ethics committee approval in the country where approval of the therapy is sought. These authorities are responsible for vetting the risk/benefit ratio of the trial – their approval does not mean that the therapy is safe or effective, only that the trial may be conducted. Clinical trials can vary in size and cost, and they can involve a research center or multiple centers. Clinical study design aims to ensure the validity and reproducibility of the results. Trials can be costly, depending on a number of factors. The sponsor may be an organization or a pharmaceutical, biotechnology or medical device company. Certain functions necessary to the trial, such as monitoring and lab work, may be managed by an outsourced partner, only 10 percent of all drugs started in human clinical trials become an approved drug. Some clinical trials involve healthy subjects with no pre-existing medical conditions, other clinical trials pertain to patients with specific health conditions who are willing to try an experimental treatment. When participants are healthy volunteers who receive financial incentives, the goals are different than when the participants are sick, during dosing periods, study subjects typically remain under supervision for one to 40 nights. Usually pilot experiments are conducted to gain insights for design of the trial to follow. There are two goals to testing medical treatments, to whether they work well enough, called efficacy or effectiveness. The benefits must outweigh the risks, in the US, the elderly constitute only 14 percent of the population, while they consume over one-third of drugs. Women, children and people with unrelated medical conditions are frequently excluded. For women, a reason for exclusion is the possibility of pregnancy. If the sponsor cannot obtain enough test subjects at one location investigators at other locations are recruited to join the study, during the trial, investigators recruit subjects with the predetermined characteristics, administer the treatment and collect data on the subjects health for a defined time period. The researchers send the data to the sponsor, who then analyzes the pooled data using statistical tests. Except for small, single-location trials, the design and objectives are specified in a document called a clinical trial protocol

2.
Confirmation bias
–
Confirmation bias, also called confirmatory bias or myside bias, is the tendency to search for, interpret, favor, and recall information in a way that confirms ones preexisting beliefs or hypotheses. It is a type of bias and a systematic error of inductive reasoning. People display this bias when they gather or remember information selectively, the effect is stronger for emotionally charged issues and for deeply entrenched beliefs. People also tend to interpret ambiguous evidence as supporting their existing position, biased search, interpretation and memory have been invoked to explain attitude polarization, belief perseverance, the irrational primacy effect and illusory correlation. A series of experiments in the 1960s suggested that people are biased toward confirming their existing beliefs, later work re-interpreted these results as a tendency to test ideas in a one-sided way, focusing on one possibility and ignoring alternatives. In certain situations, this tendency can bias peoples conclusions, explanations for the observed biases include wishful thinking and the limited human capacity to process information. Another explanation is that people show confirmation bias because they are weighing up the costs of being wrong, rather than investigating in a neutral, confirmation biases contribute to overconfidence in personal beliefs and can maintain or strengthen beliefs in the face of contrary evidence. Poor decisions due to these biases have been found in political and organizational contexts, confirmation biases are effects in information processing. Others apply the more broadly to the tendency to preserve ones existing beliefs when searching for evidence, interpreting it. Experiments have found repeatedly that people tend to test hypotheses in a one-sided way, rather than searching through all the relevant evidence, they phrase questions to receive an affirmative answer that supports their theory. They look for the consequences that they would expect if their hypothesis were true, for example, someone using yes/no questions to find a number he or she suspects to be the number 3 might ask, Is it an odd number. People prefer this type of question, called a positive test, would yield exactly the same information. However, this does not mean that people seek tests that guarantee a positive answer, in studies where subjects could select either such pseudo-tests or genuinely diagnostic ones, they favored the genuinely diagnostic. The preference for positive tests in itself is not a bias, however, in combination with other effects, this strategy can confirm existing beliefs or assumptions, independently of whether they are true. In real-world situations, evidence is often complex and mixed, for example, various contradictory ideas about someone could each be supported by concentrating on one aspect of his or her behavior. Thus any search for evidence in favor of a hypothesis is likely to succeed, one illustration of this is the way the phrasing of a question can significantly change the answer. For example, people who are asked, Are you happy with your social life, report greater satisfaction than those asked, Are you unhappy with your social life. Even a small change in a questions wording can affect how people search through available information and this was shown using a fictional child custody case

3.
Analysis of variance
–
In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of groups are equal. Hy, ANOVAs are useful for comparing three or more means for statistical significance and it is conceptually similar to multiple two-sample t-tests, but is more conservative and is therefore suited to a wide range of practical problems. While the analysis of variance reached fruition in the 20th century and these include hypothesis testing, the partitioning of sums of squares, experimental techniques and the additive model. Laplace was performing hypothesis testing in the 1770s, the development of least-squares methods by Laplace and Gauss circa 1800 provided an improved method of combining observations. It also initiated much study of the contributions to sums of squares, Laplace soon knew how to estimate a variance from a residual sum of squares. By 1827 Laplace was using least squares methods to address ANOVA problems regarding measurements of atmospheric tides, before 1800 astronomers had isolated observational errors resulting from reaction times and had developed methods of reducing the errors. An eloquent non-mathematical explanation of the effects model was available in 1885. Ronald Fisher introduced the term variance and proposed its formal analysis in a 1918 article The Correlation Between Relatives on the Supposition of Mendelian Inheritance and his first application of the analysis of variance was published in 1921. Analysis of variance became widely known after being included in Fishers 1925 book Statistical Methods for Research Workers, Randomization models were developed by several researchers. The first was published in Polish by Neyman in 1923, one of the attributes of ANOVA which ensured its early popularity was computational elegance. The structure of the model allows solution for the additive coefficients by simple algebra rather than by matrix calculations. In the era of mechanical calculators this simplicity was critical, the determination of statistical significance also required access to tables of the F function which were supplied by early statistics texts. The analysis of variance can be used as an tool to explain observations. A dog show provides an example, a dog show is not a random sampling of the breed, it is typically limited to dogs that are adult, pure-bred, and exemplary. A histogram of dog weights from a show might plausibly be rather complex, suppose we wanted to predict the weight of a dog based on a certain set of characteristics of each dog. Before we could do that, we would need to explain the distribution of weights by dividing the dog population into groups based on those characteristics. A successful grouping will split dogs such that each group has a low variance of dog weights, in the illustrations to the right, each group is identified as X1, X2, etc

4.
Experiment
–
An experiment is a procedure carried out to support, refute, or validate a hypothesis. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a particular factor is manipulated, experiments vary greatly in goal and scale, but always rely on repeatable procedure and logical analysis of the results. There also exists natural experimental studies, a child may carry out basic experiments to understand gravity, while teams of scientists may take years of systematic investigation to advance their understanding of a phenomenon. Experiments and other types of activities are very important to student learning in the science classroom. Experiments can raise test scores and help a student become more engaged and interested in the material they are learning, experiments can vary from personal and informal natural comparisons, to highly controlled. Uses of experiments vary considerably between the natural and human sciences, experiments typically include controls, which are designed to minimize the effects of variables other than the single independent variable. This increases the reliability of the results, often through a comparison between control measurements and the other measurements, scientific controls are a part of the scientific method. Ideally, all variables in an experiment are controlled and none are uncontrolled, in such an experiment, if all controls work as expected, it is possible to conclude that the experiment works as intended, and that results are due to the effect of the tested variable. In the scientific method, an experiment is a procedure that arbitrates between competing models or hypotheses. Researchers also use experimentation to test existing theories or new hypotheses to support or disprove them, an experiment usually tests a hypothesis, which is an expectation about how a particular process or phenomenon works. However, an experiment may also aim to answer a question, without a specific expectation about what the experiment reveals. If an experiment is conducted, the results usually either support or disprove the hypothesis. According to some philosophies of science, an experiment can never prove a hypothesis, on the other hand, an experiment that provides a counterexample can disprove a theory or hypothesis. An experiment must also control the possible confounding factors—any factors that would mar the accuracy or repeatability of the experiment or the ability to interpret the results, confounding is commonly eliminated through scientific controls and/or, in randomized experiments, through random assignment. In engineering and the sciences, experiments are a primary component of the scientific method. They are used to test theories and hypotheses about how physical processes work under particular conditions, typically, experiments in these fields focus on replication of identical procedures in hopes of producing identical results in each replication. In medicine and the sciences, the prevalence of experimental research varies widely across disciplines. In contrast to norms in the sciences, the focus is typically on the average treatment effect or another test statistic produced by the experiment

5.
Design of experiments
–
The design of experiments is the design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation. In its simplest form, an experiment aims at predicting the outcome by introducing a change of the preconditions, the change in the predictor is generally hypothesized to result in a change in the second variable, hence called the outcome variable. Main concerns in design include the establishment of validity, reliability. Related concerns include achieving appropriate levels of power and sensitivity. Correctly designed experiments advance knowledge in the natural and social sciences, other applications include marketing and policy making. In 1747, while serving as surgeon on HMS Salisbury, James Lind carried out a clinical trial to compare remedies for scurvy. This systematic clinical trial constitutes a type of DOE, Lind selected 12 men from the ship, all suffering from scurvy. Lind limited his subjects to men who were as similar as I could have them and he divided them into six pairs, giving each pair different supplements to their basic diet for two weeks. The treatments were all remedies that had proposed, A quart of cider every day. Twenty five gutts of vitriol three times a day upon an empty stomach, one half-pint of seawater every day. A mixture of garlic, mustard, and horseradish in a lump the size of a nutmeg, two spoonfuls of vinegar three times a day. Two oranges and one every day. The citrus treatment stopped after six days when they ran out of fruit, apart from that, only group one showed some effect of its treatment. The remainder of the crew served as a control. Charles S. Peirce randomly assigned volunteers to a blinded, repeated-measures design to evaluate their ability to discriminate weights, peirces experiment inspired other researchers in psychology and education, which developed a research tradition of randomized experiments in laboratories and specialized textbooks in the 1800s. Charles S. Peirce also contributed the first English-language publication on a design for regression models in 1876. A pioneering optimal design for regression was suggested by Gergonne in 1815. In 1918 Kirstine Smith published optimal designs for polynomials of degree six, herman Chernoff wrote an overview of optimal sequential designs, while adaptive designs have been surveyed by S. Zacks

6.
Between-group design
–
In the design of experiments, a between-group design is an experiment that has two or more groups of subjects each being tested by a different testing factor simultaneously. This design is used in place of, or in some cases in conjunction with, the within-subject design. In order to avoid bias, experimental blinds are usually applied in between-group designs. The most commonly used type is the blind, which keeps the subjects blind without identifying them as members of the treatment group or the control group. In a single-blind experiment, a placebo is usually offered to the group members. Occasionally, the blind, a more secure way to avoid bias from both the subjects and the testers, is implemented. In this case, both the subjects and the testers are unaware of which group belong to. The double blind design can protect the experiment from the observer-expectancy effect, the utilization of the between-group experimental design has several advantages. First, multiple variables, or multiple levels of a variable, can be tested simultaneously, and with enough testing subjects, thus, the inquiry is broadened and extended beyond the effect of one variable. Additionally, this design saves a great deal of time, which is if the results aid in a time-sensitive issue. The main disadvantage with between-group designs is that they can be complex and often require a number of participants to generate any useful. For example, researchers testing the effectiveness of a treatment for severe depression might need two groups of twenty patients for a control and a test group, if they wanted to add another treatment to the research, they would need another group of twenty patients. The potential scale of these experiments can make between-group designs impractical due to limited resources, subjects, another major concern for between-group designs is bias. Assignment bias, observer-expectancy and subject-expectancy biases are common causes for skewed data results in between-group experiments, some other disadvantages for between-group designs are generalization, individual variability and environmental factors. Whilst it is easy to try to select subjects of the age, gender and background. At the same time, the lack of homogeneity within a group due to individual variability may also produce unreliable results and obscure genuine patterns, environmental variables can also influence results and usually arise from poor research design. A practice effect is the change resulting from repeated testing. Some research has been done regarding whether it is possible to design an experiment that combines within-subject design and between-group design, a way to design psychological experiments using both designs exists and is sometimes known as mixed factorial design

7.
Case-control study
–
A case-control study is a type of observational study in which two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute. They require fewer resources but provide less evidence for causal inference than a randomized controlled trial, the case-control is a type of epidemiological observational study. If a larger proportion of the cases smoke than the controls, that suggests, but does not conclusively show, the case-control study is frequently contrasted with cohort studies, wherein exposed and unexposed subjects are observed until they develop an outcome of interest. Controls need not be in health, inclusion of sick people is sometimes appropriate. Controls should come from the population as the cases. Controls can carry the disease as the experimental group, but of another grade/severity. However, because the difference between the cases and the controls will be smaller, this results in a power to detect an exposure effect. As with any epidemiological study, greater numbers in the study will increase the power of the study, numbers of cases and controls do not have to be equal. In many situations, it is easier to recruit controls than to find cases. Increasing the number of controls above the number of cases, up to a ratio of about 4 to 1 and they have pointed the way to a number of important discoveries and advances. The case-control study design is used in the study of rare diseases or as a preliminary study where little is known about the association between the risk factor and disease of interest. Compared to prospective cohort studies they tend to be less costly, in several situations they have greater statistical power than cohort studies, which must often wait for a sufficient number of disease events to accrue. Case-control studies are observational in nature and thus do not provide the level of evidence as randomized controlled trials. The results may be confounded by other factors, to the extent of giving the answer to better studies. A meta-analysis of what were considered 30 high-quality studies concluded that use of a product halved a risk, the most important drawback in case-control studies relates to the difficulty of obtaining reliable information about an individual’s exposure status over time. Case-control studies are placed low in the hierarchy of evidence. One of the most significant triumphs of the study was the demonstration of the link between tobacco smoking and lung cancer, by Richard Doll and Bradford Hill. They showed a significant association in a large case-control study

8.
Fisher information
–
In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior. The role of the Fisher information in the theory of maximum-likelihood estimation was emphasized by the statistician Ronald Fisher. The Fisher information is used in the calculation of the Jeffreys prior. The Fisher-information matrix is used to calculate the covariance matrices associated with maximum-likelihood estimates and it can also be used in the formulation of test statistics, such as the Wald test. Statistical systems of a scientific nature whose likelihood functions obey shift invariance have been shown to obey maximum Fisher information, the level of the maximum depends upon the nature of the system constraints. The Fisher information is a way of measuring the amount of information that a random variable X carries about an unknown parameter θ upon which the probability of X depends. The probability function for X, which is also the function for θ, is a function f. The partial derivative with respect to θ of the logarithm of the likelihood function is called the score. A random variable carrying high Fisher information implies that the value of the score is often high. The Fisher information is not a function of a particular observation, since the expectation of the score is zero, the Fisher information is also the variance of the score. Thus, the Fisher information is the negative of the expectation of the derivative with respect to θ of the natural logarithm of f. Information may be seen to be a measure of the curvature of the curve near the maximum likelihood estimate of θ. Information is additive, in that the information yielded by two independent experiments is the sum of the information from each experiment separately, I X, Y = I X + I Y. This result follows from the fact that if random variables are independent. In particular, the information in a sample of size n is n times that in a sample of size 1. The information provided by a sufficient statistic is the same as that of the sample X and this may be seen by using Neymans factorization criterion for a sufficient statistic. If T is sufficient for θ, then f = g h for some functions g and h, see sufficient statistic for a more detailed explanation. The equality of information then follows from the fact, ∂ ∂ θ log = ∂ ∂ θ log which follows from the definition of Fisher information

9.
Interaction (statistics)
–
Most commonly, interactions are considered in the context of regression analyses. The presence of interactions can have important implications for the interpretation of statistical models, if two variables of interest interact, the relationship between each of the interacting variables and a third dependent variable depends on the value of the other interacting variable. In practice, this makes it difficult to predict the consequences of changing the value of a variable. An interaction variable is a variable constructed from a set of variables to try to represent either all of the interaction present or some part of it. Often, models are presented without the term d, but this confounds the main effect. A simple setting in which interactions can arise is a two-factor experiment analyzed using Analysis of Variance, suppose we have two binary factors A and B. For example, these factors might indicate either of two treatments were administered to a patient, with the treatments applied either singly, or in combination. We can then consider the average treatment response for each patient, the following table shows one possible situation, In this example, there is no interaction between the two treatments — their effects are additive. In contrast, if the following average responses are observed then there is an interaction between the treatments — their effects are not additive, similar observations are made for this particular example in the next section. In many applications it is useful to distinguish between qualitative and quantitative interactions. A quantitative interaction between A and B refers to a situation where the magnitude of the effect of B depends on the value of A, but the direction of the effect of B is constant for all A. A qualitative interaction between A and B refers to a situation where both the magnitude and direction of each variables effect can depend on the value of the other variable. The table of means on the left, below, shows a quantitative interaction — treatment A is beneficial both when B is given, and when B is not given, but the benefit is greater when B is not given. The table of means on the shows a qualitative interaction. A is harmful when B is given, but it is beneficial when B is not given, note that the same interpretation would hold if we consider the benefit of B based on whether A is given. The distinction between qualitative and quantitative interactions depends on the order in which the variables are considered. In its simplest form, the assumption of treatment unit additivity states that the observed response yij from experimental unit i when receiving treatment j can be written as the sum yij = yi + tj. The assumption of treatment additivity implies that every treatment has exactly the same additive effect on each experimental unit