1.
Statistics
–
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e. g. a scientific, industrial, or social problem, populations can be diverse topics such as all people living in a country or every atom composing a crystal. Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys, statistician Sir Arthur Lyon Bowley defines statistics as Numerical statements of facts in any department of inquiry placed in relation to each other. When census data cannot be collected, statisticians collect data by developing specific experiment designs, representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. In contrast, an observational study does not involve experimental manipulation, inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two data sets, or a data set and a synthetic data drawn from idealized model. A hypothesis is proposed for the relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the hypothesis is done using statistical tests that quantify the sense in which the null can be proven false. Working from a hypothesis, two basic forms of error are recognized, Type I errors and Type II errors. Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis, measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random or systematic, the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics continues to be an area of research, for example on the problem of how to analyze Big data. Statistics is a body of science that pertains to the collection, analysis, interpretation or explanation. Some consider statistics to be a mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty, mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure-theoretic probability theory. In applying statistics to a problem, it is practice to start with a population or process to be studied. Populations can be diverse topics such as all living in a country or every atom composing a crystal. Ideally, statisticians compile data about the entire population and this may be organized by governmental statistical institutes

2.
Probability
–
Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1, the higher the probability of an event, the more certain that the event will occur. A simple example is the tossing of a fair coin, since the coin is unbiased, the two outcomes are both equally probable, the probability of head equals the probability of tail. Since no other outcomes are possible, the probability is 1/2 and this type of probability is also called a priori probability. Probability theory is used to describe the underlying mechanics and regularities of complex systems. For example, tossing a coin twice will yield head-head, head-tail, tail-head. The probability of getting an outcome of head-head is 1 out of 4 outcomes or 1/4 or 0.25 and this interpretation considers probability to be the relative frequency in the long run of outcomes. A modification of this is propensity probability, which interprets probability as the tendency of some experiment to yield a certain outcome, subjectivists assign numbers per subjective probability, i. e. as a degree of belief. The degree of belief has been interpreted as, the price at which you would buy or sell a bet that pays 1 unit of utility if E,0 if not E. The most popular version of subjective probability is Bayesian probability, which includes expert knowledge as well as data to produce probabilities. The expert knowledge is represented by some prior probability distribution and these data are incorporated in a likelihood function. The product of the prior and the likelihood, normalized, results in a probability distribution that incorporates all the information known to date. The scientific study of probability is a development of mathematics. Gambling shows that there has been an interest in quantifying the ideas of probability for millennia, there are reasons of course, for the slow development of the mathematics of probability. Whereas games of chance provided the impetus for the study of probability. According to Richard Jeffrey, Before the middle of the century, the term probable meant approvable. A probable action or opinion was one such as people would undertake or hold. However, in legal contexts especially, probable could also apply to propositions for which there was good evidence, the sixteenth century Italian polymath Gerolamo Cardano demonstrated the efficacy of defining odds as the ratio of favourable to unfavourable outcomes

3.
Decision tree
–
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm, Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning. The paths from root to leaf represent classification rules, another use of decision trees is as a descriptive means for calculating conditional probabilities. Drawn from left to right, a tree has only burst nodes. Therefore, used manually, they can grow big and are then often hard to draw fully by hand. Traditionally, decision trees have been created manually — as the example shows — although increasingly. The decision tree can be linearized into decision rules, where the outcome is the contents of the node. In general, the rules have the form, if condition1 and condition2, Decision rules can be generated by constructing association rules with the target variable on the right. They can also denote temporal or causal relations, commonly a decision tree is drawn using flowchart symbols as it is easier for many to read and understand. Much of the information in a tree can be represented more compactly as an influence diagram, focusing attention on the issues. Decision trees can also be seen as models of induction rules from empirical data. An optimal decision tree is defined as a tree that accounts for most of the data. Several algorithms to generate such optimal trees have been devised, such as ID3/4/5, CLS, ASSISTANT, among decision support tools, decision trees have several advantages. Decision trees, Are simple to understand and interpret, people are able to understand decision tree models after a brief explanation. Have value even with little hard data, important insights can be generated based on experts describing a situation and their preferences for outcomes. Allow the addition of new possible scenarios Help determine worst, best, if a given result is provided by a model. Can be combined with other decision techniques, disadvantages of decision trees, For data including categorical variables with different number of levels, information gain in decision trees are biased in favor of those attributes with more levels. Calculations can get very complex particularly if many values are uncertain if many outcomes are linked

4.
Sample space
–
In probability theory, the sample space of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is denoted using set notation, and the possible outcomes are listed as elements in the set. It is common to refer to a space by the labels S, Ω. For example, if the experiment is tossing a coin, the space is typically the set. For tossing two coins, the sample space would be. For tossing a single six-sided die, the sample space is. A well-defined sample space is one of three elements in a probabilistic model, the other two are a well-defined set of possible events and a probability assigned to each event. For many experiments, there may be more than one plausible sample space available, for example, when drawing a card from a standard deck of fifty-two playing cards, one possibility for the sample space could be the various ranks, while another could be the suits. Still other sample spaces are possible, such as if some cards have been flipped when shuffling, some treatments of probability assume that the various outcomes of an experiment are always defined so as to be equally likely. The result of this is every possible combination of individuals who could be chosen for the sample is also equally likely. In an elementary approach to probability, any subset of the space is usually called an event. However, this rise to problems when the sample space is infinite. Under this definition only measurable subsets of the space, constituting a σ-algebra over the sample space itself, are considered events. Probability space Space Set Event σ-algebra

5.
Venn diagram
–
A Venn diagram is a diagram that shows all possible logical relations between a finite collection of different sets. These diagrams depict elements as points in the plane, and sets as regions inside closed curves, a Venn diagram consists of multiple overlapping closed curves, usually circles, each representing a set. In Venn diagrams the curves are overlapped in every possible way and they are thus a special case of Euler diagrams, which do not necessarily show all relations. Venn diagrams were conceived around 1880 by John Venn and they are used to teach elementary set theory, as well as illustrate simple set relationships in probability, logic, statistics, linguistics and computer science. A Venn diagram in which in addition the area of each shape is proportional to the number of elements it contains is called an area-proportional or scaled Venn diagram and this example involves two sets, A and B, represented here as coloured circles. The orange circle, set A, represents all living creatures that are two-legged, the blue circle, set B, represents the living creatures that can fly. Each separate type of creature can be imagined as a point somewhere in the diagram, living creatures that both can fly and have two legs—for example, parrots—are then in both sets, so they correspond to points in the region where the blue and orange circles overlap. That region contains all such and only living creatures. Humans and penguins are bipedal, and so are then in the circle, but since they cannot fly they appear in the left part of the orange circle. Mosquitoes have six legs, and fly, so the point for mosquitoes is in the part of the circle that does not overlap with the orange one. Creatures that are not two-legged and cannot fly would all be represented by points outside both circles, the combined region of sets A and B is called the union of A and B, denoted by A ∪ B. The union in this case contains all living creatures that are either two-legged or that can fly, the region in both A and B, where the two sets overlap, is called the intersection of A and B, denoted by A ∩ B. For example, the intersection of the two sets is not empty, because there are points that represent creatures that are in both the orange and blue circles. They are rightly associated with Venn, however, because he comprehensively surveyed and formalized their usage, Venn himself did not use the term Venn diagram and referred to his invention as Eulerian Circles. Of these schemes one only, viz. that commonly called Eulerian circles, has met with any general acceptance, the first to use the term Venn diagram was Clarence Irving Lewis in 1918, in his book A Survey of Symbolic Logic. Venn diagrams are similar to Euler diagrams, which were invented by Leonhard Euler in the 18th century. Baron has noted that Leibniz in the 17th century produced similar diagrams before Euler and she also observes even earlier Euler-like diagrams by Ramon Lull in the 13th Century. In the 20th century, Venn diagrams were further developed, D. W. Henderson showed in 1963 that the existence of an n-Venn diagram with n-fold rotational symmetry implied that n was a prime number

6.
Probability theory
–
Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. It is not possible to predict precisely results of random events, two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, a great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. Christiaan Huygens published a book on the subject in 1657 and in the 19th century, initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial. Eventually, analytical considerations compelled the incorporation of continuous variables into the theory and this culminated in modern probability theory, on foundations laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of space, introduced by Richard von Mises. This became the mostly undisputed axiomatic basis for modern probability theory, most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory-based treatment of probability covers the discrete, continuous, consider an experiment that can produce a number of outcomes. The set of all outcomes is called the space of the experiment. The power set of the space is formed by considering all different collections of possible results. For example, rolling an honest die produces one of six possible results, one collection of possible results corresponds to getting an odd number. Thus, the subset is an element of the set of the sample space of die rolls. In this case, is the event that the die falls on some odd number, If the results that actually occur fall in a given event, that event is said to have occurred. Probability is a way of assigning every event a value between zero and one, with the requirement that the event made up of all possible results be assigned a value of one, the probability that any one of the events, or will occur is 5/6. This is the same as saying that the probability of event is 5/6 and this event encompasses the possibility of any number except five being rolled. The mutually exclusive event has a probability of 1/6, and the event has a probability of 1, discrete probability theory deals with events that occur in countable sample spaces. Modern definition, The modern definition starts with a finite or countable set called the sample space, which relates to the set of all possible outcomes in classical sense, denoted by Ω

7.
Bayes' theorem
–
In probability theory and statistics, Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. One of the applications of Bayes’ theorem is Bayesian inference. When applied, the involved in Bayes’ theorem may have different probability interpretations. With the Bayesian probability interpretation the theorem expresses how a subjective degree of belief should rationally change to account for availability of related evidence, Bayesian inference is fundamental to Bayesian statistics. Bayes’ theorem is named after Rev. Thomas Bayes, who first provided an equation that allows new evidence to update beliefs. It was further developed by Pierre-Simon Laplace, who first published the modern formulation in his 1812 “Théorie analytique des probabilités. ”Sir Harold Jeffreys put Bayes’ algorithm and Laplaces formulation on an axiomatic basis. Jeffreys wrote that Bayes’ theorem “is to the theory of probability what the Pythagorean theorem is to geometry. ”Bayes theorem is stated mathematically as the equation, P = P P P. P and P are the probabilities of observing A and B without regard to each other, P, a conditional probability, is the probability of observing event A given that B is true. P is the probability of observing event B given that A is true, Bayes’ theorem was named after the Reverend Thomas Bayes, who studied how to compute a distribution for the probability parameter of a binomial distribution. Bayes’ unpublished manuscript was edited by Richard Price before it was posthumously read at the Royal Society. Price edited Bayes’ major work “An Essay towards solving a Problem in the Doctrine of Chances”, Price wrote an introduction to the paper which provides some of the philosophical basis of Bayesian statistics. In 1765 he was elected a Fellow of the Royal Society in recognition of his work on the legacy of Bayes, the French mathematician Pierre-Simon Laplace reproduced and extended Bayes’ results in 1774, apparently quite unaware of Bayes’ work. The Bayesian interpretation of probability was developed mainly by Laplace, stephen Stigler suggested in 1983 that Bayes’ theorem was discovered by Nicholas Saunderson, a blind English mathematician, some time before Bayes, that interpretation, however, has been disputed. Martyn Hooper and Sharon McGrayne have argued that Richard Prices contribution was substantial, By modern standards, Price discovered Bayes’ work, recognized its importance, corrected it, contributed to the article, and found a use for it. The modern convention of employing Bayes’ name alone is unfair but so entrenched that anything else makes little sense, suppose a drug test is 99% sensitive and 99% specific. That is, the test will produce 99% true positive results for drug users, suppose that 0. 5% of people are users of the drug. If a randomly selected individual tests positive, what is the probability that he is a user and this surprising result arises because the number of non-users is very large compared to the number of users, thus the number of false positives outweighs the number of true positives. To use concrete numbers, if 1000 individuals are tested, there are expected to be 995 non-users and 5 users, from the 995 non-users,0.01 ×995 ≃10 false positives are expected

8.
Conditional probability
–
In probability theory, conditional probability is a measure of the probability of an event given that another event has occurred. For example, the probability that any person has a cough on any given day may be only 5%. But if we know or assume that the person has a cold, the conditional probability of coughing given that you have a cold might be a much higher 75%. The concept of probability is one of the most fundamental. But conditional probabilities can be slippery and require careful interpretation. For example, there need not be a causal or temporal relationship between A and B, P may or may not be equal to P. If P = P, then events A and B are said to be independent, in such a case, also, in general, P is not equal to P. For example, if you have cancer you might have a 90% chance of testing positive for cancer. In this case what is being measured is that the if event B having cancer has occurred, alternatively, you can test positive for cancer but you may have only a 10% chance of actually having cancer because cancer is very rare. In this case what is being measured is the probability of the event B - having cancer given that the event A - test is positive has occurred, falsely equating the two probabilities causes various errors of reasoning such as the base rate fallacy. Conditional probabilities can be reversed using Bayes theorem. The logic behind this equation is that if the outcomes are restricted to B, Note that this is a definition but not a theoretical result. We just denote the quantity P / P as P and call it the conditional probability of A given B. Further, this multiplication axiom introduces a symmetry with the axiom for mutually exclusive events, P = P + P − P0 If P =0. However, it is possible to define a probability with respect to a σ-algebra of such events. The case where B has zero measure is problematic, see conditional expectation for more information. Conditioning on an event may be generalized to conditioning on a random variable, Let X be a random variable, we assume for the sake of presentation that X is discrete, that is, X takes on only finitely many values x. The conditional probability of A given X is defined as the variable, written P

9.
Probability measure
–
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity. The difference between a probability measure and the general notion of measure is that a probability measure must assign value 1 to the entire probability space. Probability measures have applications in fields, from physics to finance. The requirements for a function μ to be a probability measure on a probability space are that, μ must return results in the unit interval, μ must satisfy the countable additivity property that for all countable collections of pairwise disjoint sets, μ = ∑ i ∈ I μ. For example, given three elements 1,2 and 3 with probabilities 1/4, 1/4 and 1/2, the assigned to is 1/4 + 1/2 = 3/4. The conditional probability based on the intersection of events defined as, if there is a unique probability measure that must be used to price assets in a market, then the market is called a complete market. Not all measures that intuitively represent chance or likelihood are probability measures, for instance, although the fundamental concept of a system in statistical mechanics is a measure space, such measures are not always probability measures. Probability measures are used in mathematical biology. For instance, in comparative sequence analysis a probability measure may be defined for the likelihood that a variant may be permissible for an acid in a sequence. Ash, Catherine A. Doléans-Dade 1999 Academic Press ISBN 0-12-065202-1

10.
Joint probability distribution
–
In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution. The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint probability density function or joint probability mass function. Consider the flip of two coins, let A and B be discrete random variables associated with the outcomes first. If a coin displays heads then associated random variable is 1, the joint probability density function of A and B defines probabilities for each pair of outcomes. All possible outcomes are, Since each outcome is likely the joint probability density function becomes P =1 /4 when A, B ∈. Since the coin flips are independent, the joint probability density function is the product of the marginals, in general, each coin flip is a Bernoulli trial and the sequence of flips follows a Bernoulli distribution. Consider the roll of a dice and let A =1 if the number is even. Furthermore, let B =1 if the number is prime and B =0 otherwise. Then, the joint distribution of A and B, expressed as a probability function, is P = P =16, P = P =26, P = P =26, P = P =16. These probabilities necessarily sum to 1, since the probability of some combination of A and B occurring is 1. The joint probability function of two discrete random variables X, Y is, P = P ⋅ P = P ⋅ P. Again, since these are probability distributions, one has ∫ x ∫ y f X, Y d y d x =1, formally, fX, Y is the probability density function of with respect to the product measure on the respective supports of X and Y. Two discrete random variables X and Y are independent if the joint probability mass function satisfies P = P ⋅ P for all x and y. Similarly, two absolutely continuous random variables are independent if f X, Y = f X ⋅ f Y for all x and y, such conditional independence relations can be represented with a Bayesian network or copula functions

11.
Marginal distribution
–
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables and this contrasts with a conditional distribution, which gives the probabilities contingent upon the values of the other variables. The term marginal variable is used to refer to those variables in the subset of variables being retained and these terms are dubbed marginal because they used to be found by summing values in a table along rows or columns, and writing the sum in the margins of the table. The distribution of the variables is obtained by marginalizing over the distribution of the variables being discarded. Several different analyses may be done, each treating a different subset of variables as the marginal variables, given two random variables X and Y whose joint distribution is known, the marginal distribution of X is simply the probability distribution of X averaging over information about Y. It is the probability distribution of X when the value of Y is not known and this is typically calculated by summing or integrating the joint probability distribution over Y. For discrete random variables, the probability mass function can be written as Pr. This is Pr = ∑ y Pr = ∑ y Pr Pr, in this case, the variable Y has been marginalized out. Bivariate marginal and joint probabilities for discrete variables are often displayed as two-way tables. Similarly for continuous variables, the marginal probability density function can be written as pX. Again, the variable Y has been marginalized out and this follows from the definition of expected value, E Y = ∫ y f p Y d y. Let H be a random variable taking one value from. Let L be a random variable taking one value from. Realistically, H will be dependent on L and that is, P and P will take different values depending on whether L is red, yellow or green. A person is, for example, far more likely to be hit by a car trying to cross while the lights for cross traffic are green than if they are red. In general, a pedestrian can be hit if the lights are red OR if the lights are yellow OR if the lights are green. So, in case the answer for the marginal probability can be found by summing P for all possible values of L. Here is a showing the conditional probabilities of being hit

12.
Independence (probability theory)
–
In probability theory, two events are independent, statistically independent, or stochastically independent if the occurrence of one does not affect the probability of occurrence of other. Similarly, two variables are independent if the realization of one does not affect the probability distribution of the other. Two events A and B are independent if their joint probability equals the product of their probabilities, although the derived expressions may seem more intuitive, they are not the preferred definition, as the conditional probabilities may be undefined if P or P are 0. Furthermore, the preferred definition makes clear by symmetry that when A is independent of B, B is also independent of A. A finite set of events is independent if every pair of events is independent—that is, if. A finite set of events is independent if every event is independent of any intersection of the other events—that is, if and only if for every n-element subset. This is called the rule for independent events. Note that it is not a condition involving only the product of all the probabilities of all single events. For more than two events, an independent set of events is pairwise independent, but the converse is not necessarily true. Two random variables X and Y are independent if and only if the elements of the π-system generated by them are independent, that is to say, for every a and b, the events and are independent events. A set of variables is pairwise independent if and only if every pair of random variables is independent. A set of variables is mutually independent if and only if for any finite subset X1, …, X n and any finite sequence of numbers a 1, …, a n. The measure-theoretically inclined may prefer to substitute events for events in the above definition and that definition is exactly equivalent to the one above when the values of the random variables are real numbers. It has the advantage of working also for complex-valued random variables or for random variables taking values in any measurable space. Intuitively, two random variables X and Y are conditionally independent given Z if, once Z is known, for instance, two measurements X and Y of the same underlying quantity Z are not independent, but they are conditionally independent given Z. The formal definition of independence is based on the idea of conditional distributions. If X, Y, and Z are discrete random variables, if X and Y are conditionally independent given Z, then P = P for any x, y and z with P >0. That is, the distribution for X given Y and Z is the same as that given Z alone