1.
Statistics
–
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e. g. a scientific, industrial, or social problem, populations can be diverse topics such as all people living in a country or every atom composing a crystal. Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys, statistician Sir Arthur Lyon Bowley defines statistics as Numerical statements of facts in any department of inquiry placed in relation to each other. When census data cannot be collected, statisticians collect data by developing specific experiment designs, representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. In contrast, an observational study does not involve experimental manipulation, inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two data sets, or a data set and a synthetic data drawn from idealized model. A hypothesis is proposed for the relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the hypothesis is done using statistical tests that quantify the sense in which the null can be proven false. Working from a hypothesis, two basic forms of error are recognized, Type I errors and Type II errors. Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis, measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random or systematic, the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics continues to be an area of research, for example on the problem of how to analyze Big data. Statistics is a body of science that pertains to the collection, analysis, interpretation or explanation. Some consider statistics to be a mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty, mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure-theoretic probability theory. In applying statistics to a problem, it is practice to start with a population or process to be studied. Populations can be diverse topics such as all living in a country or every atom composing a crystal. Ideally, statisticians compile data about the entire population and this may be organized by governmental statistical institutes

2.
Probability theory
–
Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. It is not possible to predict precisely results of random events, two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, a great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. Christiaan Huygens published a book on the subject in 1657 and in the 19th century, initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial. Eventually, analytical considerations compelled the incorporation of continuous variables into the theory and this culminated in modern probability theory, on foundations laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of space, introduced by Richard von Mises. This became the mostly undisputed axiomatic basis for modern probability theory, most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory-based treatment of probability covers the discrete, continuous, consider an experiment that can produce a number of outcomes. The set of all outcomes is called the space of the experiment. The power set of the space is formed by considering all different collections of possible results. For example, rolling an honest die produces one of six possible results, one collection of possible results corresponds to getting an odd number. Thus, the subset is an element of the set of the sample space of die rolls. In this case, is the event that the die falls on some odd number, If the results that actually occur fall in a given event, that event is said to have occurred. Probability is a way of assigning every event a value between zero and one, with the requirement that the event made up of all possible results be assigned a value of one, the probability that any one of the events, or will occur is 5/6. This is the same as saying that the probability of event is 5/6 and this event encompasses the possibility of any number except five being rolled. The mutually exclusive event has a probability of 1/6, and the event has a probability of 1, discrete probability theory deals with events that occur in countable sample spaces. Modern definition, The modern definition starts with a finite or countable set called the sample space, which relates to the set of all possible outcomes in classical sense, denoted by Ω

3.
Sample space
–
In probability theory, the sample space of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is denoted using set notation, and the possible outcomes are listed as elements in the set. It is common to refer to a space by the labels S, Ω. For example, if the experiment is tossing a coin, the space is typically the set. For tossing two coins, the sample space would be. For tossing a single six-sided die, the sample space is. A well-defined sample space is one of three elements in a probabilistic model, the other two are a well-defined set of possible events and a probability assigned to each event. For many experiments, there may be more than one plausible sample space available, for example, when drawing a card from a standard deck of fifty-two playing cards, one possibility for the sample space could be the various ranks, while another could be the suits. Still other sample spaces are possible, such as if some cards have been flipped when shuffling, some treatments of probability assume that the various outcomes of an experiment are always defined so as to be equally likely. The result of this is every possible combination of individuals who could be chosen for the sample is also equally likely. In an elementary approach to probability, any subset of the space is usually called an event. However, this rise to problems when the sample space is infinite. Under this definition only measurable subsets of the space, constituting a σ-algebra over the sample space itself, are considered events. Probability space Space Set Event σ-algebra

4.
Event (probability theory)
–
In probability theory, an event is a set of outcomes of an experiment to which a probability is assigned. A single outcome may be an element of different events. An event defines an event, namely the complementary set. Typically, when the space is finite, any subset of the sample space is an event. However, this approach does not work well in cases where the space is uncountably infinite. So, when defining a probability space it is possible, and often necessary, to exclude certain subsets of the sample space from being events. If we assemble a deck of 52 playing cards with no jokers, an event, however, is any subset of the sample space, including any singleton set, the empty set and the sample space itself. Other events are subsets of the sample space that contain multiple elements. So, for example, potential events include, Red and black at the time without being a joker, The 5 of Hearts, A King, A Face card, A Spade, A Face card or a red suit. Since all events are sets, they are written as sets. Defining all subsets of the space as events works well when there are only finitely many outcomes. For many standard probability distributions, such as the normal distribution, attempts to define probabilities for all subsets of the real numbers run into difficulties when one considers badly behaved sets, such as those that are nonmeasurable. Hence, it is necessary to restrict attention to a limited family of subsets. The most natural choice is the Borel measurable set derived from unions and intersections of intervals, however, the larger class of Lebesgue measurable sets proves more useful in practice. In the general description of probability spaces, an event may be defined as an element of a selected σ-algebra of subsets of the sample space. Under this definition, any subset of the space that is not an element of the σ-algebra is not an event. With a reasonable specification of the probability space, however, all events of interest are elements of the σ-algebra, even though events are subsets of some sample space Ω, they are often written as propositional formulas involving random variables. For example, if X is a random variable defined on the sample space Ω

5.
Random variable
–
In probability and statistics, a random variable, random quantity, aleatory variable, or stochastic variable is a variable quantity whose value depends on possible outcomes. It is common that these outcomes depend on physical variables that are not well understood. For example, when you toss a coin, the outcome of heads or tails depends on the uncertain physics. Which outcome will be observed is not certain, of course the coin could get caught in a crack in the floor, but such a possibility is excluded from consideration. The domain of a variable is the set of possible outcomes. In the case of the coin, there are two possible outcomes, namely heads or tails. Since one of these outcomes must occur, thus either the event that the coin lands heads or the event that the coin lands tails must have non-zero probability, a random variable is defined as a function that maps outcomes to numerical quantities, typically real numbers. In this sense, it is a procedure for assigning a numerical quantity to each outcome, and, contrary to its name. What is random is the physics that describes how the coin lands. A random variables possible values might represent the possible outcomes of a yet-to-be-performed experiment and they may also conceptually represent either the results of an objectively random process or the subjective randomness that results from incomplete knowledge of a quantity. The mathematics works the same regardless of the interpretation in use. A random variable has a probability distribution, which specifies the probability that its value falls in any given interval, two random variables with the same probability distribution can still differ in terms of their associations with, or independence from, other random variables. The realizations of a variable, that is, the results of randomly choosing values according to the variables probability distribution function, are called random variates. The formal mathematical treatment of random variables is a topic in probability theory, in that context, a random variable is understood as a function defined on a sample space whose outputs are numerical values. A random variable X, Ω → E is a function from a set of possible outcomes Ω to a measurable space E. The technical axiomatic definition requires Ω to be a probability space, a random variable does not return a probability. The probability of a set of outcomes is given by the probability measure P with which Ω is equipped. Rather, X returns a numerical quantity of outcomes in Ω — e. g. the number of heads in a collection of coin flips

6.
Probability measure
–
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity. The difference between a probability measure and the general notion of measure is that a probability measure must assign value 1 to the entire probability space. Probability measures have applications in fields, from physics to finance. The requirements for a function μ to be a probability measure on a probability space are that, μ must return results in the unit interval, μ must satisfy the countable additivity property that for all countable collections of pairwise disjoint sets, μ = ∑ i ∈ I μ. For example, given three elements 1,2 and 3 with probabilities 1/4, 1/4 and 1/2, the assigned to is 1/4 + 1/2 = 3/4. The conditional probability based on the intersection of events defined as, if there is a unique probability measure that must be used to price assets in a market, then the market is called a complete market. Not all measures that intuitively represent chance or likelihood are probability measures, for instance, although the fundamental concept of a system in statistical mechanics is a measure space, such measures are not always probability measures. Probability measures are used in mathematical biology. For instance, in comparative sequence analysis a probability measure may be defined for the likelihood that a variant may be permissible for an acid in a sequence. Ash, Catherine A. Doléans-Dade 1999 Academic Press ISBN 0-12-065202-1

7.
Joint probability distribution
–
In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution. The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint probability density function or joint probability mass function. Consider the flip of two coins, let A and B be discrete random variables associated with the outcomes first. If a coin displays heads then associated random variable is 1, the joint probability density function of A and B defines probabilities for each pair of outcomes. All possible outcomes are, Since each outcome is likely the joint probability density function becomes P =1 /4 when A, B ∈. Since the coin flips are independent, the joint probability density function is the product of the marginals, in general, each coin flip is a Bernoulli trial and the sequence of flips follows a Bernoulli distribution. Consider the roll of a dice and let A =1 if the number is even. Furthermore, let B =1 if the number is prime and B =0 otherwise. Then, the joint distribution of A and B, expressed as a probability function, is P = P =16, P = P =26, P = P =26, P = P =16. These probabilities necessarily sum to 1, since the probability of some combination of A and B occurring is 1. The joint probability function of two discrete random variables X, Y is, P = P ⋅ P = P ⋅ P. Again, since these are probability distributions, one has ∫ x ∫ y f X, Y d y d x =1, formally, fX, Y is the probability density function of with respect to the product measure on the respective supports of X and Y. Two discrete random variables X and Y are independent if the joint probability mass function satisfies P = P ⋅ P for all x and y. Similarly, two absolutely continuous random variables are independent if f X, Y = f X ⋅ f Y for all x and y, such conditional independence relations can be represented with a Bayesian network or copula functions

8.
Marginal distribution
–
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables and this contrasts with a conditional distribution, which gives the probabilities contingent upon the values of the other variables. The term marginal variable is used to refer to those variables in the subset of variables being retained and these terms are dubbed marginal because they used to be found by summing values in a table along rows or columns, and writing the sum in the margins of the table. The distribution of the variables is obtained by marginalizing over the distribution of the variables being discarded. Several different analyses may be done, each treating a different subset of variables as the marginal variables, given two random variables X and Y whose joint distribution is known, the marginal distribution of X is simply the probability distribution of X averaging over information about Y. It is the probability distribution of X when the value of Y is not known and this is typically calculated by summing or integrating the joint probability distribution over Y. For discrete random variables, the probability mass function can be written as Pr. This is Pr = ∑ y Pr = ∑ y Pr Pr, in this case, the variable Y has been marginalized out. Bivariate marginal and joint probabilities for discrete variables are often displayed as two-way tables. Similarly for continuous variables, the marginal probability density function can be written as pX. Again, the variable Y has been marginalized out and this follows from the definition of expected value, E Y = ∫ y f p Y d y. Let H be a random variable taking one value from. Let L be a random variable taking one value from. Realistically, H will be dependent on L and that is, P and P will take different values depending on whether L is red, yellow or green. A person is, for example, far more likely to be hit by a car trying to cross while the lights for cross traffic are green than if they are red. In general, a pedestrian can be hit if the lights are red OR if the lights are yellow OR if the lights are green. So, in case the answer for the marginal probability can be found by summing P for all possible values of L. Here is a showing the conditional probabilities of being hit

9.
Conditional probability
–
In probability theory, conditional probability is a measure of the probability of an event given that another event has occurred. For example, the probability that any person has a cough on any given day may be only 5%. But if we know or assume that the person has a cold, the conditional probability of coughing given that you have a cold might be a much higher 75%. The concept of probability is one of the most fundamental. But conditional probabilities can be slippery and require careful interpretation. For example, there need not be a causal or temporal relationship between A and B, P may or may not be equal to P. If P = P, then events A and B are said to be independent, in such a case, also, in general, P is not equal to P. For example, if you have cancer you might have a 90% chance of testing positive for cancer. In this case what is being measured is that the if event B having cancer has occurred, alternatively, you can test positive for cancer but you may have only a 10% chance of actually having cancer because cancer is very rare. In this case what is being measured is the probability of the event B - having cancer given that the event A - test is positive has occurred, falsely equating the two probabilities causes various errors of reasoning such as the base rate fallacy. Conditional probabilities can be reversed using Bayes theorem. The logic behind this equation is that if the outcomes are restricted to B, Note that this is a definition but not a theoretical result. We just denote the quantity P / P as P and call it the conditional probability of A given B. Further, this multiplication axiom introduces a symmetry with the axiom for mutually exclusive events, P = P + P − P0 If P =0. However, it is possible to define a probability with respect to a σ-algebra of such events. The case where B has zero measure is problematic, see conditional expectation for more information. Conditioning on an event may be generalized to conditioning on a random variable, Let X be a random variable, we assume for the sake of presentation that X is discrete, that is, X takes on only finitely many values x. The conditional probability of A given X is defined as the variable, written P

10.
Independence (probability theory)
–
In probability theory, two events are independent, statistically independent, or stochastically independent if the occurrence of one does not affect the probability of occurrence of other. Similarly, two variables are independent if the realization of one does not affect the probability distribution of the other. Two events A and B are independent if their joint probability equals the product of their probabilities, although the derived expressions may seem more intuitive, they are not the preferred definition, as the conditional probabilities may be undefined if P or P are 0. Furthermore, the preferred definition makes clear by symmetry that when A is independent of B, B is also independent of A. A finite set of events is independent if every pair of events is independent—that is, if. A finite set of events is independent if every event is independent of any intersection of the other events—that is, if and only if for every n-element subset. This is called the rule for independent events. Note that it is not a condition involving only the product of all the probabilities of all single events. For more than two events, an independent set of events is pairwise independent, but the converse is not necessarily true. Two random variables X and Y are independent if and only if the elements of the π-system generated by them are independent, that is to say, for every a and b, the events and are independent events. A set of variables is pairwise independent if and only if every pair of random variables is independent. A set of variables is mutually independent if and only if for any finite subset X1, …, X n and any finite sequence of numbers a 1, …, a n. The measure-theoretically inclined may prefer to substitute events for events in the above definition and that definition is exactly equivalent to the one above when the values of the random variables are real numbers. It has the advantage of working also for complex-valued random variables or for random variables taking values in any measurable space. Intuitively, two random variables X and Y are conditionally independent given Z if, once Z is known, for instance, two measurements X and Y of the same underlying quantity Z are not independent, but they are conditionally independent given Z. The formal definition of independence is based on the idea of conditional distributions. If X, Y, and Z are discrete random variables, if X and Y are conditionally independent given Z, then P = P for any x, y and z with P >0. That is, the distribution for X given Y and Z is the same as that given Z alone

11.
Conditional independence
–
In the standard notation of probability theory, R and B are conditionally independent given Y if and only if Pr = Pr Pr, or equivalently, Pr = Pr. Two random variables X and Y are conditionally independent given a random variable Z if. Two random variables X and Y are conditionally independent given a σ-algebra Σ if the equation holds for all R in σ and B in σ. Two random variables X and Y are conditionally independent given a random variable W if they are independent given σ, Conditional independence of more than two events, or of more than two random variables, is defined analogously. The following two examples show that X ⊥ Y neither implies nor is implied by X ⊥ Y | W. First, when W =0 take X and Y to be independent, each having the value 0 with probability 0.99 and the value 1 otherwise. When W =1, X and Y are again independent, but X and Y are dependent, because Pr < Pr. This is because Pr =0.5, but if Y =0 then its likely that W =0 and thus that X =0 as well. For the second example, suppose X ⊥ Y, each taking the values 0 and 1 with probability 0.5, let W be the product X×Y. Then when W =0, Pr = 2/3, but Pr = 1/2 and this is also an example of Explaining Away. See Kevin Murphys tutorial where X and Y take the values brainy, the discussion on StackExchange provides a couple of useful examples. Let the two events be the probabilities of persons A and B getting home in time for dinner, while both A and B have a lower probability of getting home in time for dinner, the lower probabilities will still be independent of each other. That is, the knowledge that A is late does not tell you whether B will be late. However, if you have information that they live in the neighborhood, use the same transportation. Conditional independence depends on the nature of the third event, if you roll two dice, one may assume that the two dice behave independently of each other. Looking at the results of 1 die will not tell you about the result of the second die, in other words, two events can be independent, but NOT conditionally independent. Height and vocabulary are not independent, but they are independent if you add age. Let p be the proportion of voters who will vote yes in an upcoming referendum, in taking an opinion poll, one chooses n voters randomly from the population. N, let Xi =1 or 0 corresponding, respectively, in a frequentist approach to statistical inference one would not attribute any probability distribution to p and one would say that X1

12.
Law of large numbers
–
In probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a number of trials should be close to the expected value. The LLN is important because it guarantees stable long-term results for the averages of some random events, for example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game and it is important to remember that the LLN only applies when a large number of observations is considered. There is no principle that a number of observations will coincide with the expected value or that a streak of one value will immediately be balanced by the others. For example, a roll of a fair, six-sided die produces one of the numbers 1,2,3,4,5, or 6. It follows from the law of numbers that the empirical probability of success in a series of Bernoulli trials will converge to the theoretical probability. For a Bernoulli random variable, the value is the theoretical probability of success. For example, a coin toss is a Bernoulli trial. When a fair coin is flipped once, the probability that the outcome will be heads is equal to 1/2. Therefore, according to the law of numbers, the proportion of heads in a large number of coin flips should be roughly 1/2. In particular, the proportion of heads after n flips will almost surely converge to 1/2 as n approaches infinity, though the proportion of heads approaches 1/2, almost surely the absolute difference in the number of heads and tails will become large as the number of flips becomes large. That is, the probability that the difference is a small number. Also, almost surely the ratio of the difference to the number of flips will approach zero. Intuitively, expected absolute difference grows, but at a slower rate than the number of flips, the Italian mathematician Gerolamo Cardano stated without proof that the accuracies of empirical statistics tend to improve with the number of trials. This was then formalized as a law of large numbers, a special form of the LLN was first proved by Jacob Bernoulli. It took him over 20 years to develop a rigorous mathematical proof which was published in his Ars Conjectandi in 1713. He named this his Golden Theorem but it became known as Bernoullis Theorem

13.
Bayes' theorem
–
In probability theory and statistics, Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. One of the applications of Bayes’ theorem is Bayesian inference. When applied, the involved in Bayes’ theorem may have different probability interpretations. With the Bayesian probability interpretation the theorem expresses how a subjective degree of belief should rationally change to account for availability of related evidence, Bayesian inference is fundamental to Bayesian statistics. Bayes’ theorem is named after Rev. Thomas Bayes, who first provided an equation that allows new evidence to update beliefs. It was further developed by Pierre-Simon Laplace, who first published the modern formulation in his 1812 “Théorie analytique des probabilités. ”Sir Harold Jeffreys put Bayes’ algorithm and Laplaces formulation on an axiomatic basis. Jeffreys wrote that Bayes’ theorem “is to the theory of probability what the Pythagorean theorem is to geometry. ”Bayes theorem is stated mathematically as the equation, P = P P P. P and P are the probabilities of observing A and B without regard to each other, P, a conditional probability, is the probability of observing event A given that B is true. P is the probability of observing event B given that A is true, Bayes’ theorem was named after the Reverend Thomas Bayes, who studied how to compute a distribution for the probability parameter of a binomial distribution. Bayes’ unpublished manuscript was edited by Richard Price before it was posthumously read at the Royal Society. Price edited Bayes’ major work “An Essay towards solving a Problem in the Doctrine of Chances”, Price wrote an introduction to the paper which provides some of the philosophical basis of Bayesian statistics. In 1765 he was elected a Fellow of the Royal Society in recognition of his work on the legacy of Bayes, the French mathematician Pierre-Simon Laplace reproduced and extended Bayes’ results in 1774, apparently quite unaware of Bayes’ work. The Bayesian interpretation of probability was developed mainly by Laplace, stephen Stigler suggested in 1983 that Bayes’ theorem was discovered by Nicholas Saunderson, a blind English mathematician, some time before Bayes, that interpretation, however, has been disputed. Martyn Hooper and Sharon McGrayne have argued that Richard Prices contribution was substantial, By modern standards, Price discovered Bayes’ work, recognized its importance, corrected it, contributed to the article, and found a use for it. The modern convention of employing Bayes’ name alone is unfair but so entrenched that anything else makes little sense, suppose a drug test is 99% sensitive and 99% specific. That is, the test will produce 99% true positive results for drug users, suppose that 0. 5% of people are users of the drug. If a randomly selected individual tests positive, what is the probability that he is a user and this surprising result arises because the number of non-users is very large compared to the number of users, thus the number of false positives outweighs the number of true positives. To use concrete numbers, if 1000 individuals are tested, there are expected to be 995 non-users and 5 users, from the 995 non-users,0.01 ×995 ≃10 false positives are expected

14.
Venn diagram
–
A Venn diagram is a diagram that shows all possible logical relations between a finite collection of different sets. These diagrams depict elements as points in the plane, and sets as regions inside closed curves, a Venn diagram consists of multiple overlapping closed curves, usually circles, each representing a set. In Venn diagrams the curves are overlapped in every possible way and they are thus a special case of Euler diagrams, which do not necessarily show all relations. Venn diagrams were conceived around 1880 by John Venn and they are used to teach elementary set theory, as well as illustrate simple set relationships in probability, logic, statistics, linguistics and computer science. A Venn diagram in which in addition the area of each shape is proportional to the number of elements it contains is called an area-proportional or scaled Venn diagram and this example involves two sets, A and B, represented here as coloured circles. The orange circle, set A, represents all living creatures that are two-legged, the blue circle, set B, represents the living creatures that can fly. Each separate type of creature can be imagined as a point somewhere in the diagram, living creatures that both can fly and have two legs—for example, parrots—are then in both sets, so they correspond to points in the region where the blue and orange circles overlap. That region contains all such and only living creatures. Humans and penguins are bipedal, and so are then in the circle, but since they cannot fly they appear in the left part of the orange circle. Mosquitoes have six legs, and fly, so the point for mosquitoes is in the part of the circle that does not overlap with the orange one. Creatures that are not two-legged and cannot fly would all be represented by points outside both circles, the combined region of sets A and B is called the union of A and B, denoted by A ∪ B. The union in this case contains all living creatures that are either two-legged or that can fly, the region in both A and B, where the two sets overlap, is called the intersection of A and B, denoted by A ∩ B. For example, the intersection of the two sets is not empty, because there are points that represent creatures that are in both the orange and blue circles. They are rightly associated with Venn, however, because he comprehensively surveyed and formalized their usage, Venn himself did not use the term Venn diagram and referred to his invention as Eulerian Circles. Of these schemes one only, viz. that commonly called Eulerian circles, has met with any general acceptance, the first to use the term Venn diagram was Clarence Irving Lewis in 1918, in his book A Survey of Symbolic Logic. Venn diagrams are similar to Euler diagrams, which were invented by Leonhard Euler in the 18th century. Baron has noted that Leibniz in the 17th century produced similar diagrams before Euler and she also observes even earlier Euler-like diagrams by Ramon Lull in the 13th Century. In the 20th century, Venn diagrams were further developed, D. W. Henderson showed in 1963 that the existence of an n-Venn diagram with n-fold rotational symmetry implied that n was a prime number

15.
Decision tree
–
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm, Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning. The paths from root to leaf represent classification rules, another use of decision trees is as a descriptive means for calculating conditional probabilities. Drawn from left to right, a tree has only burst nodes. Therefore, used manually, they can grow big and are then often hard to draw fully by hand. Traditionally, decision trees have been created manually — as the example shows — although increasingly. The decision tree can be linearized into decision rules, where the outcome is the contents of the node. In general, the rules have the form, if condition1 and condition2, Decision rules can be generated by constructing association rules with the target variable on the right. They can also denote temporal or causal relations, commonly a decision tree is drawn using flowchart symbols as it is easier for many to read and understand. Much of the information in a tree can be represented more compactly as an influence diagram, focusing attention on the issues. Decision trees can also be seen as models of induction rules from empirical data. An optimal decision tree is defined as a tree that accounts for most of the data. Several algorithms to generate such optimal trees have been devised, such as ID3/4/5, CLS, ASSISTANT, among decision support tools, decision trees have several advantages. Decision trees, Are simple to understand and interpret, people are able to understand decision tree models after a brief explanation. Have value even with little hard data, important insights can be generated based on experts describing a situation and their preferences for outcomes. Allow the addition of new possible scenarios Help determine worst, best, if a given result is provided by a model. Can be combined with other decision techniques, disadvantages of decision trees, For data including categorical variables with different number of levels, information gain in decision trees are biased in favor of those attributes with more levels. Calculations can get very complex particularly if many values are uncertain if many outcomes are linked

16.
International Standard Book Number
–
The International Standard Book Number is a unique numeric commercial book identifier. An ISBN is assigned to each edition and variation of a book, for example, an e-book, a paperback and a hardcover edition of the same book would each have a different ISBN. The ISBN is 13 digits long if assigned on or after 1 January 2007, the method of assigning an ISBN is nation-based and varies from country to country, often depending on how large the publishing industry is within a country. The initial ISBN configuration of recognition was generated in 1967 based upon the 9-digit Standard Book Numbering created in 1966, the 10-digit ISBN format was developed by the International Organization for Standardization and was published in 1970 as international standard ISO2108. Occasionally, a book may appear without a printed ISBN if it is printed privately or the author does not follow the usual ISBN procedure, however, this can be rectified later. Another identifier, the International Standard Serial Number, identifies periodical publications such as magazines, the ISBN configuration of recognition was generated in 1967 in the United Kingdom by David Whitaker and in 1968 in the US by Emery Koltay. The 10-digit ISBN format was developed by the International Organization for Standardization and was published in 1970 as international standard ISO2108, the United Kingdom continued to use the 9-digit SBN code until 1974. The ISO on-line facility only refers back to 1978, an SBN may be converted to an ISBN by prefixing the digit 0. For example, the edition of Mr. J. G. Reeder Returns, published by Hodder in 1965, has SBN340013818 -340 indicating the publisher,01381 their serial number. This can be converted to ISBN 0-340-01381-8, the check digit does not need to be re-calculated, since 1 January 2007, ISBNs have contained 13 digits, a format that is compatible with Bookland European Article Number EAN-13s. An ISBN is assigned to each edition and variation of a book, for example, an ebook, a paperback, and a hardcover edition of the same book would each have a different ISBN. The ISBN is 13 digits long if assigned on or after 1 January 2007, a 13-digit ISBN can be separated into its parts, and when this is done it is customary to separate the parts with hyphens or spaces. Separating the parts of a 10-digit ISBN is also done with either hyphens or spaces, figuring out how to correctly separate a given ISBN number is complicated, because most of the parts do not use a fixed number of digits. ISBN issuance is country-specific, in that ISBNs are issued by the ISBN registration agency that is responsible for country or territory regardless of the publication language. Some ISBN registration agencies are based in national libraries or within ministries of culture, in other cases, the ISBN registration service is provided by organisations such as bibliographic data providers that are not government funded. In Canada, ISBNs are issued at no cost with the purpose of encouraging Canadian culture. In the United Kingdom, United States, and some countries, where the service is provided by non-government-funded organisations. Australia, ISBNs are issued by the library services agency Thorpe-Bowker

17.
Probability
–
Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1, the higher the probability of an event, the more certain that the event will occur. A simple example is the tossing of a fair coin, since the coin is unbiased, the two outcomes are both equally probable, the probability of head equals the probability of tail. Since no other outcomes are possible, the probability is 1/2 and this type of probability is also called a priori probability. Probability theory is used to describe the underlying mechanics and regularities of complex systems. For example, tossing a coin twice will yield head-head, head-tail, tail-head. The probability of getting an outcome of head-head is 1 out of 4 outcomes or 1/4 or 0.25 and this interpretation considers probability to be the relative frequency in the long run of outcomes. A modification of this is propensity probability, which interprets probability as the tendency of some experiment to yield a certain outcome, subjectivists assign numbers per subjective probability, i. e. as a degree of belief. The degree of belief has been interpreted as, the price at which you would buy or sell a bet that pays 1 unit of utility if E,0 if not E. The most popular version of subjective probability is Bayesian probability, which includes expert knowledge as well as data to produce probabilities. The expert knowledge is represented by some prior probability distribution and these data are incorporated in a likelihood function. The product of the prior and the likelihood, normalized, results in a probability distribution that incorporates all the information known to date. The scientific study of probability is a development of mathematics. Gambling shows that there has been an interest in quantifying the ideas of probability for millennia, there are reasons of course, for the slow development of the mathematics of probability. Whereas games of chance provided the impetus for the study of probability. According to Richard Jeffrey, Before the middle of the century, the term probable meant approvable. A probable action or opinion was one such as people would undertake or hold. However, in legal contexts especially, probable could also apply to propositions for which there was good evidence, the sixteenth century Italian polymath Gerolamo Cardano demonstrated the efficacy of defining odds as the ratio of favourable to unfavourable outcomes