1.
Statistics
–
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e. g. a scientific, industrial, or social problem, populations can be diverse topics such as all people living in a country or every atom composing a crystal. Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys, statistician Sir Arthur Lyon Bowley defines statistics as Numerical statements of facts in any department of inquiry placed in relation to each other. When census data cannot be collected, statisticians collect data by developing specific experiment designs, representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. In contrast, an observational study does not involve experimental manipulation, inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. A standard statistical procedure involves the test of the relationship between two data sets, or a data set and a synthetic data drawn from idealized model. A hypothesis is proposed for the relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the hypothesis is done using statistical tests that quantify the sense in which the null can be proven false. Working from a hypothesis, two basic forms of error are recognized, Type I errors and Type II errors. Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis, measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random or systematic, the presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems. Statistics continues to be an area of research, for example on the problem of how to analyze Big data. Statistics is a body of science that pertains to the collection, analysis, interpretation or explanation. Some consider statistics to be a mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty, mathematical techniques used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure-theoretic probability theory. In applying statistics to a problem, it is practice to start with a population or process to be studied. Populations can be diverse topics such as all living in a country or every atom composing a crystal. Ideally, statisticians compile data about the entire population and this may be organized by governmental statistical institutes
2.
Sample space
–
In probability theory, the sample space of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is denoted using set notation, and the possible outcomes are listed as elements in the set. It is common to refer to a space by the labels S, Ω. For example, if the experiment is tossing a coin, the space is typically the set. For tossing two coins, the sample space would be. For tossing a single six-sided die, the sample space is. A well-defined sample space is one of three elements in a probabilistic model, the other two are a well-defined set of possible events and a probability assigned to each event. For many experiments, there may be more than one plausible sample space available, for example, when drawing a card from a standard deck of fifty-two playing cards, one possibility for the sample space could be the various ranks, while another could be the suits. Still other sample spaces are possible, such as if some cards have been flipped when shuffling, some treatments of probability assume that the various outcomes of an experiment are always defined so as to be equally likely. The result of this is every possible combination of individuals who could be chosen for the sample is also equally likely. In an elementary approach to probability, any subset of the space is usually called an event. However, this rise to problems when the sample space is infinite. Under this definition only measurable subsets of the space, constituting a σ-algebra over the sample space itself, are considered events. Probability space Space Set Event σ-algebra
3.
Venn diagram
–
A Venn diagram is a diagram that shows all possible logical relations between a finite collection of different sets. These diagrams depict elements as points in the plane, and sets as regions inside closed curves, a Venn diagram consists of multiple overlapping closed curves, usually circles, each representing a set. In Venn diagrams the curves are overlapped in every possible way and they are thus a special case of Euler diagrams, which do not necessarily show all relations. Venn diagrams were conceived around 1880 by John Venn and they are used to teach elementary set theory, as well as illustrate simple set relationships in probability, logic, statistics, linguistics and computer science. A Venn diagram in which in addition the area of each shape is proportional to the number of elements it contains is called an area-proportional or scaled Venn diagram and this example involves two sets, A and B, represented here as coloured circles. The orange circle, set A, represents all living creatures that are two-legged, the blue circle, set B, represents the living creatures that can fly. Each separate type of creature can be imagined as a point somewhere in the diagram, living creatures that both can fly and have two legs—for example, parrots—are then in both sets, so they correspond to points in the region where the blue and orange circles overlap. That region contains all such and only living creatures. Humans and penguins are bipedal, and so are then in the circle, but since they cannot fly they appear in the left part of the orange circle. Mosquitoes have six legs, and fly, so the point for mosquitoes is in the part of the circle that does not overlap with the orange one. Creatures that are not two-legged and cannot fly would all be represented by points outside both circles, the combined region of sets A and B is called the union of A and B, denoted by A ∪ B. The union in this case contains all living creatures that are either two-legged or that can fly, the region in both A and B, where the two sets overlap, is called the intersection of A and B, denoted by A ∩ B. For example, the intersection of the two sets is not empty, because there are points that represent creatures that are in both the orange and blue circles. They are rightly associated with Venn, however, because he comprehensively surveyed and formalized their usage, Venn himself did not use the term Venn diagram and referred to his invention as Eulerian Circles. Of these schemes one only, viz. that commonly called Eulerian circles, has met with any general acceptance, the first to use the term Venn diagram was Clarence Irving Lewis in 1918, in his book A Survey of Symbolic Logic. Venn diagrams are similar to Euler diagrams, which were invented by Leonhard Euler in the 18th century. Baron has noted that Leibniz in the 17th century produced similar diagrams before Euler and she also observes even earlier Euler-like diagrams by Ramon Lull in the 13th Century. In the 20th century, Venn diagrams were further developed, D. W. Henderson showed in 1963 that the existence of an n-Venn diagram with n-fold rotational symmetry implied that n was a prime number
4.
Stochastic process
–
In probability theory and related fields, a stochastic or random process is a mathematical object usually defined as a collection of random variables. Stochastic processes are used as mathematical models of systems and phenomena that appear to vary in a random manner. Furthermore, seemingly random changes in financial markets have motivated the use of stochastic processes in finance. Applications and the study of phenomena have in turn inspired the proposal of new stochastic processes. Examples of such processes include the Wiener process or Brownian motion process, used by Louis Bachelier to study price changes on the Paris Bourse. Erlang to study the number phone calls occurring in a period of time. The term random function is used to refer to a stochastic or random process. The terms stochastic process and random process are used interchangeably, often no specific mathematical space for the set that indexes the random variables. But often these two terms are used when the variables are indexed by the integers or an interval of the real line. If the random variables are indexed by the Cartesian plane or some higher-dimensional Euclidean space, the values of a stochastic process are not always numbers and can be vectors or other mathematical objects. The theory of processes is considered to be an important contribution to mathematics. The set used to index the random variables is called the index set, historically, the index set was some subset of the real line, such as the natural numbers, giving the index set the interpretation of time. Each random variable in the collection takes values from the same space known as the state space. This state space can be, for example, the integers, an increment is the amount that a stochastic process changes between two index values, often interpreted as two points in time. A stochastic process can have many outcomes, due its randomness, and an outcome of a stochastic process is called, among other names. A stochastic process can be classified in different ways, for example, by its space, its index set. One common way of classification is by the cardinality of the index set, if the index set is some interval of the real line, then time is said to be continuous. The two types of processes are respectively referred to as discrete-time and continuous-time stochastic processes
5.
Probability theory
–
Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. It is not possible to predict precisely results of random events, two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, a great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. Christiaan Huygens published a book on the subject in 1657 and in the 19th century, initially, probability theory mainly considered discrete events, and its methods were mainly combinatorial. Eventually, analytical considerations compelled the incorporation of continuous variables into the theory and this culminated in modern probability theory, on foundations laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of space, introduced by Richard von Mises. This became the mostly undisputed axiomatic basis for modern probability theory, most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory-based treatment of probability covers the discrete, continuous, consider an experiment that can produce a number of outcomes. The set of all outcomes is called the space of the experiment. The power set of the space is formed by considering all different collections of possible results. For example, rolling an honest die produces one of six possible results, one collection of possible results corresponds to getting an odd number. Thus, the subset is an element of the set of the sample space of die rolls. In this case, is the event that the die falls on some odd number, If the results that actually occur fall in a given event, that event is said to have occurred. Probability is a way of assigning every event a value between zero and one, with the requirement that the event made up of all possible results be assigned a value of one, the probability that any one of the events, or will occur is 5/6. This is the same as saying that the probability of event is 5/6 and this event encompasses the possibility of any number except five being rolled. The mutually exclusive event has a probability of 1/6, and the event has a probability of 1, discrete probability theory deals with events that occur in countable sample spaces. Modern definition, The modern definition starts with a finite or countable set called the sample space, which relates to the set of all possible outcomes in classical sense, denoted by Ω
6.
Bayes' theorem
–
In probability theory and statistics, Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. One of the applications of Bayes’ theorem is Bayesian inference. When applied, the involved in Bayes’ theorem may have different probability interpretations. With the Bayesian probability interpretation the theorem expresses how a subjective degree of belief should rationally change to account for availability of related evidence, Bayesian inference is fundamental to Bayesian statistics. Bayes’ theorem is named after Rev. Thomas Bayes, who first provided an equation that allows new evidence to update beliefs. It was further developed by Pierre-Simon Laplace, who first published the modern formulation in his 1812 “Théorie analytique des probabilités. ”Sir Harold Jeffreys put Bayes’ algorithm and Laplaces formulation on an axiomatic basis. Jeffreys wrote that Bayes’ theorem “is to the theory of probability what the Pythagorean theorem is to geometry. ”Bayes theorem is stated mathematically as the equation, P = P P P. P and P are the probabilities of observing A and B without regard to each other, P, a conditional probability, is the probability of observing event A given that B is true. P is the probability of observing event B given that A is true, Bayes’ theorem was named after the Reverend Thomas Bayes, who studied how to compute a distribution for the probability parameter of a binomial distribution. Bayes’ unpublished manuscript was edited by Richard Price before it was posthumously read at the Royal Society. Price edited Bayes’ major work “An Essay towards solving a Problem in the Doctrine of Chances”, Price wrote an introduction to the paper which provides some of the philosophical basis of Bayesian statistics. In 1765 he was elected a Fellow of the Royal Society in recognition of his work on the legacy of Bayes, the French mathematician Pierre-Simon Laplace reproduced and extended Bayes’ results in 1774, apparently quite unaware of Bayes’ work. The Bayesian interpretation of probability was developed mainly by Laplace, stephen Stigler suggested in 1983 that Bayes’ theorem was discovered by Nicholas Saunderson, a blind English mathematician, some time before Bayes, that interpretation, however, has been disputed. Martyn Hooper and Sharon McGrayne have argued that Richard Prices contribution was substantial, By modern standards, Price discovered Bayes’ work, recognized its importance, corrected it, contributed to the article, and found a use for it. The modern convention of employing Bayes’ name alone is unfair but so entrenched that anything else makes little sense, suppose a drug test is 99% sensitive and 99% specific. That is, the test will produce 99% true positive results for drug users, suppose that 0. 5% of people are users of the drug. If a randomly selected individual tests positive, what is the probability that he is a user and this surprising result arises because the number of non-users is very large compared to the number of users, thus the number of false positives outweighs the number of true positives. To use concrete numbers, if 1000 individuals are tested, there are expected to be 995 non-users and 5 users, from the 995 non-users,0.01 ×995 ≃10 false positives are expected
7.
Copula (probability theory)
–
In probability theory and statistics, a copula is a multivariate probability distribution for which the marginal probability distribution of each variable is uniform. Copulas are used to describe the dependence between random variables and their name comes from the Latin for link or tie, similar but unrelated to grammatical copulas in linguistics. Copulas have been used widely in quantitative finance to model and minimize tail risk, Copulas are popular in high-dimensional statistical applications as they allow one to easily model and estimate the distribution of random vectors by estimating marginals and copulae separately. There are many parametric copula families available, which usually have parameters that control the strength of dependence, some popular parametric copula models are outlined below. Suppose its marginals are continuous, i. e. the marginal CDFs F i = P are continuous functions, by applying the probability integral transform to each component, the random vector = has uniformly distributed marginals. The copula of is defined as the joint cumulative distribution function of, the importance of the above is that the reverse of these steps can be used to generate pseudo-random samples from general classes of multivariate probability distributions. That is, given a procedure to generate a sample from the copula distribution, the inverses F i −1 are unproblematic as the F i were assumed to be continuous. The above formula for the function can be rewritten to correspond to this as. Sklars theorem, named after Abe Sklar, provides the foundation for the application of copulas. Sklars theorem states that every multivariate cumulative distribution function H = P of a vector can be expressed in terms of its marginals F i = P. In case that the distribution has a density h, and this is available, it holds further that h = c ⋅ f 1 ⋅ ⋯ ⋅ f d. The theorem also states that, given H, the copula is unique on Ran × ⋯ × Ran and this implies that the copula is unique if the marginals F i are continuous. The converse is true, given a copula C, d →. The Fréchet–Hoeffding Theorem states that for any Copula C, d → and any ∈ d the following bounds hold, the function W is called lower Fréchet–Hoeffding bound and is defined as W = max. The function M is called upper Fréchet–Hoeffding bound and is defined as M = min, the upper bound is sharp, M is always a copula, it corresponds to comonotone random variables. In two dimensions, i. e. the bivariate case, the Fréchet–Hoeffding Theorem states max ≤ C ≤ min Several families of copulae have been described, the Gaussian copula is a distribution over the unit cube d. It is constructed from a normal distribution over R d by using the probability integral transform. While there is no simple formula for the copula function, C R Gauss, it can be upper or lower bounded
8.
Conditional probability
–
In probability theory, conditional probability is a measure of the probability of an event given that another event has occurred. For example, the probability that any person has a cough on any given day may be only 5%. But if we know or assume that the person has a cold, the conditional probability of coughing given that you have a cold might be a much higher 75%. The concept of probability is one of the most fundamental. But conditional probabilities can be slippery and require careful interpretation. For example, there need not be a causal or temporal relationship between A and B, P may or may not be equal to P. If P = P, then events A and B are said to be independent, in such a case, also, in general, P is not equal to P. For example, if you have cancer you might have a 90% chance of testing positive for cancer. In this case what is being measured is that the if event B having cancer has occurred, alternatively, you can test positive for cancer but you may have only a 10% chance of actually having cancer because cancer is very rare. In this case what is being measured is the probability of the event B - having cancer given that the event A - test is positive has occurred, falsely equating the two probabilities causes various errors of reasoning such as the base rate fallacy. Conditional probabilities can be reversed using Bayes theorem. The logic behind this equation is that if the outcomes are restricted to B, Note that this is a definition but not a theoretical result. We just denote the quantity P / P as P and call it the conditional probability of A given B. Further, this multiplication axiom introduces a symmetry with the axiom for mutually exclusive events, P = P + P − P0 If P =0. However, it is possible to define a probability with respect to a σ-algebra of such events. The case where B has zero measure is problematic, see conditional expectation for more information. Conditioning on an event may be generalized to conditioning on a random variable, Let X be a random variable, we assume for the sake of presentation that X is discrete, that is, X takes on only finitely many values x. The conditional probability of A given X is defined as the variable, written P
9.
Probability measure
–
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity. The difference between a probability measure and the general notion of measure is that a probability measure must assign value 1 to the entire probability space. Probability measures have applications in fields, from physics to finance. The requirements for a function μ to be a probability measure on a probability space are that, μ must return results in the unit interval, μ must satisfy the countable additivity property that for all countable collections of pairwise disjoint sets, μ = ∑ i ∈ I μ. For example, given three elements 1,2 and 3 with probabilities 1/4, 1/4 and 1/2, the assigned to is 1/4 + 1/2 = 3/4. The conditional probability based on the intersection of events defined as, if there is a unique probability measure that must be used to price assets in a market, then the market is called a complete market. Not all measures that intuitively represent chance or likelihood are probability measures, for instance, although the fundamental concept of a system in statistical mechanics is a measure space, such measures are not always probability measures. Probability measures are used in mathematical biology. For instance, in comparative sequence analysis a probability measure may be defined for the likelihood that a variant may be permissible for an acid in a sequence. Ash, Catherine A. Doléans-Dade 1999 Academic Press ISBN 0-12-065202-1
10.
Joint probability distribution
–
In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution. The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint probability density function or joint probability mass function. Consider the flip of two coins, let A and B be discrete random variables associated with the outcomes first. If a coin displays heads then associated random variable is 1, the joint probability density function of A and B defines probabilities for each pair of outcomes. All possible outcomes are, Since each outcome is likely the joint probability density function becomes P =1 /4 when A, B ∈. Since the coin flips are independent, the joint probability density function is the product of the marginals, in general, each coin flip is a Bernoulli trial and the sequence of flips follows a Bernoulli distribution. Consider the roll of a dice and let A =1 if the number is even. Furthermore, let B =1 if the number is prime and B =0 otherwise. Then, the joint distribution of A and B, expressed as a probability function, is P = P =16, P = P =26, P = P =26, P = P =16. These probabilities necessarily sum to 1, since the probability of some combination of A and B occurring is 1. The joint probability function of two discrete random variables X, Y is, P = P ⋅ P = P ⋅ P. Again, since these are probability distributions, one has ∫ x ∫ y f X, Y d y d x =1, formally, fX, Y is the probability density function of with respect to the product measure on the respective supports of X and Y. Two discrete random variables X and Y are independent if the joint probability mass function satisfies P = P ⋅ P for all x and y. Similarly, two absolutely continuous random variables are independent if f X, Y = f X ⋅ f Y for all x and y, such conditional independence relations can be represented with a Bayesian network or copula functions
11.
Measure (mathematics)
–
In mathematical analysis, a measure on a set is a systematic way to assign a number to each suitable subset of that set, intuitively interpreted as its size. In this sense, a measure is a generalization of the concepts of length, area, for instance, the Lebesgue measure of the interval in the real numbers is its length in the everyday sense of the word – specifically,1. Technically, a measure is a function that assigns a real number or +∞ to subsets of a set X. It must further be countably additive, the measure of a subset that can be decomposed into a finite number of smaller disjoint subsets, is the sum of the measures of the smaller subsets. In general, if one wants to associate a consistent size to each subset of a set while satisfying the other axioms of a measure. This problem was resolved by defining measure only on a sub-collection of all subsets, the so-called measurable subsets and this means that countable unions, countable intersections and complements of measurable subsets are measurable. Non-measurable sets in a Euclidean space, on which the Lebesgue measure cannot be defined consistently, are complicated in the sense of being badly mixed up with their complement. Indeed, their existence is a consequence of the axiom of choice. Measure theory was developed in stages during the late 19th and early 20th centuries by Émile Borel, Henri Lebesgue, Johann Radon. The main applications of measures are in the foundations of the Lebesgue integral, in Andrey Kolmogorovs axiomatisation of probability theory, probability theory considers measures that assign to the whole set the size 1, and considers measurable subsets to be events whose probability is given by the measure. Ergodic theory considers measures that are invariant under, or arise naturally from, let X be a set and Σ a σ-algebra over X. A function μ from Σ to the real number line is called a measure if it satisfies the following properties, Non-negativity. Countable additivity, For all countable collections i =1 ∞ of pairwise disjoint sets in Σ, μ = ∑ k =1 ∞ μ One may require that at least one set E has finite measure. Then the empty set automatically has measure zero because of countable additivity, because μ = μ = μ + μ + μ + …, which implies that μ =0. If only the second and third conditions of the definition of measure above are met, the pair is called a measurable space, the members of Σ are called measurable sets. If and are two spaces, then a function f, X → Y is called measurable if for every Y-measurable set B ∈ Σ Y. See also Measurable function#Caveat about another setup, a triple is called a measure space. A probability measure is a measure with total measure one – i. e, a probability space is a measure space with a probability measure
12.
Marginal distribution
–
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables and this contrasts with a conditional distribution, which gives the probabilities contingent upon the values of the other variables. The term marginal variable is used to refer to those variables in the subset of variables being retained and these terms are dubbed marginal because they used to be found by summing values in a table along rows or columns, and writing the sum in the margins of the table. The distribution of the variables is obtained by marginalizing over the distribution of the variables being discarded. Several different analyses may be done, each treating a different subset of variables as the marginal variables, given two random variables X and Y whose joint distribution is known, the marginal distribution of X is simply the probability distribution of X averaging over information about Y. It is the probability distribution of X when the value of Y is not known and this is typically calculated by summing or integrating the joint probability distribution over Y. For discrete random variables, the probability mass function can be written as Pr. This is Pr = ∑ y Pr = ∑ y Pr Pr, in this case, the variable Y has been marginalized out. Bivariate marginal and joint probabilities for discrete variables are often displayed as two-way tables. Similarly for continuous variables, the marginal probability density function can be written as pX. Again, the variable Y has been marginalized out and this follows from the definition of expected value, E Y = ∫ y f p Y d y. Let H be a random variable taking one value from. Let L be a random variable taking one value from. Realistically, H will be dependent on L and that is, P and P will take different values depending on whether L is red, yellow or green. A person is, for example, far more likely to be hit by a car trying to cross while the lights for cross traffic are green than if they are red. In general, a pedestrian can be hit if the lights are red OR if the lights are yellow OR if the lights are green. So, in case the answer for the marginal probability can be found by summing P for all possible values of L. Here is a showing the conditional probabilities of being hit