A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic. The algorithm uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random bits. Formally, the algorithm's performance will be a random variable determined by the random bits. One has to distinguish between algorithms that use the random input so that they always terminate with the correct answer, but where the expected running time is finite, algorithms which have a chance of producing an incorrect result or fail to produce a result either by signaling a failure or failing to terminate. In the second case, random performance and random output, the term "algorithm" for a procedure is somewhat questionable. In the case of random output, it is no longer formally effective. However, in some cases, probabilistic algorithms are the only practical means of solving a problem. In common practice, randomized algorithms are approximated using a pseudorandom number generator in place of a true source of random bits.
As a motivating example, consider the problem of finding an ‘a’ in an array of n elements. Input: An array of n≥2 elements, in which half are ‘a’s and the other half are ‘b’s. Output: Find an ‘a’ in the array. We give one Las Vegas algorithm and one Monte Carlo algorithm. Las Vegas algorithm: This algorithm succeeds with probability 1; the number of iterations varies and can be arbitrarily large, but the expected number of iterations is lim n → ∞ ∑ i = 1 n i 2 i = 2 Since it is constant the expected run time over many calls is Θ. Monte Carlo algorithm: If an ‘a’ is found, the algorithm succeeds, else the algorithm fails. After k iterations, the probability of finding an ‘a’ is: This algorithm does not guarantee success, but the run time is bounded; the number of iterations is always equal to k. Taking k to be constant the run time is Θ. Randomized algorithms are useful when faced with a malicious "adversary" or attacker who deliberately tries to feed a bad input to the algorithm such as in the Prisoner's dilemma.
It is for this reason. In cryptographic applications, pseudo-random numbers cannot be used, since the adversary can predict them, making the algorithm deterministic. Therefore, either a source of random numbers or a cryptographically secure pseudo-random number generator is required. Another area in which randomness is inherent is quantum computing. In the example above, the Las Vegas algorithm always outputs the correct answer, but its running time is a random variable; the Monte Carlo algorithm is guaranteed to complete in an amount of time that can be bounded by a function the input size and its parameter k, but allows a small probability of error. Observe that any Las Vegas algorithm can be converted into a Monte Carlo algorithm, by having it output an arbitrary incorrect answer if it fails to complete within a specified time. Conversely, if an efficient verification procedure exists to check whether an answer is correct a Monte Carlo algorithm can be converted into a Las Vegas algorithm by running the Monte Carlo algorithm till a correct answer is obtained.
Computational complexity theory models randomized algorithms as probabilistic Turing machines. Both Las Vegas and Monte Carlo algorithms are considered, several complexity classes are studied; the most basic randomized complexity class is RP, the class of decision problems for which there is an efficient randomized algorithm which recognizes NO-instances with absolute certainty and recognizes YES-instances with a probability of at least 1/2. The complement class for RP is co-RP. Problem classes having algorithms with polynomial time average case running time whose output is always correct are said to be in ZPP; the class of problems for which both YES and NO-instances are allowed to be identified with some error is called BPP. This class acts as the randomized equivalent of P, i.e. BPP represents the class of efficient randomized algorithms; the first randomized algorithm was a method developed by Michael O. Rabin for the closest pair problem in computational geometry; the study of randomized algorithms was spurred by the 1977 discovery of a randomized primality test by Robert M. Solovay and Volker Strassen.
Soon afterwards Michael O. Rabin demonstrated that the 1976 Miller's primality test can be turned into a randomized algorithm. At that time, no practical deterministic algorithm for primality was known; the Miller-Rabin primality test relies on a binary relation between two positive integers k and n that can be expressed by saying that k "is a witness to the compositeness of" n. It can be shown that If there is a witness to the
In statistics, quality assurance, survey methodology, sampling is the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt for the samples to represent the population in question. Two advantages of sampling are lower cost and faster data collection than measuring the entire population; each observation measures one or more properties of observable bodies distinguished as independent objects or individuals. In survey sampling, weights can be applied to the data to adjust for the sample design in stratified sampling. Results from probability theory and statistical theory are employed to guide the practice. In business and medical research, sampling is used for gathering information about a population. Acceptance sampling is used to determine if a production lot of material meets the governing specifications. Successful statistical practice is based on focused problem definition. In sampling, this includes defining the "population".
A population can be defined as including all people or items with the characteristic one wishes to understand. Because there is rarely enough time or money to gather information from everyone or everything in a population, the goal becomes finding a representative sample of that population. Sometimes what defines. For example, a manufacturer needs to decide whether a batch of material from production is of high enough quality to be released to the customer, or should be sentenced for scrap or rework due to poor quality. In this case, the batch is the population. Although the population of interest consists of physical objects, sometimes it is necessary to sample over time, space, or some combination of these dimensions. For instance, an investigation of supermarket staffing could examine checkout line length at various times, or a study on endangered penguins might aim to understand their usage of various hunting grounds over time. For the time dimension, the focus may be on discrete occasions.
In other cases, the examined'population' may be less tangible. For example, Joseph Jagger studied the behaviour of roulette wheels at a casino in Monte Carlo, used this to identify a biased wheel. In this case, the'population' Jagger wanted to investigate was the overall behaviour of the wheel, while his'sample' was formed from observed results from that wheel. Similar considerations arise when taking repeated measurements of some physical characteristic such as the electrical conductivity of copper; this situation arises when seeking knowledge about the cause system of which the observed population is an outcome. In such cases, sampling theory may treat the observed population as a sample from a larger'superpopulation'. For example, a researcher might study the success rate of a new'quit smoking' program on a test group of 100 patients, in order to predict the effects of the program if it were made available nationwide. Here the superpopulation is "everybody in the country, given access to this treatment" – a group which does not yet exist, since the program isn't yet available to all.
Note that the population from which the sample is drawn may not be the same as the population about which information is desired. There is large but not complete overlap between these two groups due to frame issues etc.. Sometimes they may be separate – for instance, one might study rats in order to get a better understanding of human health, or one might study records from people born in 2008 in order to make predictions about people born in 2009. Time spent in making the sampled population and population of concern precise is well spent, because it raises many issues and questions that would otherwise have been overlooked at this stage. In the most straightforward case, such as the sampling of a batch of material from production, it would be most desirable to identify and measure every single item in the population and to include any one of them in our sample. However, in the more general case this is not possible or practical. There is no way to identify all rats in the set of all rats. Where voting is not compulsory, there is no way to identify which people will vote at a forthcoming election.
These imprecise populations are not amenable to sampling in any of the ways below and to which we could apply statistical theory. As a remedy, we seek a sampling frame which has the property that we can identify every single element and include any in our sample; the most straightforward type of frame is a list of elements of the population with appropriate contact information. For example, in an opinion poll, possible sampling frames include an electoral register and a telephone directory. A probability sample is a sample in which every unit in the population has a chance of being selected in the sample, this probability can be determined; the combination of these traits makes it possible to produce unbiased estimates of population totals, by weighting sampled units according to their probability of selection. Example: We want to estimate the total income of adults living in a given street. We visit each household in that street, identify all adults living there, randomly select one adult from each household..
We interview the selected person and find their income
Travelling salesman problem
The travelling salesman problem asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in operations research and theoretical computer science. The travelling purchaser problem and the vehicle routing problem are both generalizations of TSP. In the theory of computational complexity, the decision version of the TSP belongs to the class of NP-complete problems. Thus, it is possible that the worst-case running time for any algorithm for the TSP increases superpolynomially with the number of cities; the problem was first formulated in 1930 and is one of the most intensively studied problems in optimization. It is used as a benchmark for many optimization methods. Though the problem is computationally difficult, a large number of heuristics and exact algorithms are known, so that some instances with tens of thousands of cities can be solved and problems with millions of cities can be approximated within a small fraction of 1%.
The TSP has several applications in its purest formulation, such as planning and the manufacture of microchips. Modified, it appears as a sub-problem in many areas, such as DNA sequencing. In these applications, the concept city represents, for example, soldering points, or DNA fragments, the concept distance represents travelling times or cost, or a similarity measure between DNA fragments; the TSP appears in astronomy, as astronomers observing many sources will want to minimize the time spent moving the telescope between the sources. In many applications, additional constraints such as limited resources or time windows may be imposed; the origins of the travelling salesman problem are unclear. A handbook for travelling salesmen from 1832 mentions the problem and includes example tours through Germany and Switzerland, but contains no mathematical treatment; the travelling salesman problem was mathematically formulated in the 1800s by the Irish mathematician W. R. Hamilton and by the British mathematician Thomas Kirkman.
Hamilton’s Icosian Game was a recreational puzzle based on finding a Hamiltonian cycle. The general form of the TSP appears to have been first studied by mathematicians during the 1930s in Vienna and at Harvard, notably by Karl Menger, who defines the problem, considers the obvious brute-force algorithm, observes the non-optimality of the nearest neighbour heuristic: We denote by messenger problem the task to find, for finitely many points whose pairwise distances are known, the shortest route connecting the points. Of course, this problem is solvable by finitely many trials. Rules which would push the number of trials below the number of permutations of the given points, are not known; the rule that one first should go from the starting point to the closest point to the point closest to this, etc. in general does not yield the shortest route. It was first considered mathematically in the 1930s by Merrill M. Flood, looking to solve a school bus routing problem. Hassler Whitney at Princeton University introduced the name travelling salesman problem soon after.
In the 1950s and 1960s, the problem became popular in scientific circles in Europe and the USA after the RAND Corporation in Santa Monica offered prizes for steps in solving the problem. Notable contributions were made by George Dantzig, Delbert Ray Fulkerson and Selmer M. Johnson from the RAND Corporation, who expressed the problem as an integer linear program and developed the cutting plane method for its solution, they wrote what is considered the seminal paper on the subject in which with these new methods they solved an instance with 49 cities to optimality by constructing a tour and proving that no other tour could be shorter. Dantzig and Johnson, speculated that given a near optimal solution we may be able to find optimality or prove optimality by adding a small number of extra inequalities, they used this idea to solve their initial 49 city problem using a string model. They found. While this paper did not give an algorithmic approach to TSP problems, the ideas that lay within it were indispensable to creating exact solution methods for the TSP, though it would take 15 years to find an algorithmic approach in creating these cuts.
As well as cutting plane methods, Dantzig and Johnson used branch and bound algorithms for the first time. In the following decades, the problem was studied by many researchers from mathematics, computer science, chemistry and other sciences. In the 1960s however a new approach was created, that instead of seeking optimal solutions, one would produce a solution whose length is provably bounded by a multiple of the optimal length, in doing so create lower bounds for the problem. One method of doing this was to create a minimum spanning tree of the graph and double all its edges, which produces the bound that the length of an optimal tour is at most twice the weight of a minimum spanning tree. Christofides made a big advance in this approach of giving an approach for which we know the worst-case scenario. Christofides algorithm given in 1976, at worst is 1.5 times longer than the optimal solution. As the algorithm was so simple and quick, many hoped it would give way to a near optimal solution method.
This remains the method with the best worst-case sc