In computer science and information theory, Tunstall coding is a form of entropy coding used for lossless data compression. Tunstall coding was the subject of Brian Parker Tunstall's PhD thesis in 1967, while at Georgia Institute of Technology; the subject of that thesis was "Synthesis of noiseless compression codes" Its design is a precursor to Lempel-Ziv. Unlike variable-length codes, which include Huffman and Lempel–Ziv coding, Tunstall coding is a code which maps source symbols to a fixed number of bits. Both Tunstall codes and Lempel-Ziv codes represent variable-length words by fixed-length codes. Unlike typical set encoding, Tunstall coding parses a stochastic source with codewords of variable length, it can be shown that, for a large enough dictionary, the number of bits per source letter can be infinitely close to H, the entropy of the source. The algorithm requires as input an input alphabet U, along with a distribution of probabilities for each word input, it requires an arbitrary constant C, an upper bound to the size of the dictionary that it will compute.
The dictionary in question, D, is constructed as a tree of probabilities, in which each edge is associated to a letter from the input alphabet. The algorithm goes like this: D:= tree of | U | leaves, one for each letter in U. While | D | < C: Convert most probable leaf to tree with | U | leaves. Let's imagine that we wish to encode the string "hello, world". Let's further assume that the input alphabet U contains only characters from the string "hello, world" — that is,'h','e','l',',',",'w','o','r','d'. We can therefore compute the probability of each character based on its statistical appearance in the input string. For instance, the letter L appears thrice in a string of 12 characters: its probability is 3 12. We initialize the tree, starting with a tree of | U | = 9 leaves; each word is therefore directly associated to a letter of the alphabet. The 9 words that we thus obtain can be encoded into a fixed-sized output of ⌈ log 2 ⌉ = 4 bits. We take the leaf of highest probability, convert it to yet another tree of | U | = 9 leaves, one for each character.
We re-compute the probabilities of those leaves. For instance, the sequence of two letters L happens once. Given that there are three occurrences of letters followed by an L, the resulting probability is 1 3 ⋅ 3 12 = 1 12. We obtain. Note that we could iterate further, increasing the number of words by | U | − 1 = 8 every time. Tunstall coding requires the algorithm to know, prior to the parsing operation, what the distribution of probabilities for each letter of the alphabet is; this issue is shared with Huffman coding. Its requiring a fixed-length block output makes it lesser than Lempel-Ziv, which has a similar dictionary-based design, but with a variable-sized block output
David J. C. MacKay
Sir David John Cameron MacKay was a British physicist and academic. He was the Regius Professor of Engineering in the Department of Engineering at the University of Cambridge and from 2009 to 2014 was Chief Scientific Adviser to the UK Department of Energy and Climate Change. MacKay was the author of the book Sustainable Energy – Without the Hot Air. MacKay was educated at Newcastle High School and represented Britain in the International Physics Olympiad in Yugoslavia in 1985, receiving the first prize for experimental work, he continued his education at Trinity College and received a Bachelor of Arts degree in Natural Sciences in 1988. He went to the California Institute of Technology as a Fulbright Scholar, where his supervisor was John Hopfield, he was awarded a PhD in 1992. In January 1992 MacKay was appointed the Royal Society Smithson Research Fellow at Darwin College, continuing his cross-disciplinary research in the Cavendish Laboratory, the Department of Physics of the University of Cambridge.
In 1995 he was made a University Lecturer in the Cavendish Laboratory. He was promoted in 1999 to a Readership, in 2003 to a Professorship in Natural Philosophy and in 2013 to the post of Regius Professorship of Engineering. MacKay's contributions in machine learning and information theory include the development of Bayesian methods for neural networks, the rediscovery of low-density parity-check codes, the invention of Dasher, a software application for communication popular with those who cannot use a traditional keyboard, he cofounded the knowledge management company Transversal. In 2003, his book Information Theory and Learning Algorithms was published, his interests beyond research included the development of effective teaching methods and African development. In 2008 he completed a book on energy consumption and energy production without fossil fuels called Sustainable Energy – Without the Hot Air. MacKay used £10,000 of his own money to publish the book, the initial print run of 5,000 sold within days.
The book received praise from The Economist, The Guardian, Bill Gates, who called it "one of the best books on energy, written." Like his textbook on Information theory, MacKay made the book available for free online. In March 2012 he gave a TED talk on renewable energy. MacKay was appointed to be Chief Scientific Advisor of the Department of Energy and Climate Change, United Kingdom, in September 2009. In October 2014, at the end of his five-year term, he was succeeded by John Loughhead. MacKay was elected a Fellow of the Royal Society in 2009, his certificate of election reads: In the 2016 New Year Honours, MacKay was appointed a Knight Bachelor "for services to Scientific Advice in Government and Science Outreach", therefore granted the title sir. David MacKay was born the fifth child of Donald MacCrimmon Valerie MacKay, his elder brother Robert S. MacKay FRS is Professor of Mathematics at the University of Warwick. David was a vegetarian, he married Ramesh Ghiassi in 2011. They had a daughter.
MacKay was diagnosed with inoperable stomach cancer in July 2015, for which he underwent palliative chemotherapy, a process he documented in detail on his public personal blog. He died in the afternoon of 14 April 2016, he is survived by his wife and two children
Probability is the measure of the likelihood that an event will occur. See glossary of probability and statistics. Probability quantifies as a number between 0 and 1, loosely speaking, 0 indicates impossibility and 1 indicates certainty; the higher the probability of an event, the more it is that the event will occur. A simple example is the tossing of a fair coin. Since the coin is fair, the two outcomes are both probable; these concepts have been given an axiomatic mathematical formalization in probability theory, used in such areas of study as mathematics, finance, science, artificial intelligence/machine learning, computer science, game theory, philosophy to, for example, draw inferences about the expected frequency of events. Probability theory is used to describe the underlying mechanics and regularities of complex systems; when dealing with experiments that are random and well-defined in a purely theoretical setting, probabilities can be numerically described by the number of desired outcomes divided by the total number of all outcomes.
For example, tossing a fair coin twice will yield "head-head", "head-tail", "tail-head", "tail-tail" outcomes. The probability of getting an outcome of "head-head" is 1 out of 4 outcomes, or, in numerical terms, 1/4, 0.25 or 25%. However, when it comes to practical application, there are two major competing categories of probability interpretations, whose adherents possess different views about the fundamental nature of probability: Objectivists assign numbers to describe some objective or physical state of affairs; the most popular version of objective probability is frequentist probability, which claims that the probability of a random event denotes the relative frequency of occurrence of an experiment's outcome, when repeating the experiment. This interpretation considers probability to be the relative frequency "in the long run" of outcomes. A modification of this is propensity probability, which interprets probability as the tendency of some experiment to yield a certain outcome if it is performed only once.
Subjectivists assign numbers per subjective probability. The degree of belief has been interpreted as, "the price at which you would buy or sell a bet that pays 1 unit of utility if E, 0 if not E." The most popular version of subjective probability is Bayesian probability, which includes expert knowledge as well as experimental data to produce probabilities. The expert knowledge is represented by some prior probability distribution; these data are incorporated in a likelihood function. The product of the prior and the likelihood, results in a posterior probability distribution that incorporates all the information known to date. By Aumann's agreement theorem, Bayesian agents whose prior beliefs are similar will end up with similar posterior beliefs. However, sufficiently different priors can lead to different conclusions regardless of how much information the agents share; the word probability derives from the Latin probabilitas, which can mean "probity", a measure of the authority of a witness in a legal case in Europe, correlated with the witness's nobility.
In a sense, this differs much from the modern meaning of probability, which, in contrast, is a measure of the weight of empirical evidence, is arrived at from inductive reasoning and statistical inference. The scientific study of probability is a modern development of mathematics. Gambling shows that there has been an interest in quantifying the ideas of probability for millennia, but exact mathematical descriptions arose much later. There are reasons for the slow development of the mathematics of probability. Whereas games of chance provided the impetus for the mathematical study of probability, fundamental issues are still obscured by the superstitions of gamblers. According to Richard Jeffrey, "Before the middle of the seventeenth century, the term'probable' meant approvable, was applied in that sense, unequivocally, to opinion and to action. A probable action or opinion was one such as sensible people would undertake or hold, in the circumstances." However, in legal contexts especially,'probable' could apply to propositions for which there was good evidence.
The sixteenth century Italian polymath Gerolamo Cardano demonstrated the efficacy of defining odds as the ratio of favourable to unfavourable outcomes. Aside from the elementary work by Cardano, the doctrine of probabilities dates to the correspondence of Pierre de Fermat and Blaise Pascal. Christiaan Huygens gave the earliest known scientific treatment of the subject. Jakob Bernoulli's Ars Conjectandi and Abraham de Moivre's Doctrine of Chances treated the subject as a branch of mathematics. See Ian Hacking's The Emergence of Probability and James Franklin's The Science of Conjecture for histories of the early development of the concept of mathematical probability; the theory of errors may be traced back to Roger Cotes's Opera Miscellanea, but a memoir prepared by Thomas Simpson in 1755 first applied the theory to the discussion of errors of observation. The reprint of this memoir lays down the axioms that positive and negative errors are probable, that certain assignable limits define the range of all errors.
Simpson discusses c
International Standard Serial Number
An International Standard Serial Number is an eight-digit serial number used to uniquely identify a serial publication, such as a magazine. The ISSN is helpful in distinguishing between serials with the same title. ISSN are used in ordering, interlibrary loans, other practices in connection with serial literature; the ISSN system was first drafted as an International Organization for Standardization international standard in 1971 and published as ISO 3297 in 1975. ISO subcommittee TC 46/SC 9 is responsible for maintaining the standard; when a serial with the same content is published in more than one media type, a different ISSN is assigned to each media type. For example, many serials are published both in electronic media; the ISSN system refers to these types as electronic ISSN, respectively. Conversely, as defined in ISO 3297:2007, every serial in the ISSN system is assigned a linking ISSN the same as the ISSN assigned to the serial in its first published medium, which links together all ISSNs assigned to the serial in every medium.
The format of the ISSN is an eight digit code, divided by a hyphen into two four-digit numbers. As an integer number, it can be represented by the first seven digits; the last code digit, which may be 0-9 or an X, is a check digit. Formally, the general form of the ISSN code can be expressed as follows: NNNN-NNNC where N is in the set, a digit character, C is in; the ISSN of the journal Hearing Research, for example, is 0378-5955, where the final 5 is the check digit, C=5. To calculate the check digit, the following algorithm may be used: Calculate the sum of the first seven digits of the ISSN multiplied by its position in the number, counting from the right—that is, 8, 7, 6, 5, 4, 3, 2, respectively: 0 ⋅ 8 + 3 ⋅ 7 + 7 ⋅ 6 + 8 ⋅ 5 + 5 ⋅ 4 + 9 ⋅ 3 + 5 ⋅ 2 = 0 + 21 + 42 + 40 + 20 + 27 + 10 = 160 The modulus 11 of this sum is calculated. For calculations, an upper case X in the check digit position indicates a check digit of 10. To confirm the check digit, calculate the sum of all eight digits of the ISSN multiplied by its position in the number, counting from the right.
The modulus 11 of the sum must be 0. There is an online ISSN checker. ISSN codes are assigned by a network of ISSN National Centres located at national libraries and coordinated by the ISSN International Centre based in Paris; the International Centre is an intergovernmental organization created in 1974 through an agreement between UNESCO and the French government. The International Centre maintains a database of all ISSNs assigned worldwide, the ISDS Register otherwise known as the ISSN Register. At the end of 2016, the ISSN Register contained records for 1,943,572 items. ISSN and ISBN codes are similar in concept. An ISBN might be assigned for particular issues of a serial, in addition to the ISSN code for the serial as a whole. An ISSN, unlike the ISBN code, is an anonymous identifier associated with a serial title, containing no information as to the publisher or its location. For this reason a new ISSN is assigned to a serial each time it undergoes a major title change. Since the ISSN applies to an entire serial a new identifier, the Serial Item and Contribution Identifier, was built on top of it to allow references to specific volumes, articles, or other identifiable components.
Separate ISSNs are needed for serials in different media. Thus, the print and electronic media versions of a serial need separate ISSNs. A CD-ROM version and a web version of a serial require different ISSNs since two different media are involved. However, the same ISSN can be used for different file formats of the same online serial; this "media-oriented identification" of serials made sense in the 1970s. In the 1990s and onward, with personal computers, better screens, the Web, it makes sense to consider only content, independent of media; this "content-oriented identification" of serials was a repressed demand during a decade, but no ISSN update or initiative occurred. A natural extension for ISSN, the unique-identification of the articles in the serials, was the main demand application. An alternative serials' contents model arrived with the indecs Content Model and its application, the digital object identifier, as ISSN-independent initiative, consolidated in the 2000s. Only in 2007, ISSN-L was defined in the
Range encoding is an entropy coding method defined by G. Nigel N. Martin in a 1979 paper, which rediscovered the FIFO arithmetic code first introduced by Richard Clark Pasco in 1976. Given a stream of symbols and their probabilities, a range coder produces a space-efficient stream of bits to represent these symbols and, given the stream and the probabilities, a range decoder reverses the process. Range coding is similar to arithmetic encoding, except that encoding is done with digits in any base, instead of with bits, so it is faster when using larger bases at small cost in compression efficiency. After the expiration of the first arithmetic coding patent, range encoding appeared to be free of patent encumbrances; this drove interest in the technique in the open source community. Since that time, patents on various well-known arithmetic coding techniques have expired. Range encoding conceptually encodes all the symbols of the message into one number, unlike Huffman coding which assigns each symbol a bit-pattern and concatenates all the bit-patterns together.
Thus range encoding can achieve greater compression ratios than the one-bit-per-symbol lower bound on Huffman encoding and it does not suffer the inefficiencies that Huffman does when dealing with probabilities that are not exact powers of two. The central concept behind range encoding is this: given a large-enough range of integers, a probability estimation for the symbols, the initial range can be divided into sub-ranges whose sizes are proportional to the probability of the symbol they represent; each symbol of the message can be encoded in turn, by reducing the current range down to just that sub-range which corresponds to the next symbol to be encoded. The decoder must have the same probability estimation the encoder used, which can either be sent in advance, derived from transferred data or be part of the compressor and decompressor; when all symbols have been encoded identifying the sub-range is enough to communicate the entire message. A single integer is sufficient to identify the sub-range, it may not be necessary to transmit the entire integer.
Suppose we want to encode the message "AABA<EOM>", where <EOM> is the end-of-message symbol. For this example it is assumed that the decoder knows that we intend to encode five symbols in the base 10 number system ) using the probability distribution; the encoder breaks down the range [0, 100000) into three subranges: A: [ 0, 60000) B: [ 60000, 80000) <EOM>: [ 80000, 100000) Since our first symbol is an A, it reduces our initial range down to [0, 60000). The second symbol choice leaves us with three sub-ranges of this range. We show them following the already-encoded'A': AA: [ 0, 36000) AB: [ 36000, 48000) A<EOM>: [ 48000, 60000) With two symbols encoded, our range is now [0, 36000) and our third symbol leads to the following choices: AAA: [ 0, 21600) AAB: [ 21600, 28800) AA<EOM>: [ 28800, 36000) This time it is the second of our three choices that represent the message we want to encode, our range becomes [21600, 28800). It may look harder to determine our sub-ranges in this case, but it is not: we can subtract the lower bound from the upper bound to determine that there are 7200 numbers in our range.
Adding back the lower bound gives us our ranges: AABA: [21600, 25920) AABB: [25920, 27360) AAB<EOM>: [27360, 28800) Finally, with our range narrowed down to [21600, 25920), we have just one more symbol to encode. Using the same technique as before for dividing up the range between the lower and upper bound, we find the three sub-ranges are: AABAA: [21600, 24192) AABAB: [24192, 25056) AABA<EOM>: [25056, 25920) And since <EOM> is our final symbol, our final range is [25056, 25920). Because all five-digit integers starting with "251" fall within our final range, it is one of the three-digit prefixes we could transmit that would unambiguously convey our original message; the central problem may appear to be selecting an initial range large enough that no matter how many symbols we have to encode, we will always have a current range large enough to divide into non-zero sub-ranges. In practice, this is not a problem, because instead of starting with a large range and narrowing it down, the encoder works with a smaller range of numbers at any given time.
After some number of digits have been encoded, the leftmost digits will not change. In the example after encoding just three symbols, we knew that our final result would start with "2". More digits are shifted in on the right; this is illustrated in the following code: To finish off we may need to emit a few extra digits. The top digit of low is too small so we need to increment it, but we have to make sure we don't increment it past low+range. So first we need to make sure. One problem that can occur with the Encode function above is that range might become small but low and low+range still have differ
Claude Elwood Shannon was an American mathematician, electrical engineer, cryptographer known as "the father of information theory". Shannon is noted for having founded information theory with a landmark paper, A Mathematical Theory of Communication, that he published in 1948, he is well known for founding digital circuit design theory in 1937, when—as a 21-year-old master's degree student at the Massachusetts Institute of Technology —he wrote his thesis demonstrating that electrical applications of Boolean algebra could construct any logical numerical relationship. Shannon contributed to the field of cryptanalysis for national defense during World War II, including his fundamental work on codebreaking and secure telecommunications. Shannon was born in Petoskey and grew up in Gaylord, Michigan, his father, Claude, Sr. a descendant of early settlers of New Jersey, was a self-made businessman, for a while, a Judge of Probate. Shannon's mother, Mabel Wolf Shannon, was a language teacher, served as the principal of Gaylord High School.
Most of the first 16 years of Shannon's life were spent in Gaylord, where he attended public school, graduating from Gaylord High School in 1932. Shannon showed an inclination towards electrical things, his best subjects were science and mathematics. At home he constructed such devices as models of planes, a radio-controlled model boat and a barbed-wire telegraph system to a friend's house a half-mile away. While growing up, he worked as a messenger for the Western Union company, his childhood hero was Thomas Edison, who he learned was a distant cousin. Both Shannon and Edison were descendants of John Ogden, a colonial leader and an ancestor of many distinguished people. Shannon was an atheist. In 1932, Shannon entered the University of Michigan, where he was introduced to the work of George Boole, he graduated in 1936 with two bachelor's degrees: one in electrical engineering and the other in mathematics. In 1936, Shannon began his graduate studies in electrical engineering at MIT, where he worked on Vannevar Bush's differential analyzer, an early analog computer.
While studying the complicated ad hoc circuits of this analyzer, Shannon designed switching circuits based on Boole's concepts. In 1937, he wrote A Symbolic Analysis of Relay and Switching Circuits. A paper from this thesis was published in 1938. In this work, Shannon proved that his switching circuits could be used to simplify the arrangement of the electromechanical relays that were used in telephone call routing switches. Next, he expanded this concept, proving that these circuits could solve all problems that Boolean algebra could solve. In the last chapter, he presented diagrams of several circuits, including a 4-bit full adder. Using this property of electrical switches to implement logic is the fundamental concept that underlies all electronic digital computers. Shannon's work became the foundation of digital circuit design, as it became known in the electrical engineering community during and after World War II; the theoretical rigor of Shannon's work superseded the ad hoc methods. Howard Gardner called Shannon's thesis "possibly the most important, the most noted, master's thesis of the century."Shannon received his Ph.
D. degree from MIT in 1940. Vannevar Bush had suggested that Shannon should work on his dissertation at the Cold Spring Harbor Laboratory, in order to develop a mathematical formulation for Mendelian genetics; this research resulted in Shannon's PhD thesis, called An Algebra for Theoretical Genetics. In 1940, Shannon became a National Research Fellow at the Institute for Advanced Study in Princeton, New Jersey. In Princeton, Shannon had the opportunity to discuss his ideas with influential scientists and mathematicians such as Hermann Weyl and John von Neumann, he had occasional encounters with Albert Einstein and Kurt Gödel. Shannon worked across disciplines, this ability may have contributed to his development of mathematical information theory. Shannon joined Bell Labs to work on fire-control systems and cryptography during World War II, under a contract with section D-2 of the National Defense Research Committee. Shannon is credited with the invention of signal-flow graphs, in 1942, he discovered the topological gain formula while investigating the functional operation of an analog computer.
For two months early in 1943, Shannon came into contact with the leading British mathematician Alan Turing. Turing had been posted to Washington to share with the U. S. Navy's cryptanalytic service the methods used by the British Government Code and Cypher School at Bletchley Park to break the ciphers used by the Kriegsmarine U-boats in the north Atlantic Ocean, he was interested in the encipherment of speech and to this end spent time at Bell Labs. Shannon and Turing met at teatime in the cafeteria. Turing showed Shannon his 1936 paper that defined what is now known as the "Universal Turing machine". In 1945, as the war was coming to an end, the NDRC was issuing a summary of technical reports as a last step prior to its eventual closing down. Inside the volume on fire control, a special essay titled Data Smoothing and Prediction in Fire-Control Systems, coauthored by Shannon, Ralph Beebe Blackman, Hendrik Wade Bode, formally treated the problem of smoothing the data in fire-control by analogy with "the problem of separating a signal from interfering noise in communications systems."
In other words, it modeled the problem in terms of data and signal processing and thus heralded the coming of the Information Age. Shannon's w
In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a given number x is the exponent to which another fixed number, the base b, must be raised, to produce that number x. In the simplest case, the logarithm counts repeated multiplication of the same factor; the logarithm of x to base b is denoted as logb . More exponentiation allows any positive real number to be raised to any real power, always producing a positive result, so the logarithm for any two positive real numbers b and x where b is not equal to 1, is always a unique real number y. More explicitly, the defining relation between exponentiation and logarithm is: log b = y if b y = x. For example, log2 64 = 6, as 26 = 64; the logarithm to base 10 is called the common logarithm and has many applications in science and engineering. The natural logarithm has the number e as its base; the binary logarithm uses base 2 and is used in computer science. Logarithms were introduced by John Napier in the early 17th century as a means to simplify calculations.
They were adopted by navigators, scientists and others to perform computations more using slide rules and logarithm tables. Tedious multi-digit multiplication steps can be replaced by table look-ups and simpler addition because of the fact—important in its own right—that the logarithm of a product is the sum of the logarithms of the factors: log b = log b x + log b y, provided that b, x and y are all positive and b ≠ 1; the present-day notion of logarithms comes from Leonhard Euler, who connected them to the exponential function in the 18th century. Logarithmic scales reduce wide-ranging quantities to tiny scopes. For example, the decibel is a unit used to express ratio as logarithms for signal power and amplitude. In chemistry, pH is a logarithmic measure for the acidity of an aqueous solution. Logarithms are commonplace in scientific formulae, in measurements of the complexity of algorithms and of geometric objects called fractals, they help describing frequency ratios of musical intervals, appear in formulas counting prime numbers or approximating factorials, inform some models in psychophysics, can aid in forensic accounting.
In the same way as the logarithm reverses exponentiation, the complex logarithm is the inverse function of the exponential function applied to complex numbers. The discrete logarithm is another variant. Addition and exponentiation are three fundamental arithmetic operations. Addition, the simplest of these, can be undone by subtraction: adding, say, 2 to 3 gives 5; the process of adding 2 can be undone by subtracting 2: 5 − 2 = 3. Multiplication, the next-simplest operation, can be undone by division: doubling a number x, i.e. multiplying x by 2, the result is 2x. To get back x, it is necessary to divide by 2. For example 2 ⋅ 3 = 6 and the process of multiplying by 2 is undone by dividing by 2: 6 / 2 = 3; the idea and purpose of logarithms is to undo a fundamental arithmetic operation, namely raising a number to a certain power, an operation known as exponentiation. For example, raising 2 to the third power yields 8, because 8 is the product of three factors of 2: 2 3 = 2 × 2 × 2 = 8 The logarithm of 8 is 3, reflecting the fact that 2 was raised to the third power to get 8.
This subsection contains a short overview of the exponentiation operation, fundamental to understanding logarithms. Raising b to the n-th power, where n is a natural number, is done by multiplying n factors equal to b; the n-th power of b is written bn, so that b n = b × b × ⋯ × b ⏟ n factors Exponentiation may be extended to by, where b is a positive number and the exponent y is any real number. For example, b − 1 is the reciprocal of b. Raising b to the power 1/2 is the square root of b. More raising b to a rational power p/q, where p and q are integers, is given by b p / q = b p q, the q-th root of bp. Any irrational number y can be approximated to arbitrary precision by rational numbers; this can be used to compute the y-th power of b: for example 2 ≈ 1.414... and