1.
Claude Shannon
–
Claude Elwood Shannon was an American mathematician, electrical engineer, and cryptographer known as the father of information theory. Shannon is noted for having founded information theory with a paper, A Mathematical Theory of Communication. Shannon contributed to the field of cryptanalysis for national defense during World War II, including his work on codebreaking. Shannon was born in Petoskey, Michigan and grew up in Gaylord and his father, Claude, Sr. a descendant of early settlers of New Jersey, was a self-made businessman, and for a while, a Judge of Probate. Shannons mother, Mabel Wolf Shannon, was a language teacher, most of the first 16 years of Shannons life were spent in Gaylord, where he attended public school, graduating from Gaylord High School in 1932. Shannon showed an inclination towards mechanical and electrical things and his best subjects were science and mathematics. At home he constructed such devices as models of planes, a model boat. While growing up, he worked under Andrew Coltrey as a messenger for the Western Union company. His childhood hero was Thomas Edison, who he learned was a distant cousin. Both were descendants of John Ogden, a leader and an ancestor of many distinguished people. Shannon was apolitical and an atheist, in 1932, Shannon entered the University of Michigan, where he was introduced to the work of George Boole. He graduated in 1936 with two degrees, one in electrical engineering and the other in mathematics. In 1936, Shannon began his studies in electrical engineering at MIT, where he worked on Vannevar Bushs differential analyzer. While studying the complicated ad hoc circuits of this analyzer, Shannon designed switching circuits based on Booles concepts, in 1937, he wrote his masters degree thesis, A Symbolic Analysis of Relay and Switching Circuits, A paper from this thesis was published in 1938. In this work, Shannon proved that his switching circuits could be used to simplify the arrangement of the electromechanical relays that were used then in telephone call routing switches, next, he expanded this concept, proving that these circuits could solve all problems that Boolean algebra could solve. In the last chapter, he presents diagrams of several circuits, using this property of electrical switches to implement logic is the fundamental concept that underlies all electronic digital computers. Shannons work became the foundation of digital design, as it became widely known in the electrical engineering community during. The theoretical rigor of Shannons work superseded the ad hoc methods that had prevailed previously, howard Gardner called Shannons thesis possibly the most important, and also the most noted, masters thesis of the century
2.
Data compression
–
In signal processing, data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation. Compression can be lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy, no information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information, the process of reducing the size of a data file is referred to as data compression. In the context of data transmission, it is called coding in opposition to channel coding. Compression is useful because it reduces resources required to store and transmit data, computational resources are consumed in the compression process and, usually, in the reversal of the process. Data compression is subject to a space–time complexity trade-off, Lossless data compression algorithms usually exploit statistical redundancy to represent data without losing any information, so that the process is reversible. Lossless compression is possible because most real-world data exhibits statistical redundancy, for example, an image may have areas of color that do not change over several pixels, instead of coding red pixel, red pixel. The data may be encoded as 279 red pixels and this is a basic example of run-length encoding, there are many schemes to reduce file size by eliminating redundancy. The Lempel–Ziv compression methods are among the most popular algorithms for lossless storage, DEFLATE is a variation on LZ optimized for decompression speed and compression ratio, but compression can be slow. DEFLATE is used in PKZIP, Gzip, and PNG, LZW is used in GIF images. LZ methods use a table-based compression model where table entries are substituted for repeated strings of data, for most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded, current LZ-based coding schemes that perform well are Brotli and LZX. LZX is used in Microsofts CAB format, the best modern lossless compressors use probabilistic models, such as prediction by partial matching. The Burrows–Wheeler transform can also be viewed as a form of statistical modelling. The basic task of grammar-based codes is constructing a context-free grammar deriving a single string, sequitur and Re-Pair are practical grammar compression algorithms for which software is publicly available. In a further refinement of the use of probabilistic modelling. Arithmetic coding is a more modern coding technique that uses the mathematical calculations of a machine to produce a string of encoded bits from a series of input data symbols
3.
Algorithm
–
In mathematics and computer science, an algorithm is a self-contained sequence of actions to be performed. Algorithms can perform calculation, data processing and automated reasoning tasks, an algorithm is an effective method that can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function. The transition from one state to the next is not necessarily deterministic, some algorithms, known as randomized algorithms, giving a formal definition of algorithms, corresponding to the intuitive notion, remains a challenging problem. In English, it was first used in about 1230 and then by Chaucer in 1391, English adopted the French term, but it wasnt until the late 19th century that algorithm took on the meaning that it has in modern English. Another early use of the word is from 1240, in a manual titled Carmen de Algorismo composed by Alexandre de Villedieu and it begins thus, Haec algorismus ars praesens dicitur, in qua / Talibus Indorum fruimur bis quinque figuris. Which translates as, Algorism is the art by which at present we use those Indian figures, the poem is a few hundred lines long and summarizes the art of calculating with the new style of Indian dice, or Talibus Indorum, or Hindu numerals. An informal definition could be a set of rules that precisely defines a sequence of operations, which would include all computer programs, including programs that do not perform numeric calculations. Generally, a program is only an algorithm if it stops eventually, but humans can do something equally useful, in the case of certain enumerably infinite sets, They can give explicit instructions for determining the nth member of the set, for arbitrary finite n. An enumerably infinite set is one whose elements can be put into one-to-one correspondence with the integers, the concept of algorithm is also used to define the notion of decidability. That notion is central for explaining how formal systems come into being starting from a set of axioms. In logic, the time that an algorithm requires to complete cannot be measured, from such uncertainties, that characterize ongoing work, stems the unavailability of a definition of algorithm that suits both concrete and abstract usage of the term. Algorithms are essential to the way computers process data, thus, an algorithm can be considered to be any sequence of operations that can be simulated by a Turing-complete system. Although this may seem extreme, the arguments, in its favor are hard to refute. Gurevich. Turings informal argument in favor of his thesis justifies a stronger thesis, according to Savage, an algorithm is a computational process defined by a Turing machine. Typically, when an algorithm is associated with processing information, data can be read from a source, written to an output device. Stored data are regarded as part of the state of the entity performing the algorithm. In practice, the state is stored in one or more data structures, for some such computational process, the algorithm must be rigorously defined, specified in the way it applies in all possible circumstances that could arise. That is, any conditional steps must be dealt with, case-by-case
4.
Adaptive Huffman coding
–
Adaptive Huffman coding is an adaptive coding technique based on Huffman coding. It permits building the code as the symbols are being transmitted, having no knowledge of source distribution. The benefit of one-pass procedure is that the source can be encoded in time, though it becomes more sensitive to transmission errors. There are a number of implementations of this method, the most notable are FGK and it is an online coding technique based on Huffman coding. Having no initial knowledge of frequencies, it permits dynamically adjusting the Huffmans tree as data are being transmitted. In a FGK Huffman tree, an external node, called 0-node, is used to identify a newly-coming character. That is, whenever new data are encountered, output the path to the 0-node followed by the data, for a past-coming character, just output the path of the data in the current Huffmans tree. Most importantly, we have to adjust the FGK Huffman tree if necessary, as the frequency of a datum is increased, the sibling property of the Huffmans tree may be broken. The adjustment is triggered for this reason and it is accomplished by consecutive swappings of nodes, subtrees, or both. The data node is swapped with the node of the same frequency in the Huffmans tree. All ancestor nodes of the node should also be processed in the same manner, since the FGK Algorithm has some drawbacks about the node-or-subtree swapping, Vitter proposed another algorithm to improve it. Some important terminologies & constraints, - Implicit Numbering, It simply means that nodes are numbered in increasing order by level and from left to right. I. e. nodes at bottom level will have low implicit number as compared to upper level nodes and nodes on same level are numbered in increasing order from left to right. Invariant, For each weight w, all leaves of weight w precedes all internal nodes having weight w. Blocks, Nodes of same weight, leader, Highest numbered node in a block. Blocks are interlinked by increasing order of their weights, a leaf block always precedes internal block of same weight, thus maintaining the invariant. NYT is special node and used to represents symbols which are not yet transferred, encoder and decoder start with only the root node, which has the maximum number. In the beginning it is our initial NYT node, when we transmit an NYT symbol, we have to transmit code for the NYT node, then for its generic code. For every symbol that is already in the tree, we only have to transmit code for its leaf node, encoding abb gives 0110000100110001011
5.
Convolution
–
It has applications that include probability, statistics, computer vision, natural language processing, image and signal processing, engineering, and differential equations. The convolution can be defined for functions on other than Euclidean space. For example, periodic functions, such as the discrete-time Fourier transform, can be defined on a circle, a discrete convolution can be defined for functions on the set of integers. Computing the inverse of the operation is known as deconvolution. The convolution of f and g is written f∗g, using an asterisk or star and it is defined as the integral of the product of the two functions after one is reversed and shifted. As such, it is a kind of integral transform. While the symbol t is used above, it need not represent the time domain, but in that context, the convolution formula can be described as a weighted average of the function f at the moment t where the weighting is given by g simply shifted by amount t. As t changes, the weighting function emphasizes different parts of the input function, for the multi-dimensional formulation of convolution, see Domain of definition. A primarily engineering convention that one sees is, f ∗ g = d e f ∫ − ∞ ∞ f g d τ ⏟. For instance, ƒ*g is equivalent to, but ƒ*g is in fact equivalent to, Convolution describes the output of an important class of operations known as linear time-invariant. See LTI system theory for a derivation of convolution as the result of LTI constraints, in terms of the Fourier transforms of the input and output of an LTI operation, no new frequency components are created. The existing ones are only modified, in other words, the output transform is the pointwise product of the input transform with a third transform. See Convolution theorem for a derivation of that property of convolution, conversely, convolution can be derived as the inverse Fourier transform of the pointwise product of two Fourier transforms. One of the earliest uses of the convolution integral appeared in DAlemberts derivation of Taylors theorem in Recherches sur différents points importants du système du monde, soon thereafter, convolution operations appear in the works of Pierre Simon Laplace, Jean-Baptiste Joseph Fourier, Siméon Denis Poisson, and others. The term itself did not come into use until the 1950s or 60s. Prior to that it was known as faltung, composition product, superposition integral. Yet it appears as early as 1903, though the definition is rather unfamiliar in older uses, the operation, ∫0 t φ ψ d s,0 ≤ t < ∞, is a particular case of composition products considered by the Italian mathematician Vito Volterra in 1913. The summation is called a summation of the function f
6.
Range encoding
–
Range encoding is an entropy coding method defined by G. Nigel N. Martin in a 1979 paper, which effectively rediscovered the FIFO arithmetic code first introduced by Richard Clark Pasco in 1976. After the expiration of the first arithmetic coding patent, range encoding appeared to clearly be free of patent encumbrances and this particularly drove interest in the technique in the open source community. Since that time, patents on various well-known arithmetic coding techniques have also expired, range encoding conceptually encodes all the symbols of the message into one number, unlike Huffman coding which assigns each symbol a bit-pattern and concatenates all the bit-patterns together. Each symbol of the message can then be encoded in turn, the decoder must have the same probability estimation the encoder used, which can either be sent in advance, derived from already transferred data or be part of the compressor and decompressor. When all symbols have been encoded, merely identifying the sub-range is enough to communicate the entire message, suppose we want to encode the message AABA<EOM>, where <EOM> is the end-of-message symbol. Because all five-digit integers starting with 251 fall within our final range, it is one of the three-digit prefixes we could transmit that would unambiguously convey our original message. In practice, however, this is not a problem, because instead of starting with a large range and gradually narrowing it down. After some number of digits have been encoded, the leftmost digits will not change, in the example after encoding just three symbols, we already knew that our final result would start with 2. More digits are shifted in on the right as digits on the left are sent off and this is illustrated in the following code, To finish off we may need to emit a few extra digits. The top digit of low is probably too small so we need to increment it, so first we need to make sure range is large enough. One problem that can occur with the Encode function above is that range might become very small but low and this could result in the interval having insufficient precision to distinguish between all of the symbols in the alphabet. When this happens we need to fudge a little, output the first couple of digits even though we might be off by one, the decoder will be following the same steps so it will know when it needs to do this to keep in sync. Base 10 was used in example, but a real implementation would just use binary. Instead of 10000 and 1000 you would likely use hexadecimal constants such as 0x1000000 and 0x10000, instead of emitting a digit at a time you would emit a byte at a time and use a byte-shift operation instead of multiplying by 10. Decoding uses exactly the same algorithm with the addition of keeping track of the current code value consisting of the digits read from the compressor. Instead of emitting the top digit of low you just throw it away, use AppendDigit below instead of EmitDigit. In order to determine which probability intervals to apply, the needs to look at the current value of code within the interval [low, low+range). For the AABA<EOM> example above, this would return a value in the range 0 to 9, values 0 through 5 would represent A,6 and 7 would represent B, and 8 and 9 would represent <EOM>
7.
Tunstall coding
–
In computer science and information theory, Tunstall coding is a form of entropy coding used for lossless data compression. Tunstall coding was the subject of Brian Parker Tunstalls PhD thesis in 1967, the subject of that thesis was Synthesis of noiseless compression codes Its design is a precursor to Lempel-Ziv. Unlike variable-length codes, which include Huffman and Lempel–Ziv coding, Tunstall coding is a code which maps source symbols to a number of bits. Both Tunstall codes and Lempel-Ziv codes represent variable-length words by fixed-length codes, unlike typical set encoding, Tunstall coding parses a stochastic source with codewords of variable length. It can be shown that, for a large dictionary, the number of bits per source letter can be infinitely close to H. The algorithm requires as input an input alphabet U, along with a distribution of probabilities for each word input and it also requires an arbitrary constant C, which is an upper bound to the size of the dictionary that it will compute. The dictionary in question, D, is constructed as a tree of probabilities, the algorithm goes like this, D, = tree of | U | leaves, one for each letter in U. While | D | < C, Convert most probable leaf to tree with | U | leaves, lets imagine that we wish to encode the string hello, world. Lets further assume that the input alphabet U contains only characters from the string hello, world — that is and we can therefore compute the probability of each character based on its statistical appearance in the input string. For instance, the letter L appears thrice in a string of 12 characters and we initialize the tree, starting with a tree of | U | =9 leaves. Each word is therefore directly associated to a letter of the alphabet, the 9 words that we thus obtain can be encoded into a fixed-sized output of ⌈ log 2 ⌉ =4 bits. We then take the leaf of highest probability, and convert it to yet another tree of | U | =9 leaves and we re-compute the probabilities of those leaves. For instance, the sequence of two letters L happens once, given that there are three occurrences of letters followed by an L, the resulting probability is 13 ⋅312 =112. We obtain 17 words, which can each be encoded into an output of ⌈ log 2 ⌉ =5 bits. Note that we could iterate further, increasing the number of words by | U | −1 =8 every time, Tunstall coding requires the algorithm to know, prior to the parsing operation, what the distribution of probabilities for each letter of the alphabet is. This issue is shared with Huffman coding and its requiring a fixed-length block output makes it lesser than Lempel-Ziv, which has a similar dictionary-based design, but with a variable-sized block output
8.
Companding
–
In telecommunication and signal processing companding is a method of mitigating the detrimental effects of a channel with limited dynamic range. The name is a combination of the words compressing and expanding, the use of companding allows signals with a large dynamic range to be transmitted over facilities that have a smaller dynamic range capability. Companding is employed in telephony and other applications such as professional wireless microphones. The dynamic range of a signal is compressed before transmission and is expanded to the value at the receiver. The electronic circuit that does this is called a compander and works by compressing or expanding the range of an analog electronic signal such as sound recorded by a microphone. One variety is a triplet of amplifiers, an amplifier, followed by a variable-gain linear amplifier. Such a triplet has the property that its voltage is proportional to the input voltage raised to an adjustable power. This type of quantization is used in telephony systems. The two most popular compander functions used for telecommunications are the A-law and μ-law functions, companding is used in digital telephony systems, compressing before input to an analog-to-digital converter, and then expanding after a digital-to-analog converter. This is equivalent to using a non-linear ADC as in a T-carrier telephone system that implements A-law or μ-law companding and this method is also used in digital file formats for better signal-to-noise ratio at lower bit rates. This is effectively a form of audio data compression. Professional wireless microphones do this since the range of the microphone audio signal itself is larger than the dynamic range provided by radio transmission. Companding also reduces the noise and crosstalk levels at the receiver, companders are used in concert audio systems and in some noise reduction schemes such as dbx and Dolby NR. The use of companding in a picture transmission system was patented by A. B. In 1942, Clark and his team completed the SIGSALY secure voice system that included the first use of companding in a PCM system. In 1953, B. Smith showed that a nonlinear DAC could be complemented by the nonlinearity in a successive-approximation ADC configuration. In 1970, H. Kaneko developed the uniform description of segment companding laws that had by then adopted in digital telephony. In the 1980s, many of the equipment manufacturers used companding when compressing the library waveform data in their digital synthesizers
9.
Huffman coding
–
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding and/or using such a code proceeds by means of Huffman coding, student at MIT, and published in the 1952 paper A Method for the Construction of Minimum-Redundancy Codes. The output from Huffmans algorithm can be viewed as a code table for encoding a source symbol. The algorithm derives this table from the probability or frequency of occurrence for each possible value of the source symbol. As in other entropy encoding methods, more common symbols are represented using fewer bits than less common symbols. Huffmans method can be implemented, finding a code in time linear to the number of input weights if these weights are sorted. However, although optimal among methods encoding symbols separately, Huffman coding is not always optimal among all compression methods, specifically, Huffman coding is optimal only if the probabilities of symbols are natural powers of 1/2. This is usually not the case and this sub-optimality is repaired in arithmetic coding and recent faster Asymmetric Numeral Systems family of entropy codings. In 1951, David A. Huffman and his MIT information theory classmates were given the choice of a paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding the most efficient binary code, in doing so, Huffman outdid Fano, who had worked with information theory inventor Claude Shannon to develop a similar code. By building the tree from the bottom up instead of the top down, Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code. Given A set of symbols and their weights, find A prefix-free binary code with minimum expected codeword length. Alphabet A =, which is the alphabet of size n. Set W =, which is the set of the symbol weights, code C =, which is the tuple of codewords, where c i is the codeword for a i,1 ≤ i ≤ n. Goal. Let L = ∑ i =1 n w i × l e n g t h be the path length of code C. Condition, L ≤ L for any code T and we give an example of the result of Huffman coding for a code with five characters and given weights. We will not verify that it minimizes L over all codes, but we will compute L and compare it to the Shannon entropy H of the set of weights. For any code that is biunique, meaning that the code is uniquely decodeable, in this example, the sum is strictly equal to one, as a result, the code is termed a complete code