1.
Data compression
–
In signal processing, data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation. Compression can be lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy, no information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information, the process of reducing the size of a data file is referred to as data compression. In the context of data transmission, it is called coding in opposition to channel coding. Compression is useful because it reduces resources required to store and transmit data, computational resources are consumed in the compression process and, usually, in the reversal of the process. Data compression is subject to a space–time complexity trade-off, Lossless data compression algorithms usually exploit statistical redundancy to represent data without losing any information, so that the process is reversible. Lossless compression is possible because most real-world data exhibits statistical redundancy, for example, an image may have areas of color that do not change over several pixels, instead of coding red pixel, red pixel. The data may be encoded as 279 red pixels and this is a basic example of run-length encoding, there are many schemes to reduce file size by eliminating redundancy. The Lempel–Ziv compression methods are among the most popular algorithms for lossless storage, DEFLATE is a variation on LZ optimized for decompression speed and compression ratio, but compression can be slow. DEFLATE is used in PKZIP, Gzip, and PNG, LZW is used in GIF images. LZ methods use a table-based compression model where table entries are substituted for repeated strings of data, for most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded, current LZ-based coding schemes that perform well are Brotli and LZX. LZX is used in Microsofts CAB format, the best modern lossless compressors use probabilistic models, such as prediction by partial matching. The Burrows–Wheeler transform can also be viewed as a form of statistical modelling. The basic task of grammar-based codes is constructing a context-free grammar deriving a single string, sequitur and Re-Pair are practical grammar compression algorithms for which software is publicly available. In a further refinement of the use of probabilistic modelling. Arithmetic coding is a more modern coding technique that uses the mathematical calculations of a machine to produce a string of encoded bits from a series of input data symbols
2.
Dynamic range
–
Dynamic range, abbreviated DR, DNR, or DYR is the ratio between the largest and smallest values that a certain quantity can assume. It is often used in the context of signals, like sound and it is measured either as a ratio or as a base-10 or base-2 logarithmic value of the difference between the smallest and largest signal values, in parallel to the common usage for audio signals. The human senses of sight and hearing have a high dynamic range. A human is capable of hearing anything from a quiet murmur in a room to the sound of the loudest heavy metal concert. Such a difference can exceed 100 dB which represents a factor of 100,000 in amplitude, a human cannot perform these feats of perception at both extremes of the scale at the same time. The eyes take time to adjust to different light levels, the instantaneous dynamic range of human audio perception is similarly subject to masking so that, for example, a whisper cannot be heard in loud surroundings. In practice, it is difficult to achieve the full dynamic range experienced by humans using electronic equipment. For example, a good quality LCD has a range of around 1000,1. Paper reflectance can achieve a range of about 100,1. A professional ENG camcorder such as the Sony Digital Betacam achieves a range of greater than 90 dB in audio recording. A nighttime scene will usually contain duller colours and will often be lit with blue lighting, the dynamic range of human hearing is roughly 140 dB, varying with frequency, from the threshold of hearing to the threshold of pain. The dynamic range of music as normally perceived in a concert hall doesnt exceed 80 dB, the dynamic range differs from the ratio of the maximum to minimum amplitude a given device can record, as a properly dithered recording device can record signals well below the noise RMS amplitude. Digital audio with undithered 20-bit digitization is theoretically capable of 120 dB dynamic range, 24-bit digital audio calculates to 144 dB dynamic range. Multiple noise processes determine the noise floor of a system, noise can be picked up from microphone self-noise, preamp noise, wiring and interconnection noise, media noise, etc. Early 78 rpm phonograph discs had a range of up to 40 dB, soon reduced to 30 dB. Ampex tape recorders in the 1950s achieved 60 dB in practical usage, the peak of professional analog magnetic recording tape technology reached 90 dB dynamic range in the midband frequencies at 3% distortion, or about 80 dB in practical broadband applications. The Dolby SR noise reduction gave a 20 dB further increased range resulting in 110 dB in the midband frequencies at 3% distortion. Specialized bias and record head improvements by Nakamichi and Tandberg combined with Dolby C noise reduction yielded 72 dB dynamic range for the cassette
3.
Adaptive Huffman coding
–
Adaptive Huffman coding is an adaptive coding technique based on Huffman coding. It permits building the code as the symbols are being transmitted, having no knowledge of source distribution. The benefit of one-pass procedure is that the source can be encoded in time, though it becomes more sensitive to transmission errors. There are a number of implementations of this method, the most notable are FGK and it is an online coding technique based on Huffman coding. Having no initial knowledge of frequencies, it permits dynamically adjusting the Huffmans tree as data are being transmitted. In a FGK Huffman tree, an external node, called 0-node, is used to identify a newly-coming character. That is, whenever new data are encountered, output the path to the 0-node followed by the data, for a past-coming character, just output the path of the data in the current Huffmans tree. Most importantly, we have to adjust the FGK Huffman tree if necessary, as the frequency of a datum is increased, the sibling property of the Huffmans tree may be broken. The adjustment is triggered for this reason and it is accomplished by consecutive swappings of nodes, subtrees, or both. The data node is swapped with the node of the same frequency in the Huffmans tree. All ancestor nodes of the node should also be processed in the same manner, since the FGK Algorithm has some drawbacks about the node-or-subtree swapping, Vitter proposed another algorithm to improve it. Some important terminologies & constraints, - Implicit Numbering, It simply means that nodes are numbered in increasing order by level and from left to right. I. e. nodes at bottom level will have low implicit number as compared to upper level nodes and nodes on same level are numbered in increasing order from left to right. Invariant, For each weight w, all leaves of weight w precedes all internal nodes having weight w. Blocks, Nodes of same weight, leader, Highest numbered node in a block. Blocks are interlinked by increasing order of their weights, a leaf block always precedes internal block of same weight, thus maintaining the invariant. NYT is special node and used to represents symbols which are not yet transferred, encoder and decoder start with only the root node, which has the maximum number. In the beginning it is our initial NYT node, when we transmit an NYT symbol, we have to transmit code for the NYT node, then for its generic code. For every symbol that is already in the tree, we only have to transmit code for its leaf node, encoding abb gives 0110000100110001011
4.
Natural number
–
In mathematics, the natural numbers are those used for counting and ordering. In common language, words used for counting are cardinal numbers, texts that exclude zero from the natural numbers sometimes refer to the natural numbers together with zero as the whole numbers, but in other writings, that term is used instead for the integers. These chains of extensions make the natural numbers canonically embedded in the number systems. Properties of the numbers, such as divisibility and the distribution of prime numbers, are studied in number theory. Problems concerning counting and ordering, such as partitioning and enumerations, are studied in combinatorics, the most primitive method of representing a natural number is to put down a mark for each object. Later, a set of objects could be tested for equality, excess or shortage, by striking out a mark, the first major advance in abstraction was the use of numerals to represent numbers. This allowed systems to be developed for recording large numbers, the ancient Egyptians developed a powerful system of numerals with distinct hieroglyphs for 1,10, and all the powers of 10 up to over 1 million. A stone carving from Karnak, dating from around 1500 BC and now at the Louvre in Paris, depicts 276 as 2 hundreds,7 tens, and 6 ones, and similarly for the number 4,622. A much later advance was the development of the idea that 0 can be considered as a number, with its own numeral. The use of a 0 digit in place-value notation dates back as early as 700 BC by the Babylonians, the Olmec and Maya civilizations used 0 as a separate number as early as the 1st century BC, but this usage did not spread beyond Mesoamerica. The use of a numeral 0 in modern times originated with the Indian mathematician Brahmagupta in 628, the first systematic study of numbers as abstractions is usually credited to the Greek philosophers Pythagoras and Archimedes. Some Greek mathematicians treated the number 1 differently than larger numbers, independent studies also occurred at around the same time in India, China, and Mesoamerica. In 19th century Europe, there was mathematical and philosophical discussion about the nature of the natural numbers. A school of Naturalism stated that the numbers were a direct consequence of the human psyche. Henri Poincaré was one of its advocates, as was Leopold Kronecker who summarized God made the integers, in opposition to the Naturalists, the constructivists saw a need to improve the logical rigor in the foundations of mathematics. In the 1860s, Hermann Grassmann suggested a recursive definition for natural numbers thus stating they were not really natural, later, two classes of such formal definitions were constructed, later, they were shown to be equivalent in most practical applications. The second class of definitions was introduced by Giuseppe Peano and is now called Peano arithmetic and it is based on an axiomatization of the properties of ordinal numbers, each natural number has a successor and every non-zero natural number has a unique predecessor. Peano arithmetic is equiconsistent with several systems of set theory
5.
Convolution
–
It has applications that include probability, statistics, computer vision, natural language processing, image and signal processing, engineering, and differential equations. The convolution can be defined for functions on other than Euclidean space. For example, periodic functions, such as the discrete-time Fourier transform, can be defined on a circle, a discrete convolution can be defined for functions on the set of integers. Computing the inverse of the operation is known as deconvolution. The convolution of f and g is written f∗g, using an asterisk or star and it is defined as the integral of the product of the two functions after one is reversed and shifted. As such, it is a kind of integral transform. While the symbol t is used above, it need not represent the time domain, but in that context, the convolution formula can be described as a weighted average of the function f at the moment t where the weighting is given by g simply shifted by amount t. As t changes, the weighting function emphasizes different parts of the input function, for the multi-dimensional formulation of convolution, see Domain of definition. A primarily engineering convention that one sees is, f ∗ g = d e f ∫ − ∞ ∞ f g d τ ⏟. For instance, ƒ*g is equivalent to, but ƒ*g is in fact equivalent to, Convolution describes the output of an important class of operations known as linear time-invariant. See LTI system theory for a derivation of convolution as the result of LTI constraints, in terms of the Fourier transforms of the input and output of an LTI operation, no new frequency components are created. The existing ones are only modified, in other words, the output transform is the pointwise product of the input transform with a third transform. See Convolution theorem for a derivation of that property of convolution, conversely, convolution can be derived as the inverse Fourier transform of the pointwise product of two Fourier transforms. One of the earliest uses of the convolution integral appeared in DAlemberts derivation of Taylors theorem in Recherches sur différents points importants du système du monde, soon thereafter, convolution operations appear in the works of Pierre Simon Laplace, Jean-Baptiste Joseph Fourier, Siméon Denis Poisson, and others. The term itself did not come into use until the 1950s or 60s. Prior to that it was known as faltung, composition product, superposition integral. Yet it appears as early as 1903, though the definition is rather unfamiliar in older uses, the operation, ∫0 t φ ψ d s,0 ≤ t < ∞, is a particular case of composition products considered by the Italian mathematician Vito Volterra in 1913. The summation is called a summation of the function f
6.
Range encoding
–
Range encoding is an entropy coding method defined by G. Nigel N. Martin in a 1979 paper, which effectively rediscovered the FIFO arithmetic code first introduced by Richard Clark Pasco in 1976. After the expiration of the first arithmetic coding patent, range encoding appeared to clearly be free of patent encumbrances and this particularly drove interest in the technique in the open source community. Since that time, patents on various well-known arithmetic coding techniques have also expired, range encoding conceptually encodes all the symbols of the message into one number, unlike Huffman coding which assigns each symbol a bit-pattern and concatenates all the bit-patterns together. Each symbol of the message can then be encoded in turn, the decoder must have the same probability estimation the encoder used, which can either be sent in advance, derived from already transferred data or be part of the compressor and decompressor. When all symbols have been encoded, merely identifying the sub-range is enough to communicate the entire message, suppose we want to encode the message AABA<EOM>, where <EOM> is the end-of-message symbol. Because all five-digit integers starting with 251 fall within our final range, it is one of the three-digit prefixes we could transmit that would unambiguously convey our original message. In practice, however, this is not a problem, because instead of starting with a large range and gradually narrowing it down. After some number of digits have been encoded, the leftmost digits will not change, in the example after encoding just three symbols, we already knew that our final result would start with 2. More digits are shifted in on the right as digits on the left are sent off and this is illustrated in the following code, To finish off we may need to emit a few extra digits. The top digit of low is probably too small so we need to increment it, so first we need to make sure range is large enough. One problem that can occur with the Encode function above is that range might become very small but low and this could result in the interval having insufficient precision to distinguish between all of the symbols in the alphabet. When this happens we need to fudge a little, output the first couple of digits even though we might be off by one, the decoder will be following the same steps so it will know when it needs to do this to keep in sync. Base 10 was used in example, but a real implementation would just use binary. Instead of 10000 and 1000 you would likely use hexadecimal constants such as 0x1000000 and 0x10000, instead of emitting a digit at a time you would emit a byte at a time and use a byte-shift operation instead of multiplying by 10. Decoding uses exactly the same algorithm with the addition of keeping track of the current code value consisting of the digits read from the compressor. Instead of emitting the top digit of low you just throw it away, use AppendDigit below instead of EmitDigit. In order to determine which probability intervals to apply, the needs to look at the current value of code within the interval [low, low+range). For the AABA<EOM> example above, this would return a value in the range 0 to 9, values 0 through 5 would represent A,6 and 7 would represent B, and 8 and 9 would represent <EOM>
7.
Tunstall coding
–
In computer science and information theory, Tunstall coding is a form of entropy coding used for lossless data compression. Tunstall coding was the subject of Brian Parker Tunstalls PhD thesis in 1967, the subject of that thesis was Synthesis of noiseless compression codes Its design is a precursor to Lempel-Ziv. Unlike variable-length codes, which include Huffman and Lempel–Ziv coding, Tunstall coding is a code which maps source symbols to a number of bits. Both Tunstall codes and Lempel-Ziv codes represent variable-length words by fixed-length codes, unlike typical set encoding, Tunstall coding parses a stochastic source with codewords of variable length. It can be shown that, for a large dictionary, the number of bits per source letter can be infinitely close to H. The algorithm requires as input an input alphabet U, along with a distribution of probabilities for each word input and it also requires an arbitrary constant C, which is an upper bound to the size of the dictionary that it will compute. The dictionary in question, D, is constructed as a tree of probabilities, the algorithm goes like this, D, = tree of | U | leaves, one for each letter in U. While | D | < C, Convert most probable leaf to tree with | U | leaves, lets imagine that we wish to encode the string hello, world. Lets further assume that the input alphabet U contains only characters from the string hello, world — that is and we can therefore compute the probability of each character based on its statistical appearance in the input string. For instance, the letter L appears thrice in a string of 12 characters and we initialize the tree, starting with a tree of | U | =9 leaves. Each word is therefore directly associated to a letter of the alphabet, the 9 words that we thus obtain can be encoded into a fixed-sized output of ⌈ log 2 ⌉ =4 bits. We then take the leaf of highest probability, and convert it to yet another tree of | U | =9 leaves and we re-compute the probabilities of those leaves. For instance, the sequence of two letters L happens once, given that there are three occurrences of letters followed by an L, the resulting probability is 13 ⋅312 =112. We obtain 17 words, which can each be encoded into an output of ⌈ log 2 ⌉ =5 bits. Note that we could iterate further, increasing the number of words by | U | −1 =8 every time, Tunstall coding requires the algorithm to know, prior to the parsing operation, what the distribution of probabilities for each letter of the alphabet is. This issue is shared with Huffman coding and its requiring a fixed-length block output makes it lesser than Lempel-Ziv, which has a similar dictionary-based design, but with a variable-sized block output