Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the worlds writing systems. As of June 2016, the most recent version is Unicode 9.0, the standard is maintained by the Unicode Consortium. Unicodes success at unifying character sets has led to its widespread, the standard has been implemented in many recent technologies, including modern operating systems, XML, Java, and the. NET Framework. Unicode can be implemented by different character encodings, the most commonly used encodings are UTF-8, UTF-16 and the now-obsolete UCS-2. UTF-8 uses one byte for any ASCII character, all of which have the same values in both UTF-8 and ASCII encoding, and up to four bytes for other characters. UCS-2 uses a 16-bit code unit for each character but cannot encode every character in the current Unicode standard, UTF-16 extends UCS-2, using one 16-bit unit for the characters that were representable in UCS-2 and two 16-bit units to handle each of the additional characters. Many traditional character encodings share a common problem in that they allow bilingual computer processing, Unicode, in intent, encodes the underlying characters—graphemes and grapheme-like units—rather than the variant glyphs for such characters. In the case of Chinese characters, this leads to controversies over distinguishing the underlying character from its variant glyphs. In text processing, Unicode takes the role of providing a unique code point—a number, in other words, Unicode represents a character in an abstract way and leaves the visual rendering to other software, such as a web browser or word processor. This simple aim becomes complicated, however, because of concessions made by Unicodes designers in the hope of encouraging a more rapid adoption of Unicode, the first 256 code points were made identical to the content of ISO-8859-1 so as to make it trivial to convert existing western text. For other examples, see duplicate characters in Unicode and he explained that he name Unicode is intended to suggest a unique, unified, universal encoding. In this document, entitled Unicode 88, Becker outlined a 16-bit character model, Unicode could be roughly described as wide-body ASCII that has been stretched to 16 bits to encompass the characters of all the worlds living languages. In a properly engineered design,16 bits per character are more than sufficient for this purpose, Unicode aims in the first instance at the characters published in modern text, whose number is undoubtedly far below 214 =16,384. By the end of 1990, most of the work on mapping existing character encoding standards had been completed, the Unicode Consortium was incorporated in California on January 3,1991, and in October 1991, the first volume of the Unicode standard was published. The second volume, covering Han ideographs, was published in June 1992, in 1996, a surrogate character mechanism was implemented in Unicode 2.0, so that Unicode was no longer restricted to 16 bits. The Microsoft TrueType specification version 1.0 from 1992 used the name Apple Unicode instead of Unicode for the Platform ID in the naming table, Unicode defines a codespace of 1,114,112 code points in the range 0hex to 10FFFFhex. Normally a Unicode code point is referred to by writing U+ followed by its hexadecimal number, for code points in the Basic Multilingual Plane, four digits are used, for code points outside the BMP, five or six digits are used, as required. Code points in Planes 1 through 16 are accessed as surrogate pairs in UTF-16, within each plane, characters are allocated within named blocks of related characters
Many modern applications can render a substantial subset of the many scripts in Unicode, as demonstrated by this screenshot from the OpenOffice.org application.
Various Cyrillic characters shown with and without italics.