Category:Scripts encoded in Unicode 5.1
Pages in category "Scripts encoded in Unicode 5.1"
The following 11 pages are in this category, out of 11 total. This list may not reflect recent changes (learn more).
The following 11 pages are in this category, out of 11 total. This list may not reflect recent changes (learn more).
1. Script (Unicode) – In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. Some scripts support one and only one writing system and language, for example, other scripts support many different writing systems, for example, the Latin script supports English, French, German, Italian, Vietnamese, Latin itself, and several other languages. Some languages make use of multiple alternate writing systems, thus also use several scripts, in Turkish, the Arabic script was used before the 20th century, but transitioned to Latin in the early part of the 20th century. For a list of languages supported by each script see the list of languages by writing system, more or less complementary to scripts are symbols and Unicode control characters. The unified diacritical characters and unified punctuation characters frequently have the common or inherited script property, Unicode 9.0 defines 135 separate scripts, including 84 modern scripts and 51 ancient or historic scripts. More scripts are in the process for encoding or have been allocated for encoding in roadmaps. When multiple languages make use of the script, there are frequently some differences, particularly in diacritics. For example, Swedish and English both use the Latin script, however, Swedish includes the character ‘å’ while English has no such character. Nor does English make use of the diacritic combining circle above for any character, in general the languages sharing the same scripts share many of the same characters. Despite these peripheral differences in the Swedish and English writing systems they are said to use the same Latin script, so the Unicode abstraction of scripts is a basic organizing technique. The differences between different alphabets or writing systems remain and are supported through Unicode’s flexible scripts, combining marks, writing system is sometimes treated as a synonym for script. However it also can be used as the specific writing system supported by a script. For example, the Vietnamese writing system is supported by the Latin script, a writing system may also cover more than one script, for example the Japanese writing system makes use of the Han, Hiragana and Katakana scripts. The term complex system is used to describe those where the admixture makes classification problematic. Unicode supports all of these types of writing systems through its numerous scripts, Unicode also adds further properties to characters to help differentiate the various characters and the ways they behave within Unicode text processing algorithms. In addition to explicit or specific script properties Unicode uses three values, Common Unicode can assign a character in the UCS to a single script only. However, many characters — those that are not part of a natural language writing system or are unified across many writing systems may be used in more than one script. For example, currency signs, symbols, numerals and punctuation marks, in these cases Unicode defines them as belonging to the common script
2. Unicode – Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the worlds writing systems. As of June 2016, the most recent version is Unicode 9.0, the standard is maintained by the Unicode Consortium. Unicodes success at unifying character sets has led to its widespread, the standard has been implemented in many recent technologies, including modern operating systems, XML, Java, and the. NET Framework. Unicode can be implemented by different character encodings, the most commonly used encodings are UTF-8, UTF-16 and the now-obsolete UCS-2. UTF-8 uses one byte for any ASCII character, all of which have the same values in both UTF-8 and ASCII encoding, and up to four bytes for other characters. UCS-2 uses a 16-bit code unit for each character but cannot encode every character in the current Unicode standard, UTF-16 extends UCS-2, using one 16-bit unit for the characters that were representable in UCS-2 and two 16-bit units to handle each of the additional characters. Many traditional character encodings share a common problem in that they allow bilingual computer processing, Unicode, in intent, encodes the underlying characters—graphemes and grapheme-like units—rather than the variant glyphs for such characters. In the case of Chinese characters, this leads to controversies over distinguishing the underlying character from its variant glyphs. In text processing, Unicode takes the role of providing a unique code point—a number, in other words, Unicode represents a character in an abstract way and leaves the visual rendering to other software, such as a web browser or word processor. This simple aim becomes complicated, however, because of concessions made by Unicodes designers in the hope of encouraging a more rapid adoption of Unicode, the first 256 code points were made identical to the content of ISO-8859-1 so as to make it trivial to convert existing western text. For other examples, see duplicate characters in Unicode and he explained that he name Unicode is intended to suggest a unique, unified, universal encoding. In this document, entitled Unicode 88, Becker outlined a 16-bit character model, Unicode could be roughly described as wide-body ASCII that has been stretched to 16 bits to encompass the characters of all the worlds living languages. In a properly engineered design,16 bits per character are more than sufficient for this purpose, Unicode aims in the first instance at the characters published in modern text, whose number is undoubtedly far below 214 =16,384. By the end of 1990, most of the work on mapping existing character encoding standards had been completed, the Unicode Consortium was incorporated in California on January 3,1991, and in October 1991, the first volume of the Unicode standard was published. The second volume, covering Han ideographs, was published in June 1992, in 1996, a surrogate character mechanism was implemented in Unicode 2.0, so that Unicode was no longer restricted to 16 bits. The Microsoft TrueType specification version 1.0 from 1992 used the name Apple Unicode instead of Unicode for the Platform ID in the naming table, Unicode defines a codespace of 1,114,112 code points in the range 0hex to 10FFFFhex. Normally a Unicode code point is referred to by writing U+ followed by its hexadecimal number, for code points in the Basic Multilingual Plane, four digits are used, for code points outside the BMP, five or six digits are used, as required. Code points in Planes 1 through 16 are accessed as surrogate pairs in UTF-16, within each plane, characters are allocated within named blocks of related characters
3. Cham alphabet – The Cham alphabet is an abugida used to write Cham, an Austronesian language spoken by some 230,000 Chams in Vietnam and Cambodia. It is written left to right, as in English. The Cham script is a descendant of the Brahmi script of India, Cham was one of the first scripts to develop from a Tamil Brahmi script called the Grantha alphabet some time around 200 CE. It came to Southeast Asia as part of the expansion of Hinduism and Buddhism, Hindu stone temples of the Champa civilization contain both Sanskrit and Chamic language stone inscriptions. The earliest inscriptions in Vietnam are found in Mỹ Sơn, a temple dated to around 400 CE. The oldest inscription is written in faulty Sanskrit, after this, inscriptions alternate between Sanskrit and the Cham language of the times. Cham kings studied classical Indian texts such as the Dharmaśāstra and inscriptions make reference to Sanskrit literature, by the 8th century, the Cham script had outgrown Sanskrit and the Cham language was in full use. Most preserved manuscripts focus on rituals, epic battles and poems. Modern Chamic languages have the Southeast Asian areal features of monosyllabicity, tonality, however, they had reached the Southeast Asia mainland disyllabic and non-tonal. The script needed to be altered to meet these changes, the Cham now live in two groups, the Western Cham of Cambodia and the Eastern Cham of Vietnam. For the first millennium AD, the Chamic languages were a chain along the Vietnam coast. The division of Cham into Western and Phan Rang Cham immediately followed the Vietnamese overthrow of the last Cham polity, each uses a distinct variety of the script, although the former are mostly Muslim and now prefer to use the Arabic alphabet. The latter are mostly Hindu and still use the Cham script, during French colonial times, both groups had to use the Latin alphabet. The script is highly valued in Cham culture, but this does not mean that people are learning it. There have been efforts to simplify the spelling and to learning the script. Traditionally, boys learned the script around the age of twelve when they were old, however, women and girls did not typically learn to read. The traditional Indic Cham script is known and used by Vietnams Eastern Cham. As an abugida, Cham writes individual consonants supplemented by obligatory vowel diacritics tacked onto the consonant, most consonant letters, such as, or, includes an inherent vowel which does not need to be written
4. Lepcha alphabet – The Lepcha script, or Róng script, is an abugida used by the Lepcha people to write the Lepcha language. Unusually for an abugida, syllable-final consonants are written as diacritics, Lepcha is derived from the Tibetan script, and may have some Burmese influence. According to tradition, it was devised in the beginning of 18th century by prince Phyagdor Namgyal of the Tibetan dynasty in Sikkim, early Lepcha manuscripts were written vertically, a sign of Chinese influence. When they were written horizontally, the letters remained in their new orientations. This resulted in a method of writing final consonants. As in most other Brahmic scripts, the short vowel /-a/ is not written, other vowels are written with diacritics before, after, the length mark, however, is written over the initial, as well as any final consonant diacritic, and fuses with /-o/ and /-u/. Initial vowels do not have letters, but are written with the vowel diacritics on an &-shaped zero-consonant letter. There are postposed diacritics for medial /-y-/ and /-r-/, which may be combined, for medial /-l-/, however, there are seven dedicated conjunct letters. That is, there is a letter for /kla/ which does not resemble the letter for /ka/. One of the letters, /-ŋ/, is an exception to these patterns. First, unlike the other finals, final /-ŋ/ is written to the left of the initial consonant rather than on top and that is, /kiŋ/ is written ngki. Second, there is no inherent vowel before /-ŋ/, even short /-a-/ must be written and that is, /kaŋ/ is written ngka, rather than ngk as would be expected from the general pattern. Lepcha script was added to the Unicode Standard in April,2008 with the release of version 5.1, the Unicode block for Lepcha is U+1C00–U+1C4F, Leonard van der Kuijp, The Tibetan Script and Derivatives, in Daniels and Bright, The Worlds Writing Systems,1996. Via reocities. com Mingzat - A Lepcha Unicode font based on Jason Glavy’s JG Lepcha
5. Lycian alphabet – The Lycian alphabet was used to write the Lycian language. It was an extension of the Greek alphabet, with half a dozen additional letters for sounds not found in Greek and it was largely similar to the Lydian and the Phrygian alphabets. The Lycian alphabet contains letters for 29 sounds, some sounds are represented by more than one symbol, which is considered one letter. There are six letters, one for each of the four oral vowels of Lycian. Nine of the Lycian letters do not appear to derive from the Greek alphabet, the Lycian alphabet was added to the Unicode Standard in April,2008 with the release of version 5.1. It is encoded in Plane 1, the Unicode block for Lycian is U+10280–U+1029F, Letoon trilingual Lycian language Adiego, I. J. Greek and Lycian. In Christidis, A. F. Arapopoulou, Maria, Chriti, a History of Ancient Greek From the Beginning to Late Antiquity. The Lycians - Volume I, The Lycians in Literary and Epigraphic Sources, roger D. Woodard,2007, The Ancient Languages of Asia Minor. Proposal to encode the Lycian and Lydian scripts in the SMP of the UCS
6. Ol Chiki script – The Ol Chiki script, also known as Ol Cemetʼ, Ol Ciki, Ol, and sometimes as the Santali alphabet, was created in 1925 by Raghunath Murmu for the Santali language. Previously, Santali had been written with the Latin alphabet, the detailed analysis was given by Byomkes Chakrabarti in his Comparative Study of Santali and Bengali. Missionaries brought the Latin script, which is better at representing Santali stops, phonemes and nasal sounds with the use of diacritical marks, unlike most Indic scripts, which are derived from Brahmi, Ol Chiki is not an abugida, with vowels given equal representation with consonants. Additionally, it was designed specifically for the language, but one letter could not be assigned to each phoneme because the vowel in Ol Chiki is still problematic. Ol Chiki has 30 letters, the forms of which are intended to evoke natural shapes and it is written from left to right. Ol Chiki script was added to the Unicode Standard in April,2008 with the release of version 5.1, the Unicode block for Ol Chiki is U+1C50–U+1C7F, Byomkes Chakrabarti Santali Latin alphabet Santali alphabet
7. Rejang script – The Rejang script, sometimes spelt Redjang and locally known as Surat Ulu, is an abugida of the Brahmic family, and is related to other scripts of the region, like Batak, Buginese, and others. Rejang is a member of the related group of Surat Ulu scripts that include the script variants of Bengkulu, Lembak, Lintang, Lebong. Other scripts that are related, and sometimes included in the Surat Ulu group, are Kerinci. The script was in use prior to the introduction of Islam to the Rejang area, the Rejang script is sometimes also known as the KaGaNga script following the first three letters of the alphabet. The term KaGaNga was never used by the users of the script community, canberra, The Australian National University 1964. There are five dialects of Rejang, Lebong, Musi, Kebanagung, Pesisir. Most of its users live in remote rural areas, of whom slightly less than half are literate. The traditional Rejang corpus consists chiefly of ritual texts, medical incantations, Rejang script was added to the Unicode Standard in April,2008 with the release of version 5.1. The Unicode block for Rejang is U+A930–U+A95F, Rencong script Everson, proposal for encoding the Rejang script in the BMP of the UCS