Category:Scripts encoded in Unicode 5.1
Pages in category "Scripts encoded in Unicode 5.1"
The following 11 pages are in this category, out of 11 total. This list may not reflect recent changes (learn more).
The following 11 pages are in this category, out of 11 total. This list may not reflect recent changes (learn more).
1. Script (Unicode) – In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. Some scripts support one and only one writing system and language, for example, other scripts support many different writing systems, for example, the Latin script supports English, French, German, Italian, Vietnamese, Latin itself, and several other languages. Some languages make use of multiple alternate writing systems, thus also use several scripts, in Turkish, the Arabic script was used before the 20th century, but transitioned to Latin in the early part of the 20th century. For a list of languages supported by each script see the list of languages by writing system, more or less complementary to scripts are symbols and Unicode control characters. The unified diacritical characters and unified punctuation characters frequently have the common or inherited script property, Unicode 9.0 defines 135 separate scripts, including 84 modern scripts and 51 ancient or historic scripts. More scripts are in the process for encoding or have been allocated for encoding in roadmaps. When multiple languages make use of the script, there are frequently some differences, particularly in diacritics. For example, Swedish and English both use the Latin script, however, Swedish includes the character ‘å’ while English has no such character. Nor does English make use of the diacritic combining circle above for any character, in general the languages sharing the same scripts share many of the same characters. Despite these peripheral differences in the Swedish and English writing systems they are said to use the same Latin script, so the Unicode abstraction of scripts is a basic organizing technique. The differences between different alphabets or writing systems remain and are supported through Unicode’s flexible scripts, combining marks, writing system is sometimes treated as a synonym for script. However it also can be used as the specific writing system supported by a script. For example, the Vietnamese writing system is supported by the Latin script, a writing system may also cover more than one script, for example the Japanese writing system makes use of the Han, Hiragana and Katakana scripts. The term complex system is used to describe those where the admixture makes classification problematic. Unicode supports all of these types of writing systems through its numerous scripts, Unicode also adds further properties to characters to help differentiate the various characters and the ways they behave within Unicode text processing algorithms. In addition to explicit or specific script properties Unicode uses three values, Common Unicode can assign a character in the UCS to a single script only. However, many characters — those that are not part of a natural language writing system or are unified across many writing systems may be used in more than one script. For example, currency signs, symbols, numerals and punctuation marks, in these cases Unicode defines them as belonging to the common script
2. Unicode – Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the worlds writing systems. As of June 2016, the most recent version is Unicode 9.0, the standard is maintained by the Unicode Consortium. Unicodes success at unifying character sets has led to its widespread, the standard has been implemented in many recent technologies, including modern operating systems, XML, Java, and the. NET Framework. Unicode can be implemented by different character encodings, the most commonly used encodings are UTF-8, UTF-16 and the now-obsolete UCS-2. UTF-8 uses one byte for any ASCII character, all of which have the same values in both UTF-8 and ASCII encoding, and up to four bytes for other characters. UCS-2 uses a 16-bit code unit for each character but cannot encode every character in the current Unicode standard, UTF-16 extends UCS-2, using one 16-bit unit for the characters that were representable in UCS-2 and two 16-bit units to handle each of the additional characters. Many traditional character encodings share a common problem in that they allow bilingual computer processing, Unicode, in intent, encodes the underlying characters—graphemes and grapheme-like units—rather than the variant glyphs for such characters. In the case of Chinese characters, this leads to controversies over distinguishing the underlying character from its variant glyphs. In text processing, Unicode takes the role of providing a unique code point—a number, in other words, Unicode represents a character in an abstract way and leaves the visual rendering to other software, such as a web browser or word processor. This simple aim becomes complicated, however, because of concessions made by Unicodes designers in the hope of encouraging a more rapid adoption of Unicode, the first 256 code points were made identical to the content of ISO-8859-1 so as to make it trivial to convert existing western text. For other examples, see duplicate characters in Unicode and he explained that he name Unicode is intended to suggest a unique, unified, universal encoding. In this document, entitled Unicode 88, Becker outlined a 16-bit character model, Unicode could be roughly described as wide-body ASCII that has been stretched to 16 bits to encompass the characters of all the worlds living languages. In a properly engineered design,16 bits per character are more than sufficient for this purpose, Unicode aims in the first instance at the characters published in modern text, whose number is undoubtedly far below 214 =16,384. By the end of 1990, most of the work on mapping existing character encoding standards had been completed, the Unicode Consortium was incorporated in California on January 3,1991, and in October 1991, the first volume of the Unicode standard was published. The second volume, covering Han ideographs, was published in June 1992, in 1996, a surrogate character mechanism was implemented in Unicode 2.0, so that Unicode was no longer restricted to 16 bits. The Microsoft TrueType specification version 1.0 from 1992 used the name Apple Unicode instead of Unicode for the Platform ID in the naming table, Unicode defines a codespace of 1,114,112 code points in the range 0hex to 10FFFFhex. Normally a Unicode code point is referred to by writing U+ followed by its hexadecimal number, for code points in the Basic Multilingual Plane, four digits are used, for code points outside the BMP, five or six digits are used, as required. Code points in Planes 1 through 16 are accessed as surrogate pairs in UTF-16, within each plane, characters are allocated within named blocks of related characters
3. Cham alphabet – The Cham alphabet is an abugida used to write Cham, an Austronesian language spoken by some 230,000 Chams in Vietnam and Cambodia. It is written left to right, as in English. The Cham script is a descendant of the Brahmi script of India, Cham was one of the first scripts to develop from a Tamil Brahmi script called the Grantha alphabet some time around 200 CE. It came to Southeast Asia as part of the expansion of Hinduism and Buddhism, Hindu stone temples of the Champa civilization contain both Sanskrit and Chamic language stone inscriptions. The earliest inscriptions in Vietnam are found in Mỹ Sơn, a temple dated to around 400 CE. The oldest inscription is written in faulty Sanskrit, after this, inscriptions alternate between Sanskrit and the Cham language of the times. Cham kings studied classical Indian texts such as the Dharmaśāstra and inscriptions make reference to Sanskrit literature, by the 8th century, the Cham script had outgrown Sanskrit and the Cham language was in full use. Most preserved manuscripts focus on rituals, epic battles and poems. Modern Chamic languages have the Southeast Asian areal features of monosyllabicity, tonality, however, they had reached the Southeast Asia mainland disyllabic and non-tonal. The script needed to be altered to meet these changes, the Cham now live in two groups, the Western Cham of Cambodia and the Eastern Cham of Vietnam. For the first millennium AD, the Chamic languages were a chain along the Vietnam coast. The division of Cham into Western and Phan Rang Cham immediately followed the Vietnamese overthrow of the last Cham polity, each uses a distinct variety of the script, although the former are mostly Muslim and now prefer to use the Arabic alphabet. The latter are mostly Hindu and still use the Cham script, during French colonial times, both groups had to use the Latin alphabet. The script is highly valued in Cham culture, but this does not mean that people are learning it. There have been efforts to simplify the spelling and to learning the script. Traditionally, boys learned the script around the age of twelve when they were old, however, women and girls did not typically learn to read. The traditional Indic Cham script is known and used by Vietnams Eastern Cham. As an abugida, Cham writes individual consonants supplemented by obligatory vowel diacritics tacked onto the consonant, most consonant letters, such as, or, includes an inherent vowel which does not need to be written
4. Lepcha alphabet – The Lepcha script, or Róng script, is an abugida used by the Lepcha people to write the Lepcha language. Unusually for an abugida, syllable-final consonants are written as diacritics, Lepcha is derived from the Tibetan script, and may have some Burmese influence. According to tradition, it was devised in the beginning of 18th century by prince Phyagdor Namgyal of the Tibetan dynasty in Sikkim, early Lepcha manuscripts were written vertically, a sign of Chinese influence. When they were written horizontally, the letters remained in their new orientations. This resulted in a method of writing final consonants. As in most other Brahmic scripts, the short vowel /-a/ is not written, other vowels are written with diacritics before, after, the length mark, however, is written over the initial, as well as any final consonant diacritic, and fuses with /-o/ and /-u/. Initial vowels do not have letters, but are written with the vowel diacritics on an &-shaped zero-consonant letter. There are postposed diacritics for medial /-y-/ and /-r-/, which may be combined, for medial /-l-/, however, there are seven dedicated conjunct letters. That is, there is a letter for /kla/ which does not resemble the letter for /ka/. One of the letters, /-ŋ/, is an exception to these patterns. First, unlike the other finals, final /-ŋ/ is written to the left of the initial consonant rather than on top and that is, /kiŋ/ is written ngki. Second, there is no inherent vowel before /-ŋ/, even short /-a-/ must be written and that is, /kaŋ/ is written ngka, rather than ngk as would be expected from the general pattern. Lepcha script was added to the Unicode Standard in April,2008 with the release of version 5.1, the Unicode block for Lepcha is U+1C00–U+1C4F, Leonard van der Kuijp, The Tibetan Script and Derivatives, in Daniels and Bright, The Worlds Writing Systems,1996. Via reocities. com Mingzat - A Lepcha Unicode font based on Jason Glavy’s JG Lepcha
5. Lycian alphabet – The Lycian alphabet was used to write the Lycian language. It was an extension of the Greek alphabet, with half a dozen additional letters for sounds not found in Greek and it was largely similar to the Lydian and the Phrygian alphabets. The Lycian alphabet contains letters for 29 sounds, some sounds are represented by more than one symbol, which is considered one letter. There are six letters, one for each of the four oral vowels of Lycian. Nine of the Lycian letters do not appear to derive from the Greek alphabet, the Lycian alphabet was added to the Unicode Standard in April,2008 with the release of version 5.1. It is encoded in Plane 1, the Unicode block for Lycian is U+10280–U+1029F, Letoon trilingual Lycian language Adiego, I. J. Greek and Lycian. In Christidis, A. F. Arapopoulou, Maria, Chriti, a History of Ancient Greek From the Beginning to Late Antiquity. The Lycians - Volume I, The Lycians in Literary and Epigraphic Sources, roger D. Woodard,2007, The Ancient Languages of Asia Minor. Proposal to encode the Lycian and Lydian scripts in the SMP of the UCS
6. Ol Chiki script – The Ol Chiki script, also known as Ol Cemetʼ, Ol Ciki, Ol, and sometimes as the Santali alphabet, was created in 1925 by Raghunath Murmu for the Santali language. Previously, Santali had been written with the Latin alphabet, the detailed analysis was given by Byomkes Chakrabarti in his Comparative Study of Santali and Bengali. Missionaries brought the Latin script, which is better at representing Santali stops, phonemes and nasal sounds with the use of diacritical marks, unlike most Indic scripts, which are derived from Brahmi, Ol Chiki is not an abugida, with vowels given equal representation with consonants. Additionally, it was designed specifically for the language, but one letter could not be assigned to each phoneme because the vowel in Ol Chiki is still problematic. Ol Chiki has 30 letters, the forms of which are intended to evoke natural shapes and it is written from left to right. Ol Chiki script was added to the Unicode Standard in April,2008 with the release of version 5.1, the Unicode block for Ol Chiki is U+1C50–U+1C7F, Byomkes Chakrabarti Santali Latin alphabet Santali alphabet
7. Rejang script – The Rejang script, sometimes spelt Redjang and locally known as Surat Ulu, is an abugida of the Brahmic family, and is related to other scripts of the region, like Batak, Buginese, and others. Rejang is a member of the related group of Surat Ulu scripts that include the script variants of Bengkulu, Lembak, Lintang, Lebong. Other scripts that are related, and sometimes included in the Surat Ulu group, are Kerinci. The script was in use prior to the introduction of Islam to the Rejang area, the Rejang script is sometimes also known as the KaGaNga script following the first three letters of the alphabet. The term KaGaNga was never used by the users of the script community, canberra, The Australian National University 1964. There are five dialects of Rejang, Lebong, Musi, Kebanagung, Pesisir. Most of its users live in remote rural areas, of whom slightly less than half are literate. The traditional Rejang corpus consists chiefly of ritual texts, medical incantations, Rejang script was added to the Unicode Standard in April,2008 with the release of version 5.1. The Unicode block for Rejang is U+A930–U+A95F, Rencong script Everson, proposal for encoding the Rejang script in the BMP of the UCS
8. Saurashtra alphabet – Saurashtra is a script used to write the Saurashtra language. Its usage has declined and Tamil script and Latin are now used more commonly, the Saurashtra Language is written in its own script. Because this is a minority language not taught in schools, people learn to write in Sourashtra Script through Voluntary Organisations like Sourashtra Vidya Peetam, Sourashtra is the popular spelling and it refers to both the Sourashtra language and a person who speaks Sourashtram. Saurashtra is an area in Gujarat State in India, from where the present Sourashtras in Tamil Nadu are traditionally believed to have migrated some centuries back, vrajlal Sapovadia describes the Saurashtra language and language as a hybrid of Gujarati, Marathi & Tamil. The language has had its own script for centuries, the earliest one available from 1880 and this language is not taught in schools and hence had been confined to being merely a spoken language. But many great works like Bhagavath Gita and Tirukkural were translated into Sourashtram and it is now a literary language. Sahitya Akademi has recognized this language by conferring Bhasha Samman awards to Sourashtra Scholars, though some of the books were printed in Devanagari script, it failed to register the growth of the language. For writing Sourashtram using Devanagari Script, we require seven additional symbols to denote the short e and o. We also require one more symbol to mark the sound of half yakara which is peculiar to the Sourashtra language, the books printed in Devanagari Script were discarded because they did not represent the sounds properly. The Commissioner for Linguistic Minorities, Allahabad by his letter No, the Leaders in the Community could not realize the importance of teaching of mother tongue in schools and did not evince interest in production of textbooks in Sourashtram for class use. Of late in internet, many Sourashtra Yahoo groups in their use the Roman script for the Sourashtra language. Now the Sourashtra font is available in computers and this enabled the supporters of Sourashtra Script to print books in its own script, an electronic journal, printed in the Sourashtra Script. One journal, Bhashabhimani, is published from Madurai, in Sourashtra Script, another journal, Jaabaali, is also published by the same Editor of Bhashabhimani from Madurai. The Zeeg Sourashtra script practice Magazine is also published from Madurai only, all the three journals support the Sourashtra script only. There is a journal in Devanagari called palkar Sourashtra Samachar, the letter order of Saurashtra script is similar to other Brahmic Scripts. The letters are vowels, consonants, and the letters which are formed essentially by adding a vowel sound to a consonant. Saurashtra script was added to the Unicode Standard in April,2008 with the release of version 5.1, the Unicode block for Saurashtra is U+A880–U+A8DF
9. Sundanese script – Sundanese script is a writing system which is used by the Sundanese people. It is built based on Old Sundanese script which was used by the ancient Sundanese between the 14th and 18th centuries, the government of West Java Province has announced Peraturan Daerah no.61996 about the Sundanese language, literature and script. The regulation was motivated by Keputusan Presiden no, and now it is also agreed upon scholars that the script can simply be called Aksara Sunda. Since there were variants in writing due to materials, timeline. And, considering the completeness and practicality, the variant found in soft-material-documents shall be used for modern usage, there was a tendency to name Cacarakan script as Sundanese script by some people before. However, it can be traced back that the earliest source was a written by G. J. Grashuis. The book taught to write Sundanese Script but using Cacarakan, the Cacarakan script itself only contains around 10% of innovation by Sundanese people, especially by reducing and simplifying the sounds in Javanese to suit Sundanese language. From the cultural point of view, Sundanese script is one part of Sundanese civilization, therefore, spreading and utilizing Sundanese script shall integrate with the task to maintain and conserve Sundanese culture as a whole. Thus, it will have broader scope as wide as the scope of the people itself, re-spreading and re-utilizing Sundanese script shall be done in several steps since it was not well known by the community within the last three centuries. 434/SK. 614-Dis. PK/99 about Standardization of Sundanese Script, Local Governments Regulation no.52003 about Conservation of Local Language, Literature, and Script, there are two non-standard sounds kha and sya for writing foreign Arabic consonants خ and ش. These are considered non-standard because their usage only supported by few Sundanese people, there are also rarangkéns or attachments for removing, modifying, or adding vowel or consonant sound to the base characters. In addition, there are glyphs for number characters, from zero to nine, graphically, Ngalagena characters including rarangkéns have angle 45° – 75°. In general, the ratio is 4,4, except for the Ngalagena character ra, ba and nya. Rarangkéns have dimension ratio 2,2, except for panyecek, panglayar, panyakra, pamaéh, numbers have ratio 4,4, except for number 4 and 5. Rarangkéns above the base glyph b, rarangkéns below the base glyph c. Rarangkéns inline the base glyph In texts, numbers are written surrounded with dual pipe sign |, example, |᮲᮰᮱᮵| =2015 For modern use, Latin punctuations are used. Such punctuations are, comma, dot, semicolon, colon, exclamation mark, question mark, quotes, parenthesis, simple words or sentences can be written directly, for example by arranging Ngalagena letters which represent the sounds. However, in words, compound consonants can be found
10. Vai syllabary – The Vai syllabary is a syllabic writing system devised for the Vai language by Momolu Duwalu Bukele of Jondu, in what is now Grand Cape Mount County, Liberia. Bukele is regarded within the Vai community, as well as by most scholars, as the syllabarys inventor and chief promoter when it was first documented in the 1830s. It is one of the two most successful indigenous scripts in West Africa in terms of the number of current users and the availability of written in the script. Vai is a script written from left to right that represents CV syllables. Originally there were separate glyphs for syllables ending in a nasal, such as don, with a vowel, such as soo, with a diphthong, such as bai. However, these have dropped from the modern script. There are relatively few glyphs for nasal vowels because only a few occur with each consonant, in recent years evidence has emerged suggesting that the Cherokee syllabary of North America provided a model for the design of the Vai syllabary in Liberia. The Vai syllabary emerged about 1832/33, the link appears to have been Cherokee who emigrated to Liberia after the invention of the Cherokee syllabary but before the invention of the Vai syllabary. One such man, Cherokee Austin Curtis, married into a prominent Vai family and it is notable that the romantic inscription on a house that first drew the worlds attention to the existence of the Vai script was in fact on the home of Curtis, a Cherokee. Vai has distinct basic punctuation marks, Additional punctuation marks are taken from European usage, the oldest Vai texts used various logograms. Of these, only ꘓ and ꘘ are still in use and this roughly fifty page manuscript contains several now obsolete symbols, The Vai syllabary was added to the Unicode Standard in April,2008 with the release of version 5.1. In Windows 7 and earlier, since this only gives names for characters released in Unicode 5.0 and earlier. The Unicode block for Vai is U+A500–U+A63F, Konrad Tuchscherer, in Africana, The Encyclopedia of the African and African American Experience, ed. by Kwame Anthony Appiah and Henry Louis Gates, Jr. pp. 476–480. Cherokee and West Africa, Examining the Origins of the Vai Script, History in Africa,29, the Vai Script, in Liberia, Africas First Republic. The Seminar on the Standardization of the Vai script, in University of Liberia Journal Vol.3, everson, Michael, Charles Riley, José Rivera. Proposal to add the Vai script to the BMP of the UCS, ethnologue on Vai Scripts of Africa Input tool and Unicode font for using Vai on Windows XP SIL on Vai