Tibetan Unicode: Difference between revisions

Revision as of 09:53, 5 November 2006

Before the Unicode Standard came along, there were hundreds of different standardized and non-standardized encoding systems for encoding the characters of different writing systems. No single character encoding had enough characters to encode all the characters of all the different writing systems used in the world. Even for a language like English which has an uncomplicated writing system, the 7-bit and 8-bit computer character sets such as ASCII used for encoding the Latin script were inadequate for all the letters, punctuation, and technical symbols in common use.

Most Tibetan word-processing applications used non-standardized, proprietory font-based encodings - mapping Tibetan glyphs in fonts to character sets originally designed for encoding Latin or Chinese characters. As each Tibetan system used its own encoding, one of the greatest obstacles to using electronic Tibetan data resulted from the fact that files could not be shared by different Tibetan word-processing programs and other applications.

Tibetan texts may contain thousands of different character combinations (or ligatures) but many of these systems mapped the glyphs in their fonts to 8-bit character sets supporting a maximum of 256 characters or less. This meant that these applications had to spread the glyph set required for Tibetan across a whole set or series of seperate fonts.

External links

What is Unicode?
Encoding model of the Tibetan script in the UCS - Explains how Tibetan characters are encoded in the ISO 10646 / Unicode Standard. by Christopher Fynn
Tibetan Block of The Unicode Standard (code chart)

@@ Line 1: / Line 1: @@
+Before the Unicode Standard came along, there were hundreds of different standardized and non-standardized encoding systems for encoding the characters of different writing systems. No single character encoding had enough characters to encode all the characters of all the different writing systems used in the world. Even for a language like English which has an uncomplicated writing system, the 7-bit and 8-bit computer character sets such as ASCII used for encoding the Latin script were inadequate for all the letters, punctuation, and technical symbols in common use.
+Most Tibetan word-processing applications used non-standardized, proprietory font-based encodings - mapping Tibetan glyphs in fonts to character sets originally designed for encoding Latin or Chinese characters. As each Tibetan system used its own encoding, one of the greatest obstacles to using electronic Tibetan data resulted from the fact that files could not be shared by different Tibetan word-processing programs and other applications.
+Tibetan texts may contain thousands of different character combinations (or ligatures) but many of these systems mapped the glyphs in their fonts to 8-bit character sets supporting a maximum of 256 characters or less. This meant that these applications had to spread the glyph set required for Tibetan across a whole set or series of seperate fonts.
 See also
 [[Tibetan Fonts]]
 ===External links===
+* [http://www.unicode.org/standard/WhatIsUnicode.html What is Unicode?]
 * [http://www.thdl.org/xml/showEssay.php?xml=/tools/encodingTib.xml&m=all Encoding model of the Tibetan script in the UCS] - Explains how Tibetan characters are encoded in the ISO 10646 / Unicode Standard. by [[Christopher Fynn]]
 * [http://www.unicode.org/charts/PDF/U0F00.pdf Tibetan Block of The Unicode Standard] (code chart)

Tibetan Unicode: Difference between revisions

Revision as of 09:53, 5 November 2006

External links

Navigation menu

Search