[Home]History of Unicode

HomePage | Recent Changes | Preferences

Revision 23 . . November 23, 2001 3:33 am by Stephen Gilbert [linkify]
Revision 22 . . November 10, 2001 2:16 am by Lee Daniel Crocker [Removed initial paragraph--see Talk]
Revision 21 . . November 10, 2001 2:15 am by Lee Daniel Crocker [Removed initial paragraph--see Talk]
Revision 20 . . (edit) November 10, 2001 12:11 am by HannesHirzel
Revision 19 . . November 9, 2001 11:50 pm by Dmerrill [+initial paragraph with more general rationale for why Unicode was created, reference to ASCII.]
Revision 18 . . November 9, 2001 11:38 pm by Kwaku
Revision 17 . . October 7, 2001 10:48 am by Carey Evans [Unicode 3.1 was released this year]
  

Difference (from prior major revision) (no other diffs)

Changed: 1c1
The California-based [Unicode Consortium] first published "The Unicode Standard" in 1991, and continues to develop standards based on that original work. The goal of Unicode is to specify a code matching every character needed by every human language to a single unique integer. This can be used to create character encodings and facilitate translation among other encodings. Unicode was adopted as an standard by the International Organization for Standardization as [ISO 10646]?.
The California-based [Unicode Consortium]? first published "The Unicode Standard" in 1991, and continues to develop standards based on that original work. The goal of Unicode is to specify a code matching every character needed by every human language to a single unique integer. This can be used to create character encodings and facilitate translation among other encodings. Unicode was adopted as an standard by the International Organization for Standardization as [ISO 10646]?.

Changed: 3c3
The character set is divided into several planes, each of which supports 65536 characters, of which only the first, the Basic Multilingual Plane (BMP), is normally used. (The remaining planes are mainly for ancient [Egyptian hieroglyphics]?, rare Chinese characters, and other specialized uses.) The Unicode standard allows for several million code points overall. The first 256 codes of UCS-2 precisely match those of ISO 8859-1, the most popular 8-bit character encoding.
The character set is divided into several planes, each of which supports 65536 characters, of which only the first, the Basic Multilingual Plane (BMP), is normally used. (The remaining planes are mainly for ancient [Egyptian hieroglyphics]?, rare Chinese characters, and other specialized uses.) The Unicode standard allows for several million code points overall. The first 256 codes of UCS-2 precisely match those of ISO 8859-1, the most popular 8-bit character encoding.

Changed: 5c5
Several encodings of Unicode have been defined. One of these is UCS-2, which is a 16-bit encoding, sufficent to encode every code point in the BMP in one 16-bit word. (Representation of code points from other planes requires two 16-bit words.) This encoding is what is often meant by "Unicode". UTF-16 is another name for this encoding: UCS-2 implies the ISO 10646 standard, while UTF-16 implies the Unicode Consortium standard; but the two standards differ only on a few minor points.
Several encodings of Unicode have been defined. One of these is UCS-2, which is a 16-bit encoding, sufficent to encode every code point in the BMP in one 16-bit word. (Representation of code points from other planes requires two 16-bit words.) This encoding is what is often meant by "Unicode". UTF-16? is another name for this encoding: UCS-2 implies the [ISO 10646]? standard, while UTF-16 implies the Unicode Consortium standard; but the two standards differ only on a few minor points.

Changed: 7c7
Another encoding is UCS-4, which is a 32-bit encoding. This encoding is capable of expressing every Unicode code point, from any plane, in one 32-bit word. This encoding is not often used externally due to storage considerations, but many programs use it internally since it is the easiest representation to manipulate (if full Unicode support, including non-BMP planes, is sought). UTF-32 is another name for this encoding: UCS-4 implies the ISO 10646 standard, while UTF-32 implies the Unicode Consortium standard; but the two differ only on a few minor points.
Another encoding is UCS-4?, which is a 32-bit encoding. This encoding is capable of expressing every Unicode code point, from any plane, in one 32-bit word. This encoding is not often used externally due to storage considerations, but many programs use it internally since it is the easiest representation to manipulate (if full Unicode support, including non-BMP planes, is sought). UTF-32? is another name for this encoding: UCS-4 implies the ISO 10646 standard, while UTF-32 implies the Unicode Consortium standard; but the two differ only on a few minor points.

Changed: 9c9
Another common encoding can express each Unicode character as a sequence of 8-bit bytes; this is UTF-8. This encoding has the property of being identical to ASCII if only the first 128 code points are used.
Another common encoding can express each Unicode character as a sequence of 8-bit bytes; this is UTF-8. This encoding has the property of being identical to ASCII if only the first 128 code points are used.

Changed: 26c26
* [www.unicode.org]
* [Unicode Consortium]

HomePage | Recent Changes | Preferences
Search: