[Home]UCS

HomePage | Recent Changes | Preferences

Universal Character Set, defined in ISO 10646. UCS is kept synchronized character by character with Unicode. It has several million code points, but only the first 65536 (the Basic Multilingual Plane, or BMP) are commonly used, the remainder being reserved for such purposes as representing ancient Egyptian hieroglyphics or rare Chinese characters.

There are several encodings defined by ISO 10646 for the Universal Character Set. Most common is UCS-2 uses two bytes for each character. This permits every code point in the BMP to be represented by two bytes. Code points outside the BMP can be represented by four bytes, i.e. a pair of two byte sequences.

Another encoding defined is UCS-4, which uses four bytes for each character. This can represent every code point in the character set, including those outside the BMP, by four bytes. It has the advantage over UCS-2 of every character encoding being of the same length, which makes it simpler to manipulate; but it requires twice as much storage as UCS-2.


HomePage | Recent Changes | Preferences
This page is read-only | View other revisions
Last edited September 19, 2001 11:39 pm by Simon J Kissane (diff)
Search: