[Home]Character encoding

HomePage | Recent Changes | Preferences

A character encoding is a code that pairs a set of natural language characters (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. Common examples include Morse code, which encodes letters of the Roman alphabet as series of long and short depressions of a [telegraph key]?; and ASCII, which encodes letters, numerals, and other symbols as integers.

In some contexts (especially computer storage and communication) it makes sense to distinguish a character set or character repertoire, which is a full set of characters that a system supports, from a character encoding which specifies how to represent characters from that set using a number of codes. For example, the full repertoire of Unicode encompasses thousands of characters, each with a unique 32-bit code. But since most applications use only a small subset, there are more efficient ways to represent Unicode characters in computer storage or communications using shorter words, for example, UTF-8 and UTF-16?. This type of character encoding (which could also be considered a simple text encoding) uses data compression techniques to represent a large repertoire with a smaller number of codes.

Popular character encodings:

HomePage | Recent Changes | Preferences
This page is read-only | View other revisions
Last edited December 8, 2001 6:55 pm by 213.121.100.xxx (diff)