[Home]Character encoding

HomePage | Recent Changes | Preferences

Showing revision 8
A character encoding is a code that pairs a set of natural language characters (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. Common examples include Morse code, which encodes letters of the Roman alphabet as series of long and short depressions of a [telegraph key]?; and ASCII, which encodes letters, numerals, and other symbols as integers.

In some contexts (especially computer storage and communication) it makes sense to distinguish a character set or character repertoire, which is the whole mapping between a full set of characters and integers, from a character encoding (in a more narrow sense of the term) which specifies how to represent characters from that set using a smaller number of codes. For example, the full repertoire of Unicode encompasses millions of characters, each with a unique 32-bit code. But since most applications use only a small subset, there are more efficient ways to represent Unicode characters in computer storage or communications using only 8-bit bytes, for example, UTF-8. This type of character encoding (which could also be considered a simple text encoding) uses data compression techniques to represent a large repertoire with a smaller number of codes.

Popular character encodings:

See also Text encoding, Unicode.


HomePage | Recent Changes | Preferences
This page is read-only | View other revisions | View current revision
Edited September 20, 2001 8:01 am by Lee Daniel Crocker (diff)
Search: