Wikipedia: Character encoding

Showing revision 8

A character encoding is a code that pairs a set of natural language characters (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. Common examples include Morse code, which encodes letters of the Roman alphabet as series of long and short depressions of a [telegraph key]?; and ASCII, which encodes letters, numerals, and other symbols as integers.

In some contexts (especially computer storage and communication) it makes sense to distinguish a character set or character repertoire, which is the whole mapping between a full set of characters and integers, from a character encoding (in a more narrow sense of the term) which specifies how to represent characters from that set using a smaller number of codes. For example, the full repertoire of Unicode encompasses millions of characters, each with a unique 32-bit code. But since most applications use only a small subset, there are more efficient ways to represent Unicode characters in computer storage or communications using only 8-bit bytes, for example, UTF-8. This type of character encoding (which could also be considered a simple text encoding) uses data compression techniques to represent a large repertoire with a smaller number of codes.

Popular character encodings:

ISO 8859
ASCII
EBCDIC
Unicode (And subsets thereof, such as UCS-16)