The integral data types (so called because they are most frequently used to represent integers) of computing generally consist of some number of bits (usually a power of two) treated as a unit of storage or manipulation. **Bit** is derived from the term *B*inary dig*IT*, and represents the fundamental unit of computer storage--0 or 1, on or off. Everything else is just a bunch-o-bits.

### Representing integers

**endian**, **big-endian**, **little-endian**, **network byte order**

See also: Kilobyte, Megabyte, Gigabyte, Terabyte, Petabyte, Exabyte, Zettabyte, Yottabyte

The table below lists data types recognized by common processors. Additional data types, such as bit-fields and extended-precision integers, found in high level programming languages are not discussed here. Following the table are additional usage notes, then details on number representation.

See also: [real data type]?

bits | name |
comments | |

1 | bit | status, Boolean flag | |

4 | nibble, nybble | humorously derived half a byte; usually a single BCD digit | |

8 | byte, octet | small integers, characters | |

16 | word | larger integers, pointers | |

32 | longword | usually shortened to long; larger integers, pointers | |

64 | quadword, long long | larger integers, pointers | |

80 | tenbyte | Intel-specific, probably should be in floating point article? | |

128 | octword | VMS internal date/time format | |

In addition to their interpretation as sizes of numerical values, three terms (*bit*, *byte*, and *word*) have other common usages. **word** is ambiguous, it often indicates the "most efficient size" of data for a processor--typically the size of its internal registers. Thus various families, or different models within families, of processors had different sized words-- 8-, 12-, 16-, 32-, 36-, 60- and 64-bit words have all been used. **byte** sometimes means some a quantity of bits other than 8; 36-bit word architectures commonly had 9-bit bytes.
The term octet? can be used for more clarity, and always refers to eight bits.
The other terms (in the table) are typically used only when the content is to be interpreted numerically.

Telecommunications or network traffic volume is usually described in terms of *bits per second*. For example, a *56Kb modem* is capable of transferring data at 56 kilo*bits*/second; Ethernet transfers data at speeds ranging from 10 mega*bits*/second to 1000 mega*bits*/second.

A **byte**, usually called an **octet** in a networking context, is used to specify the size or amount of computer memory or storage, regardless of the type of data represented. For example, a 50 byte text string, 100 KB (kilobytes) files, 128 MB (megabytes) of RAM, or 30 GB (gigabytes) of disk storage.

**Pointer** is a generic term used to indicate an integral value (or a structure thereof) that is used to specify ("point to") a location (address) in memory.

**complement**, **one's-complement**, **two's-complement**, and so on.

*Complementing* a binary number simply means changing all the *0*s to *1*s and all the *1*s to *0*s,
nothing more.

A byte, holding 8 bits, can represent the values 00000000 (0) to 11111111 (255_{10}), if all bits
are used to represent the magnitude of the number. This is called an *unsigned* integer.

To represent both positive and negative (*signed*) integers, the convention is that the
*most significant bit* (MSB) of the binary representation of the number will be used to
indicate the sign of the number, rather than contributing to its magnitude; three formats have been used for representing the magnitude: sign-and-magnitude, one's complement and two's complement, which is by far the most common nowadays.

Sign-and-magnitude is the simplest and most like human writing forms.
The MSB is set to *0*
for a positive number and *1* for a negative number. The remaining bits in the number indicate the (positive) magnitude. Hence in a byte with only seven
bits (apart from the sign bit), the magnitude can range from 0000000 (0) to 1111111 (127). Thus you can represent numbers from
-127_{10} to +127_{10}. -43 encoded in a byte this way is 10101011.

The *one's-complement* representation of a negative number is created by taking the
complement of its positive counterpart. For example, negated 00101011 (43) becomes 11010100 (-43)
(Notice how this is different from the sign-and-magnitude convention where the same bit pattern would be -84).
The PDP-1 uses one's-complement arithmetic.
The range of signed numbers using one's complement in a byte is -127_{10} to +127_{10}.

Both one's-complement and sign-and-magnitude have two ways to represent zero: 00000000 (+0) and 11111111 (-0) in one's-complement and 10000000 in sign-and-magnitude. This is sometimes problematic (as hardware for adding and subtracting may be more complicated, as might testing for 0).

To avoid this, and to also make integer addition simpler, the *two's-complement* representation is the one generally used. The two's-complement representation is created by first complementing the positive number, then adding 1 to it. Thus 00101011 (43) becomes 11010101 (-43).

In two's-complement, there is only one zero (00000000). Negating a negative number involves the same operation: complementing, then adding 1. The pattern 11111111 now represents -1_{10} and 10000000 represents -128_{10};
that is, the range of two's-complement integers is -128_{10} to +127_{10}.

To add two two's-complement integers, treat them as unsigned numbers, add them, and ignore any potentical carry over (this is essentially the great advantage that two's-complement has other the other conventions). The result will be the correct two's-complement number, unless both summands were positive and the result is negative or both summands were negative and the result is non-negative. The latter cases are refered to as "overflow" or "wrap around"; the addition cannot be carried out in 8 bit two's-complement in these cases. For example:

00101011 (+43) 11010101 (-43) 00101011 (+43) 10011010 (-101) + 11010101 (-43) + 11100011 (-29) + 11100011 (-29) + 10110001 (- 79) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 00000000 ( 0) 10111000 (-72) 00001110 (+14) 01001011 (overflow)

When an integer is represented with multiple bytes, the actual ordering of those bytes in memory, or the sequence in which they are transmitted over some medium, is subject to convention. This is similar to the situation in written languages, where some are written left-to-right, while others are written right-to-left.

Using a 4-byte integer, written as "ABCD", where A is the most significant byte and D is least significant byte, *big-endian* convention would store the number
in successive memory locations as A (lowest address), then B, then C, finally D, while *little-endian* convention would store the bytes in D-C-B-A order.

*Network byte order* is, by convention, sending the bytes in the order A, then B, etc., onto
the medium. It is the responsibility for the transmitting and receiving systems to convert, if
necessary, to their internal endian format.

Processor families that use big-endian storage: Motorola, IBM 370

Processor families that use little-endian format: Intel 386, VAX

Processor families that use either (determined by software): MIPS, DEC Alpha, PowerPC

The PDP family of processors, which were word- rather than byte-addressable, used the unusual pattern of B-A-D-C (that is, byte-swap within words).

The term *big-endian* is derived from the Big-Endians of Jonathan Swift's
Gulliver's Travels.

See also: Kilobyte, Megabyte, Gigabyte, Terabyte, Petabyte, Exabyte, Zettabyte, Yottabyte