Wikipedia: Data compression

Data compression, a fundamental topic of computer science, is the process of encoding data so that it takes less storage space or less transmission time than it would if it were not compressed. This is possible because most real-world data is very redundant.

One very simple means of compression, for example, is run-length encoding, wherein large runs of consecutive identical data values are replaced by a simple code with the data value and length of the run. This is an example of lossless data compression, where the data is compressed in such a way that it can be recovered exactly. For symbolic data such as spreadsheets, text, executable programs, etc., losslessness is essential because changing even a single bit cannot be tolerated (except in some limited cases).

In other kinds of data such as sounds and pictures, a small loss of quality can be tolerated without losing the essential nature of the data, so lossy data compression methods can be used. These frequently offer a range of compression efficiencies, where the user can choose whether he wants highly-compressed data with noticeable loss of quality or higher-quality data with less compression. In particular, compression of images and sounds can take advantage of limitations of the human sensory system to compress data in ways that is lossy, but nearly indistinguishable from the original.

Many data compression systems are best viewed with a four stage compression model.

Closely allied with data compression are the fields of [coding theory]? and cryptography. Theoretical background is provided by information theory and algorithmic information theory. When compressing information in the form of signals we often use digital signal processing methods. The idea of data compression is deeply connected with statistical inference and particularly with the maximum likelihood principle.

Data compression topics:

Common Data compression algorithms:

Run-length encoding
Huffman coding (simple entropy coding)
arithmetic coding (more advanced entropy coding; encumbered by patents as of October 2001)
LZW
Other LZ compression methods Deflation is one, LhA? is another, etc. etc.
Deflation? (used in PKZIP?, gzip, and PNG)
MP3
JPEG (image compression using a windowed cosine transform, then quantization, then Huffman coding)
fractal compression

The Lempel-Ziv (LZ) compression methods are the most popular algorithms for perfect storage (i.e. not lossy). Deflation is a variation on LZ which is optimized for decompression speed and compression ratio. Compression can be slow. Deflation is used in PKZIP, gzip and PNG. LZR (Lempel-Ziv-Renau) is patented by Unisys?, and is used in GIF images. This patent is the main reason for GIF's obsolescence. LZ methods utilize a table based compression model where table entries are subsitituted for redundant data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded (eg. SHRI, LZX). The current LZ based code that performs best is the obsolete (!) LZX, although RAR and ACE are now coming close.

Compression of sounds is generally called audio compression, where we use methods of psychoacoustics to remove non-audible components of the signal to make compression more efficient. Audio compression is therefore lossy compression. Different audio compression standards are listed under audio codecs.

/Talk