[Home]Data compression/LZW

HomePage | Data compression | Recent Changes | Preferences

LZW is a lossless compression algorithm. It stands for Lempel-Ziv-Welch. Most of this method was invented and published by Lempel and Ziv in 1978. A few minor details were not specified in that paper. Most of those remaining details (along with the basic Lempel-Ziv method) were described by Welch in 1984.

The LZW algorithm is patented in the USA, and possibly in other countries.

There are many legal issues surrounding this method which I hope someone in a country with free speech will write up (it would be illegal for me to mention many of the basic legal facts).

The key insight of the method is that it is possible to automatically build a dictionary of previously seen strings in the text being compressed. The dictionary does not have to be transmitted with the compressed text, since the decompressor can build it the same way the compressor does, and if coded correctly, will have exactly the same strings that the compressor dictionary had at the same point in the text.

The dictionary starts off with 256 entries, one for each possible single byte string. Every time a string already in the dictionary is seen, a longer string consisting of that string with the single character following it, is stored in the dictionary.

The output consists of integer indices into the dictionary. These initially are 9 bits each, and as the dictionary grows, can increase to up to 16 bits. A special symbol is reserved for "flush the dictionary" which takes the dictionary back to the original 256 entries, and 9 bit indices. This is useful if compressing a text which has variable charactistics, so that a dictionary of early material is no longer much use later in the text.

This use of variable increasing index sizes is one of Welch's contributions. Another was to specify an efficient data structure to store the dictionary.

The method became moderately widely used in the program "compress" which became a more or less standard utility in Unix systems circa 1986. (It has since disappeared from many for both legal and technical reasons.) Several other popular compression utilities also used the method, or closely related ones.

It became very widely used after it became part of the GIF image format in 1987.

LZW compression provided a better compression ratio, in most applications, than any well known method available up to that time. It became the first widely used general purpose data compression method on computers. On large English texts, it typically compressed to about half original size. Other kinds of data were also quite usefully compressed in many cases.

LZW compression has now been superseded for most purposes by [Data compression/deflation]? and [Data compression/Burrows Wheeler transform]? methods, partly because these newer methods compress better in most cases, and partly for legal reasons. However the GIF image format with LZW compression is still in widespread use in 2001.


HomePage | Data compression | Recent Changes | Preferences
This page is read-only | View other revisions
Last edited October 9, 2001 1:18 am by Zundark (diff)
Search: