[Home]Data compression/entropy

HomePage | Data compression | Recent Changes | Preferences

Entropy is a concept in information theory. It was named by anology with the concept of entropy in thermodynamics. The two concepts do actually have something in common, but this might not be obvious unless you are an expert in both fields.

Conceptually, entropy is the actual amount of (information theoretic) infomation in a piece of data. Entirely random ascii data has an entropy of about 8, since you never know what the next character will be. A long string of A's has an entropy of 0, since you know that the next character will always be an 'A'. The entropy of english text is about 1.5 (Try compressing it with PPM!) The entropy of a data source means the average number of bits per character needed to encode it.

1. Many of the bits in the data may not be conveying any infomation. For instance it is often the case that data structures store information redundantly, or have sections that are always the same regardless of the information in the data structure.

2. The amount of entropy is not always an integer number of bits.

Entropy is effectively the lowest compression possible. This is useful for determining whether a compression algorithm has any significant advantages of another for a data source. The definition of entropy is based on the [Markov Model]? of text. For an order-0 (each character is selected independent of the last characters) source, the entropy is:

H(S)=-Σ pi log2 pi

Where pi is the probability of i. For a higher order markov source (one in which probabilities are dependent on the preceding characters), the entropy is:

H(S)=ΣiΣj Pi pi (j) log2 pi (j)

Where i is a state (certain preceding characters) and pi (j) is the probability of j given i as the previous character (s).

More information about entopy can be found on the page describing Information theory.

BTW Larry "I don't believe in subpages" Sanger should have a good look at this and maybe talk someone who knows both the entropies (or are there more than 2?) into finding a better home for this page.

Oh, yes, and the page is a mess, feel free to fix.

Fix? Well I tried... I think I just mangled it.


HomePage | Data compression | Recent Changes | Preferences
This page is read-only | View other revisions
Last edited December 2, 2001 7:51 pm by Boaz (diff)
Search: