Wikipedia: History of Cryptography/Hashfunction

Difference (from prior major revision) (minor diff, author diff)

Changed: 1c1

Hash functions are used in Cryptography for many purposes; for example, verifying the validity of a public key (this is commonly known as the key's fingerprint). A hash function returns a representation of the bits of a string in the form of a hash value, which has a fixed size. This is done in such a manner that the probability of two different strings having the same hash value is infinitesimal. Changing just one character in string will result in a different hash value.

#REDIRECT Hash function

Changed: 3c3,9

Practical example: The record companies came up with the idea of identifying MP3 files by using their hash value in order to be able to trace the files on the Internet and in various filesharing communities. However, by changing or adding just one bit to the file will defeat this system completely.

A hash function in the sense of computer science is a function which converts anything in the domain of the hash function into an integer (typically 4, 8 or 16 bytes long). There are various kinds of hash functions that vary according to their characteristics. For example, some hash functions have very desirable properties with respect to [error correction]? in that given a string that has been slightly corrupted and the hash of the original (correct) string, then the corruption can be reversed if it is small enough. In other applications, such as verifying that software has not been modified maliciously, hash functions are used that are very difficult to spoof?.

Formally speaking, a hash function is defined by its domain (typically variable length strings of bytes), its range (typically fixed length bit-sequences) and the defining function (let's call this function H). The key desired characteristics of a hash function in general are that H(x) ≠ H(y) implies x ≠ y and that H(x) = H(y) probably implies that x = y. Unfortunately, the use of probably here is just too inexact to let us succeed. If the set of all possible values of H(x) is much smaller than the set of all possible values of x, then this latter condition cannot be true if all sequences are equally likely. Thus, the second condition is often restated in different ways to make it plausible. For instance, cryptographic hashes use the second condition in the form that given x, it is computationally very difficult to find y such that H(x) = H(y). Additionally, it is desirable that if we are given x and H(x + s) where + can be bit changing or concatenation, then we can't find s short of exhaustive enumeration. In practice, for most applications other than error correction, cryptographic hashes serve very nicely. The MD5 and SHA algorithms are two of the most popular algorithms although any cryptosystem can be used to create a hash function (as, indeed, any cryptographically secure hash can be used to create a cryptosystem).

For error correction, a distribution of likely perturbations is assumed at least approximately. Perturbations to a string are then classified into large (improbable) and small (probable) errors. The second criterion is then restated so that if we are given H(x) and x+s, then we can compute x efficiently if s is small. Such hash functions are known as error correction codes. Important sub-class of these correction codes are cyclic redundancy codes and Reed-Solomon codes.

For audio identification such as finding out whether an MP3 file matches one of a list of known items, one could use a conventional hash function such as MD5, but this would be very sensitive to highly likely perturbations such as time-shifting, CD read errors, different compression algorithms or implementations or changes in volume. Using something like MD5 is useful as a first pass to find exactly identical files, but another more advanced algorithm is required to find all identical items. Contrary to an opinion often voiced by people who don't follow the industry careful, hashing algorithms do exist that are robust to these minor differences. Most of the algorithms available are not extremely robust, but some are so robust that they can identify music played on loud-speakers in a noisy room.

	Revision 4 . . (edit) September 22, 2001 2:28 am by Css [redirect]
	Revision 3 . . (edit) August 18, 2001 4:55 pm by (logged).108.233.xxx
	Revision 1 . . June 11, 2001 4:24 pm by Hornlo [moved from CyrptographY/Hashfunctions]