[Home]String search algorithms

HomePage | Recent Changes | Preferences

Shoudn't this be [string search algorithm]??

String search algorithms try to find a place where one string is found inside another. They include the following, each of which deserves an article about it:

Naïve string search

The simplest way to see where one string occurs inside another is to check each place it could be, one by one, to see if it's there. So first we see if there's a copy of the needle in the first few characters of the haystack; if not, we look to see if there's a copy of the needle starting at the second character of the haystack; if not, we look starting at the third character, and so forth. In the normal case, we only have to look at one or two characters for each wrong position to see that it's a wrong position, so in the average case, this takes O(n + m) steps, where n is the length of the haystack and m is the length of the needle; but in the worst case, searching for a string like "aaaab" in a string like "aaaaaaaaab", it takes O(nm) steps.


KMP computes a deterministic finite state automaton that recognizes inputs with the string to search for as a suffix, so it doesn't need to back up. Boyer-Moore starts searching from the end of the needle, so it can usually jump ahead a whole needle-length at each step. Baeza-Yates and Gonnet uses bits in a word to keep track of whether the previous N characters were a prefix of the search string, and is therefore adaptable to fuzzy matching etc. These descriptions are insufficient!

Could somebody expand this entry ?

HomePage | Recent Changes | Preferences
This page is read-only | View other revisions
Last edited December 10, 2001 9:38 am by Taral (diff)