Wikipedia: History of Regular expression

Difference (from prior major revision) (minor diff, author diff)

Changed: 3c3

The origin of regular expressions lies in [Automata theory]? and [Formal language theory]? (both part of theoretical computer science) that study computational automata and ways to describe languages, i.e., sets of strings, and processing them. In the 1940s, [Warren McCulloch]? and [Walter Pitts]? described the nervous system by modelling neurons as small simple automata. The mathematician, Stephen Kleene, later described these models using his mathematical notation called regular sets. Ken Thompson built this notation into qed (the predecessor of the Unix ed) and eventually into grep?. Ever since that time, regular expressions have been widely used in Unix and Unix-like utilities, such as expr?, awk, Emacs and Perl, most of them using the regex library built by [Henry Spencer]?.

The origin of regular expressions lies in [automata theory]? and formal language theory (both part of theoretical computer science). These fields study models of computation (automata) and ways to describe and classify formal languages. A formal language is nothing but a set of strings. In the 1940s, [Warren McCulloch]? and [Walter Pitts]? described the nervous system by modelling neurons as small simple automata. The mathematician, Stephen Kleene, later described these models using his mathematical notation called regular sets. Ken Thompson built this notation into qed (the predecessor of the Unix ed) and eventually into grep?. Ever since that time, regular expressions have been widely used in Unix and Unix-like utilities, such as expr?, awk, Emacs and Perl, most of them using the regex library built by [Henry Spencer]?.

Changed: 14c14

# (Kleene star) R* denoting the smallest superset of R that contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating zero or more strings in R. For example, {"ab", "c"}* = {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", ... }.

Changed: 21,24c21,25

: a U b* denotes {"a", ε, "b", "bb", "bbb", ...}
: (a U b)* denotes the set of all strings consisting of 'a's and 'b's, including the empty string
: (ab*)* the same
: ab*(c U ε) denotes the set of strings starting with 'a', then zero or more 'b's and finally optionally a 'c'.

* a U b* denotes {"a", ε, "b", "bb", "bbb", ...}
* (a U b)* denotes the set of all strings consisting of 'a's and 'b's, including the empty string
* (ab*)* the same
* ab*(c U ε) denotes the set of strings starting with 'a', then zero or more 'b's and finally optionally a 'c'.
* ( bb U a(bb)*aa U a(bb)*(ab U ba)(bb)*(ab U ba) )* denotes the set of all strings which contain an even number of 'b's and a number of 'a's divisible by three.

	Revision 37 . . (edit) December 17, 2001 4:40 am by (logged).117.133.xxx [Added link to superset]
	Revision 36 . . October 13, 2001 6:34 am by AxelBoldt [More advanced example]
	Revision 35 . . September 4, 2001 9:33 pm by (logged).177.80.xxx [reverting GNU/Linux to Linux]