Wikipedia: History of Lexical Analysis

Difference (from prior major revision) (minor diff, author diff)

Changed: 1c1

Lexcial analysis is the process of converting a raw input stream--usually but not exclusively a stream of characters--into a stream of tokens for use by a parser. E.g. the string "net_worth_future = (assets - liabilities)*1.1" might be converted into the stream NAME EQUALS OPEN_PARENS NAME MINUS NAME CLOSE_PARENS TIMES NUMBER. The reason that this is done is that it makes writing a parser? much easier. Instead of being forced to deal with the fact that net_worth_future is a name, and assets is a name, and foo_barQUuX? is a name, the parser need only concern itself with syntactical matters. This leads to efficiency of programming, if not efficiency of execution.

Lexical analysis is the process of converting a raw input stream--usually but not exclusively a stream of characters--into a stream of tokens for use by a parser. E.g. the string "net_worth_future = (assets - liabilities)*1.1" might be converted into the stream NAME EQUALS OPEN_PARENS NAME MINUS NAME CLOSE_PARENS TIMES NUMBER. The reason that this is done is that it makes writing a parser? much easier. Instead of being forced to deal with the fact that net_worth_future is a name, and assets is a name, and foo_barQUuX? is a name, the parser need only concern itself with syntactical matters. This leads to efficiency of programming, if not efficiency of execution.

Changed: 3c3

Lexers typically operate by using simple regular expression to recognise tokens in the input stream. For example, a NAME might be any alphabetical character or the underscore followed by any number of any alphanumeric character or the underscore. This could be represented compactly by the string [a-zA-Z_][a-zA-Z_0-9]+. This means "any character a-z, A-Z or _, then 0 or more of a-z, A-Z, _ or 0-9." Regular expressions are a simple way of representing complex problems. What they cannot do is represent structures such as "n opening parentheses, followed by a statement, followed by n closing parentheses." context free grammar provide the tools to recognise such things; this is, in fact, what a parser? does.

Lexers typically operate by using simple regular expressions to recognise tokens in the input stream. For example, a NAME might be any alphabetical character or the underscore followed by any number of any alphanumeric character or the underscore. This could be represented compactly by the string [a-zA-Z_][a-zA-Z_0-9]+. This means "any character a-z, A-Z or _, then 0 or more of a-z, A-Z, _ or 0-9." Regular expressions are a simple way of representing complex problems. What they cannot do is handle recursively defined patterns, such as "n opening parentheses, followed by a statement, followed by n closing parentheses." It takes a full-fledged parser? (there exist tools which accept Context free grammars as input and emit parsers as output) to recognise such patterns.

	Revision 6 . . (edit) October 16, 2001 6:45 pm by (logged).209.41.xxx
	Revision 5 . . (edit) September 27, 2001 10:14 am by Maw [Another minor improvement.]
	Revision 1 . . September 27, 2001 4:37 am by (logged).197.20.xxx [Discussion of Lexical Analysis from a Theoretical and Practical Perspective]