[Home]Compiler

HomePage | Recent Changes | Preferences

Showing revision 21
A compiler is a program that translates source code, written in a computer programming language, into an output language, usually into machine language that can be directly executed by a computer.

Typical compilers will first translate the source code into an intermediate form called object code. A set of object files, which need not have all come from a single compiler, may then be linked together to create the final executable which can be run directly by a user.

Compiler design

Modern compilers share a common 'two stage' design. The first stage, the 'compiler frontend' translates the source language into an intermediate representation. The second stage, the 'compiler backend' works with the internal representation to produce code in the output language. While compiler design is believed to be a complex task, this approach allows to exchange either the frontend or backend to retarget the compilers source and output language respectivly. This way modern compilers are often portable and allow multiple dialects of a language to be compiled.

Compiler frontend

The compiler frontend consists of multiple phases itself, each informed by formal language theory:

  1. Scanning - breaking the source code text into small pieces, tokens - sometimes called 'terminals' - each representing a single piece of the language, for instance a keyword, identifier? or [symbol names]?. The token language is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it.
  2. Parsing - Identifying syntactic structures - so called 'non-terminals' - constructed from one or more tokens and non-terminals, representing complicated language elements, for instance assignments, conditions and loops. This is typically done with a parser for a context-free grammar, often an [LL parser]? or LR parser from a parser generator. (Most programming languages are only almost context-free, so there's often some extra logic hacked in.)
  3. Intermediate language generation - an equivalent to the original program is created in a special purpose intermediate language

Compiler backend

While there are applications where only the compiler frontend is necessary, such as static language verification tools, a real compiler hands the intermediate representation generated by the frontend to the backend, which produces a functional equivalent program in the output language. This is done in multiple steps:

  1. Optimization - the intermediate language representation is transformed into functionaly equivalent but faster (or smaller) forms.
  2. Code generation - the transformed intermediate language is translated into the output language, usually the native machine language of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate [machine instructions]?.

Compiled vs. Interpreted languages

Many people divide higher level programming languages into two categories: compiled languages and interpreted languages. However, in fact most of these languages can be implemented either through compilation or interpretation, the categorisation merely reflecting which method is most commonly used to implement that language. (Some interpreted languages, however, cannot easily be implemented through compilation, especially those which allow self-modifying programs.)

Further reading

The so called 'Dragon Book', ISBN 0201100886 (amazon.com, search) from 1985 is considered to be the standard authority on compiler basics and makes a good primer for the techniques mentioned above.

In the recent decade a large number of free compilers and compiler development tools were developed for all kinds of languages. Some of them are considered to be of very high quality and their free source code make a nice read for everyone interested in modern compiler concepts.

See also: assembler, interpreter, linker

/Talk


HomePage | Recent Changes | Preferences
This page is read-only | View other revisions | View current revision
Edited October 26, 2001 2:08 pm by Kragen (diff)
Search: