Compiler Construction—The Art of Niklaus Wirth

1

Compiler Construction
The Art of Niklaus Wirth
Hanspeter Mössenböck
University of Linz
moessenboeck@ssw.uni-linz.ac.at

Abstract
Niklaus Wirth is not only a master of language design but also a pioneer of
compiler construction. For four decades he has refined his techniques for
building simple, efficient and reliable compilers. This paper tries to collect
some general principles behind his work. It is not a paper about new
compilation techniques but a reflection about Wirth's way to write compilers.

1 Introduction
Among the numerous research topics that Niklaus Wirth approached during his
professional career, programming languages and their compilers clearly was the
dominating theme. There is hardly anybody in Computer Science who has such a
deep insight into language design as Niklaus Wirth. Since the early sixties he has
been active in this field and has contributed such seminal languages as Algol-W,
Pascal, Modula-2 and Oberon. And being an engineer, he was always as much
concerned about the simplicity and efficiency of his compilers as about the elegance
of his languages.
In this paper I will try to summarize some principles of Wirth's art of compiler
construction. Although I have no personal recollections from Wirth's early projects I
have studied his compilers from the early eighties on and was lucky to have the
chance to work with him for several years during the Oberon project. So this paper
is more about what I learned from him and what I appreciate about his work. Wirth
himself gave a much more thorough description of his compiler technology in his
recent compiler book [22], in his Turing Award lecture [2] and in his recollections at
the second conference on the History of Programming Languages [21].
When I write about Wirth's art of compiler construction I also want to honour the
work of his colleagues and assistants [1, 3, 7, 9, 10, 11, 12]. However, although
many of the well-known compilers from ETH Zurich were implemented by his
bright PhD students, it was always Wirth who set the direction and determined the
style of ETH compilers.
I will start this paper with some general principles that I learned from Wirth and
then look at more detailed guidelines and techniques for the individual compilation
phases.

2

Hanspeter Mössenböck

2 Guiding Principles
Besides using specific techniques, data structures and algorithms it is also important
that a compiler designer follows some general guidelines, which set the framework
for the whole development. Here are a few principles, which I believe to be central
in Wirth's compilers. The list is perhaps incomplete and somewhat arbitrary but
hopefully still some food for thought.
• Use straightforward techniques
Compiler technology is one of the most intensively researched fields in
Computer Science. There are highly sophisticated algorithms for syntax
analysis, semantic analysis, optimization and code generation. However, more
sophisticated techniques do not necessarily lead to better compilers. Wirth's
compilers use only the simplest techniques such as recursive descent parsing,
context-free semantic analysis and single-pass code generation with only a few
optimizations. Nevertheless these compilers generate sufficiently fast code and
run much faster than most commercial compilers. A compiler writer should
have the heart to refrain from using highly sophisticated algorithms if he can
see that more straightforward techniques are perfectly sufficient. In his Turing
Award lecture [2] Wirth wrote: "Within the Euler, Algol-W and PL360
projects, much consideration was given to the development of table-driven,
bottom-up syntax analysis techniques. Later I switched back to the simple
recursive-descent, top-down method, which is easily comprehensible and
unquestionably sufficiently powerful, if the syntax of the language is wisely
chosen."
• Simple compilers are more reliable compilers
As in any software project it is essential that a single person can grasp the
whole design of a compiler. This is what Brooks calls the conceptional
integrity of a software system [6]. It can only be achieved if the compiler is
simple enough. A compiler that cannot be understood as a whole by a single
software engineer cannot be guaranteed to be correct and cannot be maintained
without introducing new errors. In his Turing Award lecture Wirth confessed:
"… that this Modula compiler is already at the limits of comprehensible
complexity, and I would feel utterly incapable of constructing a good compiler
for Ada."
• Generate good code for the frequent case; don't optimize the exceptional case
It is much more important that frequent language constructs such as procedure
calls, array accesses, or general control structures are compiled efficiently than
to put too much effort into optimizations such as loop unrolling or induction
variable analysis that can only be exploited in rare cases. Wirth's compilers
don't use any of the more advanced optimization techniques. Instead they

Compiler Construction—The Art of Niklaus Wirth

3

generate good code for ordinary programs by fully exploiting the machine
architecture and its addressing modes.
• Generate good code from the beginning; don't generate bad code and optimize
later
Some compilers first generate poor intermediate code because it is optimized
later anyway. For such compilers it is often reported that optimizations achieve
substantial speedups. However, this is not surprising since the unoptimized
code was rather inefficient. If a compiler generates good code from the
beginning there is much less room for speedups. Brandis and others [3] report
that non-optimizing Oberon compilers for the Motorola 68020, for the Sun
SPARC and for the DEC MIPS architectures generated code that was as
efficient as the code generated by the standard C compiler with the
optimization option switched on (cc -O). Only for the IBM RS/6000 the
optimizing C compiler was able to generate more efficient code than the
Oberon compiler.
• Don't let the compiler repair poor programming style
Some people expect a compiler to smoothen inefficiencies caused by poor
programming style. This includes some cases of common subexpression
elimination, loop invariant code motion and copy propagation. It is, however,
simpler and often also clearer if the programmer does this transformations in
the source code instead of relying on the compiler to do them behind the
scenes. Although not all of these transformations can be expressed in source
code, most of them can and the remaining ones usually don't cause major
inefficiencies, except in some frequently executed inner loops.
• Generate predictable code
Optimizations often have the effect that minor changes in the source code
cause major differences in the run time of the program, because a small change
in the code may allow or disallow an optimization. Programmer feel uneasy if
they cannot predict the behaviour of the generated code. It impedes debugging
and benchmarking.
• What is difficult to compile is usually also difficult to understand
Like a programmer, a compiler has to analyze (i.e., to "understand") a program
before it can translate it. If a language construct is difficult to analyze for a
compiler, it is often difficult to understand for a human, too. For example, in C
the construct
x (*y) ();
declares a function pointer y whereas the syntactically similar construct
x (*y);
denotes the call of a procedure x. Since both constructs can occur at the same
place in a program, it is difficult to keep them apart syntactically, both for the
compiler and for the programmer.

4

Hanspeter Mössenböck

• Instead of applying more sophisticated compilation techniques, simplify the
language
Of course, this advice can only be followed if the compiler writer is at the
same time also the language designer (as Wirth always was for his compilers).
If the implementers of C did not rely on powerful compiler construction tools
such as Yacc, the syntax of C would probably be simpler.
• Strive for short compilation times
It is not only important that the compiler generates efficient code but that it
also runs efficiently itself. These two goals are often in conflict and one has to
find a reasonable trade-off. Unfortunately, many commercial compilers
emphasize code efficiency and accept long compilation times. However, even
a compilation time of 5 to 10 seconds per module can be perceived as unduly
long, especially if several modules have to be recompiled after a program
modification. During software development, the productivity of programmers
can be boosted significantly if their work is not disrupted by unduly long
compilation pauses.
• A compiler is also an educational tool; it should inhibit bad programming style
Clearly a programming language can have an educational effect by providing
only clean and structured language features. But a compiler can have a similar
effect by introducing certain restrictions, which limit the size of a module, the
number of imported modules, or the number of global variables to a reasonable
maximum. It can also penalize questionable programming constructs such as
the access to non-local variables by not optimizing such cases. Although these
measures are not very popular with programmers they can have an educational
value and may prevent major style deficiencies. Such restrictions are a
problem, however, when source code is generated by a tool. In that case the
length of names or the size of modules cannot be influenced by the
programmer.
• Write the compiler in its own language
For Wirth, language design and compiler construction always went hand in
hand. He writes [21]: "I never could separate the design of a language from its
implementation, for a rigid definition without the feedback from the
construction of its compiler would seem to me presumptuous and
unprofessional." Writing a compiler in its own language not only proves that
the language is powerful enough but it has also the advantage that any
improvement of the generated code will make the compiler faster, too.
Other people will probably find even more principles. Wirth himself published some
general compiler guidelines in an early paper [16] but I wish he would do it again
today in the light of the Modula and Oberon projects.

Compiler Construction—The Art of Niklaus Wirth

5

In the remaining sections I will look at the individual compilation phases and try
to extract some general principles from them.

3 Scanning
There is not much to say about scanning. Wirth's scanners are hand-crafted and
optimized since the scanner is one of the most time consuming parts of a compiler.
Token positions for error messages are recorded as character positions and not as
line/column pairs, which simplifies book-keeping. Attributes and token positions are
returned in global variables to avoid parameter overhead for information that is not
needed with every token. The length of names is restricted to a reasonable amount
(24 or 32 characters). This can be seen as an educational means since unduly long
names can inhibit the readability of programs as much as cryptically short names do.

4 Parsing
The design of a parser is mainly determined by the selection of the parsing technique
and the error recovering strategy. Since the times of Pascal Wirth's compilers use
top-down recursive descent parsing. Recursive descent is the simplest and fastest
parsing technique. Although there are more powerful techniques, such as bottom-up
LALR parsing, recursive descent is powerful enough if the syntax of the language is
properly designed.
Recursive descent parsing requires the grammar of a language to be LL(1), that
means it must be possible to analyze the language top-down from left to right with a
single lookahead symbol. Whenever there is an alternative in the grammar, the
lookahead symbol must lead the way and tell the compiler which alternative to
select.
This is probably also the way how humans read and analyze a program.
Programmers read source code from left to right. Whenever they read a symbol
(keyword, name, number, etc.) they want to know which language construct it
belongs to. In contrast, bottom-up parsers read symbols without already deciding to
which language construct they belong. They simply push the symbols on a stack. As
soon as the end of the stack contains an appropriate right-hand side of a production
they reduce this symbol sequence to the corresponding nonterminal symbol. Thus,
bottom-up parsers delay the recognition of language constructs in a program as long
as possible, which makes them more powerful but also less natural for humans.
Programmers don't have large stacks in their heads. They do not want to read many
symbols ahead until they know what a program construct means. Programs are
notations for humans and not so much for machines. They should be easy to read
and therefore suitable for top-down parsing because it is more natural than bottomup parsing.

6

Hanspeter Mössenböck

Wirth always designed his languages so that recursive descent parsing could be
applied. This did not compromise the flexibility or the power of the languages. I
know of no language feature that must inherently be expressed in a way that is not
amenable to recursive descent parsing. I even think that the availability of powerful
bottom-up parser generators such as Yacc are responsible for the cryptic syntax of C.
If the designers of C were forced to make their language LL(1) it would have
probably become more readable.
Unfortunately even Wirth's languages contain a few constructs which are not
LL(1). To analyze such constructs, it is not sufficient to rely on syntactic
information. One must also take semantic information into account. For example, a
type guard x(T) in Oberon cannot be distinguished from a procedure call
syntactically. One has to consult the symbol table to find out if x is a procedure or a
variable. This ambiguity could have easily been avoided, for example by using curly
brackets instead of round brackets.
The second design decision in a parser is the selection of an appropriate error
handling technique. Although good error recovery is more difficult for hand-coded
recursive descent parsers than for table-driven parsers, there are a number of suitable
standard techniques. The general goal of error handling is to find as many errors as
possible in a single compilation and to avoid spurious errors. In addition, error
recovery should not slow down error-free parsing and should not blur the parsing
code with too many error handling statements.
The standard error recovery technique, which was also applied in the ETH Pascal
and Modula-2 compilers, is to pass to every parsing procedure the set of follower
symbols of the nonterminal that the procedure is to recognize. If an error is detected,
the input is skipped until a symbol is found that is a legal follower of a currently
active parsing procedure. The procedure stack is then unwound to the caller of this
procedure and parsing can continue there.
Although this technique can be systematically applied, it requires a large amount
of set operations and additional parameters of the parsing procedures. This slows
down error-free parsing and tends to hide the parsing actions between the error
recovery statements. Therefore, Wirth selected a different error recovery strategy for
his Oberon compiler:
Given the fact that computers become faster and faster, it is not necessary any
more to find all errors in a single compilation. If a compilation takes less than a
second, it is acceptable to find just some errors, correct them, and then compile the
program again to find the remaining errors. Thus the Oberon compiler does not
synchronize with the followers of every nonterminal symbol but selects a few
synchronization points in the grammar where recovery is particularly easy and safe.
These synchronization points are the beginning of a statement, a declaration or a
type where symbols such as "IF", "WHILE", "TYPE" or "RECORD" are expected.
These symbols are considered particularly safe, because they are unlikely to be
encountered in the wrong context. If the parser detects an error, it reports it and

Compiler Construction—The Art of Niklaus Wirth

7

continues up to the next synchronization point (i.e., to the beginning of the next
statement, declaration or type). There it skips input symbols until one of the
expected safe symbols occurs. Parsing can be resumed with that symbol. To avoid
spurious error messages, an error is only reported, if a certain number of tokens (e.g.
three) has been recognized successfully since the last error.
This simple error recovery technique can be implemented in about ten lines of
code. It does not slow down error-free parsing and does not burry the parsing
statements between error recovery actions. It is another example of Wirth's search
for simplicity and adequacy.

5 Symbol Table Handling
A symbol table maps the names in a program to their attributes such as their types
and their addresses. It consists of several subtables (scopes), one for every module,
procedure and record type. As a consequence of single pass compilation, which
Wirth preferred in his later compilers, the scopes can be organized in a stack-like
way. A new scope is opened at the top of the stack whenever the parser encounters a
procedure, and it is closed when the parser has finished with this procedure. This has
the advantage that most nodes of a scope can be discarded (or left for the garbage
collector) after the corresponding procedure has been parsed. Only the nodes for the
formal parameters have to be kept.
For the implementation of the symbol table, Wirth usually chose rather simple
data structures such as binary search trees or even linear lists. Although a hash table
would be more efficient for large scopes, it is more complicated to handle (e.g. to
traverse, to enlarge or to shrink) and consumes more memory. Since too many
names in a scope are often an indication of a questionable programming style, Wirth
felt that large scopes should not be encouraged. Measurements showed that for
procedure scopes, which usually don't contain more than ten names, a simple linear
list is sufficient.
Symbol tables also store the types of objects. For type compatibility checks,
Wirth again preferred the simpler technique (name equivalence) over the more
powerful but also more complicated technique (structural equivalence). With name
equivalence two types are the same if they are denoted by the same type name, i.e.
by the same type node in the symbol table. This can be checked with a simple
pointer comparison. In contrast, with structural equivalence (as it is used in Modula3) two types are the same if they have the same structure, i.e. if they have the same
components in the same order and if these components are structurally equivalent
pairwise. This requires a recursive comparison of the type structures and is not only
more complicated but also more time consuming.
A symbol table contains different kinds of objects (e.g., constants, variables,
procedures, etc.) as well as different kinds of structures (e.g., basic types, arrays,
records, etc.). With type extension, which was introduced in Oberon, it would seem

8

Hanspeter Mössenböck

natural to organize the objects and structures in type hierarchies, i.e. in an objectoriented way. There would be a base type with the common data of all object kinds
as well as several extensions with the special data for constants, variables or
procedures. Although this organization is elegant and is in fact recommended in
some compiler books, it is complicated to handle in practice. Since a compiler often
works with general objects, their actual extension types would have to be checked
using run-time type tests, and their fields would have to be accessed with guarded
type conversions. Alternatively, all operations on objects could be defined in terms
of dynamically bound methods, but this would require a large number of methods
and would compromise the efficiency of the compiler. Besides, there are some
operations which depend on more than one operand. This would require methods
that are dynamically bound depending on two or more objects (so-called multi
methods). Thus, Wirth chose to represent the fields of all objects in a single flat
record type and to use only those fields that belong to the respective object variant.
Although less elegant, this technique makes the access to object fields simple and
efficient. It is another example of an engineering trade-off, which can be found so
often in Wirth's compilers.

6 Separate Compilation
A major achievement of Modula-2 was the concept of separate compilation, which
emerged in the seventies in languages such as Mesa [13] and later also Ada. In
contrast to independent compilation (e.g. as in Fortran and C), where subprograms
can be compiled independently, separate compilation performs type checking across
module boundaries, thus guaranteeing that separately compiled modules can be
safely fit together by the linker. When a module is compiled the symbol tables of the
imported modules have to be made available as well so that the compiler can access
the types of the imported modules. In order to do that, the module interfaces have to
be described in so-called definition modules, i.e. in files that contain the declarations
of the exported objects.
Instead of recompiling the interface of a module every time this module is
imported, Wirth chose to store the precompiled symbol tables in so-called symbol
files [10]. Importing a module then simply requires the compiler to read the
respective symbol file and to add the imported symbol table to the symbol table of
the currently compiled module. Since symbol files are more compact than source
files they usually fit into a single disk block and can be read with a single disk
access.
A special problem is that the interface of a module can depend on the interfaces
of other modules. This would require to read several symbol files when a module is
imported. In order to avoid this, symbol files are made self-contained, i.e. they
contain also parts of the symbol tables of those modules on which they depend. Thus
every import requires just one symbol file to be read. However, it can happen now,

Compiler Construction—The Art of Niklaus Wirth

9

that a partial symbol table of the same module gets imported twice from different
symbol files. One has to make sure that these repeatedly imported symbol tables are
consistent. This can be done by attaching to every interface a unique version number
(either a time stamp derived from the compilation time of the interface or a
fingerprint derived from the source code of the definition module). By comparing
these version numbers, it can easily be checked, if repeatedly imported symbol
tables are the same. The version numbers of the imported modules are also stored in
the object file of the importing module so that the linker can check if the object files
to be linked match the interfaces that were seen during compilation. Thus separate
compilation not only performs type checking across module boundaries but also
makes sure that consistent versions of modules get bound together by the linker.
In a recent dissertation [8], Régis Crelier refined the version checking between
separately compiled modules and implemented it on a finer granularity. The problem
with the basic version checking technique is that whenever the interface of a module
is extended it gets a new version number. This invalidates all modules that import
this interface, even if they use only features that did not change. The new idea is to
provide distinct version numbers for all exported objects instead of a single version
number for the whole interface. If an object is added to the interface or some
exported object is changed, the version numbers of the other objects in the interface
remain unchanged. This technique gives a programmer much more flexibility in
changing interfaces without having to recompile dependent modules. The
implementation of this technique is non-trivial and is described in [8].
As a surprise to the software engineering community, Wirth abandoned the
definition modules in Oberon. However, he did not abandon the concept of separate
compilation, but only the necessity to write down a definition module explicitly. In
Oberon, there are just implementation modules. The programmer simply marks
those declarations that are to be exported in the source code, and the compiler
generates a symbol file from them during the normal compilation. A definition
module can still be generated from the symbol file of a module, but it is only
documentation. A similar technique is also used in other modern languages such as
Eiffel and Java.
The elimination of explicit definition modules caused much debate among
Modula-2 enthusiasts. In fact there are good arguments in favour of definition
modules, but there are also good arguments against them. A definition module is a
clear description of a module interface. It forces a programmer to think on the
design of a module first before starting its implementation. This is good software
engineering practice. On the other hand, definition modules introduce redundancy
into a language. For example, the interfaces of exported procedures have to be
described both in the definition module and in the implementation module. Every
time a procedure interface is modified, it has to be modified in two places, and the
compiler has to check the consistency of these declarations. Definition modules also
make it harder to selectively export record fields, because the record with the

10

Hanspeter Mössenböck

exported fields would have to be declared in the definition module and the rest of
the record in the implementation module. Finally, explicit definition modules
duplicate the number of source files and make it harder to maintain a large software
system.
Personally, I think that the elimination of definition modules was a good
decision, since it simplifies the language and its compiler. One can still start with
designing the interface of a module by writing a skeleton of the implementation
module that only contains the exported declarations. Later this skeleton can be filled
with flesh by adding private declarations and code. Nevertheless, I must admit that it
is tempting to mark the exported declarations only after the whole implementation
module has been written.

7 Code Generation
The code generation phase is not as systematical to describe and to implement as the
earlier compiler phases. It heavily depends on the peculiarities of the target machine,
which is often irregular and full of machine-specific details. Nevertheless the code
generators of Wirth's compilers show some common characteristics.
One principle that was already mentioned earlier is to generate reasonably good
code from the beginning. This can be achieved by delayed code generation using the
Item technique described below, by exploiting the special instructions and
addressing modes of the target machine, and by applying simple optimizations such
as constant folding and strength reduction.
The spine of Wirth's code generators is what I call the Item technique. It is
described in Wirth's compiler book [22] although it is not given a specific name. The
general idea is that every value that turns up during code generation is described by
a structure that is called an item. Items can describe constants, variables,
intermediate results, or condition codes resulting from compare instructions. Every
parsing procedure processes a language construct and returns an item that describes
the value of this construct. Items are thus the attributes of the imaginary syntax tree.
While the parsing procedures move top-down, the resulting items are propagated
bottom-up thus creating more complex items and generating code alongside.
For example, the code generation for the addition x + y first leads to two items
describing the variables x and y. In the next step, load instructions are generated for
x and y, and the items describing the variables are transformed into items describing
register contents, because the values of x and y are now in registers. Finally an add
instruction is emitted and the two register items are combined into a single register
item, which describes the result of the addition. If the target machine supports the
addition of memory operands, only x has to be loaded into a register and y is still
described by a variable item.
In many cases, items can be constructed without emitting code. For example, a
record selection r.f results in an item describing the record variable r and an object

Compiler Construction—The Art of Niklaus Wirth

11

describing the field f. In the next step, a new item is constructed that describes a
memory cell with the base address of r and the offset of f. No code has to be emitted
in this step. The transformation just leads to a compile-time addition of an offset.
Using such compile-time operations, code generation is delayed as long as possible.
An instruction is only generated if the result of the transformation cannot be
produced at compile time.
The kinds of items depend both on the source language and on the target
machine. For every object kind in the symbol table (i.e. for constants, variables,
fields, procedures, etc.) an item kind has to be provided that describes that object.
These are the source level items. Similarly, for every addressing mode of the target
machine (i.e. for immediate, register, register relative, indexed, etc.) an item kind
has to be provided that describes a value accessed by this addressing mode. These
are the machine level items. Code generation proceeds by transforming source level
items (e.g. variables) into machine level items accessed by a specific addressing
mode (e.g. register relative). All items are implemented by a flat record that encodes
the item kind and its attributes. Usually some source level items are identical to
some machine level items (for example, a constant item corresponds to an
immediate item, a variable item corresponds to a register relative item, etc). In this
case the two item kinds can be combined into a single item kind.
Having identified the necessary item kinds, one has to implement three sets of
procedures: (a) procedures that transform objects into source level items (e.g.
variable objects into variable items), (b) procedures that transform one item kind
into another (e.g. from register relative to register by loading the value into a
register), and (c) procedures that generate the actual machine instructions: for every
machine instruction op a, b one has to implement a procedure Op(a, b) where the
parameters a and b are items. This procedure is implemented by a case statement
considering all possible combinations of item kinds that a and b can assume. By
applying these guidelines, item kinds and the corresponding code generation
procedures can be developed systematically for any given machine.
Another important issue in code generation is register allocation. It is especially
important for RISC machines where a large number of registers are available and
should be exploited to have faster access to variables and intermediate values. Good
register allocation, however, cannot be done in a single pass. It requires life range
analysis of programs and a coverage of the life ranges by registers using a technique
that is based on graph colouring. Since Wirth's compilers do code generation in a
single pass a trade-off is used. The basic register allocation technique is quite
simple. Whenever a new register is needed, the register allocator returns an arbitrary
free register and marks it as used. At the end of every statement all registers are
returned. This technique is simple and efficient but does only exploit a small subset
of the available registers.
As a simple optimization, a compiler can try to keep the first few variables of
every procedure in registers. However, this can only be done if these variables are

12

Hanspeter Mössenböck

not accessed by nested procedures and if they are not passed to other procedures as
reference parameters. Instead of allocating a memory cell for such register variables,
the symbol table simply holds the register numbers where these variables are stored.
Of course, register variables must not be marked as free at the end of every
statement. One also has to save and restore register variables across procedure calls,
since the called procedures will use the registers for their own variables. A detailed
description of this technique can be found in [3].

8 Single Pass versus Multi Pass Compilers
Wirth designed his languages so that they can be compiled in a single pass: every
name has to be declared before it is used; pointers and procedures can be forward
declared in order to allow cyclic dependencies. If it was necessary a few years ago to
decompose a compiler into several passes due to memory restrictions, this is not an
argument any more. Current computers have enough memory to fit a whole
compiler into it. The predominant reason for multi-pass compilers is optimization.
Advanced optimization requires data flow analysis, which has to be performed on an
intermediate program representation such as the control flow graph. Building this
graph and analyzing it requires several passes. Wirth's compilers avoid sophisticated
optimizations, thus no intermediate program representation is necessary and the
code can be generated on the fly during parsing. Although the resulting code is not
as efficient as it could be, it compares well with the code generated by optimizing C
compilers on most platforms (measurements are given in [3]).
There is, however, another good reason to decompose a compiler into two
passes: namely portability. If a compiler is decomposed in a language-dependent
front end and a machine-dependent back end, porting the compiler to a new machine
only requires to rewrite the back end. This solution was chosen in the Oberon OP2
compiler [7], designed by Régis Crelier and ported to about 10 different platforms
by various PhD students. All these compilers use the same front end, which contains
the scanner, the parser and the symbol table handler. It generates an intermediate
program representation in the form of an abstract syntax tree and a symbol table.
These data structures are passed to the back end, which contains the code generator.
The syntax tree also allows for more optimizations in the back end such as the
elimination of array index checks in certain situations. The back end for the IBM
RS/6000 even applies heavy optimizations before code is generated [4]: it turned out
that this multiscalar processor requires instruction scheduling to achieve a code
quality comparable to the code generated by optimizing C compilers. Therefore the
syntax tree is first transformed into a control flow graph using a simplified version
of the static single assignment form [5]. Various optimizations such as common
subexpression elimination, constant propagation, loop invariant code motion and
instruction scheduling are applied to this graph before code is generated.

Compiler Construction—The Art of Niklaus Wirth

13

The intermediate program representation between front end and back end turned
out to be of value for still another purpose: mobile code. Michael Franz found a
compact encoding of the syntax tree and the symbol table, that can be transferred
over a network to some target machine, where it is compiled to machine code by a
code generating loader [9]. This is an alternative to the Java bytecodes and allows
"write once run everywhere" software. It is interesting to note that this idea was
published before the advent of Java.

9 Conclusion
This paper tried to highlight some principles that I believe are central in Wirth's
compilers. Maybe my view is biased and incomplete, but I think that it reflects some
valuable ideas. I always enjoyed studying Wirth's compilers. Although they contain
hardly a line of comment, their structure is so clear, simple and elegant that they are
a pleasure to read. The full source code of some of his compilers can be found in his
books on compiler construction [22] and on the Oberon Project [20]. The source
code of the OP2 compiler is also available online from ETH Zurich as part of the
Oberon distribution.
For Wirth, compiler construction is an engineering discipline and not pure
science. He prefers well-proven techniques over fashionable results. For him, a
compiler is a tool that has to be as useful and practical as possible and not so much a
piece of scientific experimentation.
I regard Wirth's way of compiler construction an art, not in the sense of black
magic but in the sense of a craft built on four decades of experience.
References
1. Amman U.: The Zurich Implementation, in "Pascal - The Language and its Implementation", Wiley
1978.
2. Ashenhurst R.L., Graham S.: ACM Turing Award Lectures – The First Twenty Years, 1966-1985.
ACM Press 1987.
3. Brandis M., Crelier R., Franz M., Templ J.: The Oberon System Family. Software—Practice and
Experience 25(12): 1331-1366, 1995.
4. Brandis M.: Optimizing Compilers for Structured Programming Languages. ETH dissertation, 1995.
5. Brandis M., Mössenböck H.: Single-Pass Generation of Static Single Assignment Form for Structured
Languages. ACM Transactions on Programming Languages 16(6): 1684-1698, 1994.
6. Brooks F.P.: The Mythical Man-Month—Essays on Software Engineering. Addison-Wesley, 1975.
7. Crelier R.: OP2: A Portable Oberon-2 Compiler. Proceedings of the 2nd International Modula-2
Conference, Loughborough, 1991.
8. Crelier R.: Separate Compilation and Module Extension. ETH dissertation, 1994.
9. Franz M.: Code-Generation on the Fly—a Key to Portable Software. ETH dissertation, 1994.
10. Geissmann L.B.: Separate Compilation in Modula-2 and the Structure of the Modula-2 Compiler on
the Personal Computer Lilith. ETH dissertation, 1983.

14

Hanspeter Mössenböck

11. Jacobi C.: Code Generation and the Lilith Architecture, ETH dissertation, 1983.
12. Le V.K.: The Module: A Tool for Structured Programming, ETH dissertation, 1978.
13. Mitchell J.G., Maybury W., Sweet R.: Mesa Language Manual, CSL-79-3, Xerox PARC, 1979.
14. Wirth N.: The Programming Language Pascal. Acta Informatica 1: 35-63, 1971.
15. Wirth N.: The Design of a Pascal Compiler. Software—Practice and Experience 1: 309-333, 1971.
16. Wirth N.: Programming Languages: What to Demand and How to Assess Them. Symposium on
Software Engineering, Belfast, April 1976.
17. Wirth N.: Design and Implementation of Modula. Software—Practice and Experience 7(1): 67-84,
1977.
18. Wirth N.: Programming in Modula-2. Springer-Verlag 1982.
19. Wirth N.: The Programming Language Oberon. Software—Practice and Experience 18(7): 671-690,
1985.
20. Wirth N., Gutknecht J.: Project Oberon. The Design of an Operating System and Compiler. AddisonWesely 1992.
21. Wirth N.: Recollections about the Development of Pascal. Proceedings History of Programming
Languages II, Sigplan Notices 28(3): 333-342, 1993.
22. Wirth N.: Compiler Construction. Addison-Wesely 1996.