C--:

a portable assembly language
that supports garbage collection

Simon Peyton Jones1, Norman Ramsey2, and Fermin Reig3
1

, Microsoft Research Ltd
, University of Virginia
, University of Glasgow

simonpj@microsoft.com
2
nr@cs.virginia.edu
3
reig@dcs.gla.ac.uk

Abstract. For a compiler writer, generating good machine code for a
variety of platforms is hard work. One might try to reuse a retargetable
code generator, but code generators are complex and dicult to use,
and they limit one's choice of implementation language. One might try
to use C as a portable assembly language, but C limits the compiler
writer's exibility and the performance of the resulting code. The wide
use of C, despite these drawbacks, argues for a portable assembly language. C-- is a new language designed expressly for this purpose. The
use of a portable assembly language introduces new problems in the support of such high-level run-time services as garbage collection, exception
handling, concurrency, pro ling, and debugging. We address these problems by combining the C-- language with a C-- run-time interface. The
combination is designed to allow the compiler writer a choice of sourcelanguage semantics and implementation techniques, while still providing
good performance.

1 Introduction
Suppose you are writing a compiler for a high-level language. How are you to
generate high-quality machine code? You could do it yourself, or you could try
to take advantage of the work of others by using an o -the-shelf code generator. Curiously, despite the huge amount of research in this area, only three
retargetable, optimizing code generators appear to be freely available: VPO
(Benitez and Davidson 1988), ML-RISC (George 1996), and the gcc back end
(Stallman 1992). Each of these impressive systems has a rich, complex, and illdocumented interface. Of course, these interfaces are quite di erent from one
another, so once you start to use one, you will be unable to switch easily to another. Furthermore, they are language-speci c. To use ML-RISC you must write
your front end in ML, to use the gcc back end you must write it in C, and so on.
All of this is most unsatisfactory. It would be much better to have one portable
assembly language that could be generated by a front end and implemented by
any of the available code generators. So pressing is this need that it has become common to use C as a portable assembly language (Atkinson et al. 1989;

Bartlett 1989b; Peyton Jones 1992; Tarditi, Acharya, and Lee 1992; Henderson,
Conway, and Somogyi 1995; Pettersson 1995; Serrano and Weis 1995). Unfortunately, C was never intended for this purpose | it is a programming language,
not an assembly language. C locks the implementation into a particular calling
convention, makes it impossible to compute targets of jumps, provides no support for garbage collection, and provides very little support for exceptions or
debugging (Section 2).
The obvious way forward is to design a language speci cally as a compiler target language. Such a language should serve as the interface between a compiler
for a high-level language (the front end ) and a retargetable code generator (the
back end ). The language would not only make the compiler writer's life much
easier, but would also give the author of a new code generator a ready-made customer base. In an earlier paper we propose a design for just such a language, C-(Peyton Jones, Oliva, and Nordin 1998), but the story does not end there. Separating the front and back ends greatly complicates run-time support. In general,
the front end, back end, and run-time system for a programming language are
designed together. They cooperate intimately to support such high-level features
as garbage collection, exception handling, debugging, pro ling, and concurrency
| high-level run-time services. If the back end is a portable assembler like C--,
we want the cooperation without the intimacy; an implementation of C-- should
be independent of the front ends with which it will be used.
One alternative is to make all these high-level services part of the abstraction offered by the portable assembler. For example, the Java Virtual Machine, which
provides garbage collection and exception handling, has been used as a target for languages other than Java, including Ada (Taft 1996), ML (Benton,
Kennedy, and Russell 1998), Scheme (Clausen and Danvy 1998), and Haskell
(Wakeling 1998). But a sophisticated platform like a virtual machine embodies
too many design decisions. For a start, the semantics of the virtual machine may
not match the semantics of the language being compiled (e.g., the exception semantics). Even if the semantics happen to match, the engineering tradeo s may
di er dramatically. For example, functional languages like Haskell or Scheme
allocate like crazy (Diwan, Tarditi, and Moss 1993), and JVM implementations
are typically not optimised for this case. Finally, a virtual machine typically
comes complete with a very large infrastructure | class loaders, veri ers and
the like | that may well be inappropriate. Our intended level of abstraction is
much, much lower.
Our problem is to enable a client to implement high-level services, while still
using C-- as a code generator. As we discuss in Section 4, supporting highlevel services requires knowledge from both the front and back ends. The insight
behind our solution is that C-- should include not only a low-level assembly
language, for use by the compiler, but also a low-level run-time system, for use
by the front end's run-time system. The only intimate cooperation required is
between the C-- back end and its run-time system; the front end works with
C-- at arm's length, through a well-de ned language and a well-de ned run2

time interface (Section 5). This interface adds something fundamentally new:
the ability to inspect and modify the state of a suspended computation.
It is not obvious that this approach is workable. Can just a few assembly-language
capabilities support many high-level run-time services? Can the front-end runtime system easily implement high-level services using these capabilities? How
much is overall eciency compromised by the arms-length relationship between
the front-end runtime and the C-- runtime? We cannot yet answer these questions de nitively. Instead, the primary contributions of this paper are to identify
needs that are common to various high-level services, and to propose speci c
mechanisms to meet these needs. We demonstrate only how to use C-- to implement the easiest of our intended services, namely garbage collection. Re ning
our design to accommodate exceptions, concurrency, pro ling, and debugging
has emerged as an interesting research challenge.

2 It's impossible | or it's C
The dream of a portable assembler has been around at least since UNCOL
(Conway 1958). Is it an impossible dream, then? Clearly not: C's popularity as
an assembler is clear evidence that a need exists, and that something useful can
be done.
If C is so popular, then perhaps C is perfectly adequate? Not so. There are many
diculties, of which the most fundamental are these:

{ The C route rewards those who can map their high-level language rather

directly onto C. A high-level language procedure becomes a C procedure,
and so on. But this mapping is often awkward, and sometimes impossible.

For example, some source languages fundamentally require tail-call optimisation; a procedure call whose result is returned to the caller of the current
procedure must be executed in the stack frame of the current procedure. This
optimisation allows iteration to be implemented eciently using recursion.
More generally, it allows one to think of a procedure as a labelled extended
basic block that can be jumped to, rather than as sub-program that can only
be called. Such procedures give a front end the freedom to design its own
control ow.
It is very dicult to implement the tail-call optimisation in C, and no C compiler known to us does so across separately compiled modules. Those using C
have been very ingenious in nding ways around this de ciency (Steele 1978;
Tarditi, Acharya, and Lee 1992; Peyton Jones 1992; Henderson, Conway, and
Somogyi 1995), but the results are complex, fragile, and heavily tuned for
one particular implementation of C (usually gcc).
3

{ A C compiler may lay out its stack frames as it pleases. This makes it dicult

for a garbage collector to nd the live pointers. Implementors either arrange
not to keep pointers on the C stack, or they use a conservative garbage
collector. These restrictions are Draconian.

{ The unknown stack-frame layout also complicates support for exception

handling, debugging, pro ling, and concurrency. For example, an exceptionhandling mechanism needs to walk the stack, perhaps removing stack frames
as it goes. Again, C makes it essentially impossible to implement such mechanisms, unless they can be closely mapped onto what C provides (i.e., setjmp
and longjmp).

{ A C compiler has to be very conservative about the possibility of memory

aliasing. This seriously limits the ability of the instruction scheduler to permute memory operations or hoist them out of a loop. The front-end compiler
often knows that aliasing cannot occur, but there is no way to convey this
information to the C compiler.

So much for fundamental issues. C also lacks the ability to control a number
of important low-level features, including returning multiple values in registers
from a procedure, mis-aligned memory accesses, arithmetic, data layout, and
omitting range checks on multi-way jumps.
In short, C is awkward to use as a portable assembler, and many of these diculties translate into performance hits. A portable assembly language should be
able to o er better performance, as well as greater ease of use.

3 An overview of C-In this section we give an overview of the design of C--. Fuller descriptions
can be found in Peyton Jones, Oliva, and Nordin (1998) and in Reig and Peyton
Jones (1998). Figure 1 gives examples of some C-- procedures that give a avour
of the language. Despite its name C-- is by no means a subset of C, as will become
apparent; C was simply our jumping-o point.

3.1 What is a portable assember?
C-- is an assembly language | an abstraction of hardware | not a high-level
programming language. Hardware provides computation, control ow, memory,
and registers; C-- provides corresponding abstractions.

{

expressions and assignments are abstractions of computation. C-- provides a rich set of computational operators, but these operators work only
on machine-level data types: bytes, words, etc. The expression abstraction

C--

4

/* Loops */
export sp3;
sp3( bits32 n ) {
bits32 s, p;
s = 1; p = 1;

/* Ordinary recursion */
export sp1;
sp1( bits32 n ) {
bits32 s, p;
if n == 1 {
return( 1, 1 );
} else {
s, p = sp1( n-1 );
return( s+n, p*n );
}
}

loop:
if n==1 {
return( s, p );
} else {
s = s+n;
p = p*n;
n = n-1;
goto loop;
}

/* Tail recursion */
export sp2;
sp2( bits32 n ) {
jump sp2_help( n, 1, 1 );
}

}

sp2_help( bits32 n, bits32 s, bits32 p ) {
if n==1 {
return( s, p );
} else {
jump sp2_help( n-1, s+n, p*n )
}
}

Fig. 1. Three functions that compute the sum P =1 and product Q =1 , written
n
i

in C--.

i

n
i

i

hides the particular combination of machine instructions needed to compute values, and it hides the machine registers that may be needed to hold
intermediate results.

{

C--'s goto and if statements are abstractions of control ow. (For convenience, C-- also provides structured control- ow constructs.) The if abstraction hides the machine's \condition codes;" branch conditions are arbitrary
Boolean expressions.

{

C-- treats memory much as the machine does, except that addresses used
in C-- programs may be arbitrary expressions. This abstraction hides the
limitations of the machine's addressing modes.

{

C-- variables are an abstraction of registers. A C-- back end puts as many
variables as possible in registers; others go in memory. This abstraction hides
the number and conventional uses of the machine's registers.

{ In addition, C-- provides a procedure abstraction, the feature that looks least

like an abstraction of a hardware primitive. However, many processor archi5

tectures provide direct support for procedures, although the nature of that
support varies widely (procedure call or multiple register save instructions,
register windows, link registers, branch prediction for return instructions,
and so on). Because of this variety, calling conventions and activation-stack
management are notoriously architecture dependent and hard to specify. C-therefore o ers prodedures as a primitive abstraction, albeit in a slightly unusual form (Section 3.4).
Our goal is to make it easy to retarget front ends, not to make every C-- program
runnable everywhere. Although every C-- program has a well-de ned semantics
that is independent of any machine, a front end translating a single source program might need to generate two di erent C-- programs for two di erent target
architectures. For example, a C-- program generated for a machine without
oating-point instructions would be di erent from a C-- program generated for
a machine without oating-point instructions.
Our goal contrasts sharply with \write once; run anywhere," the goal of such distribution formats as Java class les, Juice les (Franz 1997), and ANDF or TenDRA (Macrakis 1993). These formats are abstractions of high-level languages,
not of underlying machines. Their purpose is binary portability, and they retain enough high-level language semantics to permit e ective compilation at the
remote installation site.
Even though C-- exposes a few architecture-speci c details, like word size, the
whole point is to hide those details, so that the front end job can largely independent of the target architecture. A good C-- implementation therefore must
do substantial architecture-dependent work. For example:

{
{
{
{
{

Register allocation.
Instruction selection, exploiting complex instructions and addressing modes.
Instruction scheduling.
Stack-frame layout.
Classic back-end optimisations such as common-subexpression elimination
and copy propagation.

{ If-conversion for predicated architectures.
Given these requirements, C-- resembles a typical compiler's intermediate language more than a typical machine's assembly language.

3.2 Types
C-- supports a bare minimum of data types: a family of bits types (bits8,
bits16, bits32, bits64), and a family of oating-point types (float32, float64,

6

float80). These types encode only the size (in bits) and the kind of register
(general-purpose or oating-point) required for the datum.
Not all types are available on all machines; for example, a C-- program emitted
by a front-end compiler for a 64-bit machine might be rejected if fed to a C-implementation for a 32-bit machine. It is easy to tell a front end how big to
make its data types, and doing so makes the front end's job easier in some ways;
for example, it can compute o sets statically.
The bits types are used for characters, bit vectors, integers, and addresses
(pointers). On each architecture, a bits type is designated the \native word
type " of the machine. A \native code-pointer type " and \native data-pointer
type " are also designated; exported and imported names must have one of these
pointer types. On many machines, all three types are the same, e.g, bits32.

3.3 Static allocation
o ers detailed control of static memory layout, much as ordinary assemblers do. A data block consists of a sequence of labels, initialised data values,
uninitialised arrays, and alignment directives. For example:

C--

data {
foo:

bits32{10};
/* One bits32 initialised to 10 */
bits32{1,2,3,4}; /* Four initialised bits32's */
bits32[8];
/* Uninitialised array of 8 bits32's */

baz1:
baz2: bits8
end:

/* An uninitialised byte */

}

Here foo is the address of the rst bits32, baz1 and baz2 are both the address
of the bits8, and end is the address of the byte after the bits8. The labels
foo, baz1, etc, should be thought of as addresses, not as memory locations.
They are all immutable constants of the native data-pointer type; they cannot
be assigned to.
How, then, can one access the memory at location foo? Memory accesses (loads
and stores) are typed, and denoted with square brackets. Thus the statement:
bits32[foo] = bits32[foo] + 1;

loads a bits32 from the location whose address is in foo, adds one to it, and
stores it at the same location. The mnemonic for this syntax is to think of bits32
as a C-like array representing all of memory, and bits32[foo] as a particular
element of that array. The semantics of the address is not C-like, however; the
expression in brackets is the byte address of the item. Further, foo's type is
always the native data-pointer type; the type of value stored at foo is speci ed
by the load or store operation itself. So this is perfectly legal:
bits8[foo+2] = bits8[foo+2] - 1;

7

This statement modi es only the byte at address foo+2.
Unlike C, C-- has no implicit alignment or padding. Therefore, the address
relationships between the data items within a single data block are machineindependent; for example, baz1 = foo +52. An explicit align directive provides
alignment where that is required.
C-- supports multiple, named data sections. For example:
data "debug" {
...
}

This syntax declares the block of data to belong to the section named "debug".
Code is by default placed in the section "text", and a data directive with no
explicit section name defaults to the section "data". Procedures can be enclosed
in code "mytext" { ... } to place them in a named section "mytext".
C-- expects that, when linking object les, the linker concatenates sections with
the same name. (For backwards compatibility with some existing linkers, front
ends may wish to emit an alignment directive at the beginning of each C-section.) C-- assigns no other semantics to the names of data sections, but particular implementations may assign machine-dependent semantics. For example,
a MIPS implementation might assume that data in sections named "rodata" is
read-only.

3.4 Procedures
supports procedures that are both more and less general than C procedures
| for example, C-- procedures o er multiple results and full tail calls, but they
have a xed number of arguments. Speci cally:

C--

{ A C-- procedure, such as sp1 in Figure 1, has parameters, such as n, and local

variables, such as s and p. Parameters and variables are mapped onto machine registers where possible, and only spilled to the stack when necessary.
In this absolutely conventional way C-- abstracts away from the number of
machine registers actually available. As with registers, C-- provides no way
to take the address of a parameter or local variable.

{

C-- supports fully general tail calls, identi ed as \jumps". Control does not
return from jumps, and C-- implementations must deallocate the caller's
stack frame before each jump. For example, the procedure sp2_help in Figure 1 uses a jump to implement tail recursion.

{

C-- supports procedures with

multiple results, just as it supports procedures
with multiple arguments. Indeed, a return is somewhat like a jump to a
procedure whose address happens to be held in the topmost activation record
on the control stack, rather than being speci ed explicitly. All the procedures
8

in Figure 1 return two results; procedure sp1 contains a call site for such a
procedure.
{ A C-- procedure call is always a complete statement, which passes expressions as parameters and assigns results to local variables. Although high-level
languages allow a call to occur in an expression, C-- forbids it. For example,
it is illegal to write
r = f( g(x) );

/* illegal */

because the result returned by g(x) cannot be an argument to f. Instead,
one must write two separate calls:
y = g(x);
r = f(y);

This restriction makes explicit the order of evaluation, the location of each
call site, and the names and types of temporaries used to hold the results
of calls. (For similar reasons, assignments in C-- are statements, not expressions, and C-- operators have no side e ects. In particular, C-- provides no
analog of C's \p++.")
{ To handle high-level variables that can't be represented using C--'s primitive
types, C-- can be asked to allocate named areas in the procedure's activation
record.
f (bits32 x) {
bits32 y;
stack { p : bits32;
q : bits32[40];
}
/* Here, p and q are the addresses of the relevant chunks
of data. Their type is the native data-pointer type. */
}

is rather like data; it has the same syntax between the braces, but it
allocates on the stack. As with data, the names are bound to the addresses
of the relevant locations, and they are immutable. C-- makes no provision
for dynamically-sized stack allocation (yet).
{ The name of a procedure is a C-- expression of native code-pointer type. The
procedure speci ed in a call statement can be an arbitrary expression, not
simply the statically-visible name of a procedure. For example, the following
statements are both valid, assuming the procedure sp1 is de ned in this
compilation unit, or imported from another one:
stack

9

bits32[ptr] = sp1;
...
r,s = (bits32[ptr])( 4 );

/* Store procedure address */
/* Call stored procedure */

{ A C-- procedure, like sp3 in Figure 1, may contain gotos and labels, but

they serve only to allow a textual representation of the control- ow graph.
Unlike procedure names, labels are not values, and they have no representation at run time. Because this restriction makes it impossible for front
ends to build jump tables from labels, C-- includes a switch statement, for
which the C-- back end generates ecient code. The most ecient mix of
conditional branches and indexed branches may depend on the architecture
(Bernstein 1985).

Jump tables of procedure addresses (rather than labels) can be built, of
course, and a C-- procedure can use the jump statement to make a tail call
to a computed address.

3.5 Calling conventions
The calling convention for C-- procedures is entirely a matter for the C-- implementation | we call it the standard C-- calling convention. In particular, C-need not use the C calling convention.
The standard calling convention places no restrictions on the number of arguments passed to a function or the number of results returned from a function. The
only restrictions are that the number and types of actual parameters must match
those in the procedure declaration, and similarly, that the number and types of
values returned must match those expected at the call site. These restrictions enable ecient calling sequences with no dynamic checks. (A C-- implementation
need not check that C-- programs meet these restrictions.)
We note the following additional points:

{ If a C-- function does not \escape" | if all sites where it is called can

be identi ed statically | then the C-- back end is free to create and use
specialised instances, with specialised calling conventions, for each call site.
Escape analysis is necessarily conservative, but a function may be deemed
to escape only if its name is used other than in a call, or if it is named in
an export directive.

{ Support for unrestricted tail calls requires an unusual calling convention, so
that a procedure making a tail call can deallocate its activation record while
still leaving room for parameters that do not t in registers.

{

C-- allows the programmer to specify

a particular calling convention (chosen
from a small set of standard conventions) for an individual procedure, so that
10

C-- code can interoperate with foreign code. For example, even though C--'s
standard calling convention may di er from C's, one can ask for a particular
procedure to use C's convention, so that the procedure can be called from
an external C program. Similarly, external C procedures can be called from
a C-- procedure by specifying the calling convention at the call site.

Some C-- implementations may provide two versions of C's calling convention. The lightweight version would be like an ordinary C call, but it would be
useful only when the C procedures terminate quickly; if control were transferred to the run-time system while a C procedure was active, the run-time
system might not be able to nd values that were in callee-saves registers
at the time of the call. The heavyweight version would keep all its state on
the stack, not in callee-saves registers, so the run-time system could handle
a stack containing a mix of C and C-- activations.

3.6 Miscellaneous
Like other assemblers, C-- gives programmers the ability to name compile-time
constants, e.g., by
const GC = 2;

variables may be declared global, in which case the C-- compiler attempts
to put them in registers. For example, given the declaration

C--

global {
bits32 hp;
}

the implementation attempts to put variable hp in a register, but if no register
is available, it puts hp in memory. C-- programs use and assign to hp without
knowing whether it is in a register or in memory. Unlike storage allocated by
data, there is no such thing as \the address of a global", so memory stores to
unknown addresses cannot a ect the value of a global. This permits a global to be
held in a register and, even if it has to be held in memory, the optimiser does not
need to worry about re-loading it after a store to an unknown memory address.
All separately compiled modules must have identical global declarations, or
horribly strange things will happen.
global declarations may name speci c (implementation-dependent) registers,
for example:
global {
bits32 hp
"%ebx";
bits32 hplim "%esi";
}

We remarked in Section 2 that the front end may know a great deal about
(lack of) aliasing between memory access operations. We do not yet have a way
11

to express such knowledge in C--, but an adaptation of Novack, Hummel, and
Nicolau (1995) looks promising.

4 The problem of run-time support
When a front end and back end are written together, as part of a single compiler,
they can cooperate intimately to support high-level run-time services, such as
garbage collection, exception handling, pro ling, concurrency, and debugging. In
the C-- framework, the front and back ends work at arm's length. As mentioned
earlier, our guiding principle is this:
should make it possible to implement high-level run-time services,
but it should not actually implement any of them. Rather, it should
provide just enough \hooks" to allow the front-end run-time system to
implement them.
C--

Separating policy from mechanism in this way is easier said than done. It might
appear more palatable to incorporate garbage collection, exception handling,
and debugging into the C-- language, as (say) the Java Virtual Machine does.
But doing so would guarantee that C-- would never be used. Di erent source
languages require di erent support, di erent object layouts, and di erent exception semantics | especially when performance matters. No one back end could
satisfy all customers.
Why is the separation between front and back end hard to achieve? High-level
run-time services need to inspect and modify the state of a suspended program. A
garbage collector must nd, and perhaps modify, all live pointers. An exception
handler must navigate, and perhaps unwind, the call stack. A pro ler must
correlate object-code locations with source-code locations, and possibly navigate
the call stack. A debugger must allow the user to inspect, and perhaps modify,
the values of variables. All of these tasks require information from both front
and back ends. The rest of this section elaborates.

Finding roots for garbage collection. If the high-level language requires accurate garbage collection, then the garbage collector must be able to nd all
the roots that point into the heap. If, furthermore, the collector supports
compaction, the locations of heap objects may change during garbage collection, and the collector must be able to redirect each root to point to the
new location of the corresponding heap object.

The diculty is that neither the front end nor the back end has all the knowledge needed to nd roots at run time. Only the front end knows which sourcelanguage variables, and therefore which C-- variables, represent pointers into
the heap. Only the back end, which maps variables to registers and stack
slots, knows where those variables are located at run time. Even the back
12

end can't always identify exact locations; variables mapped to callee-saves
registers may be saved arbitrarily far away in the call stack, at locations not
identi able until run time.

Printing values in a debugger. A debugger needs compiler support to print

the values of variables. For this task, information is divided in much the
same way as for garbage collection. Only the front end knows how sourcelanguage variables are mapped onto (collections of) C-- variables. Only the
front end knows how to print the value of a variable, e.g., as determined by
the variable's high-level-language type. Only the back end knows where to
nd the values of the C-- variables.

Loci of control A debugger must be able to identify the \locus of control" in

each activation, and to associate that locus with a source-code location. This
association is used both to plant breakpoints and to report the source-code
location when a program faults.

An exception mechanism also needs to identify the locus of control, because
in some high-level languages, that locus determines which handler should
receive the exception. When it identi es a handler, the exception mechanism
unwinds the stack and changes the locus of control to refer to the handler.
A pro ler must map loci of control into entities that are pro led: procedures,
statements, source-code regions, etc.
At run time, loci of control are represented by values of the program counter
(e.g., return addresses), but at the source level, loci of control are associated
with statements in a high-level language or in C--. Only the front end knows
how to associate high-level source locations or exception-handler scopes with
C-- statements. Only the back end knows how to associate C-- statements
with the program counter.

Liveness. Depending on the semantics of the original source language, the locus

of control may determine which variables of the high-level language are visible. Depending on the optimizations performed by the back end, the locus
of control may determine which C-- variables are live, and therefore have
values. Debuggers should not print dead variables. Garbage collectors should
not trace them; tracing dead pointers could cause space leaks. Worse, tracing a register that once held a root but now holds a non-pointer value could
violate the collector's invariants. Again, only the front end knows which variables are interesting for debugging or garbage collection, but only the back
end knows which are live at a given locus of control.

Exception values. In addition to unwinding the stack and changing the locus

of control, the exception mechanism may have to communicate a value to an
exception handler. Only the front end knows which variable should receive
this value, but only the back end knows where variables are located.
13

Succinctly stated, each of these operations must combine two kinds of information:

{ Information that only the front end has:


Which C-- parameters and local variables are heap pointers.



How to map source-language variables to C-- variables and how to associate source-code locations with C-- statements.



Which exception handlers are in scope at which C-- statements, and
which variables are visible at which C-- statements.

{ Information that only the back end has:


Whether each C-- local variable and parameter is live, where it is located (if live), and how this information changes as the program counter
changes.



Which program-counter values correspond to which C-- statements.



How to nd activations of all active procedures and how to unwind
stacks.

5 Support for high-level run-time services
The main challenge, then, is arranging for the back end and front end to share
information, without having to implement them as a single integrated unit. In
this section we describe a framework that allows this to be done. We focus on
garbage collection as our illustrative example. Other high-level run-time services
can t in the same framework, but each requires service-speci c extensions; we
sketch some ideas in Section 7.
In what follows, we use the term \variable" to mean either a parameter of the
procedure or a locally-declared variable.

5.1 The framework
We assume that executable programs are divided into three parts, each of which
may be found in object les, libraries, or a combination.

{ The front end compiler translates the high-level source program into one or
more C-- modules, which are separately translated to generated object code
by the C-- compiler.
14

{ The front end comes with a (probably large) front-end run-time system.

This run-time system includes the garbage collector, exception handler, and
whatever else the source language needs. It is written in a programming
language designed for humans, not in C--; in what follows we assume that
the front end run-time system is written in C.

{ Every C-- implementation comes with a (hopefully small) C-- run-time sys-

tem. The main goal of this run-time system is to maintain and provide access
to information that only the back end can know. It makes this information
available to the front end run-time system through a C-language run-time
interface, which we describe in Section 5.2. Di erent front ends may interoperate with the same C-- run-time system.

To make an executable program, we link generated object code with both runtime systems.
In outline, C-- can support high-level run-time services, such as garbage collection, as follows. When garbage collection is required, control is transferred to
the front-end run-time system (Section 6.1). The garbage collector then walks
the C-- stack, by calling access routines provided by the C-- run-time system
(Section 5.2). In each activation record on the C-- stack, the garbage collector
nds the location of each live variable, using further procedures provided by
the C-- runtime. However, the C-- runtime cannot know which of these variables holds a pointer. To answer this question, the front-end compiler builds a
statically-allocated data block that identi es pointer variables, and it uses a span
directive (Section 5.3) to associate this data block with the corresponding procedure's range of program counter values. The garbage collector combines these
two sources of information to decide whether to treat the procedure's variable
as a root. Section 5.4 describes one possible garbage collector in more detail.

5.2 The C-- run-time interface
This section presents the core run-time interface provided by the C-- run-time
system. Using this interface, a front-end run-time system can inspect and modify
the state of a suspended C-- computation. Rather than specify representations of
a suspended computation or its activation records, we hide them behind simple
abstractions. These abstractions are presented to the front-end run-time system
through a set of C procedures.
The state of a C-- computation consists of some saved registers and a logical
stack of procedure activations. This logical stack is usually implemented as some
sort of physical stack, but the correspondence between the two may not be very
direct. Notably, callee-saves registers that logically belong with one activation are
not necessarily stored with that activation, or even with the adjacent activation;
they may be stored in the physical record of an activation that is arbitrarily
far away. This problem is the reason that C's setjmp and longjmp functions
15

don't necessarily restore callee-saves registers, which is why some C compilers
make pessimistic assumptions when compiling procedures containing setjmp
(Harbison and Steele 1995, x19.4).
We hide this complexity behind a simple abstraction, the activation. The idea of
an activation of procedure P is that it approximates the state the machine will
be in when control returns to P. The approximation is not completely accurate
because other procedures may change the global store or P's stack variables
before control returns to P. At the machine level, the activation corresponds to
the \abstract memory" of Ramsey (1992), Chapter 3, which gives the contents
of memory, including P's activation record (stack frame), and of registers.
The activation abstraction hides machine-dependent details and raises the level
of abstraction to the C-- source-code level. In particular, the abstraction hides:

{ The layout of an activation record, and the encoding used to record that
layout for the bene t of the front end runtime,

{ The details of manipulating callee-saves registers (whether to use callee saves
registers is entirely up to the C-- implementation), and

{ The direction in which the stack grows.
All of these matters become private to the back end and the C-- runtime.
In the C-- run-time interface, an activation record is represented by an activation
handle, which is a value of type activation. Arbitrary registers and memory
addresses are represented by variables, which are referred to by number.
The procedures in the C-- run-time interface include:
asks an activation handle for the location of any parameter or local variable in the activation
record to which the handle refers. The variables of a procedure are indexed
by numbering them in the order in which they are declared in that procedure,
starting with zero. FindVar returns the address of the location containing
the value of the speci ed variable. The front end is thereby able to examine
or modify the value. FindVar returns NULL if the variable is dead. It is a
checked runtime error to pass a var_index that is out of range.

void *FindVar( activation *a, int var_index )

.

void FirstActivation( tcb *t, activation *a ) When execution of a
C-- program is suspended, its state is captured by the C-- run-time system. FirstActivation uses that state to initialise an activation handle that

corresponds to the procedure that will execute when the program's execution
is resumed.

int NextActivation( activation *a ) modi es the activation handle a to
refer to the activation record of a's caller, or more precisely, to the activation to which control will return when a returns. NextActivation returns

16

nonzero if there is such an activation record, and zero if there is not. That is,
returns zero if and only if activation handle a refers
to the bottom-most record on the C-- stack.

NextActivation(&a)

Notice that FindVar always returns a pointer to a memory location, even though
the speci ed variable might be held in a register at the moment at which garbage collection is required. But by the time the garbage collector is walking the
stack, the C-- implementation must have stored all the registers away in memory somewhere, and it is up to the C-- run-time system to gure out where the
variable is, and to return the address of the location holding it.
Names bound by stack declarations are considered variables for purposes of
FindVar, even though they are immutable. For such names, FindVar returns
the value that the name has in C-- source code, i.e., the address of the stackallocated block of storage. Storing through this address is meaningful; it alters
the contents of the activation record a. Stack locations are not subject to liveness
analysis.

5.3 Front-end information
Suppose the garbage collector is examining a particular activation record. It can
use FindVar to locate variable number 1, but how can it know whether that
variable is a pointer? The front end compiler cooperates with C-- to answer this
question, as follows:

{ The front end builds a static initialised data block (Section 3.3), or descrip-

tor, that says which of the parameters and local variables of a procedure are
heap pointers. The format of this data block is known only to the front-end
compiler and run-time system; the C-- run-time system does not care.
{ The front end tells C-- to associate a particular range of program counters
with this descriptor, using a span directive.
{ The C-- run-time system provides a call, GetDescriptor, that maps an
activation handle to the descriptor associated with the program counter at
which the activation is suspended.

We discuss each of these steps in more detail. As an example, suppose we have
a function f(x, y), with no other variables, in which x holds a pointer into
the heap and y holds an integer. The front end can encode the heap-pointer
information by emitting a data block, or descriptor, associating 1 with x and
0 with y:
data {
gc1: bits32 2;
bits8 1;
bits8 0;
}

/* this procedure has two variables */
/* x is a pointer */
/* y is a non-pointer */

17

This encoding does not use the names of the variables; instead, each variable is
assigned an integer index, based on the textual order in which it appears in the
de nition of f. Therefore x has index 0 and y has index 1.
Many other encodings are possible. The front end might emit a table that uses
one bit per variable, instead of one byte. It might emit a list of the indices of
variables that contain pointers. It might arrange for pointer variables to have
continuous indices and emit only the rst and last such index.1 The key property
of our design is that the encoding matters only to the front end and its runtime
system. C-- does not know or care about the encoding.
To associate the garbage-collection descriptor with f, the front end places the
de nition of f in a C-- span :
span GC gc1 {
f( bits32 x, bits32 y ) {
...code for f...
}
}

A span may apply to a sequence of function de nitions, or to a sequence of
statements within a function de nition. In this case, the span applies to all of f.
There may be several independent span mappings in use simultaneously, e.g.,
one for garbage collection, one for exceptions, one for debugging, and so on.
C-- uses integer tokens to distinguish these mappings from one another; GC is
the token in the example above. C-- takes no interest in the tokens; it simply
provides a map from a (token, PC) pair to an address. Token values are usually
de ned using a const declaration (Section 3.6).
When the garbage collector (say) walks the stack, using an activation handle a,
it can call the following C-- run-time procedure:
void *GetDescriptor( activation *a, int token ) returns the address of
the descriptor associated with the smallest C-- span tagged with token and
containing the program point where the activation a is suspended.

There are no constraints on the form of the descriptor that gc1 labels; that form
is private to the front end and its run-time system. All C-- does is transform
span directives into mappings from program counters to values.
The front end may emit descriptors and spans to support other services, not just
garbage collection. For example, to support exception handling or debugging, the
front end may record the scopes of exception handlers or the names and types
of variables. C-- supports multiple spans, but they must not overlap. Spans can
nest, however; the innermost span bearing a given token takes precedence. One
can achieve the e ect of overlapping by binding the same data block to multiple
spans.
1

This scenario presumes the front end has the privilege of reordering parameters;
otherwise, it would have to use some other scheme for parameters.

18

5.4 Garbage collection
This section explains in more detail how the C-- run-time interface might be used
to help implement a garbage collector. Our primary concern is how the collector
nds, and possibly updates, roots. Other tasks, such as nding pointers in heap
objects and compacting the heap, can be managed entirely by the front-end
run-time system (allocator and collector) with no support from the back end.
C-- takes no responsibility for heap pointers passed to code written in other
languages. It is up to the front end to pin such pointers or to negotiate changing
them with the foreign code. We defer until Section 6.1 the question of how control
is transferred from running C-- code to the garbage collector.
To help the collector nd roots in global variables, the front end can arrange to
deposit the addresses of such variables in a special data section. To nd roots in
local variables, the collector must walk the activation stack. For each activation
handle a, it calls GetDescriptor(&a, GC) to get the garbage-collection descriptor deposited by the front end. The descriptor tells it how many variables there
are and which contain pointers. For each pointer variable, it gets the address
of that variable by calling FindVar. If the result is NULL, the variable is dead,
and need not be traced. Otherwise the collector marks or moves the object the
variable points to, and it may redirect the variable to point to the object's new
location. Note that the collector need not know which variables were stored on
the stack and which were kept in callee-saves registers; FindVar provides the
location of the variable no matter where it is. Figure 2 shows a simple copying
collector based on Appel (1989), targeted to the C-- run-time interface and the
descriptors shown in Section 5.3.
A more complicated collector might have to do more work to decide which variables represent heap pointers. TIL is the most complicated example we know of
(Tarditi et al. 1996). In TIL, whether a parameter is a pointer may depend on
the value of another parameter. For example, a C-- procedure generated by TIL
might look like this:
f( bits32 ty, bits32 a, bits32 b ) { ... }

The rst parameter, ty, is a pointer to a heap-allocated type record. It is not
statically known, however, whether a is a heap pointer. At run time, the rst eld
of the type record that ty points to describes whether a is a pointer. Similarly,
the second eld of the type record describes whether b is a pointer.
To support garbage collection, we attach to f's body a span that points to a
statically allocated descriptor, which encodes precisely the information in the
preceding paragraph. How this encoding is done is a private matter between the
front end and the garbage collector; even this rather complicated situation is
easily handled with no further support from C--.
19

struct gc_descriptor {
unsigned var_count;
char heap_ptr[1];
};
void gc(void) {
activation a;
FirstActivation(tcb, &a);
for (;;) {
struct gc_descriptor *d = GetDescriptor(&a, GC);
if (d != NULL) {
int i;
for (i = 0; i < d->var_count; i++)
if (d->heap_ptr[i]) {
typedef void *pointer;
pointer *rootp = FindVar(a, i);
if (rootp != NULL) *rootp = gc_forward(*rootp);
/* copying forward, as in Appel, if live */
}
}
if (NextActivation(&a) == NULL)
break;
}
gc_copy(); /* from-space to to-space, as in Appel */
}

Fig. 2. Part of a simple copying garbage collector

5.5 Implementing the C-- run-time interface
Can spans and the C-- run-time interface be implemented eciently? By sketching a possible implementation, we argue that they can. Because the implementation is private to the back end and the back-end run-time system, there is wide
latitude for experimentation. Any technique is acceptable provided it implements
the semantics above at reasonable cost. We argue below that well-understood
techniques do just that.

Implementing spans The span mappings of Section 5.3 take their inspi-

ration from table mappings for exception handling, and the key procedure,
GetDescriptor, can be implemented in similar ways (Chase 1994a). The main
challenge is to build a mapping from object-code locations (possible values of the
program counter) to source-code location ranges (spans). The most common way
is to use tables sorted by program counter. If suitable linker support is available,
tables for di erent tokens can go in di erent sections, and they will automatically be concatenated at link time. Otherwise, tables can be chained together
(or consolidated) by an initialisation procedure called when the program starts.
20

Bottom of stack
Activation handle
Pointer to an
activation record

Activation
record

Pointer to first
callee-save register
Pointer to second
callee-save register
Top of stack

The activation handle points to an activation record, which
may contain values of some local variables. Other local variables may be stored in callee-saves registers, in which case
their values are not saved in the current activation record,
but in the activation records of one or more called procedures.
These activation records can't be determined until run time,
so the stack walker incrementally builds a map of the locations of callee-save registers, by noting the saved locations of
each procedure.

Fig. 3. Walking a stack

Implementing stack walking In our sketch implementation, the call stack is

a contiguous stack of activation records. An activation handle is a static record
consisting of a pointer to an activation record on the stack, together with pointers
to the locations containing the values that the non-volatile registers2 had at the
moment when control left the activation record (Figure 3). FirstActivation
initialises the activation handle to point to the topmost activation record on the
stack and to the private locations in the C-- runtime that hold the values of
registers. Depending on the mechanism used to suspend execution, the runtime
2

The non-volatile registers are those registers whose values are unchanged after return
from a procedure call. They include not only the classic callee-saves registers, but
also registers like the frame pointer, which must be saved and restored but which
aren't always thought of as callee-saves registers.

21

might have values of all registers or only of non-volatile registers, but this detail
is hidden behind the run-time interface. Ramsey (1992) discusses retargetable
stack walking in Chapters 3 and 8.
The run-time system executes only when execution of C-- procedures is suspended. We assume that C-- execution is suspended only at a \safe point."
Broadly speaking, a safe point is a point at which the C-- run-time system is
guaranteed to work; we discuss the details in Section 6.2. For each safe point,
the C-- code generator builds a statically-allocated activation-record descriptor
that gives:

{ The size of the activation record; NextActivation can use this to move to
the next activation record.

{ The liveness of each local variable, and the locations of live variables, indexed

by variable number. The \location" of a live variable might be an o set
within the activation record, or it might be the name of a callee-saves register.
GetVar uses this \location" to nd the address of the true memory location
containing the variable's value, either by computing an address within the
activation record itself, or by returning the address of the location holding
the appropriate callee-saves register, as recorded in the activation handle
(Figure 3).

{ If the safe point is a call site, the locations where the callee is expected to
put results returned from the call.

{ The locations where the caller's callee-saves registers may be found. Again,

these may be locations within the activation record, or they may be this
activation's callee-saves registers. NextActivation uses this information to
update the pointers-to-callee-saves-registers in the activation handle.

The C-- runtime can map activations to descriptors using the same mechanism
it uses to implement the span mappings of Section 5.3. The run-time interface
can cache these descriptors in the activation handle, so the lookup need be done
only when NextActivation is called, i.e., when walking the stack. An alternative
that avoids the lookup is to store a pointer to the descriptor in the code space,
immediately following the call, and for the call to return to the instruction after
the pointer. The SPARC C calling convention uses a similar trick for functions
returning structures (SPARC 1992, Appendix D).
The details of descriptors and mapping of activations to descriptors are important for performance. At issue is the space overhead of storing descriptors
and maps, and the time overhead of nding descriptors that correspond to PCs.
Liskov and Snyder (1979) suggests that sharing descriptors between di erent call
sites has a signi cant impact on performance. Because these details are private
between the back end and the back-end run-time system, we can experiment
with di erent techniques without changing the approach, the run-time interface,
or the front end.
22

6 Re ning the design
The basic idea of providing a run-time interface that allows the state of a suspended C-- computation to be inspected and modi ed seems quite exible and
robust. But working out the detailed application of this idea to a variety of
run-time services, and specifying precisely what the semantics of the resulting
language is, remains challenging. In this section we elaborate some of the details that were not covered in the preceding section, and discuss mechanisms
that support run-time services other than garbage collection. Our design is not
nalised, so this section is somewhat speculative.

6.1 Suspension and introspection
All our intended high-level run-time services must be able to suspend a C-computation, inspect its state, and modify it, before resuming execution.
In many implementations of high-level languages, the run-time system runs on
the same physical stack as the program itself. In such implementations, walking
the stack or unwinding the stack requires a thorough understanding of system
calling conventions, especially if an interrupt can cause a transfer of control
from generated code to the run-time system. We prefer not to expose this implementation technique through the C-- run-time interface, but to take a more
abstract view. The C-- runtime therefore operates as if the generated code and
the run-time system run on separate stacks, as separate threads:

{ The system thread runs on the system stack supplied by the operating system. The front-end run-time system runs in the system thread, and it can
easily inspect and modify the state of the C-- thread.

{ The C-- thread runs on a separate C-- stack. When execution of the C-thread is suspended, the state of the C-- thread is saved in the C-- threadcontrol block, or TCB.

We have to say how a C-- thread is created, and how control is transferred
between the system thread and a C-- thread.

{ The system thread calls InitTCB to create a new thread. In addition to passing the program counter for a C-- procedure without parameters, the system
thread must provide space for a stack on which the thread can execute, as
well as space for a thread-control block.

{ The system thread calls
thread.

Resume

to transfer control to a suspended
23

C--

{ Execution of a C-- thread continues until that thread calls the C-- procedure
yield,

which suspends execution of the C-- thread and causes a return from
the system thread's Resume call. The C-- thread passes a yield code, which
is returned as the result of Resume.

For example, garbage collection can be invoked via a call to yield, when the
allocator runs out of space. Here is how the code might look if allocation takes
place in a single contiguous area, pointed to by a heap pointer hp, and bounded
by heap_limit:
f( bits32 a,b,c ) {
while (hp+12 > heap_limit) {
yield( GC );
/* Need to GC */
}
hp = hp+12;
...
}

It may seem unusual, even undesirable, to speak of two \threads" in a completely
sequential setting. In a more tightly-integrated system it would be more usual
simply to call the garbage collector. But simply making a foreign call to the
garbage collector will not work here. How is the garbage collector to nd the top
of the C-- portion of the stack that it must traverse? What if live variables (such
as a, b, c) are stored in C's callee-saves registers across the call to the garbage
collector? Such complications a ect not only the garbage collector, but any highlevel run-time service that needs to walk the stack. Our two-thread conceptual
model abstracts away from these complications by allowing the system thread
to inspect and modify a tidily frozen C-- thread.
Using \threads" does not imply a high implementation cost. Though we call
them threads, \coroutines" may be a more accurate term. The system thread
never runs concurrently with the C-- thread, and the two can be implemented
by a single operating-system thread.
Another merit of this two-thread view is that it extends smoothly to accommodate multiple C-- threads. Indeed, though it is not the focus of this paper, we
intend that C-- should support many very lightweight threads, in the style of
Concurrent ML (Reppy 1991), Concurrent Haskell (Peyton Jones, Gordon, and
Finne 1996), and many others.

6.2 Safe points
When can the system thread safely take control? We say a program-counter
value within a procedure is a safe point if it is safe to suspend execution of the
procedure at that point, and to inspect and modify its variables. We require the
following precondition for execution in the front-end run-time system:
A C-- thread can be suspended only at a safe point.
24

A call to yield must be a safe point, and because any procedure could call
the code generator must ensure that every call site is a safe point. This
safe point is associated with the state in which the call has been made and
the procedure is suspended awaiting the return. C-- does not guarantee that
every instruction is a safe point; recording local-variable liveness and location
information for every instruction might increase the size of the program by a
signi cant fraction (Stichnoth, Lueh, and Cierniak 1999).
So far we have suggested that a C-- program can only yield control voluntarily,
through a yield call. What happens if an interrupt or fault occurs, transferring
control to the front-end run-time system, and the currently executing C-- procedure is not at a safe point? This may happen if a user deliberately causes an
interrupt, e.g., to request that the stack be unwound or the debugger invoked. It
may happen if a hardware exception (e.g., divide by zero) is to be converted to
a software exception. It may happen in a concurrent program if timer interrupts
are used to pre-empt threads. The answer to the question remains a topic for
research; asynchronous pre-emption is dicult to implement, not only in C-- but
in any system. Chase (1994b) and Shivers, Clark, and McGrath (1999) discuss
some of the problems. One common technique is to ensure that every loop is
cut by a safe point, and to permit an interrupted program to execute until it
reaches a safe point. C-- therefore enables the front end to insert safe points, by
inserting the C-- statement
yield,

safepoint;

6.3 Call-site invariants
In the presence of garbage collection and debugging, calls have an unusual property: live local variables are potentially modi ed by any call. For example, a compacting garbage collector might modify pointers saved across a call. Consider
this function, in which a+8 is a common subexpression:
f( bits32 a ) {
bits32[a+8] = 10;
g( a );
bits32[a+8] = 0;
return;
}

/* put 10 in 32-bit word at address a+8 */
/* put 0

in 32-bit word at address a+8 */

If g invokes the garbage collector, the collector might modify a during the call
to g, so the code generator must recompute a+8 after the call | it would be
unsafe to save a+8 across the call. The same constraint supports a debugger that
might change the values of local variables. Calls may also modify C-- values that
are declared to be allocated on the stack.
A compiler writer might reasonably object to the performance penalty imposed
by this constraint; the back end pays for compacting garbage collection whether
the front end needs it or not. To eliminate this penalty, the front end can
25

mark C-- parameters
invariant, thus:

and variables as invariant across calls, using the keyword

f( invariant bits32 a ) {
invariant bits16 b;
bits32 c;
...
g( a, b, c );
/* "a" and "b" are not modified
by the call, but "c" might be */
...
}

The invariant keyword places an obligation on the front-end run-time system,
not on the caller of f. The keyword constitutes a promise to the C-- compiler
that the value of an invariant variable will not change \unexpectedly" across a
call. The run-time system and debugger may not change the values of invariant
variables.
If variables will not be changed by a debugger, a front end can safely mark nonpointer variables as invariant across calls, and front ends using mostly-copying
collectors (Bartlett 1988; Bartlett 1989a) or non-compacting collectors (Boehm
and Weiser 1988) can safely mark all variables as invariant across calls.

7 Exceptions and other services
In Section 4 we argued that many high-level run-time services share at least
some requirements in common. In general, they all need to suspend a running
C-- thread, and to inspect and modify its state. The spans of Section 5.3 and the
run-time interface of Section 5.2 provide this basic service, but each high-level
run-time service requires extra, special-purpose support. Garbage collection is
enhanced by the invariant annotation of Section 6.3. Exception handling requires rich mechanisms for changing the ow of control. Interrupt-based pro ling
requires the ability to inspect (albeit in a very modest way) the state of a thread
interrupted asynchronously. Debugging requires all of the above, and more besides. We believe that our design can be extended to deal with these situations,
and that many of C--'s capabilities will be used by more than one high-level
service. Here we give an indicative discussion of just one other service, exception
handling.
Making a single back end support a variety of di erent exception-handling mechanisms is signi cantly harder than supporting a variety of garbage collectors, in
part because exceptions alter the control ow of the program. If raising an exception could change the program counter arbitrarily, chaos would ensue; two
di erent program points may hold their live variables in di erent locations, and
they may have di erent ideas about the layout of the activation record and
the contents of callee-saves registers. They may even have di erent ideas about
which variables are alive and which are dead. In other words, unconstrained,
26

dynamic changes in locus of control make life hard for the register allocator and
the optimiser; if the program counter can change arbitrarily, there is no such
thing as dead code, and a variable live anywhere is live everywhere.
Typically, handling an exception involves rst unwinding the stack to the caller
of the current procedure, or its caller, etc., and then directing control to an exception handler. Many of the mechanisms used for garbage collection are also useful
for exception handling; for example, stack walking and spans can be used to nd
exactly which handler should catch a particular exception. But the mechanisms
we have described so far don't allow for changes in control ow. C-- controls
such changes by requiring annotations on procedure calls.
The key idea is that, in the presence of exceptions, a call might return to more
than one location, and every C-- program speci es explicitly all the locations to
which a call could return. In e ect, a call has many possible continuations instead
of just one. When an activation is suspended at a call site, three outcomes are
possible.

{ The call returns normally, and execution continues at the statement following
the call.

{ The call raises an exception that is handled in the activation, so the call
terminates by transferring control to a di erent location in that activation.

{ The call raises an exception that is not handled in the current activation,

so the activation is aborted, and the run-time system transfers control to a
handler in some calling procedure.

C--'s

call-site annotations specify these outcomes in detail.
We are currently re ning a design that supports suitable annotations, plus a
variety of mechanisms for transfer of control (Ramsey and Peyton Jones 1999).
Exception dispatch might unwind the stack one frame at a time, looking for
a handler, or it might use an auxiliary data structure to nd the handler, then
\cut the stack" directly to that handler in constant time. Our design also permits
exception dispatch to be implemented either in the front-end run-time system
or in generated code. The C-- run-time system provides supporting procedures
that can unwind the stack, change the address to which a call returns, and pass
values to exception handlers.

8 Status and conclusions
The core design of C-- is stable, and an implementation based on ML-RISC
is freely available from the authors. This implementation supports the features
described in Section 3, but it does not yet include the span directive or a C-run-time system.
27

Many open questions remain. What small set of mechanisms might support the
entire gamut of high-level language exception semantics? How about the range of
known implementation techniques? Support for debugging is even harder than
dealing with exceptions. Exactly what is the meaning of a breakpoint? How
should breakpoints interact with optimization? What are the primitive \hooks"
required for concurrency support? How should C-- cope with pre-emption?
These questions are not easily answered, but the prize is considerable. Reuse
of code generators is a critically important problem for language implementors.
Code generators embedded in C compilers have been widely reused, but the
nature of C makes it impossible to use the best known implementations of highlevel run-time services like garbage collection, exception handling, debugging,
and concurrency | C imposes a ceiling on reuse.
We hope to break through this ceiling by taking a new approach: design a lowlevel, reusable compiler-target language in tandem with a low-level, reusable
run-time system. Together, C-- and its run-time system should succeed in hiding
machine-dependent details of calling conventions and stack-frame layout. They
should eliminate the distinction between variables living in registers and variables
living on the stack. By doing so, they should

{ Permit sophisticated register allocation, even in the presence of a garbage
collector or debugger.

{ Make the results of liveness analyses available at run time, e.g., to a garbage
collector.

{ Support the best known garbage-collection techniques, and possibly enable
experimentation with new techniques.

Although the details are beyond the scope of this paper, we have some reason to
believe C-- can also support the best known techniques for exception handling,
as well as supporting pro ling, concurrency, and debugging.

Acknowledgements
We thank Xavier Leroy, Simon Marlow, Gopalan Nadathur, Mike O'Donnell,
and Julian Seward for their helpful feedback on earlier drafts of this paper. We
also thank Richard Black, Lal George, Thomas Johnsson, Greg Morrisett, Nikhil,
Olin Shivers, and David Watt for feedback on the design of C--.

28

References
Appel, Andrew W. 1989 (February). Simple generational garbage collection and
fast allocation. Software|Practice & Experience, 19(2):171{183.
Atkinson, Russ, Alan Demers, Carl Hauser, Chnristian Jacobi, Peter Kessler,
and Mark Weiser. 1989 (July). Experiences creating a portable Cedar.
Proceedings of the '89 SIGPLAN Conference on Programming Language
Design and Implementation, SIGPLAN Notices, 24(7):322{329.
Bartlett, Joel F. 1988 (February). Compacting garbage collection with ambiguous roots. Technical Report 88/2, DEC WRL, 100 Hamilton Avenue, Palo
Alto, California 94301.
. 1989a (October). Mostly-copying garbage collection picks up generations
and C++. Technical Report TN-12, DEC WRL, 100 Hamilton Avenue, Palo
Alto, California 94301.
. 1989b. SCHEME to C: A portable Scheme-to-C compiler. Technical
Report RR 89/1, DEC WRL.
Benitez, Manuel E. and Jack W. Davidson. 1988 (July). A portable global optimizer and linker. In ACM Conference on Programming Languages Design
and Implementation (PLDI'88), pages 329{338. ACM.
Benton, Nick, Andrew Kennedy, and George Russell. 1998 (September). Compiling Standard ML to Java bytecodes. In ACM Sigplan International Conference on Functional Programming (ICFP'98), pages 129{140, Balitmore.
Bernstein, Robert L. 1985 (October). Producing good code for the case statement. Software Practice and Experience, 15(10):1021{1024.
Boehm, Hans-Juergen and Mark Weiser. 1988 (September). Garbage collection in an uncooperative environment. Software Practice and Experience,
18(9):807{820.
Chase, David. 1994a (June). Implementation of exception handling, Part I. The
Journal of C Language Translation, 5(4):229{240.
. 1994b (September). Implementation of exception handling, Part II:
Calling conventions, asynchrony, optimizers, and debuggers. The Journal
of C Language Translation, 6(1):20{32.
Clausen, LR and O Danvy. 1998 (April). Compiling proper tail recursion and
rst-class continuations: Scheme on the Java virtual machine. Technical
report, Department of Computer Science, University of Aarhus, BRICS.
Conway, ME. 1958 (October). Proposal for an UNCOL. Communications of the
ACM, 1(10):5{8.
Diwan, A, D Tarditi, and E Moss. 1993 (January). Memory subsystem performance of programs using copying garbage collection. In 21st ACM Symposium on Principles of Programming Languages (POPL'94), pages 1{14.
Charleston: ACM.

Franz, Michael. 1997 (October). Beyond Java: An infrastructure for highperformance mobile code on the World Wide Web. In Lobodzinski, S.
and I. Tomek, editors, Proceedings of WebNet'97, World Conference of the
WWW, Internet, and Intranet, pages 33{38. Association for the Advancement of Computing in Education.
George, Lal. 1996. MLRISC: Customizable and reusable code generators. Unpublished report available from http://www.cs.bell-labs.com/ george/.
Harbison, Samuel P. and Guy L. Steele, Jr. 1995. C: A Reference Manual. fourth
edition. Englewood Cli s, NJ: Prentice Hall.
Henderson, Fergus, Thomas Conway, and Zoltan Somogyi. 1995. Compiling
logic programs to C using GNU C as a portable assembler. In ILPS'95
Postconference Workshop on Sequential Implementation Technologies for
Logic Programming, pages 1{15, Portland, Or.
Liskov, Barbara H. and Alan Snyder. 1979 (November). Exception handling in
CLU. IEEE Transactions on Software Engineering, SE-5(6):546{558.
Macrakis, Stavros. 1993 (January). The Structure of ANDF: Principles and
Examples. Open Systems Foundation.
Novack, Steven, Joseph Hummel, and Alexandru Nicolau. 1995. A Simple Mechanism for Improving the Accuracy and Eciency of Instruction-level Disambiguation, chapter 19. Lecture Notes in Computer Science. Springer Verlag.
Pettersson, M. 1995. Simulating tail calls in C. Technical report, Department
of Computer Science, Linkoping University.
Peyton Jones, Simon L., A. J. Gordon, and S. O. Finne. 1996 (January). Concurrent Haskell. In 23rd ACM Symposium on Principles of Programming
Languages (POPL'96), pages 295{308, St Petersburg Beach, Florida.
Peyton Jones, SL, D Oliva, and T Nordin. 1998. C--: A portable assembly
language. In Proceedings of the 1997 Workshop on Implementing Functional Languages (IFL'97), Lecture Notes in Computer Science, pages 1{19.
Springer Verlag.
Peyton Jones, Simon L. 1992 (April). Implementing lazy functional languages
on stock hardware: The spineless tagless G-machine. Journal of Functional
Programming, 2(2):127{202.
Ramsey, Norman and Simon L. Peyton Jones. 1999. Exceptions need not be
exceptional. Draft available from http://www.cs.virginia.edu/nr.
Ramsey, Norman. 1992 (December). A Retargetable Debugger. PhD thesis,
Princeton University, Department of Computer Science. Also Technical
Report CS-TR-403-92.
Reig, F and SL Peyton Jones. 1998. The C-- manual. Technical report, Department of Computing Science, University of Glasgow.
Reppy, JH. 1991 (June). CML: a higher-order concurrent language. In
ACM Conference on Programming Languages Design and Implementation
(PLDI'91). ACM.

30

Serrano, Manuel and Pierre Weis. 1995 (September). Bigloo: a portable and
optimizing compiler for strict functional languages. In 2nd Static Analysis
Symposium, Lecture Notes in Computer Science, pages 366{381, Glasgow,
Scotland.
Shivers, Olin, James W. Clark, and Roland McGrath. 1999 (September). Atomic
heap transactions and ne-grain interrupts. In ACM Sigplan International
Conference on Functional Programming (ICFP'99), Paris.
SPARC International. 1992. The SPARC Architecture Manual, Version 8. Englewood Cli s, NJ: Prentice Hall.
Stallman, Richard M. 1992 (February). Using and Porting GNU CC (Version
2.0). Free Software Foundation.
Steele, Guy L., Jr. 1978 (May). Rabbit: A compiler for Scheme. Technical Report
AI-TR-474, Arti cial Intelligence Laboratory, MIT, Cambridge, MA.
Stichnoth, JM, G-Y Lueh, and M Cierniak. 1999 (May). Suppport for garbage
collection at every instruction in a Java compiler. In ACM Conference
on Programming Languages Design and Implementation (PLDI'99), pages
118{127, Atlanta.
Taft, Tucker. 1996. Programming the Internet in Ada 95. In Strohmeier, Alfred, editor, 1996 Ada-Europe International Conference on Reliable Software
Technologies, Vol. 1088 of Lecture Notes in Computer Science, pages 1{16,
Berlin. Available through www.appletmagic.com.
Tarditi, David, Anurag Acharya, and Peter Lee. 1992. No assembly required:
compiling Standard ML to C. ACM Letters on Programming Languages
and Systems, 1(2):161{177.
Tarditi, D, G Morrisett, P Cheng, C Stone, R Harper, and P Lee. 1996 (May).
TIL: A type-directed optimizing compiler for ML. In ACM Conference
on Programming Languages Design and Implementation (PLDI'96), pages
181{192. Philadelphia: ACM.
Wakeling, D. 1998 (September). Mobile Haskell: compiling lazy functional languages for the Java virtual machine. In Proceedings of the 10th International
Symposium on Programming Languages, Implementations, Logics and Programs (PLILP'98), Pisa.

31