Flow-Directed Lightweight Closure Conversion
JEFFREY MARK SISKIND
NEC Research Institute, Inc.

This paper presents a lightweight closure-conversion method that is driven by the results of wholeprogram interprocedural flow, reachability, points-to, and escape analyses. The method has been
implemented and evaluated as part of a complete Scheme compiler. When compared with a
baseline closure-conversion method that does no optimization, as well as conventional closureconversion methods that only do optimizations that do not rely on interprocedural analysis, this
method significantly increases the degree of closure, variable-slot, parent-slot, closure-pointer–
slot, variable, parent-parameter, parent-passing, closure-pointer, variable-spilling, and parentspilling elimination. It also significantly increases the degree of parent-slot, closure-pointer–slot,
and parent-parameter compression and reduces the number of indirections per variable reference.
Finally, code produced by this compiler runs significantly faster than code produced by other
state-of-the-art Scheme compilers.
Categories and Subject Descriptors: D.3.3 [Programming Languages]: Language Constructs
and Features—procedures, functions, and subroutines; D.3.4 [Programming Languages]: Processors—
compilers; optimization
General Terms: Experimentation, Languages, Performance
Additional Key Words and Phrases: Closure conversion, compiler construction, global optimization/flow analysis, higher-order functional languages

1.

INTRODUCTION

This paper describes the lightweight closure-conversion process used by Stalin.
Stalin is an aggressively optimizing whole-program compiler for Scheme. Experiments described in section 4 illustrate that code produced by Stalin is regularly several times faster, and often an order of magnitude or two faster, than
code produced by other state-of-the-art Scheme compilers such as Scheme->C,
Gambit-C, Bigloo, and Chez. As also illustrated in section 4, much of this
speedup depends on whole-program analysis and lightweight closure conversion.
While the techniques described in this paper are formulated in terms of Scheme,
and have been implemented and evaluated for Scheme, they should be generally
This research was supported, in part, by a Samuel and Miriam Wein Academic Lectureship and
by the Natural Sciences and Engineering Research Council of Canada. Part of this research was
performed while the author was at the University of Toronto Department of Computer Science,
the Technion Department of Electrical Engineering, and the University of Vermont Department of
Electrical Engineering and Computer Science. I wish to thank Ken Anderson, Robert Cartwright,
Henry Cejtin, Matthias Felleisen, Cormac Flanagan, Suresh Jagannathan, Jacob Katzenelson,
Shriram Krishnamurthi, David McAllester, Richard O’Keefe, Stephen Weeks, and Andrew Wright
for many useful discussions pertaining to the material presented in this paper.
Permission to make digital/hard copy of all or part of this material without fee is granted
provided that the copies are not made or distributed for profit or commercial advantage, the
ACM copyright/server notice, the title of the publication, and its date appear, and notice is given
that copying is by permission of the Association for Computing Machinery, Inc. (ACM). To copy
otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific
permission and/or a fee.

2

·

Jeffrey Mark Siskind

applicable to any lexically-scoped higher-order language, such as Common Lisp,
ml, Haskell, Smalltalk, Dylan, etc.
Stalin is a batch-mode compiler. Unlike traditional interactive Scheme implementations, Stalin does not provide a read-eval-print loop or the ability to eval or
load new code into a running program. Stalin compiles the Scheme source program into a single executable, indirectly via c. The behaviour of this executable is
equivalent to loading the source program into a virgin interactive Scheme implementation and terminating its execution. Any computation results from evaluating
top-level expressions. Thus Stalin is intended more for application delivery and
production research runs than program development.
Stalin performs numerous whole-program analyses and optimizations. First,
it does polyvariant interprocedural flow and reachability analysis. The results
of this analysis support interprocedural points-to, escape, and lifetime analyses,
lightweight closure and CPS conversion, automatic in-lining, region-based storage
management, unboxing, and program-specific and program-point-specific low-level
representation selection and code generation with on-the-fly coercion between different representations. This paper describes only the interprocedural points-to and
escape analyses, automatic in-lining, and lightweight closure conversion. Companion papers [Siskind 2000a; 2000b; 2000c; 2000d] describe the remaining analyses
and optimizations.
The remainder of this paper is organized as follows. Section 2 gives an overview
of the lightweight closure-conversion process used by Stalin by way of comparison
to alternate implementations that do little or no optimization. Section 3 presents
the lightweight closure-conversion process in detail. Section 4 presents experimental
results. Section 5 concludes with a discussion of related work.
2.

OVERVIEW

To understand the optimizations that Stalin performs during closure conversion,
it is useful to compare the lightweight closure-conversion process performed by
Stalin with both a baseline implementation that does no optimization as well as
a conventional implementation1 that does limited optimization. For simplicity, the
baseline, conventional, and lightweight implementations are presented here using a
linked closure representation.2
The baseline implementation has the following characteristics:
(1) Each procedure has a closure.
(2) The closure for each procedure contains a variable slot for each variable bound
by that procedure.
(3) The closure for each procedure contains a parent slot, a pointer to the closure
for the immediate lexically surrounding procedure. The parent slot of the closure
for the top-level procedure contains a null pointer.
1 Throughout

this paper, I use the term conventional implementation to refer to optimizations
that can be performed without interprocedural analysis, regardless of whether any actual implementation performs any or all of these optimizations.
2 The lightweight closure-conversion process used by Stalin is not particular to a linked closure
representation. It can be applied to display and flat closure representations as well.

Flow-Directed Lightweight Closure Conversion

·

3

(4) A procedure object contains a closure-pointer slot, a pointer to the closure
for the immediate lexically surrounding procedure. The closure-pointer slot for the
top-level procedure contains a null pointer.
(5) A procedure object contains a code-pointer slot, a pointer to the code object
for that procedure.
(6) The code object for a procedure has a variable parameter for each variable
bound by that procedure.
(7) The code object for a procedure has a parent parameter, a pointer to the
closure for the immediate lexically surrounding procedure. The parent parameter
for the top-level procedure will be bound to the null pointer.
(8) Procedure calls indirect to the code object pointed to by the target procedure
object.
(9) Variable passing: A procedure call passes each argument to its corresponding
variable parameter in the code object.
(10) Parent passing: A procedure call passes the closure pointer of the target
procedure object to the parent parameter of the code object.
(11) Each code object contains a closure pointer, a local variable that holds a
pointer to its closure. Each code object begins with a preamble that allocates a
closure and stores a pointer to that closure in the closure pointer.
(12) Variable spilling: The preamble spills each variable parameter into the corresponding variable slot of the closure.
(13) Parent spilling: The preamble spills the parent parameter into the parent
slot of the closure.
(14) Variables are referenced3 indirectly via the closure pointer.
The baseline, conventional, and lightweight implementations differ in the following
ways:
2.1

Direct procedure calls

Baseline and conventional implementations generate code like the following for a
lambda expression (lambda (x1) . . .):
t1.tag = ONE_ARGUMENT_PROCEDURE;
t1.closure = closure;
t1.code = &p47;
OBJECT p47(CLOSURE parent, OBJECT x1)
{ CLOSURE closure;
.
.
.
/* slots: CLOSURE parent; OBJECT x1; */
closure = allocate_closure47();
closure.parent = parent;
closure.x1 = x1;
. . .}
3 Throughout

this paper, I use the term reference to mean access or assignment.

4

·

Jeffrey Mark Siskind

and code like the following for a procedure call (e1 e2 ):
t3
t4
if
t5

= e1 ;
= e2 ;
(t3.tag!=ONE_ARGUMENT_PROCEDURE) error();
= *(t3.code)(t3.closure, t4);

Here, a procedure object p has three slots: its type tag p.tag, its closure-pointer slot
p.closure, and its code-pointer slot p.code.4 The type tag of a procedure object
encodes its arity and a call to a procedure indirects through its code-pointer slot. In
contrast, Stalin always generates direct procedure calls. This is possible because
Stalin does whole-program interprocedural flow analysis and can determine all
potential targets of each call site. Stalin uses the type tag of a procedure object
to encode the identity of the procedure, rather than its arity, and dispatches on the
type tag to a collection of direct procedure calls, one for each potential target. Thus,
for example, Stalin generates code like the following5 for a lambda expression
(lambda (x1) . . .):
t1.tag = PROCEDURE47;
t1.closure = closure;
OBJECT p47(CLOSURE parent, OBJECT x1)
{ CLOSURE closure;
.
.
.
/* slots: CLOSURE parent; OBJECT x1; */
closure = allocate_closure47();
closure.parent = parent;
closure.x1 = x1;
. . .}
and code like the following for a procedure call (e1 e2 ), when e1 can take on several
potential objects, including the procedures 47 and 63:

4 Throughout

this paper, I am deliberately vague as to whether objects are boxed or unboxed and
uniformly use o.s to denote a reference to slot s of object o rather than distinguishing between
o.s and o->s. Furthermore, when objects are boxed, I am also deliberately vague as to whether
slots, like the type tag, are represented in the object or its handle, and whether they are somehow
compressed or encoded, say, in unused addresses or address bits in the handle. See Siskind [2000c]
for a discussion of such representation issues.
5 The actual c code generated by Stalin is fully typed. Each procedure has a corresponding
closure of a distinct c type and parent slots, closure-pointer slots, parent parameters, and closure
pointers are all typed appropriately to the type of closure that they contain. c variables and slots
that can hold objects of multiple types, such as t3 in this example, are represented as c unions.
The code generated by Stalin contains the appropriate casts at each reference. For simplicity,
throughout this paper, all such type declarations and casts are eliminated.

Flow-Directed Lightweight Closure Conversion

·

5

t3 = e1 ;
t4 = e2 ;
switch (t3.tag)
{ case PROCEDURE47: t5 = p47(t3.closure, t4);
case PROCEDURE63: t5 = p63(t3.closure, t4);
.
.
.
default: error();}
The default branch of the dispatch is eliminated when flow analysis determines that
the target can only be a procedure of the correct arity. Furthermore, the dispatch
itself is eliminated when flow analysis determines that there is only one potential
target.6 This yields code like the following:
t3 = e1 ;
t4 = e2 ;
t5 = p47(t3.closure, t4);
Using direct procedure calls, and encoding the identity of the target procedure in
the type tag, allows Stalin to eliminate the code-pointer slot of procedure objects.
2.2

Parent-slot, closure-pointer–slot, parent-parameter, parent-passing, and parentspilling elimination

The baseline implementation has a parent slot, a closure-pointer slot, a parent
parameter, parent passing, and parent spilling for every procedure. Conventional
implementations eliminate the parent slot, closure-pointer slot, parent parameter,
parent passing, and parent spilling for the top-level procedure because the pointers
are always null. Conventional implementations also eliminate the parent slot of
a closure if that slot is never accessed. Stalin carries this optimization further,
eliminating the closure-pointer–slot, parent-parameter, parent-passing, and parentspilling for a procedure when the parent parameter is never accessed. Situations
where the parent parameter is never accessed occur even more frequently in Stalin
than in conventional implementations because parent parameters are used primarily
to access variable slots of closures of lexically surrounding procedures and Stalin
performs more aggressive variable and variable-slot elimination, as described below.
Furthermore, Stalin is able to perform more aggressive parent-slot elimination
than conventional implementations for the same reason. Experiments reported in
section 4 show that, in practice, almost all parent slots, closure-pointer slots, parent
parameters, parent passing, and parent spilling are eliminated. Closure-pointer–
slot, parent-parameter, parent-passing, and parent-spilling elimination creates a
complication. Since some procedures have closure-pointer slots, parent parameters,
and parent passing, while others do not, the code generated for a procedure call
will vary depending on the potential targets. This is possible both because flow
analysis determines the potential targets and because the use of dispatch to direct
procedure calls allows a target-specific call sequence to be generated when there is
more than one target. For example, Stalin generates code like the following for a
lambda expression (lambda (x1) . . .) that has a parent parameter:
6 When

flow analysis determines that the target cannot be a procedure of the correct arity, the
dispatch is eliminated and a direct call to the error handler is generated.

6

·

Jeffrey Mark Siskind

t1.tag = PROCEDURE47;
t1.closure = closure;

1

OBJECT p47( CLOSURE parent,
{ CLOSURE closure;
.
.
.

2

OBJECT x1)

3

/* slots: CLOSURE parent; OBJECT x1; */
closure = allocate_closure47();
closure.parent = parent;
closure.x1 = x1;
. . .}

4

code like the following for a lambda expression (lambda (x2) . . .) that does not
have a parent parameter:
t2.tag = PROCEDURE63;
OBJECT p63(OBJECT x2)
{ CLOSURE closure;
.
.
.
/* slots: OBJECT x2; */
closure = allocate_closure63();
closure.x2 = x2;
. . .}
and code like the following for a procedure call (e1 e2 ), where the potential targets
include procedure 47, which has a parent parameter, and procedure 63, which does
not:
t3 = e1 ;
t4 = e2 ;
switch (t3.tag)
{ case PROCEDURE47: t5 = p47( t3.closure,
case PROCEDURE63: t5 = p63(t4);
.
.
.

5

t4);

default: error();}
Note that the parent slot (box 3 above), closure-pointer slot (box 1 above), parent
parameter (box 2 above), and parent spilling (box 4 above) that are present in
procedure 47 are absent from procedure 63. Also note that the calling sequence
for procedure 47 contains parent passing (box 5 above) which is absent from the
calling sequence for procedure 63. Such per-target differentiation is possible because
multiple-target calls generate dispatches to direct procedure calls.
2.3

Eliminating indirection through the closure pointer

The baseline implementation always references variables indirectly through the closure pointer. Conventional implementations eliminate such indirection in two situ-

Flow-Directed Lightweight Closure Conversion

·

7

ations. First, because closure.parent must alias parent, a free variable reference
can proceed via the parent parameter instead of via the closure pointer. The baseline implementation would generate the following code:
closure. parent. . . . .parent .x
|
{z
}
n

for a free reference to a variable x that is bound n levels up while a conventional
implementation would generate the following code:
parent. . . . .parent .x
|
{z
}
n

instead. Second, if a variable is never assigned, a bound access can proceed via its
variable parameter rather than via the closure pointer. The baseline implementation would generate closure.x to access the local variable x while a conventional
implementation would simply generate x. Stalin applies this optimization in a
wider set of circumstances because it uses a much more complex notion of what
constitutes a free vs. bound reference, as described below.
2.4

Eliminating fictitious variables

The baseline implementation has a slot, a parameter, passing, and spilling for each
variable. The slot, parameter, passing, and spilling for a variable can be eliminated
when a variable can hold only a single concrete object. Such a variable is called fictitious. Stalin approximates the fictitious property, with the results of flow analysis,
as any variable that can hold only a single abstract object where that abstract object contains a single concrete object. The abstract interpretation used by Stalin
during flow analysis treats (), #t, #f, the end-of-file object, and procedures created by a given lambda expression as distinct abstract objects. All of these except
abstract procedures always contain a single concrete object. Abstract procedures
contain single concrete procedures when the procedure has no closure-pointer slot,
due to closure-pointer–slot elimination, as described below. Furthermore, the concrete aggregate objects in abstract aggregate objects such as pairs, strings, vectors,
symbols, continuations, and procedures are indistinguishable when their identity
is not important and the components of those aggregate objects are fictitious or
unaccessed. Such a collection of indistinguishable concrete objects can be treated
as a single concrete object. Finally, continuations can be indistinguishable in certain circumstances that require a must-alias property to hold. The method used
by Stalin to approximate this must-alias property is the crux of the lightweight
closure-conversion process and is described in detail in section 3.12. Experiments
reported in section 4 show that closure-pointer–slot elimination, in practice, allows
most procedures to be fictitious and the variables holding such procedures to be
eliminated. Not only does the notion of fictitious variables affect code generation,
by eliminating the slots, parameters, passing, and spilling for such variables, as well
as references to such variables, it also impacts the lightweight closure-conversion
process itself. Free references to fictitious variables are ignored, allowing a greater
degree of parent-slot, closure-pointer–slot, parent-parameter, parent-passing, and
parent-spilling elimination.

8

·

2.5

Eliminating variables that aren’t accessed

Jeffrey Mark Siskind

The baseline implementation has a slot, a parameter, passing, and spilling for each
variable. Conventional implementations eliminate the slot, parameter, passing,
and spilling for a non-global variable that is never accessed. Stalin carries this
optimization further. First, Stalin eliminates all unaccessed variables, not just
non-global ones. Second, only reached, non-fictitious accesses, as determined by
flow and reachability analysis, count as accesses. An access might be fictitious,
even if the variable itself is not fictitious, if the access is in the consequent or
alternate of a conditional. For example, an access to x in e1 in
(if (eq? x ’a) e1 e2 )
would be fictitious even if x itself were not. Third, as for the case of eliminating fictitious variables, this optimization not only affects code generation, by eliminating
the slots, parameters, passing, and spilling for such variables, as well as assignments to such variables, it also impacts the lightweight closure-conversion process
itself. Free assignments to unaccessed variables are ignored, allowing a greater
degree of parent-slot, closure-pointer–slot, parent-parameter, parent-passing, and
parent-spilling elimination.
2.6

Variable-slot elimination

The baseline implementation has a slot and spill for each variable. Furthermore,
all variable references are to variable slots via the closure pointer. Conventional
implementations eliminate the slot and spill for a variable that is never freely referenced, compiling references to such a variable to the parameter instead of the
slot. Stalin carries this optimization further. First, only reached free references,
as determined by flow and reachability analysis, count as free references. Second,
only non-fictitious free accesses, as determined by flow analysis, count as free accesses. Third, free assignments to hidden variables don’t count as free assignments.
Hiding will be described below. Fourth, slots are eliminated for globalized variables. Globalization will be described below. Fifth, slots and spills are eliminated
for hidden variables. Sixth, the slot and spill for a variable can be eliminated, and
references to such a variable can be compiled to the parameter instead of the slot, if
two conditions are met. First, all references to the variable must be from within the
c function that binds that variable. With the current Stalin code generator, this
happens when the variable reference is in-lined in the same procedure where the
variable is bound. Second, at every reference to the variable, the slot that would be
accessed by the baseline implementation must alias the corresponding parameter
in the most recent active invocation of the procedure that binds that variable. The
method used by Stalin to approximate this must-alias property is the crux of the
lightweight closure-conversion process and is described in detail in section 3.12.
2.7

Closure elimination, closure-pointer elimination, and parent-slot compression

The baseline implementation has a closure and closure pointer for every procedure.
And the parent slot for a procedure always points to the closure for the immediate
lexically surrounding procedure (or is null if there is none). Conventional implementations eliminate a closure, and the corresponding closure pointer, when all of its
variable slots are eliminated. Conventional implementations also perform parent-

Flow-Directed Lightweight Closure Conversion

·

9

slot compression: whenever the parent slot of one closure would point to another
closure but is only used to access the parent slot of the later closure, the parent
slot of the former closure can point to the closure that the parent slot of the later
closure points to. Parent-slot compression is necessary when a closure is eliminated,
since all parent slots of other closures that would point to an eliminated closure
must instead point to the closure that the parent slot of the eliminated closure
would have pointed to or be eliminated if the parent slot of the eliminated closure
was eliminated. Parent-slot compression reduces indirection in variable reference.
Stalin performs more aggressive closure elimination, closure-pointer elimination,
and parent-slot compression than conventional implementations because these optimizations are driven by variable and variable-slot elimination and Stalin performs
more aggressive variable and variable-slot elimination. Experiments reported in
section 4 show that, in practice, almost all closures and closure pointers are eliminated. When they are not, the number of indirections needed to reference free
variables is almost always reduced to very low levels.
2.8

Closure-pointer–slot and parent-parameter compression

In the baseline implementation, the closure-pointer slot and parent parameter for
a procedure always point to the closure for the immediate lexically surrounding
procedure (or are null if there is none). Conventional implementations perform
closure-pointer–slot and parent-parameter compression: whenever the parent parameter would point to a closure but is only used to access the parent slot of
that closure, the closure-point-slot and parameter parameter can point to the closure that the parent slot of the original closure points to. Closure-pointer–slot and
parent-parameter compression reduce indirection in variable reference. Stalin performs more aggressive closure-pointer–slot and parent-parameter compression than
conventional implementations because these optimizations are driven by variable
and variable-slot elimination and Stalin performs more aggressive variable and
variable-slot elimination. Experiments reported in section 4 show that, in practice, the number of indirections needed to reference free variables is almost always
reduced to very low levels.
2.9

Globalization

The baseline implementation generates slots in closures for all variables. Conventional implementations treat variables that are bound at the top level specially,
compiling them as global variables rather than as slots of a closure. Stalin generalizes this optimization. A variable can be globalized when it can have at most one
live instance. Stalin approximates this one-live-instance property. A variable has
at most one live instance if the procedure that binds that variable is not called more
than once. And a procedure p is not called more than once if all procedures that can
call p, either directly or indirectly, have a single reached call site. A variable also
has at most one live instance if the procedure that binds that variable does not call
itself directly or indirectly through a non-tail call and all accesses to that variable
must alias the most recently created instance. The method used by Stalin to approximate this must-alias property is the crux of the lightweight closure-conversion
process and is described in detail in section 3.12. Globalization impacts code generation. Globalized variables need a form of spilling but do not need parameters

10

·

Jeffrey Mark Siskind

or passing. Furthermore, globalized variables are referenced without indirection.
Globalization also impacts the lightweight closure-conversion process itself. Since
globalized variables do not require variable slots and are referenced without access
to parent parameters or parent slots, globalization allows more aggressive parentslot, closure-pointer-slot, and parent parameter compression and closure, parentslot, closure-pointer–slot, parent-parameter, parent-passing, closure-pointer, and
parent-spilling elimination.
2.10

Hiding

The situation often arises where a variable slot always points to the closure containing that slot. Such a situation arises in the following example:
(define (f x)
(define (g y) . . . x . . .)
(define (h z) . . . x . . .)
. . .)
assuming that there are no assignments to the variables g and h (other than those
implicit in the definitions). Here, f requires a closure because x is freely referenced.
And g and h both take the closure for f as their parent parameter to allow them
to reference x. Thus the procedure objects for g and h will have the closure for
f as their closure-pointer slot. Furthermore, the variables slots for g and h in the
closure for f will always contain the procedure objects for g and h respectively.
Since Stalin eliminates code-pointer slots of procedure objects, and since the type
tags for the slots g and h can be eliminated because they contain known procedures,
the only information in the variable slots for g and h is their closure-pointer slot.
But these will always point to the closure that contains those slots. So such variable
slots, as well as the corresponding parameters, passing, and spilling, are redundant
and can be eliminated as hidden. A conventional implementation would compile a
bound access to a variable x as x and a free access to a variable x that is bound
n levels up as:
parent. . . . .parent .x
|
{z
}
n

When x is hidden, Stalin compiles a bound access to x as an access to the closure
pointer closure and a free access to x that is bound n levels up as:
parent. . . . .parent
|
{z
}
n

Stalin generalizes the hiding optimization even further. A variable is hidden not
only when it must point to the closure that contains its slot, but also when it must
point to a specific closure for a procedure that lexically surrounds the procedure
whose closure contains that slot. In this situation, if x is a variable that must point
to a closure m levels up, Stalin compiles a bound access to x as:
parent. . . . .parent
|
{z
}
m

Flow-Directed Lightweight Closure Conversion

·

11

and a free access to x that is bound n levels up as:
parent. . . . .parent
|
{z
}
m+n

Sound application of this optimization requires that the variable slot to be hidden
must point to the correct ancestor closure. The method used by Stalin to approximate this must-alias property is the crux of the lightweight closure-conversion
process and is described in detail in section 3.12. Hiding impacts code generation.
Hidden variables do not need slots, parameters, passing, or spilling. Hiding also
impacts the lightweight closure-conversion process itself. Since hidden variables do
not require variable slots, are never assigned, and are accessed by accessing a widerscope closure than the closure of the procedure that binds that variable, hiding allows more aggressive parent-slot, closure-pointer–slot, and parent-parameter compression and closure, parent-slot, closure-pointer–slot, parent-parameter, parentpassing, closure-pointer, and parent-spilling elimination.
3.

LIGHTWEIGHT CLOSURE CONVERSION

Closure conversion is often formulated as a source-to-source transformation, often
as a series of small transformations. In contrast, the approach to lightweight closure
conversion taken in this paper is to view closure conversion as an analysis phase
that annotates the source program with certain properties and relations followed by
a code-generation phase that uses those annotations to control aspects of the code
being generated. In this case, the code generator itself is rather straightforward.
The annotations control the presence or absence of fragments of the procedure call
and entry sequences: variable parameters and slots, parent parameters and slots,
closures, closure pointers, and closure-pointer slots, variable and parent spilling,
and variable and parent passing. The annotations also control the code generated
for variable references. In contrast, the analysis phase is rather complex. This
section presents that analysis phase.
Because the analysis phase is complex, I will start with an overview. The input to
the analysis phase is the abstract-syntax tree of the program. The output consists
of the following annotations:
Local(x)
Global(x)
Hidden(x)
Slotted(x)
HasClosure(p)
HasParentSlot(p)
ParentSlot(p)
HasParentParameter(p)
ParentParameter(p)

x will be allocated as a local variable.
x will be allocated as a global variable.
x will be allocated as a hidden closure slot.
x will be allocated as a closure slot.
p will have a closure and closure pointer.
p will have a parent slot.
The parent slot for p will point to the closure
for ParentSlot(p).
p will have a parent parameter and closurepointer slot.
The parent parameter and closure-pointer slot
for p will point to the closure for
ParentParameter(p).

The first four annotations are properties that apply to variables. The remaining five

12

·

Jeffrey Mark Siskind

annotations apply to abstract procedures. At most one of the first four properties
will be true of any given variable. That property indicates how that variable is represented. If none of these four properties are true of some variable, then that variable
is eliminated and not explicitly represented at run time. This can happen either
because the variable is not accessed or because the variable holds a compile-time determinable object. As for the five abstract-procedure annotations, three are properties and two are functions. The properties HasClosure(p), HasParentSlot(p),
and HasParentParameter(p) indicate whether closure, parent-slot, or closurepointer–slot and parent-parameter elimination apply to a given abstract procedure p. ParentSlot(p) and ParentParameter(p) are functional annotations
that indicate parent-slot and closure-pointer-slot or parent-parameter compression.
If a given abstract procedure p has a parent slot, then ParentSlot(p) indicates
which abstract procedure creates the closure that the parent slot of p points to.
Likewise, if a given abstract procedure p has a closure-pointer–slot or parent parameter, then ParentParameter(p) indicates which abstract procedure creates
the closure that the closure-pointer–slot or parent parameter of p points to.
Before producing the ultimate annotations that drive the closure-conversion aspects of code generation from the undecorated abstract-syntax tree, Stalin produces a number of intermediate annotations. First, Stalin performs flow analysis
(described in section 3.2). This determines the potential values of expressions, variables, and other locations. As part of flow analysis, Stalin performs reachability
analysis (described in section 3.3). This determines which expressions are reached
and which variables are accessed and assigned. Among other things, this allows
Stalin to ignore unreached accesses when deciding whether a variable is accessed,
to ignore unreached references when deciding whether a variable is freely referenced,
and to eliminate unaccessed variables. Next, Stalin determines which components
of aggregate objects are accessed and assigned (described in section 3.4). This allows Stalin to eliminate unaccessed components of aggregate objects, and in turn,
to eliminate an aggregate object itself when all of its components are eliminated
and the identity of the object is unimportant. Next, Stalin computes the call
graph (described in section 3.5). Because all call sites are higher-order in Scheme,
this requires the output of flow analysis to do effectively. Next, Stalin uses the
call graph to determine which abstract procedures are not called more than once
(described in section 3.6). This information supports globalization of variables,
since variables can be globalized when they have only one live instance and this is
trivially true when the abstract procedure that binds a variable is not called more
than once. Next, Stalin computes the free variables for each abstract procedure
(described in section 3.7). This computation uses the results of flow and reachability analysis to ignore unreached references. The free-variable computation is used,
in part, to support variable-slot elimination. Next, Stalin does points-to analysis,
driven by the output of flow analysis, to determine which objects can point to other
objects (described in section 3.8). Points-to analysis supports escape analysis (described in section 3.9) which determines which objects can be accessed after their
creating procedure returns. Escape analysis is used to support must-alias analysis
which in turn supports variable-slot elimination, globalization, hiding, and determining when continuations are fictitious. Next, Stalin determines which abstract
procedures to in-line (described in section 3.10). In-lining decisions are driven by

Flow-Directed Lightweight Closure Conversion

·

13

the results of flow analysis and are used to support variable-slot elimination. Finally, Stalin determines which procedures are reentrant (described in section 3.11).
This is important since variables that are bound by reentrant procedures cannot
be globalized because they can have more that one live instance.
Stalin makes a single pass through the above analyses. Since each analysis
depends on prior analyses, the order in which the analyses can be performed is
fairly tightly constrained. But there are no circularities in these analyses. Thus a
single pass suffices.7
The above analyses are precursors to the analyses that are directly associated
with lightweight closure conversion. In fact, the above analyses support other analyses and optimizations that Stalin performs in addition to lightweight closure
conversion. After performing the above analyses, Stalin performs the analyses to
directly support lightweight closure conversion. First, Stalin performs must-alias
analysis (described in section 3.12). This is used to justify variable-slot elimination, globalization, and hiding and to determine when continuations are fictitious.
Next, Stalin determines which abstract locations are fictitious, i.e. always contain
the same compile-time determinable concrete object (described in section 3.13).
Next, Stalin determines which variable references are trivial and can be ignored
(described in section 3.14). Next, Stalin determines which variable slots can be
eliminated (described in section 3.15). Next, Stalin determines which variables
can be globalized (described in section 3.16). Next, Stalin determines the ancestor
relation (described in section 3.17). Next, Stalin determines which variables can
be hidden (described in section 3.18). Finally, Stalin completes the lightweight
closure-conversion process by determining the representation for each variable and
the static backchain for each abstract procedure (described in section 3.19). Unlike
the analyses in sections 3.2 through 3.11, the analyses in sections 3.12 through 3.19
are circular, requiring the computation of a least fixpoint.
Because the analyses described in this section are fairly complex and numerous,
I have adopted several conventions in terminology and notation. I often define
properties or relations that apply to expressions e or expression invocations ê. In
all such cases, the property or relation is intended to apply to e or ê itself, not any
subexpressions of e or ê and not any expressions in procedures that might be called
by e or ê.
I often define a base relation along with its transitive and reflexive-transitive
variants. When doing so, the symbol for the transitive variant will contain a ‘+’
symbol, while the symbol for the reflexive-transitive variant will contain a ‘∗’ symbol. Often, the + or ∗ symbol will be overlayed on the symbol for the base relation,
as in +
◦ or ◦∗ respectively. This is to distinguish between an inherently transitive
or reflexive-transitive relation, and ◦+ or ◦∗ , which I use to denote the transitive
or reflexive-transitive closures of the ◦ relation respectively. The reason for such
distinction is that ◦+ or ◦∗ might be less precise than +
◦ or ◦∗ respectively.
I often give English names to relations. When doing so, the English name for
7 Actually,

in one little way, flow analysis depends on the fictitious property which is computed
after flow analysis. (eq? e1 e2 ) can be determined not to yield #f when β(e1 ) and β(e2 ) each
contain the same single fictitious abstract object. Stalin eliminates the circularity in this case by
using a weaker noncircular approximation to the fictitious property during flow analysis.

14

·

Jeffrey Mark Siskind

the base relation will contain the term ‘direct(ly)’ while the English name for the
transitive variant will contain the term ‘proper(ly).’ The English name for the
reflexive-transitive variant will not contain any special term.
I often overload the notation for properties or relations to refer to both concrete
properties or relations, i.e. ones that apply to concrete entities, and abstract properties or relations, i.e. ones that apply to abstract entities. (As will be described
later, abstract entities are taken to be sets of concrete entitites.) Associated with
each such pair of properties or relations is an abstraction lemma which states that
the abstract property or relation holds between some abstract entities if the concrete property or relation holds between some concrete entities in those abstract
entities.8 Note that the converse might not be true. For example, suppose that k1
and k2 are concrete entities in the abstract entity σ1 and that k3 and k4 are concrete
entities in the abstract entity σ2 . And suppose that the concrete relations k1 ◦ k3
and k2 ◦ k4 hold. Then, by the abstraction lemma, the abstract relation σ1 ◦ σ2
holds even though the concrete relations k1 ◦ k4 and k2 ◦ k3 might not hold. Thus
abstraction can introduce a degree of approximation.
I often define a pair of properties or relations: the first being an underlying property or relation with the second being an approximation to the first. The need for
approximations follows from the fact that the underlying properties and relations
are often undecidable. Underlying properties and relations, as well as syntactic
properties and relations that do not need approximations, are written without an
overbar. In contrast, a property or relation written with an overbar, as in ◦, denotes an approximation to the corresponding underlying property or relation ◦.
The soundness of the analysis, and thus the soundness of the lightweight closureconversion process in general, follows from the fact that the approximations are
conservative. Associated with each approximation is a conservative approximation
lemma which states that the approximation holds between some abstract entities
if the underlying property or relation holds between those abstract entities.9 Note
that it is trivially possible to derive sound approximations by taking ◦ to be always true. This corresponds to disabling all optimization. On the other hand, it
is not possible to derive a tight approximation because the underlying relations
are undecidable. Thus it is always possible to derive tighter approximations. The
approximations presented in this paper are a good compromise in that they empirically yield good results, yet can be computed efficiently.
Most of the analyses described in this paper are presented in three stages: concrete properties or relations, followed by abstract properties or relations, followed,
in turn, by conservative approximations to the abstract properties or relations. Ac8 In

some cases, particularly for the fictitious property described in section 3.13, the polarity is
reversed. In such cases, the abstraction lemma states that the abstract property or relation does
not hold between some abstract entitites if the concrete property or relation does not hold between
some concrete entitites in those abstract entitites.
9 In some cases, particularly for the must-alias, fictitious, localizable, globalizable, and hideable
properties described in sections 3.12, 3.13, 3.15, 3.16, and 3.18 respectively, the polarity is reversed.
In such cases, the conservative approximation lemma states that approximation does not hold
between some abstract entities if the underlying property or relation does not hold between those
abstract entities. In such cases, it is trivially possible to derive sound approximations by taking ◦
to be always false.

Flow-Directed Lightweight Closure Conversion
e

⇒
|
|
|
|
|
|

(quote k)
x
(set! x e)
(e0 e1 . . . en )
(q e1 . . . en )
(lambda (x1 . . . xn ) e)
(if e0 e1 e2 )

Fig. 1.

·

15

constant
access
assignment
call
primcall
lambda
conditional

The Stalin abstract syntax.

cordingly, most of the analyses have two sources of imprecision: the abstraction
lemma mapping the concrete properties or relations to abstract properties or relations and the conservative approximation lemma mapping abstract properties or
relations to approximations of those properties or relations.
Because the analysis leading to lightweight closure conversion is complex and
involves numerous intermediate program annotations, the terminology and notation
used in the remainder of this section may appear to be quite cumbersome. To help
guide the reader, I have summarized the terminology and notation in a glossary in
appendix A.
3.1

Preprocessing

Stalin prepends a standard prelude to every source program that defines the builtin
procedures from R4RS [Clinger and Rees 1991]. Stalin then macro expands this
extended source program and converts it into an abstract-syntax tree that is essentially of the form specified in figure 1. This macro expansion is done using
essentially the same rewrite rules as given in the appendix of R4RS. The result of
this macro expansion is a single closed lambda expression that contains the entire
source program, including the standard prelude. Running the program corresponds
to evaluating this lambda expression to yield a procedure and calling this procedure. All subsequent analyses are performed on the abstract-syntax tree and take
the form of computing properties of, and relations between, nodes in this tree and
abstract objects and locations.
In the abstract syntax, I refer to the subexpression of an assignment as its source,
the first subexpression of a call or primcall as its callee, any remaining subexpressions of a call or primcall as its arguments, any variables of a lambda expression
as its parameters, the number of arguments of a call or primcall or the number
of parameters of a lambda expression as its arity, the subexpression of a lambda
expression as its body, and the subexpressions of a conditional as its antecedent,
consequent, and alternate respectively. Note that the macro-expansion process produces abstract-syntax trees where the body of each lambda expression consists of
a single expression and the alternate of a conditional is always present.
Throughout this paper, I use the symbol x to denote variables, the symbol q
to denote primitives, the symbol e to denote expressions, the symbol p to denote
lambda expressions, and the symbol p0 to denote the top-level lambda expression
that contains the entire source program including the standard prelude. Furthermore, I use the symbols X, E, A, S, C, Ci , R, Ri , and P to denote the set of
all variables, expressions, accesses, assignments, calls, calls of arity i, primcalls,
primcalls of arity i, and lambda expressions in the source program respectively.

16

·

Jeffrey Mark Siskind

pair?
set-car!
eq?
string->symbol
exact?
>
positive?
+
quotient
truncate
sin
acos
exact->inexact
integer->char
string-length
make-vector
vector-set!
input-port?
close-input-port
eof-object?

Table I. The Stalin primitives.
cons
car
set-cdr!
not
null?
symbol?
number?
real?
inexact?
=
<=
>=
negative?
max
*
remainder
floor
round
exp
cos
tan
atan
sqrt
inexact->exact
char?
string?
make-string
string-ref
string-set!
vector
vector-length
procedure?
apply
output-port?
open-input-file
close-output-port
read-char
char-ready?
write-char

cdr
boolean?
symbol->string
integer?
<
zero?
min
/
ceiling
log
asin
expt
char->integer
string
vector?
vector-ref
call/cc
open-output-file
peak-char

Throughout this paper, I use equality between expressions or between variables
to denote intensional rather than extensional equality, i.e. eq? rather than equal?.
This also affects notions derived from equality such as set membership. Thus I use
e1 = e2 to mean that e1 and e2 denote the same expression in the source program,
not that they have the same form. Similarly, I use x1 = x2 to mean that x1 and x2
denote the same variable in the source program, not that they have the same name.
This obviates the need for alpha renaming and the complexity of expression indices
as used by Steckler and Wand [1997].
Table I lists essentially the primitives available in Stalin.10 Note that primitives
are not first-class objects. A primitive can only appear as the first subexpression
of a primcall. Primitives are taken to be disjoint from variables. This allows
distinguishing primcalls from calls by whether or not the first subexpression is
a primitive. The names of primitives are intentionally similar to the names of
some builtin R4RS procedures because they implement the same functionality.
User programs, however, never contain primcalls. Instead, they call procedures
defined in the standard prelude which contain the corresponding primcalls. This
is essentially η expansion. As an optimization, Stalin replaces certain calls in the
user program that are known to be to certain procedures that are defined in the
standard prelude with the corresponding primcalls, thus performing η reduction.
Since the analyses described in this paper are formulated independently of this
optimization, it will be safely ignored.
For the sake of expository simplicity, the abstract syntax presented in figure 1,
as well as all of the analyses presented in this paper, make three crucial simplifications. First, the primitive apply is not supported. Second, variable-arity procedures
(i.e. ‘rest’ arguments) are not supported. Third, the primitive call/cc cannot be
passed a continuation as the first argument. The actual implementation does not
10 Note

that Stalin does not currently implement complex and rational numbers.

Flow-Directed Lightweight Closure Conversion

·

17

impose these restrictions.
Throughout this paper, I use the following functions applied to nodes in the
abstract-syntax tree:
Source(e)
x(e)
Callee(e)
Arguments(e)
Argumenti (e)
Parameteri (e)
Arity(e)
Body(e)
p(x)
p(e)

Denotes the source subexpression of an assignment e.
Denotes the variable in an access or assignment e.
Denotes the callee subexpression of a call or primcall e.
Denotes the set of argument subexpressions of a call or
primcall e.
Denotes the ith argument subexpression of a call or
primcall e.
Denotes the ith parameter of a lambda expression e.
Denotes the arity of a call, primcall, or lambda
expression e.
Denotes the body subexpression of a lambda expression e.
Denotes the lambda expression in which x is bound.
Denotes the narrowest lambda expression that properly
contains e.

Furthermore, if p2 = p(p1 ), I say that p1 is directly nested in p2 , denoted p1 ≺ p2 .
Let ≺+ and ≺∗ be the transitive and reflexive-transitive closures of ≺ respectively. If
p1 ≺+ p2 or p1 ≺∗ p2 , I say that p1 is properly nested in or nested in p2 respectively.
Finally, I say that an expression e is in tail position, denoted InTailPosition(e),
if and only if it is
—the body of a lambda expression or
—the consequent or alternate of a conditional that is in tail position.
Note that this definition of tail position agrees with the definition given in R5RS
[Kelsey et al. 1998]. The above annotations all denote simple syntactic properties
of the source program and do not involve any approximation.
3.2

Flow analysis

After macro expansion, Stalin performs whole-program interprocedural flow analysis. Flow analysis determines the potential values of expressions. The lightweight
closure-conversion process uses the results of flow analysis in numerous ways. Knowing the potential targets of call sites allows building a call graph which in turn allows
determining which procedures are reentrant or are called more than once. This supports globalization of variables. Flow analysis also supports escape analysis which
forms the basis of a crucial must-alias criterion for variable-slot elimination, globalization, hiding, and determining when continuations are fictitious. Flow analysis is
actually a precursor, directly or indirectly, to almost all of the subsequent analyses
that support lightweight closure conversion.
Flow analysis is an abstract interpretation that partitions the set of all concrete
objects into a set of abstract objects that is finite for a given program and the
set of all concrete locations into a set of abstract locations that is also finite for
a given program. Thus each abstract object is a set of concrete objects and each
abstract location is a set of concrete locations. Concrete interpretation, i.e. evaluation, associates concrete locations with variable instances, expression invocations,

18

·

Jeffrey Mark Siskind

the slots of concrete pairs, and the elements of concrete strings and vectors. Abstract interpretation, i.e. flow analysis, associates abstract locations with variables,
expressions, the slots of abstract pairs, and the elements of abstract strings and
vectors. Throughout this paper, I use the symbol k to denote concrete objects, the
symbol l to denote concrete locations, the symbol σ to denote abstract objects, and
the symbol β to denote abstract locations.
A concrete location can be viewed as the set of all concrete objects that that
concrete location can have as its value. Similarly, an abstract location can be
viewed as the set of all abstract objects that that abstract location can have as its
value. The principal relation produced by flow analysis is σ ∈ β.
Flow analysis obeys the following abstraction lemma:
Lemma 1. For all concrete objects k, concrete locations l, abstract objects σ,
and abstract locations β, if, in some state in some execution of the program, k ∈ σ,
l ∈ β, and k ∈ l, then σ ∈ β.
Flow analysis also obeys the following conservative approximation lemma:
Lemma 2. For all abstract objects σ and abstract locations β, if σ ∈ β, then σ ∈
β.
Different flow analyses differ in the way that they group concrete objects and
locations into abstract objects and locations as well as how they approximate ∈
as ∈. The lightweight closure-conversion process described in this paper works for
any flow analysis that meets the following criteria:
(1) All concrete procedures created by a given lambda expression are in the same
abstract procedure.
(2) All concrete locations associated with different instances of a given variable
are in the same abstract location.
(3) All concrete locations associated with different invocations of a given expression are in the same abstract location.
(4) If an abstract object contains a concrete procedure then it does not contain
any concrete non-procedures.
(5) If an abstract object contains a concrete pair then it does not contain any
concrete non-pairs.
(6) If an abstract object contains a concrete string then it does not contain any
concrete non-strings.
(7) If an abstract object contains a concrete vector then it does not contain any
concrete non-vectors.
(8) If an abstract object contains a concrete symbol then it does not contain any
concrete non-symbols.
(9) If an abstract object contains a concrete continuation then it does not contain
any concrete non-continuations.
(10) All concrete locations associated with the car slots of different concrete
pairs in the same abstract pair are in the same abstract location.
(11) All concrete locations associated with the cdr slots of different concrete
pairs in the same abstract pair are in the same abstract location.

Flow-Directed Lightweight Closure Conversion

·

19

(12) All concrete locations associated with all of the elements of different concrete
strings in the same abstract string are in the same abstract location.
(13) All concrete locations associated with all of the elements of different concrete
vectors in the same abstract vector are in the same abstract location.
(14) All concrete locations associated with the print-name–string slot of different
concrete symbols in the same abstract symbol are in the same abstract location.
(15) All concrete continuations created from different invocations of a given expression are in the same abstract continuation.
Criteria (4) through (9) imply that it is meaningful to talk about abstract procedures, pairs, strings, vectors, symbols, and continuations respectively. Criteria (10)
through (14) imply that it is meaningful to talk about the slots and elements of
abstract pairs, strings, vectors, and symbols respectively. Criteria (1) through (3)
imply that the flow analysis is monovariant (i.e. computed with 0-CFA [Shivers
1988; 1990; 1991a; 1991b; Heintze 1993; 1994]). Stalin actually performs a novel
polyvariant analysis [Siskind 2000b] which requires enhancements to the lightweight
closure-conversion analyses presented here. For expository simplicity, these enhancements are not presented. They complicate the presentation of the analyses
but do not add any conceptual novelty.
As a result of criterion (1) above, there is a one-to-one correspondence between
lambda expressions and abstract procedures. Thus I often use the symbol p to
denote either a lambda expression or an abstract procedure. Furthermore, the
following properties and functions are meaningful because of the above constraints
on flow analysis:
β(x)
β(e)
Pair?(σ)
Car(σ)
Cdr(σ)
String?(σ)
String-Ref(σ)
Vector?(σ)
Vector-Ref(σ)
Symbol?(σ)
Symbol->String(σ)

Continuation?(σ)
e(σ)

Denotes the abstract location associated with x.
Denotes the abstract location associated with the
result of evaluating e.
True if σ is an abstract pair.
Denotes the abstract location containing the car slots
of the concrete pairs in the abstract pair σ.
Denotes the abstract location containing the cdr slots
of the concrete pairs in the abstract pair σ.
True if σ is an abstract string.
Denotes the abstract location containing the elements
of the concrete strings in the abstract string σ.
True if σ is an abstract vector.
Denotes the abstract location containing the elements
of the concrete vectors in the abstract vector σ.
True if σ is an abstract symbol.
Denotes the abstract location containing the printname strings of the concrete symbols in the abstract
symbol σ.
True if σ is an abstract continuation.
Denotes the call to call/cc where the concrete
continuations in the abstract continuation σ were
created.

·

20

3.3

Jeffrey Mark Siskind

Reachability analysis

As part of performing flow analysis, Stalin also performs reachability analysis.
Reachability analysis determines which expressions are reached and, in turn, which
variables are accessed or assigned. Reachability analysis supports lightweight closure conversion in numerous ways. Unaccessed variables can be eliminated. Unreached free references can be ignored when determining whether variables slots
can be eliminated. Like flow analysis, reachability analysis is actually a precursor, directly or indirectly, to almost all of the subsequent analyses that support
lightweight closure conversion.
Before defining the concrete reachability properties, the following auxiliary definitions are needed:
Definition 1. Calling a procedure created by evaluating a lambda expression p
creates an expression invocation ê for each expression e where p(e) = p and creates
a variable instance x̂ for each variable x where p(x) = p.
Each expression invocation has program points just before and just after that
expression invocation. Assignment, call, and primcall invocations also have intermediate program points. Assignment invocations have an intermediate program
point just after the source subexpression invocation but before the actual assignment takes place. Call invocations have an intermediate program point just after
the callee and argument subexpression invocations but before the actual procedure
call takes place. Primcall invocations have an intermediate program point just after
the argument subexpression invocations but before the actions associated with the
primitive take place.
Given the above, the concrete reachability properties are defined as follows:
Definition 2. An expression invocation ê is reached, denoted Reached(ê), when
control flows to the program point just before ê. An expression invocation ê returns,
denoted Returns(ê), when control flows to the program point just after ê. An
assignment, call, or primcall invocation ê is executed, denoted Executed(ê), when
control flows to the intermediate program point in ê. A variable instance x̂ is
accessed, denoted Accessed(x̂), when some access invocation ê to x is reached.
A variable instance x̂ is assigned, denoted Assigned(x̂), when some assignment
invocation ê to x is executed. A call invocation ê is successful if it is executed and
the arity of the call site equals the arity of the target procedure or continuation. A
primcall invocation ê is successful if it is executed and the arity of the call site is
allowed by the primitive.
Given the above, the abstract reachability properties are defined as follows:
Definition 3. An expression e is reached, returns, or is executed, denoted
Reached(e), Returns(e), or Executed(e), if some invocation ê of e is reached,
returns, or is exectuted in some execution of the program respectively. A variable x
is accessed or assigned, denoted Accessed(x) or Assigned(x), if some instance x̂
of x is accessed or assigned in some execution of the program respectively.
Flow analysis computes Reached, Returns, Executed, Accessed, and
Assigned as approximations to Reached, Returns, Executed, Accessed, and

Flow-Directed Lightweight Closure Conversion

·

21

Assigned respectively. In particular, flow analysis directly produces the approximations Reached and Returns. Executed, Accessed, and Assigned are
derived from Reached and Returns by the following:11


e ∈ S ∧ Returns(Source(e))∨







4
e ∈ C ∧ Returns(Callee(e))∧
∨
Executed(e) =
0
0
(∀e ∈ Arguments(e))Returns(e )






e ∈ R ∧ (∀e0 ∈ Arguments(e))Returns(e0 )
4

Accessed(x) = (∃e ∈ A)x(e) = x ∧ Reached(e)
4

Assigned(x) = (∃e ∈ S)x(e) = x ∧ Executed(e)
Reachability analysis obeys the following abstraction lemma:
Lemma 3. For all expressions e and invocations ê of e, if Reached(ê), then
Reached(e), if Returns(ê), then Returns(e), and if Executed(ê), then
Executed(e). For all variables x and instances x̂ of x, if Accessed(x̂), then
Accessed(x) and if Assigned(x̂), then Assigned(x).
Reachability analysis also obeys the following conservative approximation lemma:
Lemma 4. For all expressions e, if Reached(e), then Reached(e), if
Returns(e), then Returns(e), and if Executed(e), then Executed(e). For
all variables x, if Accessed(x), then Accessed(x) and if Assigned(x), then
Assigned(x).
3.4

Approximating the abstract aggregate access and assignment properties and relations

After flow and reachability analysis, Stalin approximates the abstract aggregate
accesses and assigns relations and the abstract aggregate accessed and assigned
properties. Just as a variable or variable instance can be accessed or assigned, components of concrete and abstract aggregate objects can be accessed or assigned. And
just as Stalin can eliminate unaccessed variables and perform certain other optimizations on unassigned variables, Stalin can eliminate unaccessed components
of abstract aggregate objects. And just as Stalin can eliminate a closure when all
of its variable slots are eliminated, Stalin can eliminate an aggregate object when
11 Stalin

actually uses a more precise notion of Accessed(x). A variable is accessed if some access
to that variable is accessed. And a reached expression is accessed in the following cases:
—An access is accessed.
—The source of an assignment is accessed if its destination is accessed.
—The callee of a call is accessed.
—An argument of a call to a procedure is accessed if the corresponding parameter is accessed.
—An argument of a primcall is accessed.
—The body of a lambda expression is accessed if some call to that lambda expression is accessed.
—The antecedant of a conditional is accessed.
—The consequent and alternate of an accessed conditional are accessed.
—The body of the top-level lambda expression is accessed.

The objective of this more complex analysis is to treat x1 as unaccessed when x2 is unaccessed in
code like (let ((x2 x1 )) . . .).

22

·

Jeffrey Mark Siskind

all of its components are eliminated. To do this, Stalin needs a precise notion of
what are all of the different kinds of aggregate objects and their components and
under what circumstances these components are accessed and assigned.
In addition to procedures, whose closures are handled by other mechanisms,
Scheme has four kinds of aggregate objects: pairs, strings, vectors, and symbols.
Pairs have two components: their car and cdr slots. Vectors and strings have
two kinds of components: their length slot and their elements. Symbols have one
component: their print-name–string slot. Additionally, all objects have a typetag slot. Each of these components is accessed or assigned by a disjoint set of
primitives. The car and cdr primitives access the car and cdr slots respectively.
The set-car! and set-cdr! primitives assign the car and cdr slots respectively.
The string-length and vector-length primitives access the string and vector
length slots respectively. The string-ref and vector-ref primitives access string
and vector elements respectively. The string-set! and vector-set! primitives
assign string and vector elements respectively. The symbol->string primitive accesses the print-name–string slot. And the not, boolean?, pair?, null?, symbol?,
number?, real?, integer?, char?, string?, vector?, procedure?, input-port?,
output-port?, and eof-object? primitives access the type-tag slot. Note that it
is not possible to assign length, print-name–string, or type-tag slots.
Stalin is able to eliminate abstract locations that contain a single compile-time–
determinable concrete object. Such locations are called fictitious. This optimization
will be discussed in detail in section 3.19. What is relevant here is that if none of
the components of an abstract aggregate object are ever accessed then all of the
concrete objects in that abstract object are indistinguishable. And a location that
contains only indistinguishable objects can be treated as fictitious and eliminated.
But in order for this to be sound, one other condition must be met. In Scheme,
aggregate objects have identity in addition to components. Even if the components
of an object are never accessed, it is still possible to check the identity of an object
with the eq? primitive. In order to treat the concrete aggregate objects in an
abstract aggregate object as indistinguishable, two conditions must hold: all of its
components must be unaccessed and its identity must never be checked with the
eq? primitive.
Scheme nominally has a fifth kind of aggregate object: continuations. But
continuations don’t have components that can be accessed or assigned, except for a
type-tag slot. Nonetheless, it is convenient to treat continuations as accessed when
they are called. This allows the concrete continuations in an abstract continuation
to be treated as indistinguishable when their type-tag slot is never accessed and
they are never called. Note that it is not necessary to determine that the identity of
an abstract continuation is never checked in order to treat its concrete continuations
as indistinguishable since R4RS does not define eq?-ness for continuations.
Similarly, it is convenient to treat procedures as accessed when they are called.
This allows the concrete procedures in an abstract procedure to be treated as indistinguishable when their type-tag slot is never accessed and they are never called.
Again, note that it is not necessary to determine that the identity of an abstract
procedure is never checked in order to treat its concrete procedures as indistinguishable since R4RS does not define eq?-ness for procedures.
Stalin is able to do one further optimization. It can eliminate the code for prim-

Flow-Directed Lightweight Closure Conversion

·

23

itive predicates when flow analysis determines that a primcall to such a predicate
will always return #t or always return #f.
The concrete aggregate accesses12 and assigns relations are defined as follows:
Definition 4. A one-argument primcall invocation ê to not, boolean?, pair?,
null?, symbol?, number?, real?, integer?, char?, string?, vector?,
procedure?, input-port?, output-port?, or eof-object? type-tag accesses a
concrete object k passed as the argument, denoted TypeTagAccesses(ê, k), when
ê is executed, unless flow analysis has determined that the invocation must return #t
or must return #f. A two-argument primcall invocation ê to eq? eq? accesses a concrete object k passed as one of the arguments, denoted Eq?Accesses(ê, k), when ê
is executed, unless flow analysis has determined that the invocation must return
#t or must return #f. A one-argument primcall invocation ê to car car accesses a
concrete pair k passed as the argument, denoted CarAccesses(ê, k), when ê is executed. A one-argument primcall invocation ê to cdr cdr accesses a concrete pair k
passed as the argument, denoted CdrAccesses(ê, k), when ê is executed. A oneargument primcall invocation ê to string-length string-length accesses a concrete
string k passed as the argument, denoted StringLengthAccesses(ê, k), when ê is
executed. A two-argument primcall invocation ê to string-ref string-ref accesses a
concrete string k passed as the first argument, denoted StringRefAccesses(ê, k),
when ê is executed and an exact in-bounds integer is passed as the second argument.
A one-argument primcall invocation ê to vector-length vector-length accesses a
concrete vector k passed as the argument, denoted
VectorLengthAccesses(ê, k), when ê is executed. A two-argument primcall
invocation ê to vector-ref vector-ref accesses a concrete vector k passed as the
first argument, denoted VectorRefAccesses(ê, k), when ê is executed and an
exact in-bounds integer is passed as the second argument. A one-argument primcall invocation ê to symbol->string symbol-to-string accesses a concrete symbol k
passed as the argument, denoted SymbolToStringAccesses(ê, k), when ê is executed. A call invocation ê to a concrete continuation k continuation accesses k,
denoted ContinuationAccesses(ê, k), when ê is successful. A call invocation ê to
a concrete procedure k procedure accesses k, denoted ProcedureAccesses(ê, k),
when ê is successful. A two-argument primcall invocation ê to set-car! car assigns a concrete pair k passed as the first argument, denoted CarAssigns(ê, k),
when ê is executed. A two-argument primcall invocation ê to set-cdr! cdr assigns
a concrete pair k passed as the first argument, denoted CdrAssigns(ê, k), when ê is
executed. A three-argument primcall invocation ê to string-set! string-ref assigns
a concrete string k passed as the first argument, denoted StringRefAssigns(ê, k),
when ê is executed, an exact in-bounds integer is passed as the second argument,
and a character is passed as the third argument. A three-argument primcall invocation ê to vector-set! vector-ref assigns a concrete vector k passed as the first
argument, denoted VectorRefAssigns(ê, k), when ê is executed and an exact
in-bounds integer is passed as the second argument.
Given the above, the concrete aggregate accessed and assigned properties are
12 Note

that since Stalin does not currently implement rational and complex numbers, the primitives rational? and complex? are not included in the notion of type-tag access.

24

·

Jeffrey Mark Siskind

defined as follows:
Definition 5. A concrete object k is type-tag, eq?, car, cdr, string-length, stringref, vector-length, vector-ref, symbol-to-string, continuation, or procedure accessed,
or car, cdr, string-ref, or vector-ref assigned, denoted TypeTagAccessed(k),
Eq?Accessed(k), CarAccessed(k), CdrAccessed(k),
StringLengthAccessed(k), StringRefAccessed(k),
VectorLengthAccessed(k), VectorRefAccessed(k),
SymbolToStringAccessed(k), ContinuationAccessed(k),
ProcedureAccessed(k), CarAssigned(k), CdrAssigned(k),
StringRefAssigned(k), or VectorRefAssigned(k), if some expression invocation ê type-tag, eq?, car, cdr, string-length, string-ref, vector-length, vector-ref,
continuation, or procedure accesses or car, cdr, string-ref, or vector-ref assigns k
respectively.
Given the above, the abstract aggregate accesses and assigns relations are defined
as follows:
Definition 6. An expression e type-tag, eq?, car, cdr, string-length, string-ref,
vector-length, vector-ref, continuation, or procedure accesses, or car, cdr, stringref, or vector-ref assigns an abstract object σ, denoted TypeTagAccesses(e, σ),
Eq?Accesses(e, σ), CarAccesses(e, σ), CdrAccesses(e, σ),
StringLengthAccesses(e, σ), StringRefAccesses(e, σ),
VectorLengthAccesses(e, σ), VectorRefAccesses(e, σ),
SymbolToStringAccesses(e, σ), ContinuationAccesses(e, σ),
ProcedureAccesses(e, σ), CarAssigns(e, σ), CdrAssigns(e, σ),
StringRefAssigns(e, σ), or VectorRefAssigns(e, k), if some invocation ê of e
type-tag, eq?, car, cdr, string-length, string-ref, vector-length, vector-ref, continuation, or procedure accesses or car, cdr, string-ref, or vector-ref assigns a concrete
object k ∈ σ in some execution of the program respectively.
Given the above, the abstract aggregate accessed and assigned properties are
defined as follows:
Definition 7. An abstract object σ is type-tag, eq?, car, cdr,
string-length, string-ref, vector-length, vector-ref, symbol-to-string, continuation,
or procedure accessed, or car, cdr, string-ref, or vector-ref assigned, denoted
TypeTagAccessed(σ), Eq?Accessed(σ), CarAccessed(σ),
CdrAccessed(σ), StringLengthAccessed(σ), StringRefAccessed(σ),
VectorLengthAccessed(σ), VectorRefAccessed(σ),
SymbolToStringAccessed(σ), ContinuationAccessed(σ),
ProcedureAccesses(σ,), CarAssigned(σ), CdrAssigned(σ),
StringRefAssigned(σ), or VectorRefAssigned(σ), if some expression invocation ê type-tag, eq?, car, cdr, string-length, string-ref, vector-length, vector-ref,
continuation, or procedure accesses or car, cdr, string-ref, or vector-ref assigns a
concrete object k ∈ σ in some execution of the program respectively.
Stalin approximates the abstract aggregate accesses and assigns relations as

Flow-Directed Lightweight Closure Conversion

·

25

follows:


4

TypeTagAccesses(e, σ) =

4

Eq?Accesses(e, σ) =

4

CarAccesses(e, σ) =

4

CdrAccesses(e, σ) =

4

StringLengthAccesses(e, σ) =

4

StringRefAccesses(e, σ) =

4

VectorLengthAccesses(e, σ) =

4

VectorRefAccesses(e, σ) =


e ∈ R1 ∧ Executed(e)∧




not,












boolean?,












pair?,












null?,












number?,












real?,






 


integer?,
 Callee(e) ∈
∧


char?,












string?,












vector?,












procedure?,












input-port?,











output-port?, 






 


eof-object?


 σ ∈ β(Argument1 (e))∧

#t ∈ β(e) ∧ #f ∈ β(e)


e ∈ R2 ∧ Executed(e)∧

 Callee(e) = eq?∧

 
 σ ∈ β(Argument1 (e))∨


∧

 σ ∈ β(Argument2 (e))
#t ∈ β(e) ∧ #f ∈ β(e)


e ∈ R1 ∧ Executed(e)∧
 Callee(e) = car∧

σ ∈ β(Argument1 (e))


e ∈ R1 ∧ Executed(e)∧
 Callee(e) = cdr∧

σ ∈ β(Argument1 (e))


e ∈ R1 ∧ Executed(e)∧
 Callee(e) = string-length∧ 
σ ∈ β(Argument1 (e))


e ∈ R2 ∧ Executed(e)∧
 Callee(e) = string-ref∧



 σ ∈ β(Argument1 (e))∧

number ∈ β(Argument2 (e))


e ∈ R1 ∧ Executed(e)∧
 Callee(e) = vector-length∧ 
σ ∈ β(Argument1 (e))


e ∈ R2 ∧ Executed(e)∧
 Callee(e) = vector-ref∧



 σ ∈ β(Argument1 (e))∧

number ∈ β(Argument2 (e))

26

·

Jeffrey Mark Siskind

4

SymbolToStringAccesses(e, σ) =
4

ContinuationAccesses(e, σ) =
4

ProcedureAccesses(e, σ) =

4

CarAssigns(e, σ) =

4

CdrAssigns(e, σ) =

4

StringRefAssigns(e, σ) =

4

VectorRefAssigns(e, σ) =




e ∈ R1 ∧ Executed(e)∧
 Callee(e) = symbol->string∧ 
σ ∈ β(Argument1 (e))


e ∈ C1 ∧ Executed(e)∧
σ ∈ β(Callee(e))


e ∈ C ∧ Executed(e)∧

 σ ∈ β(Callee(e))∧
Arity(e) = Arity(σ)


e ∈ R2 ∧ Executed(e)∧
 Callee(e) = set-car!∧ 
σ ∈ β(Argument1 (e))


e ∈ R2 ∧ Executed(e)∧
 Callee(e) = set-cdr!∧ 
σ ∈ β(Argument1 (e))


e ∈ R3 ∧ Executed(e)∧

 Callee(e) = string-set!∧


 σ ∈ β(Argument1 (e))∧



 number ∈ β(Argument2 (e))∧ 
char ∈ β(Argument3 (e))


e ∈ R3 ∧ Executed(e)∧
 Callee(e) = vector-set!∧ 


 σ ∈ β(Argument1 (e))∧

number ∈ β(Argument2 (e))

Stalin approximates the abstract aggregate accessed and assigned properties as
follows:
4

TypeTagAccessed(σ) = (∃e ∈ R1 )TypeTagAccesses(e, σ)
4

Eq?Accessed(σ) = (∃e ∈ R2 )Eq?Accesses(e, σ)
4

CarAccessed(σ) = (∃e ∈ R1 )CarAccesses(e, σ)
4

CdrAccessed(σ) = (∃e ∈ R1 )CdrAccesses(e, σ)
4

StringLengthAccessed(σ) = (∃e ∈ R1 )StringLengthAccesses(e, σ)
4

StringRefAccessed(σ) = (∃e ∈ R2 )StringRefAccesses(e, σ)
4

VectorLengthAccessed(σ) = (∃e ∈ R1 )VectorLengthAccesses(e, σ)
4

VectorRefAccessed(σ) = (∃e ∈ R2 )VectorRefAccesses(e, σ)
4

SymbolToStringAccessed(σ) = (∃e ∈ R1 )SymbolToStringAccesses(e, σ)
4

ContinuationAccessed(σ) = (∃e ∈ C1 )ContinuationAccesses(e, σ)
4

ProcedureAccessed(σ) = (∃e ∈ C)ProcedureAccesses(e, σ)
4

CarAssigned(σ) = (∃e ∈ R2 )CarAssigns(e, σ)

Flow-Directed Lightweight Closure Conversion

·

27

4

CdrAssigned(σ) = (∃e ∈ R2 )CdrAssigns(e, σ)
4

StringRefAssigned(σ) = (∃e ∈ R3 )StringRefAssigns(e, σ)
4

VectorRefAssigned(σ) = (∃e ∈ R3 )VectorRefAssigns(e, σ)
The aggregate access and assignment properties and relations obey the following
abstraction lemma:
Lemma 5. For all expressions e, invocations ê of e, abstract objects σ, and concrete objects k ∈ σ,
if TypeTagAccesses(ê, k), then TypeTagAccesses(e, σ),
if Eq?Accesses(ê, k), then Eq?Accesses(e, σ),
if CarAccesses(ê, k), then CarAccesses(e, σ),
if CdrAccesses(ê, k), then CdrAccesses(e, σ),
if StringLengthAccesses(ê, k), then StringLengthAccesses(e, σ),
if StringRefAccesses(ê, k), then StringRefAccesses(e, σ),
if VectorLengthAccesses(ê, k), then VectorLengthAccesses(e, σ),
if VectorRefAccesses(ê, k), then VectorRefAccesses(e, σ),
if SymbolToStringAccesses(ê, k), then SymbolToStringAccesses(e, σ),
if ContinuationAccesses(ê, k), then ContinuationAccesses(e, σ),
if ProcedureAccesses(ê, k), then ProcedureAccesses(e, σ),
if CarAssigns(ê, k), then CarAssigns(e, σ),
if CdrAssigns(ê, k), then CdrAssigns(e, σ),
if StringRefAssigns(ê, k), then StringRefAssigns(e, σ), and
if VectorRefAssigns(ê, k), then VectorRefAssigns(e, σ).
For all σ and k ∈ σ,
if TypeTagAccessed(k), then TypeTagAccessed(σ),
if Eq?Accessed(k), then Eq?Accessed(σ),
if CarAccessed(k), then CarAccessed(σ),
if CdrAccessed(k), then CdrAccessed(σ),
if StringLengthAccessed(k), then StringLengthAccessed(σ),
if StringRefAccessed(k), then StringRefAccessed(σ),
if VectorLengthAccessed(k), then VectorLengthAccessed(σ),
if VectorRefAccessed(k), then VectorRefAccessed(σ),
if SymbolToStringAccessed(k), then SymbolToStringAccessed(σ),
if ContinuationAccessed(k), then ContinuationAccessed(σ),
if ProcedureAccessed(k), then ProcedureAccessed(σ),
if CarAssigned(k), then CarAssigned(σ),
if CdrAssigned(k), then CdrAssigned(σ),
if StringRefAssigned(k), then StringRefAssigned(σ), and
if VectorRefAssigned(k), then VectorRefAssigned(σ).
The aggregate access and assignment properties and relations also obey the following conservative approximation lemma:
Lemma 6. For all expressions e and abstract objects σ,
if TypeTagAccesses(e, σ), then TypeTagAccesses(e, σ),
if Eq?Accesses(e, σ), then Eq?Accesses(e, σ),
if CarAccesses(e, σ), then CarAccesses(e, σ),

·

28

Jeffrey Mark Siskind

if CdrAccesses(e, σ), then CdrAccesses(e, σ),
if StringLengthAccesses(e, σ), then StringLengthAccesses(e, σ),
if StringRefAccesses(e, σ), then StringRefAccesses(e, σ),
if VectorLengthAccesses(e, σ), then VectorLengthAccesses(e, σ),
if VectorRefAccesses(e, σ), then VectorRefAccesses(e, σ),
if SymbolToStringAccesses(e, σ), then SymbolToStringAccesses(e, σ),
if ContinuationAccesses(e, σ), then ContinuationAccesses(e, σ),
if ProcedureAccesses(e, σ), then ProcedureAccesses(e, σ),
if CarAssigns(e, σ), then CarAssigns(e, σ),
if CdrAssigns(e, σ), then CdrAssigns(e, σ),
if StringRefAssigns(e, σ), then StringRefAssigns(e, σ), and
if VectorRefAssigns(e, σ), then VectorRefAssigns(e, σ).
For all σ,
if TypeTagAccessed(σ), then TypeTagAccessed(σ),
if Eq?Accessed(σ), then Eq?Accessed(σ),
if CarAccessed(σ), then CarAccessed(σ),
if CdrAccessed(σ), then CdrAccessed(σ),
if StringLengthAccessed(σ), then StringLengthAccessed(σ),
if StringRefAccessed(σ), then StringRefAccessed(σ),
if VectorLengthAccessed(σ), then VectorLengthAccessed(σ),
if VectorRefAccessed(σ), then VectorRefAccessed(σ),
if SymbolToStringAccessed(σ), then SymbolToStringAccessed(σ),
if ContinuationAccessed(σ), then ContinuationAccessed(σ),
if ProcedureAccessed(σ), then ProcedureAccessed(σ),
if CarAssigned(σ), then CarAssigned(σ),
if CdrAssigned(σ), then CdrAssigned(σ),
if StringRefAssigned(σ), then StringRefAssigned(σ), and
if VectorRefAssigned(σ), then VectorRefAssigned(σ).
3.5

Approximating the abstract call-graph properties and relations

After approximating the abstract aggregate access and assignment properties and
relations, Stalin approximates the abstract call-graph properties and relations.
The concrete direct call-graph properties and relations are defined as follows:
Definition 8. If k1 and k2 are concrete procedures that were created by evaluating p1 and p2 respectively, the call invocation ê was created by calling k1 , the callee
of ê is k2 , and ê is successful, then k2 is called and ê and k1 directly call k2 , denoted
Called(k2 ), ê  k2 , and k1  k2 respectively. Furthermore, if e is in tail position,
then ê and k1 directly tail call k2 , denoted ê t k2 and k1 t k2 respectively. And
if e is not in tail position, then ê and k1 directly non-tail call k2 , denoted ê t k2
and k1 t k2 respectively. And if e is in tail position and p1 is in-lined in p2 ,
denoted p1 ,→∗ p2 , then ê and k1 directly self-tail call k2 , denoted ê st k2 and
k1 st k2 respectively. And if either e is not in tail position or p1 is not in-lined
in p2 , then ê and k1 directly non-self-tail call k2 , denoted ê st k2 and k1 st k2
respectively.
Given the above, the concrete transitive call-graph relations are defined as follows:
Definition 9. If n ≥ 2 and k1 , . . . , kn are concrete procedures that were created

Flow-Directed Lightweight Closure Conversion

·

29

by evaluating p1 , . . . , pn respectively, the call invocations eb1 , . . . , ed
n−1 were created
by calling k1 , . . . , kn−1 respectively, the callee of each eb1 , . . . , ed
n−1 is k2 , . . . , kn respectively, and eb1 , . . . , ed
are
successful,
then
e
b
and
k
properly
call kn , denoted
n−1
1
1
+
+
eb1 
kn and k1 
kn respectively. Furthermore, if e1 , . . . , en−1 are all in tail
+
+
position, then eb1 and k1 properly tail call kn , denoted eb1 
t kn and k1 
t kn
respectively. And if one or more of e1 , . . . , en−1 are not in tail position, then eb1
+
+
and k1 properly non-tail call kn , denoted eb1 
t kn and k1  t kn respectively.
And if e1 , . . . , en−1 are all in tail position, and each p1 , . . . , pn−1 is in-lined in
pn , p1 , . . . , pn−2 respectively, then eb1 and k1 properly self-tail call kn , denoted
+
+
eb1 
st kn and k1 
st kn respectively. And if either one or more of e1 , . . . , en−1 are
not in tail position or one or more of p1 , . . . , pn−1 are not in-lined in pn , p1 , . . . , pn−2
+
respectively, then eb1 and k1 properly non-self-tail call kn , denoted eb1 
st kn and
+
k
respectively.
k1 
st n
Given the above, the concrete reflexive-transitive call-graph relations are defined
as follows:
+
∗
Definition 10. If k1 = k2 or k1 
k2 then k1 calls k2 , denoted k1 
k2 . If
+
∗
+
k1 = k2 or k1 
t k2 then k1 tail calls k2 , denoted k1 
t k2 . If k1 = k2 or k1 
t k2
+
∗
k
.
If
k
=
k
or
k

k
then
k
self-tail
then k1 non-tail calls k2 , denoted k1 
2
1
2
1
st
2
1
t
∗
+
calls k2 , denoted k1 
st k2 . If k1 = k2 or k1 
st k2 then k1 non-self-tail calls k2 ,
∗
k
.
denoted k1 
2
st

Given the above, the abstract call-graph properties and relations are defined as
follows:
Definition 11. An abstract procedure p is called, denoted Called(p), if some
concrete procedure k ∈ p is called in some execution of the program. A call e
directly calls an abstract procedure p, denoted e  p, if some call invocation ê of e
directly calls some concrete procedure k ∈ p in some execution of the program. An
abstract procedure p1 directly calls an abstract procedure p2 , denoted p1  p2 , if
some concrete procedure k1 ∈ p1 directly calls some concrete procedure k2 ∈ p2 in
+ , 
+
some execution of the program. Likewise for the relations t , t , st , st , 
t,
+
+
∗ , 
∗
∗
∗
∗
+
,

,

,

,

,

,
and

.

st
t
st
t
st
t
st
Note that the direct and proper tail, non-tail, self-tail, and non-self-tail call
relations are all independent because it is possible for an abstract procedure to
both directly tail and non-tail call another abstract procedure by different paths.
Stalin approximates the abstract call-graph properties and relations as follows:
4

Called(p) = (∃e ∈ C)e  p
4

e  p = Executed(e) ∧ p ∈ β(Callee(e)) ∧ Arity(e) = Arity(p)
4

p1  p2 = (∃e ∈ C)p(e) = p1 ∧ e  p2
4

e t p = e  p ∧ InTailPosition(e)
4

p1 t p2 = (∃e ∈ C)p(e) = p1 ∧ e t p2
4

e t p = e  p ∧ ¬InTailPosition(e)
4

p1 t p2 = (∃e ∈ C)p(e) = p1 ∧ e t p2

30

·

Jeffrey Mark Siskind
4

e st p = e  p ∧ InTailPosition(e) ∧ p ,→∗ p(e)
4

p1 st p2 = (∃e ∈ C)p(e) = p1 ∧ e st p2
4

e st p = e  p ∧ (¬InTailPosition(e) ∨ p 6,→∗ p(e))
4

p1 st p2 = (∃e ∈ C)p(e) = p1 ∧ e st p2
4

+
+ p
p1 
2 = p1  p2
4

+
+
p1 
t p 2 = p1  t p 2
4

0
0
0 ∗
+
∗
p1 
t p2 = (∃p, p ∈ P )p1  p ∧ p t p ∧ p  p2
4

+
+
p1 
st p2 = p1 st p2 ∨ (∃p ∈ P )p1 t p ∧ p ,→ p1 ∧ p 
st p2


+
4
p1 t p ∧ p ,→ p1 ∧ p 
st p2 ∨
+
p1 
p
=
p

p
∨
(∃p
∈
P
)
1
st 2
st 2
+ p
(p1 t p ∨ p1 t p ∧ p 6,→∗ p1 ) ∧ p 
2

4

0
0
0 ∗
+ p = (∃p ∈ P )e  p ∧ p 
e
p
4

0
0
0 ∗
+
e
t p = (∃p ∈ P )e t p ∧ p 
t p
4

0
0 ∗
0
0
0 ∗
+
e
t p = (∃p ∈ P )e t p ∧ p  p ∨ e t p ∧ p  t p
4

0
0
0
0 +
+
e
st p = e st p ∨ (∃p ∈ P )e t p ∧ p ,→ p(e) ∧ p 
st p2


0
0
0 +
4
e

p
∧
p
,→
p(e)
∧
p

p∨
t
0
st
+
e
st p = e st p ∨ (∃p ∈ P ) (e  p0 ∨ e  p0 ∧ p0 6,→∗ p(e)) ∧ p0 
+ p
t
t
4

∗
+ p
p1 
p 2 = p1 = p 2 ∨ p 1 
2

4

∗
+
p1 
t p 2 = p1 = p 2 ∨ p 1 
t p2

4

∗
+
p1 
t p 2 = p1 = p 2 ∨ p 1  t p2

4

∗
+
p1 
st p2 = p1 = p2 ∨ p1 
st p2

4

∗
+
p1 
st p2 = p1 = p2 ∨ p1  st p2

The call-graph properties and relations obey the following abstraction lemma:
Lemma 7. For all expressions e, invocations ê of e, abstract procedures p, p1 ,
and p2 , and concrete procedures k ∈ p, k1 ∈ p1 , and k2 ∈ p2 : if Called(k), then
Called(p), if ê  k, then e  p, if ê t k, then e t p, if ê t k, then e t p, if
ê st k, then e st p, if ê st k, then e st p, if k1  k2 , then p1  p2 , if k1 t k2 ,
then p1 t p2 , if k1 t k2 , then p1 t p2 , if k1 st k2 , then p1 st p2 , if k1 st k2 ,
+ k, then e 
+ p, if ê 
+
+
+
+
then p1 st p2 , if ê 
t k, then e 
t p, if ê 
t k, then e  t p,
+
+
+
+
+
+
if ê 
k,
then
e

p,
if
k

k
,
then
p

p2 , if
st k, then e 
st p, if ê 
1
2
1
st
st
+
+
+
+
+
+
k1 
k
,
then
p

p
,
if
k

k
,
then
p

t k2 , then p1 
t p2 , if k1 
2
1
2
1
st
2
1
st p2 ,
t
t
+
+
∗
∗
∗
∗
if k1 
k
,
then
p

p
,
if
k

k
,
then
p

p
,
if
k

k
,
then
p

2
1
2
1
2
1
2
1
t
2
1
t p2 ,
st
st
∗
∗
∗
∗
∗
if k1 
k
,
then
p

p
,
if
k

k
,
then
p

p
,
and
if
k

k
,
then
2
1
2
1
st
2
1
st
2
1
2
t
t
st
∗
p1 
p
.
2
st
The call-graph properties and relations also obey the following conservative approximation lemma:
Lemma 8. For all expressions e and abstract procedures p, p1 , and p2 : if

Flow-Directed Lightweight Closure Conversion

·

31

Called(k), then Called(p), if e  p, then e  p, if e t p, then e t p, if e t p,
then e t p, if e st p, then e st p, if e st p, then e st p, if p1  p2 , then
p1  p2 , if p1 t p2 , then p1 t p2 , if p1 t p2 , then p1 t p2 , if p1 st p2 , then
+ p, then e 
+ p, if e 
+
+
p1 st p2 , if p1 st p2 , then p1 st p2 , if e 
t p, then e 
t p,
+
+
+
+
+
+
+ p ,
if e 
p,
then
e

p,
if
e

p,
then
e

p,
if
e

p,
then
e

p,
if
p

st
st
1
2
t
t
st
st
+ p , if p 
+
+
+
+
then p1 
2
1 + t p2 , then p1 
t p2 , if p1 
t p2 , then p1  t p2 , if p1  st p2 ,
+
+
+
∗
∗
∗
then p1 
st p2 , if p1 
st p2 , then p1  st p2 , if p1  p2 , then p1  p2 , if p1  t p2 ,
∗
∗
∗
∗
∗
then p1 
p
,
if
p

p
,
then
p

p
,
if
p

p
,
then
p

p
,
1
1
st 2
1
t 2
1
st 2 and if
t 2
t 2
∗
∗
p1 
p
,
then
p

p
.
1
st 2
st 2
3.6

Approximating the abstract called-more-than-once property

After approximating the abstract call-graph properties and relations, Stalin approximates the abstract called-more-than-once property. This information is used
to determine which variables can be globalized. A variable can be globalized if it
can have at most one live instance. A variable will have only one instance, hence
at most one live instance, if the abstract procedure that binds that variable is not
called more than once.
The concrete called-more-than-once property is defined as follows:
Definition 12. A concrete procedure k is called more than once, denoted
CalledMoreThanOnce(k), if there are two or more successful call invocations
that target k.
Given the above, the abstract called-more-than-once property is defined as follows:
Definition 13. An abstract procedure p is called more than once, denoted
CalledMoreThanOnce(p), if some concrete procedure k ∈ p is called more than
once in some execution of the program.
Stalin approximates the property CalledMoreThanOnce(p) as follows:
4

CalledMoreThanOnce(p)
=


p
=
6
p
∧
0

 

(∃e ∈ C)e  p ∧ CalledMoreThanOnce(p(e)) ∨ 
(∃e1 , e2 ∈ C)e1 6= e2 ∧ e1  p ∧ e2  p
The called-more-than-once property obeys the following abstraction lemma:
Lemma 9. For all abstract procedures p and concrete procedures k ∈ p, if
CalledMoreThanOnce(k), then CalledMoreThanOnce(p).
The called-more-than-once property also obeys the following conservative approximation lemma:
Lemma 10. For all abstract procedures p, if CalledMoreThanOnce(p), then
CalledMoreThanOnce(p).
3.7

Approximating the abstract free-in relation

After approximating the abstract called-more-than-once property, Stalin approximates the abstract free-in relation. This information is used to support to support

·

32

Jeffrey Mark Siskind

parent-slot, closure-pointer–slot, and parent-parameter compression and elimination. If an abstract procedure p1 binds a variable x that is free in an abstract procedure p2 , then the closure for p1 must be accessible to p2 via the static backchain.
Conversely, if an abstract procedure p1 does not bind any variables that are free in
an abstract procedure p2 , then the closure for p1 , if it exists, need not be accessible
to p2 , allowing parent-slot, closure-pointer–slot, and parent-parameter compression
and elimination.
Conventional implementations use a purely syntactic definition of the free-in
relation: a variable x is free in an abstract procedure p that doesn’t bind x if
there are any references to x nested in p. Stalin uses a definition that is more
precise in two ways. First, a variable is not free if it is never accessed, even if it is
freely assigned. This is sound because Stalin eliminates unaccessed variables and
assignments to unaccessed variables. Second, unreached accesses and unexecuted
assignments to a variable are ignored. A variable can be free only if it has reached
free accesses or executed free assignments. This is sound because Stalin eliminates
unreached accesses and unexecuted assignments.
The concrete free-in relation is defined as follows:
Definition 14. A variable x is free in some procedure p, denoted FreeIn(x, p),
if p is properly nested in p(x), x is accessed, and there is some reached access e or
executed assignment e that references x such that p(e) is nested in p.
Stalin approximates the relation FreeIn(x, p) as follows:


+
p
 p(x) ∧ Accessed(x)∧


≺
4
FreeIn(x, p) =  (∃e ∈ A)Reached(e) ∧ x(e) = x ∧ p(e) ≺∗ p ∨ 
(∃e ∈ S)Executed(e) ∧ x(e) = x ∧ p(e) ≺∗ p
The free-in relation obeys the following conservative approximation lemma:
Lemma 11. For all abstract procedures p and variables x, if FreeIn(x, p), then
FreeIn(x, p).
3.8

Approximating the abstract points-to relation

After approximating the abstract free-in relation, Stalin approximates the abstract points-to relation. The abstract points-to relation is used to approximate
the abstract escape relation which forms the basis of a must-alias analysis that is
used to support variable-slot elimination, globalization, hiding, and determining
when continuations are fictitious.
The concrete points-to relation is defined as follows:
Definition 15. A concrete object k directly points to a concrete location l, denoted k ; l, if l is a slot or element of k. A concrete location l directly points to a
concrete object k, denoted l ; k, if k is the value of l. A concrete object k1 points
to a concrete object k2 , denoted k1 ∗; k2 , if k1 ;∗ k2 . A concrete object k points
to a concrete location l, denoted k ∗; l, if k ;∗ l. A concrete location l points to
a concrete object k, denoted l ∗; k, if l ;∗ k. A concrete location l1 points to a
concrete location l2 , denoted l1 ∗; l2 , if l1 ;∗ l2 .
Given the above, the abstract points-to relation is defined as follows:

Flow-Directed Lightweight Closure Conversion

·

33

Definition 16. An abstract object σ directly points to an abstract location β,
denoted σ ; β, if some concrete object in k ∈ σ directly points to some concrete
location l ∈ β in some state in some execution of the program. An abstract location β directly points to an abstract object σ, denoted β ; σ, if some concrete
location l ∈ β directly points to some concrete object k ∈ σ in some state in some
execution of the program. An abstract object σ1 points to an abstract object σ2 ,
denoted σ1 ∗; σ2 , if some concrete object k1 ∈ σ1 points to some concrete object k2 ∈ σ2 in some state in some execution of the program. An abstract object σ
points to an abstract location β, denoted σ ∗; β, if some concrete object k ∈ σ
points to some concrete location l ∈ β in some state in some execution of the program. An abstract location β points to an abstract object σ, denoted β ∗; σ, if
some concrete location l ∈ β points to some concrete object k ∈ σ in some state
in some execution of the program. An abstract location β1 points to an abstract
location β2 , denoted β1 ∗; β2 , if some concrete location l1 ∈ β1 points to some
concrete location l2 ∈ β2 in some state in some execution of the program.
Stalin approximates the ; relation as follows:
4

β;σ = σ∈
β
 

σ
 ∈ P ∧ ProcedureAccessed(σ)∧
 ∨


(∃x ∈ X)Accessed(x)
∧ β = β(x) ∧ FreeIn(x,


 σ)



Car(σ)
=
β
∧
CarAccessed(σ)∨

 Pair?(σ) ∧
∨


Cdr(σ)
=
β
∧
CdrAccessed(σ)




4 
String?(σ)
∧
String-Ref(σ)
=
β∧


σ;β = 
∨


  StringRefAccessed(σ)


 Vector?(σ) ∧ Vector-Ref(σ) = β∧


∨

 VectorRefAccessed(σ)




 Symbol?(σ) ∧ Symbol->String(σ) = β∧
SymbolToStringAccessed(σ)

Stalin then approximates the ∗; relation, denoted ∗; , as ;∗ .
The points-to relation obeys the following abstraction lemma:
Lemma 12. For all pairs of abstract objects σ1 and σ2 , concrete objects k1 ∈ σ1 ,
concrete objects k2 ∈ σ2 , pairs of abstract locations β1 and β2 , concrete locations l1 ∈ β1 , and concrete locations l2 ∈ β2 , if, in some state in some execution
of the program, k1 ; l1 , l1 ; k1 , k1 ∗; k2 , k1 ∗; l2 , l1 ∗; k2 , or l1 ∗; l2 , then
σ1 ; β1 , β1 ; σ1 , σ1 ∗; σ2 , σ1 ∗; β2 , β1 ∗; σ2 , or β1 ∗; β2 respectively.
The points-to relation also obeys the following conservative approximation
lemma:
Lemma 13. For all pairs of abstract objects σ1 and σ2 and pairs of abstract
locations β1 and β2 , if σ1 ; β1 , β1 ; σ1 , σ1 ∗; σ2 , σ1 ∗; β2 , β1 ∗; σ2 , or
β1 ∗; β2 , then σ1 ; β1 , β1 ; σ1 , σ1 ∗; σ2 , σ1 ∗; β2 , β1 ∗; σ2 , or β1 ∗; β2
respectively.
3.9

Approximating the abstract escape relation

After approximating the abstract points-to relation, Stalin approximates the abstract escape relation. The abstract escape relation forms the basis of a must-alias

34

·

Jeffrey Mark Siskind

analysis that is used to support variable-slot elimination, globalization, hiding, and
determining when continuations are fictitious.
Before defining the concrete escape relation, the following auxiliary definitions
are needed:

Definition 17. A reached lambda expression invocation creates a concrete procedure. A successful primcall invocation to cons creates a concrete pair. A successful
primcall invocation to string or make-string creates a concrete string. A successful primcall invocation to vector or make-vector creates a concrete vector.
A successful primcall invocation to string->symbol creates a symbol. A successful primcall invocation to call/cc creates a continuation. An expression invocation that type-tag, eq?, car, cdr, string-length, string-ref, vector-length, vector-ref,
symbol-to-string, or continuations accesses a concrete object accesses that object.
An expression invocation that car, cdr, string-ref, or vector-ref assigns a concrete
object assigns that object. A reached access invocation ê accesses a concrete procedure k, if k binds x̂(ê). An executed assignment invocation ê assigns a concrete
procedure k, if k binds x̂(ê).

Given the above, concrete escape relation is defined as follows:

Definition 18. A concrete object k escapes escapes an expression invocation ê,
denoted k ⇑ ê, if k is created after ê is reached but before ê returns and is accessed
after ê returns.

Given the above, the abstract escape relation is defined as follows:

Definition 19. An abstract object σ escapes an expression e, denoted σ ⇑ e, if
some concrete object k ∈ σ escapes some invocation ê of e in some execution of the
program.

The escape relation will be needed only when e is the body of a lambda expression.
This leads to the following definition:

Definition 20. An abstract object σ escapes an abstract procedure p, denoted
σ ⇑ p, if σ escapes Body(p).

Flow-Directed Lightweight Closure Conversion

·

35

Stalin approximates the relation σ ⇑ p as follows:


Returns(Body(p))
∧ β(Body(p)) ∗; σ∨








0


Executed(e)
∧
Returns(e
)∧






0


 Accessed(x(e)) ∧ p(e ) 


+ p 
∗
p(e)∧






∃
e
∈
S,


0
0




∗
∗
β(Source(e))
;
σ
∧
β(e
)
;
p
∧


0






e
∈
E,
∨


0
+




p
≺
p(x(e))∧


0






p
∈
P


00
00




Reached(e
)
∧
x(e
)
=
x(e)∧


00


(∃e ∈ A)


00
∗ 0


p(e
)
≺
p








0



 Executed(e) ∧ p 
+ p 
∗


p(e)∧


∃ e ∈ C1 ,


0






Continuation?(σ
)∧
0






p ∈ P,
∨






∗


β(Argument
(e))
;
σ∧
1
0


σ ∈ β(Callee(e))


0
0




p 
∈ β(Argument2 (e(σ )))

4
0
σ⇑p =
Executed(e) ∧ Returns(e )∧





  p(e0 ) 
+ p 
∗

p(e) ∧ Pair?(σ 0
)∧ 



 
 ∃ e ∈ C2 ,


 
 Callee(e) = set-car!∨


0








e ∈ E,
∨
∧






Callee(e)
=
set-cdr!
0






σ ∈ β(Argument1 (e)) 



∗


β(Argument
(e))
;
σ∧
2




0
0


∗


β(e
)
;
σ








0


Executed(e)
∧
Returns(e
)∧






0
0
0




+ p 
∗
∗


p(e
)

p(e)
∧
β(e
)
;
σ
∧




 ∃ e ∈ C3 ,



0




Vector?(σ
)∧


0






e ∈ E,






Callee(e)
=
vector-set!∧


0




σ ∈ β(Argument1 (e)) 





number
∈
β(Argument
(e))∧
2




β(Argument3 (e)) ∗; σ

Note that calls to string-set! do not cause their third argument to escape because
that argument must always be a character and characters are not created.
The escape relation obeys the following abstraction lemma:
Lemma 14. For all abstract objects σ, expressions e, concrete objects k ∈ σ, and
invocations ê of e, if k ⇑ ê, then σ ⇑ e.
The escape relation also obeys the following conservative approximation lemma:
Lemma 15. For all abstract objects σ and abstract procedures p, if σ ⇑ p, then
σ⇑p.
3.10

Deciding which procedures to in-line

After approximating the abstract escape relation, Stalin decides which procedures
to in-line. The resulting in-lined-in relation is important since an accessed, nonfictitious, non-global variable slot can be eliminated only if all reached accesses and
executed assignments are in-lined in the procedure that binds that variable.
Stalin in-lines procedures that have a unique call site. Stalin translates self
tail calls as gotos. Such self tail calls do not count as call sites when making inlining decisions. Moreover, the notion of self tail call used by Stalin is sensitive to
in-lining decisions.
Definition 21. A call e is a unique call site of p, denoted UniqueCallSite(e, p),
if e directly non-self-tail calls p and there is no other call e0 that directly non-self-

·

36

Jeffrey Mark Siskind

tail calls p. If e is the unique call site of p then p is directly in-lined in p(e), denoted
p ,→ p(e). Let ,→∗ denote the reflexive-transitive closure of ,→. If p1 ,→∗ p2 , then p1
is in-lined in p2 .
The above definition must be approximated, because the direct non-self-tail call
relation must be approximated. Furthermore, the above definition is circular. The
directly–in-lined-in relation ,→ depends on the notion of unique call site, which
depends on the notion of direct non-self-tail call, which depends on the relation ,→∗ ,
which depends back on the relation ,→. To approximate the in-lining relation and
break this circularity, Stalin finds a least fixed point to approximations of the
above definitions. In other words, it finds a minimal relation ,→ that satisfies the
following constraints:
4

e st p = e  p ∧ (¬InTailPosition(e) ∨ p(e) 6,→∗ p)
4

UniqueCallSite(e, p) = e st p ∧ ¬(∃e0 ∈ E)e0 6= e ∧ e0 st p
4

p1 ,→ p2 = (∃e ∈ C)p2 = p(e) ∧ UniqueCallSite(e, p1 )
It may seem overly restrictive to in-line only procedures that have a unique call
site. Stalin overcomes this restriction by doing a polyvariant flow analysis and
splitting procedures, assigning different copies to different call sites. As a result
of such splitting, individual copies might have a unique call site, and thus might
be in-lined, even if the original procedure would have multiple call sites under a
monovariant analysis. The details of the polyvariant flow analysis and splitting
procedure are beyond the scope of this paper. They are discussed in detail in
Siskind [2000b].
3.11

Approximating the reentrant property

After deciding which procedures to in-line, Stalin approximates the reentrant
property. The reentrant property is used to justify globalization. A variable can
be globalized when it has at most one live instance. Variables bound by reentrant
procedures can have more than one live instance, one for each reentrant invocation.
Thus non-reentrancy is one requirement for globalization.
A procedure is recursive if it can call itself. A procedure is reentrant if it can
have more than one live activation record. Since tail-merged calls do not create new
activation records, not all recursive procedures are reentrant. In an implementation
that merged all tail calls, a procedure would be reentrant only if it properly non-tail
called itself. Stalin, however, does not merge all tail calls. It only merges self tail
calls. Because of this, the concrete reentrant property is defined as follows:
Definition 22. A concrete procedure k is reentrant, denoted Reentrant(k), if k
properly non-self-tail calls itself.
Given the above, the abstract reentrant property is defined as follows:
Definition 23. An abstract procedure p is reentrant, denoted Reentrant(p), if
some concrete procedure k ∈ p is reentrant in some execution of the program.
Stalin approximates the property Reentrant(p), denoted Reentrant(p), as
+
p
st p.
The reentrant property obeys the following abstraction lemma:

Flow-Directed Lightweight Closure Conversion

·

37

Lemma 16. For all abstract procedures p and concrete procedures k ∈ p, if
Reentrant(k), then Reentrant(p).
The reentrant property also obeys the following conservative approximation
lemma:
Lemma 17. For all abstract procedures p, if Reentrant(p), then
Reentrant(p).
3.12

Approximating the abstract must-alias properties

Sound lightweight closure conversion requires determining a number of abstract
must-alias properties. These are defined as follows:
Definition 24. A variable x must alias, denoted MustAlias(x), if every reached
access invocation ê to x accesses the instance x̂ of x bound by the most recent active invocation of some concrete procedure k ∈ p(x). An abstract continuation σ
must alias, denoted MustAlias(σ), if for every successful call invocation ê to some
concrete continuation k ∈ σ, k was created by the most recent active invocation
of e(σ). An abstract procedure p must alias, denoted MustAlias(p), either if p has
no parent parameter or if for every successful call invocation ê to some concrete procedure k ∈ p, the most recent active invocation of ParentParameter(p) when k
is called is the same as the most recent active invocation of ParentParameter(p)
when k was created.
The variable must-alias property is used to justify variable-slot elimination and
globalization. The abstract-continuation must-alias property is used to determine
when continuations are fictitious. The abstract-procedure must-alias property is
used to justify hiding.
The variable must-alias property can be violated in only two ways: either
a. some reference is to an instance in an invocation that has already returned or
b. some reference is to an instance in a recursive invocation that is not the current
invocation.
Condition (a) occurs only when the following situation arises:
(lambda (. . . x . . .)
.
.
.
(lambda (. . .) . . . xe or (set! x . . .)e . . .)p . . .)p(x)
Here, a nontrivial reference e to x is nested in p, which is, in turn, properly nested
in p(x), and p escapes p(x). Note that p(e) could be the same as p but, in order
for p to escape p(x), p cannot be the same as p(x). Calling p after p escapes p(x)
allows a reference to x after p(x) returns.
Condition (b) occurs only when the following situation arises:
(lambda (. . . x0 . . .)
.
.
.
(lambda (. . . x . . .)
. . . x0e0 . . .
(lambda (. . .) . . . xe or (set! x . . .)e . . .)p . . .)p(x) . . .)p(x0 )

38

·

Jeffrey Mark Siskind

Here, p(x) is nested in p(x0 ), p(x) is recursive, a nontrivial reference e to x is nested
in p, which is, in turn, properly nested in and properly called by p(x), and an
access e0 to x0 is nested in p(x), where β(e0 ) points to p. Note that p(x) can be
the same as p(x0 ), p(e0 ) can be the same as p(x), p(e) can be the same as p, and e0
can be nested anywhere in p(x), including inside p. Passing p, which contains a
reference to x, created on one invocation of p(x), to a recursive invocation, and
calling p in that recursive invocation, allows an access to x in other than the most
recent invocation. Such passing can only happen via an x0 . Note that p cannot be
the same as p(x) in order for e to access a different instance of x than the one in
the most recent invocation.
Stalin approximates the property MustAlias(x) as follows:
4

MustAlias(x)
=

 

x(e) = x ∧ p(e) ≺∗ p ≺+ p(x)∧
¬(∃e
∈
A
∪
S,
p
∈
P
)
∧

NonTrivialReference(e)
∧ p⇑p(x)  





NonTrivialReference(e)∧



 
 p(e0 ) ≺∗ p(x) ≺∗ p(x(e0 ))∧




 ¬(∃e ∈ A ∪ S, e0 ∈ A, p ∈ P )  p(e) ≺∗ p ≺+ p(x)∧
 


 

 p(x) 
 
+ p(x) ∧ x(e) = x∧
0 ∗
+ p
β(e ) ; p ∧ p(x) 
Note that the above approximation uses the property
NonTrivialReference(e) which will be defined in section 3.14.
The abstract-continuation must-alias property can be violated in only two ways:
either
a. the continuation is accessed after the expression invocation that created that
continuation returns or
b. the continuation is accessed when there is an active recursive invocation of the
expression that created that continuation that is not the invocation that created
that continuation.
Condition (a) occurs only when the following situation arises:
(call/cc . . .)e(σ) . . . e
Here, σ escapes e(σ) and some expression e accesses σ.
Condition (b) occurs only when the following situation arises:
(lambda (. . .) . . . e . . .)p(e)
.
.
.
(lambda (. . . x . . .)
.
.
.
(lambda (x0 ) . . . xe0 . . .)p
.
.
.
(lambda (. . .) . . . (call/cc . . .)e(σ) . . .)p(e(σ)) . . .)p(x)
Here, a call e accesses the abstract continuation σ, p is passed σ by virtue of the
fact that it is called by the call e(σ) to call/cc, p is properly nested in p(x),

Flow-Directed Lightweight Closure Conversion

·

39

p(e(σ)) is nested in p(x), p and p(e(σ)) properly call each other, and an access e0
to x is nested in p, where β(e0 ) points to σ.
Stalin approximates the property MustAlias(σ) as follows:
4

MustAlias(σ)
=

 
 
TypeTagAccessed(σ)
∨ ContinuationAccessed(σ)
∧









p ∈ P ∧ σ ∈ β(Parameter1 (p))∧  ∧ 

 ¬


(∃p ∈ β(Argument1 (e(σ))))




σ⇑p







+ p(e(σ)) 
+ p ∧ p 
∗

p(e)∧
p



 β(e0 ) ∗; σ ∧ p ≺+ p(x(e0 ))∧
∃ e ∈ C,





∗
0
0






p(e(σ))
≺
p(x(e
))∧
e ∈ A,
¬












TypeTagAccesses(e, σ)∨
p ∈ β(Argument1 (e(σ)))






ContinuationAccesses(e, σ)
Note that Eq?Accessed(p) is not needed in the above since R4RS does not specify
the conditions for equality between continuations.
The abstract-procedure must-alias property can be violated in only two ways:
either
a. p is accessed after ParentParameter(p) returns or
b. p is accessed when there is an active recursive invocation of
ParentParameter(p) that is not the invocation that created p.
Condition (a) occurs only when the following situation arises:
(lambda (. . .)
.
.
.
(lambda (. . .) . . .)p . . .)ParentParameter(p)
.
.
.
e
Here, p escapes ParentParameter(p) and some expression e accesses p.
Condition (b) occurs only when the following situation arises:
(lambda (. . .) . . . e . . .)p(e)
.
.
.
(lambda (. . . x . . .) . . . (lambda (. . .) . . . xe0 . . .)p0 . . .)p(x)
Here p(e0 ) is nested in p0 which is in turn nested in p(x(e0 )), ParentParameter(p)
is nested in p0 , β(e0 ) points to p, p0 is recursive, p0 calls p(e), and e accesses p.

·

40

Jeffrey Mark Siskind

Stalin approximates the property MustAlias(p) as follows:
4

MustAlias(p)
=
 
 
HasParentParameter(p)
∧ p⇑ParentParameter(p)∧




¬
∧




TypeTagAccessed(p)
∨
ProcedureAccessed(p)




 


0
∗
0
∗
0
0
0


+ p ∧
p(e ) ≺ p ≺ p(x(e )) ∧ p 


0
0

 β(e ) ∗; p ∧ p 
∗
p(e)∧
 



 
HasParentParameter(p)∧
¬(∃e ∈ C, e0 ∈ A, p0 ∈ P ) 



 




 ParentParameter(p) ≺∗ p0 ∧
 







TypeTagAccesses(e, p) ∨ e  p

Note that the above approximation uses the property
HasParentParameter(p) and the function ParentParameter(p) which will
be defined in section 3.19. Also note that Eq?Accessed(p) is not needed in the
above since R4RS does not specify the conditions for equality between procedures.
The must-alias properties obey the following conservative approximation lemma:
Lemma 18. For all variables x, if ¬MustAlias(x), then ¬MustAlias(x). For
all abstract continuations σ, if ¬MustAlias(σ), then ¬MustAlias(σ). For all
abstract procedures p, if ¬MustAlias(p), then ¬MustAlias(p).
3.13

Approximating the abstract fictitious property

Stalin is able to eliminate locations, and the code to manipulate such locations,
when they are known to always hold the same concrete object. Such data is called
fictitious.
The concrete fictitious property is defined as follows:
Definition 25. A concrete location l is fictitious, denoted Fictitious(l), if it
always contains the same concrete object.
Given the above, the abstract fictitious property is defined as follows:
Definition 26. An abstract location β is fictitious, denoted Fictitious(β), if
every l ∈ β is fictitious in every execution of the program.
Stalin approximates the fictitious property by declaring an abstract location
to be fictitious if it contains a single abstract object that, in turn, contains a
single concrete object. The particular abstract interpretation that Stalin uses puts
each of the concrete objects (), #t, #f, and eof-object into abstract objects that
contain only that concrete object. Furthermore, if an abstract procedure p is not
procedure accessed or does not have a parent parameter then all of the concrete
procedures k ∈ p are indistinguishable so p can be treated as if it contained a
single concrete procedure. Additionally, if all of the components of some abstract
pair, string, vector, or external symbol σ are never accessed, or contain fictitious
locations, then all of the concrete objects k ∈ σ are indistinguishable so σ can be
treated as if it contained a single concrete object. Finally, Stalin compiles a call
to an abstract continuation σ as a goto when
a. the call to σ is in-lined in the call to call/cc that created σ and
b. σ must alias.

Flow-Directed Lightweight Closure Conversion

·

41

This is described in greater detail in Siskind [2000a]. When this occurs for all calls
to σ, or when σ is not continuation accessed, all of the concrete continuations k ∈
σ are indistinguishable so σ can be treated as if it contained a single concrete
continuation.
Stalin approximates the property Fictitious(β) as follows:
4

Fictitious(σ)
=


¬TypeTagAccessed(σ)∧


  σ = () ∨ σ = #t ∨ σ = #f ∨ σ = eof-object∨












¬ProcedureAccessed(σ)∨





∨
σ
∈
P
∧





¬HasParentParameter(σ))











Pair?(σ) ∧ ¬Eq?Accessed(σ)∧














Fictitious(Car(σ)) ∨ ¬CarAccessed(σ) ∧ ∨












  Fictitious(Cdr(σ)) ∨ ¬CdrAccessed(σ)








String?(σ) ∧ ¬Eq?Accessed(σ)∧






∨




¬StringLengthAccessed(σ)
∧
¬StringRefAccessed(σ)













 Vector?(σ) ∧ ¬Eq?Accessed(σ)∧


 
¬VectorLengthAccessed(σ)∧  

 
∨




 Fictitious(Vector-Ref(σ))∨ 














¬VectorRefAccessed(σ)












Symbol?(σ)
∧
¬Eq?Accessed(σ)∧





∨





¬SymbolToStringAccessed(σ)












Continuation?(σ)∧












  ¬ContinuationAccessed(σ)∨






















MustAlias(σ)∧











   (∀e ∈ C1 )
 








 
ContinuationAccesses(e, σ) → p(e) ,→∗ p(e(σ))
4

Fictitious(β) = ||β|| = 1 ∧ (∀σ ∈ β)Fictitious(σ)

Note that the above approximation uses the property
HasParentParameter(p) which will be defined in section 3.19.
The fictitious property obeys the following abstraction lemma:
Lemma 19. For all abstract locations β and concrete locations l ∈ β, if
¬Fictitious(l), then ¬Fictitious(β).
The fictitious property also obeys the following conservative approximation
lemma:
Lemma 20. For all abstract locations β, if ¬Fictitious(β), then
¬Fictitious(β).
3.14

Approximating the abstract nontrivial-reference property

Stalin does not generate any code for unreached accesses, unexecuted assignments,
and assignments to fictitious or hidden variables. Furthermore, an access to a
variable which can only yield a single compile-time determinable object need not
generate code to access that variable and can instead generate code to return that
object as a constant. All other references are called nontrivial. Only nontrivial
references count as free references.

·

42

Jeffrey Mark Siskind

The concrete nontrivial-reference property is defined as follows:
Definition 27. An access invocation ê is nontrivial, denoted
NonTrivialReference(ê), if it is reached and returns an object that is not known
at compile time. An assignment invocation ê is nontrivial, denoted
NonTrivialReference(ê), if it is executed, x(e) is not hidden, and there is a
subsequent nontrivial access to x̂(ê).
Given the above, the abstract nontrivial-reference property is defined as follows:
Definition 28. A reference e is nontrivial, denoted NonTrivialReference(e),
if some invocation ê of e is nontrivial in some execution of the program.
Stalin approximates the property NonTrivialReference(e) as follows:
4

NonTrivialReference(e)
=
 

e
∈
A
∧
Reached(e)
∧
¬Fictitious(β(e)) ∨


 e ∈ S ∧ Executed(e) ∧ Accessed(x(e))∧ 
¬Fictitious(β(x(e))) ∧ ¬Hidden(x(e))
Note that the above approximation uses the property Hidden(x) which will be
defined in section 3.19.
The nontrivial-reference property obeys the following abstraction lemma:
Lemma 21. For all references e and invocations ê of e, if
NonTrivialReference(ê), then NonTrivialReference(e).
The nontrivial-reference property also obeys the following conservative approximation lemma:
Lemma 22. For all references e, if NonTrivialReference(e), then
NonTrivialReference(e).
3.15

Approximating the abstract localizable property

In order to eliminate a variable slot and represent a variable as a c local variable
it must be localizable.
The abstract localizable property is defined as follows:
Definition 29. A variable x is localizable, denoted Localizable(x), if x must
alias and all nontrivial references to x are in-lined in the procedure that binds x.
Stalin approximates the property Localizable(x) as follows:
4

Localizable(x)
=


MustAlias(x)∧
¬(∃e ∈ A ∪ S)x(e) = x ∧ NonTrivialReference(e) ∧ p(e) 6,→∗ p(x)
The localizable property obeys the following conservative approximation lemma:
Lemma 23. For all variables x, if ¬Localizable(x), then ¬Localizable(x).

Flow-Directed Lightweight Closure Conversion

3.16

·

43

Approximating the abstract globalizable property

In order to represent a variable as a c global variable it must be globalizable.
The abstract globalizable property is defined as follows:
Definition 30. A variable x is globalizable, denoted Globalizable(x), if it can
have at most one live instance.
If the procedure that binds a variable is not called more than once then that
variable can have at most one live instance. Also, if the procedure that binds the
variable is not reentrant and the variable must alias then that variable can have at
most one live instance.
Stalin approximates the property Globalizable(x) as follows:


4
¬CalledMoreThanOnce(p(x))∨
Globalizable(x) =
¬Reentrant(p(x)) ∧ MustAlias(x)
The globalizable property obeys the following conservative approximation lemma:
Lemma 24. For all variables x, if ¬Globalizable(x), then
¬Globalizable(x).
3.17

Approximating the abstract ancestor relation

Certain expressions, such as free references to slotted and hidden variables, as well
as lambda expressions that contain such free references, access the parent parameter
and parent slots of procedures to get a pointer to the requisite closure.
The abstract ancestor relation is defined as follows:
Definition 31. A procedure p1 is an ancestor of a procedure p2 , denoted
Ancestor(p1 , p2 ), if some reached expression in p2 accesses a pointer to the closure
for p1 .
Stalin approximates the relation Ancestor(p1 , p2 ) as follows:
4

Ancestor(p1 , p2 ) =

∧ p(x) = p1 ∧ FreeIn(x, p2 )∨  
 Slotted(x)

(∃x ∈ X)
Hidden(x) ∧ (p(x) = p2 ∨ FreeIn(x, p2 ))


(∃p ∈ P )β(x) = {p} ∧ Ancestor(p1 , p)

Note that the above approximation uses the properties Slotted(x) and
Hidden(x) which will be defined in section 3.19.
The ancestor relation obeys the following conservative approximation lemma:
Lemma 25. For all abstract procedures p1 and p2 , if Ancestor(p1 , p2 ), then
Ancestor(p1 , p2 ).
3.18

Approximating the abstract hideable property

In order to eliminate a variable slot and have all accesses to that variable access
some closure record on the static backchain it must be hideable.
The abstract hideable property is defined as follows:
Definition 32. A variable x is hideable, denoted Hideable(x), if
a. the abstract location of x contains a single abstract procedure p,

·

44

Jeffrey Mark Siskind

b. for all nontrivial accesses e to x, p(e) is nested in every ancestor of p, and
c. p must alias.
Stalin approximates the property Hideable(x) as follows:
4

Hideable(x)
=
β(x) = {p} ∧ MustAlias(p)∧
 (∀e ∈ A, p0 ∈ P )


(∃p ∈ P ) 
 x(e) = x ∧ Ancestor(p0 , p)∧
→ p(e) ≺∗ p0
NonTrivialReference(e)






The hideable property obeys the following conservative approximation lemma:
Lemma 26. For all variables x, if ¬Hideable(x), then ¬Hideable(x).
3.19

Lightweight closure conversion

It is now possible to describe the lightweight closure-conversion process. Recall that
the output of lightweight closure conversion consists of the following annotations:
Local(x)
Global(x)
Hidden(x)
Slotted(x)
HasClosure(p)
HasParentSlot(p)
ParentSlot(p)
HasParentParameter(p)
ParentParameter(p)

x will be allocated as a local variable.
x will be allocated as a global variable.
x will be allocated as a hidden closure slot.
x will be allocated as a closure slot.
p will have a closure and closure pointer.
p will have a parent slot.
The parent slot for p will point to the closure
for ParentSlot(p).
p will have a parent parameter and closurepointer slot.
The parent parameter and closure-pointer slot
for p will point to the closure for
ParentParameter(p).

Unaccessed and fictitious variables are eliminated. The ones that remain are made
either local, global, hidden, or slotted. Localizable variables can be made local.
Globalizable variables can be made global. Hideable variables can be made hidden.
Variables that are neither localizable, globalizable, nor hideable must be made slotted. It is possible for more than one of Localizable(x), Globalizable(x), and
Hideable(x) to be true of a given variable x. In this situation, making a variable
local is given preference to making a variable global since access to local variables
is faster, especially since current c compilers do intraprocedural optimization but
not interprocedural optimization.
4

Local(x) = Accessed(x) ∧ ¬Fictitious(β(x)) ∧ Localizable(x)


4
Accessed(x) ∧ ¬Fictitious(β(x))∧
Global(x) =
Globalizable(x) ∧ ¬Local(x)


4
Accessed(x) ∧ ¬Fictitious(β(x))∧
Hidden(x) =
Hideable(x) ∧ ¬Local(x) ∧ ¬Global(x)

Flow-Directed Lightweight Closure Conversion
4

Slotted(x) =



Accessed(x) ∧ ¬Fictitious(β(x))∧
¬Local(x) ∧ ¬Global(x) ∧ ¬Hidden(x)

·

45



An abstract procedure has a closure if it binds some slotted variable.
4

HasClosure(p) = (∃x ∈ X)p(x) = p ∧ Slotted(x)
An abstract procedure p has a parent slot if some abstract procedure p0 is an
ancestor of p and p is an ancestor of some abstract procedure p00 that is directly
nested in p.
4

HasParentSlot(p) = (∃p0 ∈ P, p00 ≺ p)Ancestor(p, p00 ) ∧ Ancestor(p0 , p)
The parent slot of an abstract procedure p is the narrowest such procedure p0 .
4

ParentSlot(p) = min
{p0 ∈ P |(∃p00 ≺ p)Ancestor(p, p00 ) ∧ Ancestor(p0 , p)}
∗
≺

An abstract procedure p has a parent parameter and closure-pointer slot if it has
an ancestor.
4

HasParentParameter(p) = (∃p0 ∈ P )Ancestor(p0 , p)
The parent parameter and closure-pointer slot of an abstract procedure p is the
narrowest such ancestor.
4

ParentParameter(p) = min
{p0 ∈ P |Ancestor(p0 , p)}
∗
≺

The optimizations described in section 2 can now be formulated in terms of the
properties and relations enumerated above.
Parent-slot, closure-pointer–slot, parent-parameter, parent-passing, and
parent-spilling elimination: The parent slot and spilling for x are eliminated
when ¬HasParentSlot(x). The closure-pointer slot, parent parameter, and parent passing for x are eliminated when ¬HasParentParameter(x).
Eliminating indirection through the closure pointer: A bound access to x
can proceed via its variable parameter rather than via the closure pointer if
¬Assigned(x).
Eliminating fictitious variables: The slot, parameter, passing, and spilling for x
are eliminated when Fictitious(β(x)).
Eliminating variables that aren’t accessed: The slot, parameter, passing, and
spilling for x are eliminated when ¬Accessed(x).
Variable-slot elimination: The slot and spilling for x are eliminated when
¬Slotted(x).
Closure elimination, closure-pointer elimination, and parent-slot compression: The closure and closure-pointer for p are eliminated when
¬HasClosure(p). When the parent slot for p is not eliminated, it is compressed
to ParentSlot(p).
Closure-pointer–slot and parent-parameter compression: When the
closure-pointer slot and parent parameter for p are not eliminated, they are compressed to ParentParameter(p).
Globalization: A variable x is globalized when Global(x).
Hiding: A variable x is hidden when Hidden(x).

·

46

Jeffrey Mark Siskind

The soundness of lightweight closure conversion is summarized by the following
theorem:
Theorem 1. For all variables x,
a. at most one of Local(x), Global(x), Hidden(x), and Slotted(x) are true,
b. if there is a nontrivial access to x, then one of Local(x), Global(x),
Hidden(x), or Slotted(x) is true,
c. if Local(x), then x is localizable,
d. if Global(x), then x is globalizable,
e. if Hidden(x), then x is hideable, and
f. if Slotted(x), then HasClosure(p(x)).
For all nontrivial references e, if Slotted(x(e)) and p(x(e)) 6= p(e), then
g. HasParentParameter(p(e)) and
h. there exists i ≥ 0 such that
i. for all 0 ≤ j < i,
HasParentSlot(ParentSlotj (ParentParameter(p(e)))) and
j. p(x(e)) = ParentSloti (ParentParameter(p(e))).
The approximations given in sections 3.12 through 3.19 are circular. To resolve
this circularity, Stalin uses prioritized circumscription [McCarthy 1980]. It first
finds a solution to the approximations with a set of hidden variables such that there
is no solution to the approximations with a superset of that set of hidden variables.
Fixing this set of hidden variables, it then finds a solution to the approximations
with a set of abstract objects that are fictitious and a set of pairs of procedures p1
and p2 for which p1 is not an ancestor of p2 such that there is no solution to
the approximations with a superset of those fictitious variables and those pairs of
procedures p1 and p2 for which p1 is not an ancestor of p2 .
4.

EXPERIMENTAL RESULTS

The lightweight closure-conversion method described in this paper has been implemented as part of the Stalin compiler13 for Scheme. As originally implemented,
lightweight closure conversion is a necessary component of Stalin. In order to
evaluate the lightweight closure-conversion method while keeping all other aspects
of the compilation process invariant, Stalin was modified to allow it to use two
other closure-conversion methods. The first, the baseline method, is as described
in section 2. In this method, there is no variable, parameter, slot, passing, spilling,
closure, or closure-pointer elimination, no slot or parameter compression, no globalization, and no hiding. This method, however, still uses direct procedure calls and
eliminates indirection through the closure pointer for free references, though not for
bound accesses of unassigned variables. This is an overly conservative method. The
second, the conventional method, is an attempt to model common practice in other
non–whole-program compilers for Scheme and similar lexically-scoped higher-order
13 The

version of the compiler used to run the experiments described in this paper, as well as all
of the benchmark code, is available from ftp://ftp.nj.nec.com/pub/qobi/stalin-0.9.tar.Z.

Flow-Directed Lightweight Closure Conversion

·

47

languages. In this method, the only optimizations performed are those that can be
driven by simple syntactic analysis of individual top-level procedures without flow
analysis.
More precisely, the baseline method was implemented by taking all of the properties and relations as defined in section 3 with the following exceptions:
4

Reached(e) = true
4

Returns(e) = true
4

Fictitious(β) = false
4

Accessed(x) = true
4

Assigned(x) = true
4

Localizable(x) = false
4

Globalizable(x) = false
4

Hideable(x) = false
And the conventional method was implemented by taking all of the properties and
relations as defined in section 3 with the following exceptions:
4

Reached(e) = true
4

Returns(e) = true
4

Fictitious(β) = false
4

Accessed(x) = x is defined at the top level ∨ (∃e ∈ A)x(e) = x
4

Assigned(x) = x is defined at the top level ∨ (∃e ∈ S)x(e) = x


x(e) = x ∧ p(e) 6= p(x(e))∧
4
Localizable(x) = ¬(∃e ∈ A ∪ S)
NonTrivialReference(e)


4
x is bound by a let expression∧
Globalizable(x) =
every surrounding expression is also a let
4

Hideable(x) = false
The above definitions for Reached, Accessed, and Assigned were used only
during closure conversion. The full definition of these properties, along with the
full whole-program interprocedural flow analysis, was used for all other analyses,
optimization, and code generation performed by the compiler.
The code generator needed some modification to support the baseline and conventional methods. As originally implemented, the code generator never generated
slots, elements, parameters, locals, or globals for fictitious locations. Because such
locations were always eliminated under the lightweight method. With the baseline
and conventional methods, however, fictitious locations are not eliminated. Thus
the code generator was modified so that dummy c ints were used wherever a
fictitious object was needed.

48

·

Jeffrey Mark Siskind

Twenty Scheme benchmarks14 were compiled using each of the baseline, conventional, and lightweight methods. All runs were done on one processor of an unloaded
Dell PowerEdge 6350 with four 450MHz Pentium II Xeon processors with 512K L2
cache and 1G main memory running Red Hat 6.0 with kernel 2.2.10/SMP and
egcs 2.91.66. Tables II, III, and IV give the variable-, procedure-, and referenceelimination statistics for these runs respectively. All entries are static counts except
for the last column in table IV which gives run time in CPU seconds. The first,
second, and third lines for each benchmark show the results for the baseline, conventional, and lightweight methods respectively. Table II shows the total number of
variables, the number of variables for which Accessed(x) or Assigned(x) is false
or Fictitious(β(x)) is true, the number of variables that are eliminated, and the
number of variables for which Local(x), Global(x), Hidden(x), and Slotted(x)
are true. Table III shows the total number of procedures, the number of procedures
for which HasClosure(p), HasParentSlot(p), and HasParentParameter(p)
are true, and the amount of parent parameter and slot compression, the number
of procedures for which HasParentSlot(p) and HasParentParameter(p) is
true but for which ParentSlot(p) and ParentParameter(p) do not equal p(p)
respectively. Table IV shows the average number of indirections (i.e. c -> operators) per reference, the total number of accesses, the number of nontrivial accesses,
the total number of assignments, the number of nontrivial assignments, and the
user+system CPU time for the benchmarks as reported by the time command.
Note that the variable, procedure, and reference counts include variables, procedures, and references in the standard prelude as well as those that are introduced
by macro expanding the source program, except where such variables, procedures,
and references are eliminated by a simple syntax-directed dead-code–elimination
prepass.
These results show some important trends. First, flow-directed reachability analysis allows a substantial increase in the number of variables that are determined
to be unassigned or unaccessed. Second, in all of the benchmarks, the lightweight
method eliminates many more variables, accesses, and assignments and substantially reduces the number of slotted variables, the number of procedures that require
closures, parent parameters, and parent slots, and the average number of indirections per reference, as compared with both the baseline and conventional methods.
In half of the benchmarks, the lightweight method eliminates all slotted variables
and, with that, all closures, parent parameters, parent slots, and indirection during variable reference. Third, in all of the benchmarks, the code produced by the
lightweight method runs substantially faster than the code produced by both the
baseline and conventional methods.
Table V compares the run times for these twenty benchmarks compiled with
Stalin using the lightweight method as compared with Scheme->C, Gambit-C,
14 The

version of boyer that was used was obtained from the Scheme repository. Andrew Wright
provided the versions of graphs and lattice that were used. Saumya Debray provided the versions
of nucleic2, matrix, earley, scheme, and conform that were used. William Clinger provided the
versions of nboyer, sboyer, and dynamic that were used. Bruno Haible provide the version of
fannkuch that was used. Richard O’Keefe provided the versions of integ, gold, and sort that
were used. The simplex, em-functional, em-imperative, and nfm benchmarks were written by
the author.

·

Flow-Directed Lightweight Closure Conversion

49

Table II. Variable-elimination statistics for twenty Scheme benchmarks. The first, second, and
third row for each benchmark give the statistics for the baseline, conventional, and lightweight
methods respectively. All entries are static counts.
vars
boyer

graphs

lattice

nucleic2

matrix

earley

scheme

conform

nboyer

sboyer

dynamic

fannkuch

simplex

em-functional

em-imperative

nfm

integ

gold

sort

rrr

2193
2193
2193
2473
2473
2473
2258
2258
2258
3212
3212
3212
2617
2617
2617
2899
2899
2899
2928
2928
2928
2711
2711
2711
2380
2380
2380
2384
2384
2384
7822
7822
7822
2158
2158
2158
2240
2240
2240
3628
3628
3628
2807
2807
2807
4481
4481
4481
2104
2104
2104
2185
2185
2185
2192
2192
2192
2755
2755
2755

not
assigned

not
accessed

fictitious

eliminated

local

global

hidden

slotted

0
1315
1937
0
1589
2209
0
1380
1998
0
2055
2673
0
1711
2329
0
1962
2579
0
1951
2438
0
1737
2324
0
1480
2087
0
1483
2090
0
6513
7071
0
1287
1907
0
1346
1970
0
2582
3190
0
1820
2426
0
3347
3932
0
1243
1859
0
1310
1926
0
1318
1935
0
1812
2426

0
393
2056
0
536
2209
0
384
2043
0
658
2450
0
417
2067
0
556
2228
0
704
1248
0
528
2051
0
432
2041
0
433
2042
0
905
2169
0
388
2072
0
441
2145
0
716
2282
0
604
2168
0
733
2150
0
360
2017
0
381
2038
0
388
2049
0
547
2175

0
0
2106
0
0
2260
0
0
2116
0
0
2658
0
0
2254
0
0
2357
0
0
1763
0
0
2290
0
0
2149
0
0
2151
0
0
3932
0
0
2098
0
0
2174
0
0
2666
0
0
2358
0
0
3123
0
0
2045
0
0
2077
0
0
2110
0
0
2331

0
393
2109
0
536
2306
0
384
2126
0
658
2724
0
417
2261
0
556
2366
0
704
1815
0
528
2297
0
432
2153
0
433
2155
0
905
3978
0
388
2103
0
441
2180
0
716
2672
0
604
2366
0
733
3156
0
360
2047
0
381
2083
0
388
2119
0
547
2343

0
953
80
0
977
142
0
992
117
0
1286
406
0
1144
307
0
1104
530
0
950
1014
0
1108
388
0
1004
219
0
1006
221
0
4510
3698
0
934
53
0
938
50
0
1368
944
0
1062
429
0
2043
1315
0
939
52
0
961
97
0
950
71
0
1188
375

0
240
4
0
247
16
0
251
4
0
473
65
0
274
10
0
244
2
0
489
14
0
309
12
0
287
8
0
288
8
0
460
59
0
234
2
0
239
10
0
325
12
0
304
12
0
274
10
0
237
4
0
246
5
0
244
2
0
262
36

0
0
0
0
0
2
0
0
1
0
0
2
0
0
3
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

2193
607
0
2473
713
7
2258
631
10
3212
795
15
2617
782
36
2899
995
1
2928
785
85
2711
766
12
2380
657
0
2384
657
0
7822
1947
87
2158
602
0
2240
622
0
3628
1219
0
2807
837
0
4481
1431
0
2104
568
1
2185
597
0
2192
610
0
2755
758
1

50

·

Jeffrey Mark Siskind

Table III. Procedure-elimination statistics for twenty Scheme benchmarks. The first, second, and third row for each benchmark give the statistics for the baseline, conventional, and
lightweight methods respectively. All entries are static counts.

boyer

graphs

lattice

nucleic2

matrix

earley

scheme

conform

nboyer

sboyer

dynamic

fannkuch

simplex

em-functional

em-imperative

nfm

integ

gold

sort

rrr

procedures

closures

parent
slots

parent
parameters

parent
slot
compression

parent
parameter
compression

2285
2285
2285
2351
2351
2351
2181
2181
2181
2868
2868
2868
2533
2533
2533
2640
2640
2640
2773
2773
2773
2626
2626
2626
2466
2466
2466
2467
2467
2467
6830
6830
6830
2109
2109
2109
2249
2249
2249
3458
3458
3458
2761
2761
2761
4170
4170
4170
2045
2045
2045
2100
2100
2100
2139
2139
2139
2600
2600
2600

1132
450
0
1330
536
6
1195
476
4
1427
583
7
1432
582
24
1536
676
1
1404
573
49
1414
582
5
1243
503
0
1243
503
0
4656
1463
87
1133
460
0
1201
490
0
2097
920
0
1543
662
0
2548
1077
0
1092
438
1
1132
452
0
1153
466
0
1452
559
1

34
22
0
109
87
4
47
30
2
137
68
1
136
101
15
241
218
0
444
326
9
214
156
0
109
44
0
109
44
0
845
535
27
43
35
0
67
62
0
437
343
0
273
233
0
554
431
0
38
26
0
54
41
0
43
35
0
160
126
0

110
82
0
270
240
32
143
111
14
462
297
30
368
315
110
489
443
9
1495
1313
103
576
432
40
319
216
0
319
216
0
3532
2238
94
105
89
0
179
167
0
1071
948
0
639
585
0
1613
1432
0
100
79
1
134
110
0
124
110
0
432
388
1

6
2
0
35
33
2
8
2
0
27
25
0
30
23
8
53
42
0
163
120
0
31
22
0
27
18
0
27
18
0
112
86
0
11
10
0
26
25
0
59
51
0
75
69
0
98
90
0
7
7
0
16
15
0
7
7
0
47
42
0

31
23
0
83
77
25
43
31
9
64
58
23
96
80
65
121
95
8
629
567
54
155
102
25
85
66
0
85
66
0
810
643
5
30
27
0
78
76
0
212
196
0
187
177
0
348
336
0
24
20
0
42
36
0
31
30
0
149
141
0

Flow-Directed Lightweight Closure Conversion

·

51

Table IV. Reference-elimination statistics for twenty Scheme benchmarks. The first, second, and third row for each benchmark give the statistics for the baseline, conventional, and
lightweight methods respectively. All entries are static counts, except for the last column which
is in CPU seconds.

boyer

graphs

lattice

nucleic2

matrix

earley

scheme

conform

nboyer

sboyer

dynamic

fannkuch

simplex

em-functional

em-imperative

nfm

integ

gold

sort

rrr

average
indirections
per reference

accesses

nontrivial
accesses

assignments

nontrivial
assignments

1.147
0.357
0.000
1.690
1.038
0.044
1.164
0.385
0.026
1.316
0.314
0.006
1.425
0.616
0.065
1.862
1.384
0.003
1.743
0.742
0.020
1.336
0.555
0.032
1.257
0.428
0.000
1.257
0.426
0.000
1.322
0.473
0.008
1.267
0.478
0.000
3.529
2.233
0.000
1.770
0.933
0.000
2.300
1.361
0.000
1.651
0.843
0.000
1.162
0.298
0.001
1.448
0.706
0.000
1.187
0.440
0.000
2.804
1.201
0.000

4402
4402
4402
4604
4604
4604
4311
4311
4311
7659
7659
7659
5280
5280
5280
5246
5246
5246
5823
5823
5823
5144
5144
5144
4898
4898
4898
4908
4908
4908
16602
16602
16602
3995
3995
3995
4289
4289
4289
6505
6505
6505
5239
5239
5239
8282
8282
8282
3938
3938
3938
4160
4160
4160
4114
4114
4114
5319
5319
5319

4402
4402
212
4604
4604
501
4311
4311
281
7659
7659
1147
5280
5280
729
5246
5246
1113
5823
5823
3280
5144
5144
981
4898
4898
630
4908
4908
638
16602
16602
6555
3995
3995
146
4289
4289
334
6505
6505
2109
5239
5239
1271
8282
8282
2400
3938
3938
167
4160
4160
351
4114
4114
204
5319
5319
1109

483
483
483
478
478
478
472
472
472
751
751
751
500
500
500
531
531
531
572
572
572
573
573
573
514
514
514
515
515
515
945
945
945
465
465
465
500
500
500
642
642
642
584
584
584
728
728
728
455
455
455
469
469
469
470
470
470
545
545
545

263
263
13
264
264
1
260
260
4
539
539
50
288
288
2
320
320
3
524
524
48
392
392
20
313
312
32
314
313
32
793
793
106
251
251
1
282
282
23
440
440
16
384
384
25
549
549
7
245
245
0
259
259
3
258
258
4
337
337
13

run
time
149.740
106.450
94.820
73.470
47.560
15.270
332.790
239.490
53.480
28.340
20.570
17.510
488.060
257.770
100.380
84.120
42.660
22.720
444.220
257.750
235.630
527.740
289.690
198.320
35.310
27.540
23.700
33.130
23.880
17.250
57.670
31.950
28.430
525.030
511.530
52.750
15.320
11.190
2.210
70.440
30.920
7.360
18.130
9.400
2.410
18.600
15.500
5.030
4.390
3.540
1.460
16.990
6.110
5.520
60.650
31.780
12.500
748.320
493.390
143.100

52

·

Jeffrey Mark Siskind

Bigloo, and Chez. For all but three of the benchmarks, Stalin significantly
outperforms the other compilers. Note that many of the compilers were run with
options that violate R4RS semantics. In particular, Scheme->C was run with
-On for many of the benchmarks, Gambit-C was run with standard-bindings
and extended-bindings for all of the benchmarks and with fixnum for many of
the benchmarks, and Bigloo was run with -farithmetic for many of the benchmarks. This, in essence, manually gives these compilers information that can only
be determined by whole-program interprocedural flow analysis. Information that
Stalin soundly and automatically determines on its own. When Scheme->C,
Gambit-C, and Bigloo are run without these options, the ratio of run times relative to Stalin are even more favourable to Stalin than the results presented in
table V.
5.

CONCLUSION

A number of researchers have pursued work along the lines of the work presented
in this paper. Deutsch [1990], Emami and Hendren [1994], and Jagannathan et al.
[1998] present aliasing analyses that are based on flow analysis. Henglein [1992],
Wand and Steckler [1994], Steckler [1994b], Steckler [1994c], Steckler [1994a], Minamide et al. [1996], and Steckler and Wand [1997] present alternate approaches
to lightweight closure conversion. The techniques of Steckler and Wand, in particular, are incomparable to the techniques presented in this paper. Cases can be
constructed where the approach of Steckler and Wand performs optimizations that
the approach presented in this paper does not perform, and vice versa.
Whole program analysis, in particular flow, reachability, points-to, and escape
analysis, when used to support the lightweight closure-conversion method described
in this paper, offers significant reduction in
—variable parameters and variable slots,
—parent parameters and parent slots,
—closures, closure pointers, and closure-pointer slots,
—variable parameter and parent parameter spilling,
—variable parameter and parent parameter passing, and
—indirection in variable reference.
Furthermore, it results in substantial improvements in compiled-code speed.
REFERENCES
Clinger, W. and Rees, J. 1991. Revised4 Report on the Algorithmic Language Scheme.
Deutsch, A. 1990. On determining lifetime and aliasing of dynamically allocated data in higherorder functional specifications. In Proceedings of the 17th Annual ACM Symposium on Principles of Programming Languages. 157–168.
Emami, M. and Hendren, R. G. L. J. 1994. Context-sensitive interprocedural points-to analysis
in the presence of function pointers. In Proceedings of the 1994 SIGPLAN Conference on
Programming Language Design and Implementation. 242–256.
Heintze, N. 1993. Set based analysis of ml programs. Tech. Rep. CMU CS 93-193, School of
Computer Science, Carnegie-Mellon University. July.
Heintze, N. 1994. Set based analysis of ml programs. In Proceedings of the 1994 ACM Conference
on Lisp and Functional Programming.

Flow-Directed Lightweight Closure Conversion

·

53

Table V. The ratio of CPU times to run the twenty benchmarks under Scheme->C
(15mar93), Gambit-C (3.0), Bigloo (2.0e), and Chez (5.0a) relative to Stalin (0.9). All
runs were done on one processor of an unloaded Dell PowerEdge 6350 with four 450MHz
Pentium II Xeon processors with 512K L2 cache and 1G main memory running Red Hat 6.0
with kernel 2.2.10/SMP and egcs 2.91.66. All compilers were run with settings to give
maximal speed at the expense of minimal safety. Scheme->C was run with -Ob -Og -Ot
-O3 -fomit-frame-pointer -freg-struct-return.
Scheme->C was given the additional
option -On for the boyer, graphs, lattice, matrix, earley, scheme, conform, nboyer,
sboyer, dynamic, fannkuch, and rrr benchmarks. Gambit-C was run with r4rs-scheme,
block, standard-bindings, extended-bindings, not safe, not interrupts-enabled, and
-O3 -fomit-frame-pointer -freg-struct-return -D___SINGLE_HOST. Gambit-C was given
the additional option fixnum for the same benchmarks that Scheme->C was given the
-On option.
Bigloo was run with -call/cc -unsafe -Obench -O6 -fstack -copt "-O3
-fomit-frame-pointer -freg-struct-return".
Bigloo was given the additional option
-farithmetic for the same benchmarks that Scheme->C was given the -On option. Chez
was run with optimize-level 3. Stalin was run with -Ob -Om -On -Or -Ot -copt -O3
-copt -fomit-frame-pointer -copt -freg-struct-return. Stalin was given the additional
options
-clone-size-limit 0 -do-not-index-allocated-structure-types-by-expression
-do-not-index-constant-structure-types-by-expression -treat-all-symbols-as-external
for the scheme benchmark, the additional option
-do-not-index-allocated-structure-types-by-expression for the dynamic benchmark, and
the additional option -d for those benchmarks that were not given -On under Scheme->C.
Times are relative best-of-three-successive user+system CPU times as reported by the time
command. Runs where Stalin performs worse than other implementations are highlighted in
bold.
Scheme->C
Gambit-C
Bigloo
Chez
Stalin
Stalin
Stalin
Stalin
boyer
1.611
4.095
1.331
2.225
graphs
3.742
8.764
18.112
2.036
lattice
2.493
3.123
1.740
1.795
nucleic2
12.391
44.851
9.962
8.420
matrix
1.773
2.948
1.545
1.889
earley
1.106
3.668
1.454
1.844
scheme
0.555
0.816
0.345
0.422
conform
1.204
2.765
1.025
1.544
nboyer
3.527
4.145
5.437
3.972
sboyer
1.228
2.406
1.963
1.647
dynamic
0.366
1.620
0.479
0.518
fannkuch
1.104
1.075
0.742
3.677
simplex
3.789
30.426
5.108
4.441
em-functional
6.993
18.503
4.868
3.758
em-imperative
6.680
24.768
6.755
5.141
nfm
3.645
163.917
6.683
7.950
integ
17.828
44.953
15.485
9.931
gold
13.578
47.564
12.758
8.391
sort
3.410
12.467
2.449
1.569
rrr
1.289
10.804
1.020
5.010

54

·

Jeffrey Mark Siskind

Henglein, F. 1992. Simple closure analysis. Tech. Rep. D-193, Department of Computer Science,
University of Copenhagen. Mar.
Jagannathan, S., Thiemann, P., Weeks, S. T., and Wright, A. K. 1998. Single and loving
it: Must-alias analysis for higher-order languages. In Proceedings of the 1998 Symposium on
Principles of Programming Languages.
Kelsey, R., Clinger, W., and Rees, J. 1998. Revised5 report on the algorithmic language
Scheme. Higher-Order and Symbolic Computation 11, 1 (Sept.).
McCarthy, J. 1980. Circumscription—a form of non-monotonic reasoning. Artificial Intelligence 13, 1–2, 27–39.
Minamide, Y., Morrisett, J. G., and Harper, R. 1996. Typed closure conversion. In Proceedings
of the 1996 Symposium on Principles of Programming Languages. 271–283.
Shivers, O. 1988. Control flow analysis in Scheme. In Proceedings of the 1988 SIGPLAN
Conference on Programming Language Design and Implementation. 164–174.
Shivers, O. 1990. Data-flow analysis and type recovery in Scheme. In Topics in Advanced
Language Implementation. The MIT Press. Appears as Technical Report CMU-CS-90-115.
Shivers, O. 1991a. Control-flow analysis of higher-order languages or taming lambda. Ph.D.
thesis, School of Computer Science, Carnegie-Mellon University.
Shivers, O. 1991b. The semantics of Scheme control-flow analysis. In ACM Symposium on Partial
Evaluation and Semantics-Based Program Manipulation. 190–198. Appears as Technical Report
CMU-CS-91-119.
Siskind, J. M. 2000a. Flow-directed lightweight CPS conversion. In preparation.
Siskind, J. M. 2000b. Flow-directed polyvariance. In preparation.
Siskind, J. M. 2000c. Flow-directed representation selection. In preparation.
Siskind, J. M. 2000d. Flow-directed storage management. In preparation.
Steckler, P. A. 1994a. Correct higher-order program transformations. Ph.D. thesis, College of
Computer Science, Northeastern University.
Steckler, P. A. 1994b. Correct separate and selective closure conversion. Tech. rep., Laboratory
for the Foundations of Computer Science, Edinburgh University. Oct.
Steckler, P. A. 1994c. Tracking available values for lightweight closures. Tech. rep., College of
Computer Science, Northeastern University. Mar.
Steckler, P. A. and Wand, M. 1997. Lightweight closure conversion. ACM Transactions on
Programming Languages and Systems 19, 1 (Jan.), 48–86.
Wand, M. and Steckler, P. A. 1994. Selective and lightweight closure conversion. In Proceedings
of the 21nd Annual ACM Symposium on Principles of Programming Languages. 435–445.

Flow-Directed Lightweight Closure Conversion

A.

·

GLOSSARY

A.1

Syntactic terminology
source
callee
argument
parameters
arity
body
antecedent
consequent
alternate
x
q
e
p
p0
X
E
A
S
C
Ci
R
Ri
P
Source(e)
x(e)
Callee(e)
Arguments(e)
Argumenti (e)
Parameteri (e)
Arity(e)
Body(e)
p(x)
p(e)

p1 ≺ p2
p1 ≺+ p2
p1 ≺ ∗ p2
InTailPosition(e)

the first subexpression of an assignment
the first subexpression of a call or primcall
any subexpressions of a call or primcall except the first
the variables of a lambda expression
the number of arguments of a call or primcall or the
number of parameters of a lambda expression
the subexpression of a lambda expression
the first subexpression of a conditional
the second subexpression of a conditional
the third subexpression of a conditional
a variable
a primitive
an expression
a lambda expression, also an abstract procedure
the top-level lambda expression that contains the
entire source program
the set of all variables in the source program
the set of all expressions in the source program
the set of all accesses in the source program
the set of all assignments in the source program
the set of all calls in the source program
the set of all calls of arity i in the source program
the set of all primcalls in the source program
the set of all primcalls of arity i in the source program
the set of all lambda expressions in the source program
the source subexpression of an assignment e
the variable in an access or assignment e
the callee subexpression of a call or primcall e
the set of argument subexpressions of a call or
primcall e
the ith argument subexpression of a call or primcall e
the ith parameter of a lambda expression e
the arity of a call, primcall, or lambda expression e
the body subexpression of a lambda expression e
the lambda expression in which x is bound
the narrowest lambda expression that properly
contains e
p1 is directly nested in p2
p1 is properly nested in p2
p1 is nested in p2
e is in tail position

55

56

·

A.2

Flow analysis terminology

Jeffrey Mark Siskind

k
l
σ
β
k∈l
σ∈β
β(x)
β(e)
Pair?(σ)
Car(σ)
Cdr(σ)
String?(σ)
String-Ref(σ)
Vector?(σ)
Vector-Ref(σ)
Symbol?(σ)
Symbol->String(σ)

Continuation?(σ)
e(σ)

a concrete object
a concrete location
an abstract object
an abstract location
l can take on k as its value
some l ∈ β can take on some k ∈ σ as its value in
some execution of the program
the abstract location associated with x
the abstract location associated with the result of
evaluating e
σ is an abstract pair
the abstract location containing the car slots of the
concrete pairs in the abstract pair σ
the abstract location containing the cdr slots of the
concrete pairs in the abstract pair σ
σ is an abstract string
the abstract location containing the elements of the
concrete strings in the abstract string σ
σ is an abstract vector
the abstract location containing the elements of the
concrete vectors in the abstract vector σ
σ is an abstract symbol
the abstract location containing the print-name
strings of the concrete symbols in the abstract
symbol σ
σ is an abstract continuation
the call to call/cc where the concrete continuations
in the abstract continuation σ were created

Flow-Directed Lightweight Closure Conversion

A.3

·

Reachability analysis terminology

ê
x̂
Reached(ê)
Returns(ê)
Executed(ê)
Accessed(x̂)
Assigned(x̂)
successful

Reached(e)
Returns(e)
Executed(e)

Accessed(e)
Assigned(e)

an invocation of e
an instance of x
control flows to the program point just before ê
control flows to the program point just after ê
control flows to the intermediate program point in an
assignment, call, or primcall invocation ê
some access invocation ê to x is reached
some assignment invocation ê to x is reached
the call invocation ê is executed and the arity of the call site
equals the arity of the target procedure or continuation or
the primcall invocation ê is executed and the arity of the call
site is allowed by the primitive
some invocation ê of e is reached in some execution of
the program
some invocation ê of e returns in some execution of
the program
some invocation ê of an assignment, call, or
primcall e is executed in some execution of the
program
some instance x̂ of x is accessed in some execution of
the program
some instance x̂ of x is assigned in some execution of
the program

57

58

·

A.4

Concrete aggregate access and assignment terminology—I

Jeffrey Mark Siskind

TypeTagAccesses(ê, k)
Eq?Accesses(ê, k)
CarAccesses(ê, k)
CdrAccesses(ê, k)
StringLengthAccesses(ê, k)
StringRefAccesses(ê, k)
VectorLengthAccesses(ê, k)
VectorRefAccesses(ê, k)
SymbolToStringAccesses(ê, k)

ContinuationAccesses(ê, k)
ProcedureAccesses(ê, k)
CarAssigns(ê, k)
CdrAssigns(ê, k)
StringRefAssigns(ê, k)
VectorRefAssigns(ê, k)

the primcall invocation ê accesses the
type tag slot of k
the primcall invocation ê accesses the
identity of k
the primcall invocation ê accesses the
car slot of the concrete pair k
the primcall invocation ê accesses the
cdr slot of the concrete pair k
the primcall invocation ê accesses the
length slot of the concrete string k
the primcall invocation ê accesses an
element of the concrete string k
the primcall invocation ê accesses the
length slot of the concrete vector k
the primcall invocation ê accesses an
element of the concrete vector k
the primcall invocation ê accesses the
print-name–string slot of the concrete
symbol k
the call invocation ê calls the concrete
continuation k
the call invocation ê successfully calls
the concrete procedure k
the primcall invocation ê assigns the
car slot of the concrete pair k
the primcall invocation ê assigns the
cdr slot of the concrete pair k
the primcall invocation ê assigns an
element of the concrete string k
the primcall invocation ê assigns an
element of the concrete vector k

Flow-Directed Lightweight Closure Conversion

A.5

·

Concrete aggregate access and assignment terminology—II
TypeTagAccessed(k)
Eq?Accessed(k)
CarAccessed(k)
CdrAccessed(k)
StringLengthAccessed(k)
StringRefAccessed(k)
VectorLengthAccessed(k)
VectorRefAccessed(k)

SymbolToStringAccessed(k)
ContinuationAccessed(k)
ProcedureAccessed(k)
CarAssigned(k)
CdrAssigned(k)
StringRefAssigned(k)
VectorRefAssigned(k)

the type-tag slot of k is accessed
the identity of k is accessed
the car slot of the concrete pair k is
accessed
the cdr slot of the concrete pair k is
accessed
the length slot of the concrete string k
is accessed
an element of the concrete string k is
accessed
the length slot of the concrete vector k
is accessed
an element of the concrete vector k is
accessed
the print-name–string slot of the concrete
symbol k is accessed
the concrete continuation k is called
the concrete procedure k is called
the car slot of the concrete pair k is
assigned
the cdr slot of the concrete pair k is
assigned
an element of the concrete string k is
assigned
an element of the concrete vector k is
assigned

59

60

·

A.6

Abstract aggregate access and assignment terminology—I

Jeffrey Mark Siskind

TypeTagAccesses(e, σ)
Eq?Accesses(e, σ)
CarAccesses(e, σ)

CdrAccesses(e, σ)

StringLengthAccesses(e, σ)

StringRefAccesses(e, σ)

VectorLengthAccesses(e, σ)

VectorRefAccesses(e, σ)

SymbolToStringAccesses(e, σ)

ContinuationAccesses(e, σ)
ProcedureAccesses(e, σ)

CarAssigns(e, σ)

CdrAssigns(e, σ)

StringRefAssigns(e, σ)

VectorRefAssigns(e, σ)

some invocation ê of the primcall e
accesses the type-tag slot of some k ∈ σ
some invocation ê of the primcall e
accesses the identity of some k ∈ σ
some invocation ê of the primcall e
accesses the car slot of some concrete
pair k ∈ σ
some invocation ê of the primcall e
accesses the cdr slot of some concrete
pair k ∈ σ
some invocation ê of the primcall e
accesses the length slot of some concrete
string k ∈ σ
some invocation ê of the primcall e
accesses an element of some concrete
string k ∈ σ
some invocation ê of the primcall e
accesses the length slot of some concrete
vector k ∈ σ
some invocation ê of the primcall e
accesses an element of some concrete
vector k ∈ σ
some invocation ê of the primcall e
accesses the print-name–string slot of
some concrete symbol k ∈ σ
some invocation ê of the call e calls
some concrete continuation k ∈ σ
some invocation ê of the call e
successfully calls some concrete
procedure k ∈ σ
some invocation ê of the primcall e
assigns the car slot of some concrete
pair k ∈ σ
some invocation ê of the primcall e
assigns the cdr slot of some concrete
pair k ∈ σ
some invocation ê of the primcall e
assigns an element of some concrete
string k ∈ σ
some invocation ê of the primcall e
assigns an element of some concrete
vector k ∈ σ

Flow-Directed Lightweight Closure Conversion

A.7

·

61

Abstract aggregate access and assignment terminology—II
TypeTagAccessed(σ)
Eq?Accessed(σ)
CarAccessed(σ)
CdrAccessed(σ)
StringLengthAccessed(σ)
StringRefAccessed(σ)
VectorLengthAccessed(σ)
VectorRefAccessed(σ)

SymbolToStringAccessed(σ)
ContinuationAccessed(σ)
ProcedureAccessed(σ)
CarAssigned(σ)
CdrAssigned(σ)
StringRefAssigned(σ)
VectorRefAssigned(σ)

the type-tag slot of some k ∈ σ is accessed
the identity of some k ∈ σ is accessed
the car slot of some concrete pair k ∈ σ is
accessed
the cdr slot of some concrete pair k ∈ σ is
accessed
the length slot of some concrete
string k ∈ σ is accessed
an element of some concrete string k ∈ σ is
accessed
the length slot of some concrete
vector k ∈ σ is accessed
an element of some concrete vector k ∈ σ is
accessed
the print-name–string slot of some concrete
symbol k ∈ σ is accessed
some concrete continuation k ∈ σ is called
some concrete procedure k ∈ σ is called
the car slot of some concrete pair k ∈ σ is
assigned
the cdr slot of some concrete pair k ∈ σ is
assigned
an element of some concrete string k ∈ σ is
assigned
an element of some concrete vector k ∈ σ is
assigned

62

·

A.8

Concrete call-graph terminology—I

Jeffrey Mark Siskind

Called(k)
ê  k
ê t k
ê t k
ê st k
ê st k
k1  k2
k1  t k2
k1  t k2
k1 st k2
k1 st k2

the concrete procedure k is called
the call invocation ê directly calls the concrete procedure k
the call invocation ê directly tail calls the concrete
procedure k
the call invocation ê directly non-tail calls the concrete
procedure k
the call invocation ê directly self-tail calls the concrete
procedure k
the call invocation ê directly non-self-tail calls the concrete
procedure k
the concrete procedure k1 directly calls the concrete
procedure k2
the concrete procedure k1 directly tail calls the concrete
procedure k2
the concrete procedure k1 directly non-tail calls the concrete
procedure k2
the concrete procedure k1 directly self-tail calls the concrete
procedure k2
the concrete procedure k1 directly non-self-tail calls the concrete
procedure k2

Flow-Directed Lightweight Closure Conversion

A.9

·

Concrete call-graph terminology—II
+ k
ê 
+
ê 
t k

+
ê 
t k

+
ê 
st k

+
ê 
st k

+ k
k1 
2

+
k1 
t k2

+
k1 
t k2

+
k1 
st k2

+
k1 
st k2

∗
k1 
k2
∗
k1 
t k2
∗
k1 
t k2

∗
k1 
st k2

∗
k1 
st k2

the call invocation ê properly calls the concrete procedure k
the call invocation ê properly tail calls the concrete
procedure k
the call invocation ê properly non-tail calls the concrete
procedure k
the call invocation ê properly self-tail calls the concrete
procedure k
the call invocation ê properly non-self-tail calls the concrete
procedure k
the concrete procedure k1 properly calls the concrete
procedure k2
the concrete procedure k1 properly tail calls the concrete
procedure k2
the concrete procedure k1 properly non-tail calls the concrete
procedure k2
the concrete procedure k1 properly self-tail calls the concrete
procedure k2
the concrete procedure k1 properly non-self-tail calls the concrete
procedure k2
the concrete procedure k1 calls the concrete procedure k2
the concrete procedure k1 tail calls the concrete procedure k2
the concrete procedure k1 non-tail calls the concrete
procedure k2
the concrete procedure k1 self-tail calls the concrete
procedure k2
the concrete procedure k1 non-self-tail calls the concrete
procedure k2

63

64

A.10

·

Jeffrey Mark Siskind

Abstract call-graph terminology—I
Called(p)
ep
e t p
e t p
e st p
e st p
p 1  p2
p1  t p2
p1  t p2
p1 st p2
p1 st p2

some
some
some
some
some
some
some
some
some
some
some
some
some
some
some
some
some
some
some
some
some

concrete procedure k ∈ p is called
invocation e of the call e directly calls
concrete procedure k ∈ p
invocation e of the call e directly tail calls
concrete procedure k ∈ p
invocation e of the call e directly non-tail calls
concrete procedure k ∈ p
invocation e of the call e directly self-tail calls
concrete procedure k ∈ p
invocation e of the call e directly non-self-tail calls
concrete procedure k ∈ p
concrete procedure k1 ∈ p1 directly calls
concrete procedure k2 ∈ p2
concrete procedure k1 ∈ p1 directly tail calls
concrete procedure k2 ∈ p2
concrete procedure k1 ∈ p1 directly non-tail calls
concrete procedure k2 ∈ p2
concrete procedure k1 ∈ p1 directly self-tail calls
concrete procedure k2 ∈ p2
concrete procedure k1 ∈ p1 directly non-self-tail calls
concrete procedure k2 ∈ p2

Flow-Directed Lightweight Closure Conversion

A.11

·

Abstract call-graph terminology—II
+ p
e

+
e
t p

+
e
t p

+
e
st p

+
e
st p

+ p
p1 
2

+
p1 
t p2

+
p1 
t p2

+
p1 
st p2

+
p1 
st p2

∗
p1 
p2

∗
p1 
t p2

∗
p1 
t p2

∗
p1 
st p2

∗
p1 
st p2

some invocation e of the call e properly calls some
concrete procedure k ∈ p
some invocation e of the call e properly tail calls some
concrete procedure k ∈ p
some invocation e of the call e properly non-tail calls
some concrete procedure k ∈ p
some invocation e of the call e properly self-tail calls
some concrete procedure k ∈ p
some invocation e of the call e properly non-self-tail calls
some concrete procedure k ∈ p
some concrete procedure k1 ∈ p1 properly calls some
concrete procedure k2 ∈ p2
some concrete procedure k1 ∈ p1 properly tail calls some
concrete procedure k2 ∈ p2
some concrete procedure k1 ∈ p1 properly non-tail calls
some concrete procedure k2 ∈ p2
some concrete procedure k1 ∈ p1 properly self-tail calls
some concrete procedure k2 ∈ p2
some concrete procedure k1 ∈ p1 properly non-self-tail calls
some concrete procedure k2 ∈ p2
some concrete procedure k1 ∈ p1 calls some concrete
procedure k2 ∈ p2
some concrete procedure k1 ∈ p1 tail calls some concrete
procedure k2 ∈ p2
some concrete procedure k1 ∈ p1 non-tail calls some
concrete procedure k2 ∈ p2
some concrete procedure k1 ∈ p1 self-tail calls some
concrete procedure k2 ∈ p2
some concrete procedure k1 ∈ p1 non-self-tail calls some
concrete procedure k2 ∈ p2

65

·

66

A.12

Jeffrey Mark Siskind

Called-more-than-once, points-to, escape, in-lining, and reentrancy terminology

CalledMoreThanOnce(k)
CalledMoreThanOnce(p)
FreeIn(x, p)
k;l
l;k
k1 ∗; k2
k ∗; l
l ∗; k
l1 ∗; l2
σ;β
β;σ
σ1 ∗; σ2
σ ∗; β
β ∗; σ
β1 ∗; β2
k ⇑ ê
σ⇑e
σ⇑p
UniqueCallSite(e, p)
p1 ,→ p2
p1 ,→∗ p2
Reentrant(k)
Reentrant(p)

the concrete procedure k is called more than
once
some concrete procedure k ∈ p is called more
than once
x is free in p
k directly points to l
l directly points to k
k1 points to k2
k points to l
l points to k
l1 points to l2
some k ∈ σ directly points to some l ∈ β
some l ∈ β directly points to some k ∈ σ
some k1 ∈ σ1 points to some k2 ∈ σ2
some k ∈ σ points to some l ∈ β
some l ∈ β points to some k ∈ σ
some l1 ∈ β1 points to some l2 ∈ β2
k escapes ê
some k ∈ σ escapes some invocation ê
of e in some execution of the program
σ escapes Body(p)
the call e is the unique call site of p
p1 is directly in-lined in p2
p1 is in-lined in p2
the concrete procedure k is reentrant
some concrete procedure k ∈ p is reentrant
in some execution of the program

Flow-Directed Lightweight Closure Conversion

A.13

·

67

Internal lightweight closure-conversion terminology

MustAlias(x)

MustAlias(σ)

MustAlias(p)

Fictitious(l)
Fictitious(β)
NonTrivialReference(ê)

NonTrivialReference(e)
Localizable(x)
Globalizable(x)
Ancestor(p1 , p2 )
Hideable(x)

every reached access invocation ê to x accesses
the instance x̂ bound by the most recent active
invocation of some concrete procedure k ∈ p(x)
for every successful call invocation ê to some
concrete continuation k ∈ σ, k was created by
the most recent active invocation of e(σ)
either p has no parent parameter or for every
successful call invocation ê to some concrete
procedure k ∈ p, the most active invocation of
ParentParameter(p) when k is called is the
same as the most active invocation of
ParentParameter(p) when k was created
l always contains the same concrete object
every l ∈ β is fictitious in every execution of
the program
an access invocation e is reached and returns an
object that is not known at compile time or an
assignment invocation e is executed, x(e) is not
hidden, and there is a subsequent nontrivial
access to x̂(ê)
some invocation ê of the reference e is
nontrivial in some execution of the program
x must alias and all nontrivial references to x
are in-lined in the procedure that binds x
x can have at most one live instance
some reached expression in p2 accesses a pointer
to the closure for p1
the abstract location of x contains a single
abstract procedure p, for all nontrivial
accesses e to x, p(e) is nested in every ancestor
of p, and p must alias

·

68

A.14

Jeffrey Mark Siskind

Output lightweight closure-conversion terminology

Local(x)
Global(x)
Hidden(x)
Slotted(x)
HasClosure(p)
HasParentSlot(p)
ParentSlot(p)
HasParentParameter(p)
ParentParameter(p)

x will be allocated as a local variable
x will be allocated as a global variable
x will be allocated as a hidden closure slot
x will be allocated as a closure slot
p will have a closure and closure pointer
p will have a parent slot
The parent slot for p will point to the closure
for ParentSlot(p)
p will have a parent parameter and closurepointer slot
The parent parameter and closure-pointer slot
for p will point to the closure for
ParentParameter(p)