Compiling with Continuations, Continued
Andrew Kennedy
Microsoft Research Cambridge
akenn@microsoft.com

Abstract
We present a series of CPS-based intermediate languages suitable
for functional language compilation, arguing that they have practical benefits over direct-style languages based on A-normal form
(ANF) or monads. Inlining of functions demonstrates the benefits most clearly: in ANF-based languages, inlining involves a renormalization step that rearranges let expressions and possibly introduces a new ‘join point’ function, and in monadic languages,
commuting conversions must be applied; in contrast, inlining in our
CPS language is a simple substitution of variables for variables.
We present a contification transformation implemented by simple rewrites on the intermediate language. Exceptions are modelled
using so-called ‘double-barrelled’ CPS. Subtyping on exception
constructors then gives a very straightforward effect analysis for exceptions. We also show how a graph-based representation of CPS
terms can be implemented extremely efficiently, with linear-time
term simplification.
Categories and Subject Descriptors
guages]: Processors – Compilers
General Terms

D.3.4 [Programming Lan-

Languages

Keywords Continuations, continuation passing style, monads, optimizing compilation, functional programming languages

1. Introduction
Compiling with continuations is out of fashion. So report the authors of two classic papers on Continuation-Passing Style in recent
retrospectives:
“In 2002, then, CPS would appear to be a lesson abandoned.” (McKinley 2004; Shivers 1988)
“Yet, compiler writers abandoned CPS over the ten years
following our paper anyway.” (McKinley 2004; Flanagan
et al. 1993)
This paper argues for a reprieve for CPS: “Compiler writers, give
continuations a second chance.”
This conclusion is borne of practical experience. In the MLj
and SML.NET whole-program compilers for Standard ML, coimplemented by the current author, we adopted a direct-style,
monadic intermediate language (Benton et al. 1998, 2004b). In
part, we were interested in effect-based program transformations,

Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
ICFP’07, October 1–3, 2007, Freiburg, Germany.
Copyright c 2007 ACM [This is the author’s version of the work. It is posted here by
permission of ACM for your personal use. Not for redistribution.]. . . $5.00

so monads were a natural choice for separating computations from
values in both terms and types. But, given the history of CPS, probably there was also a feeling that “CPS is for call/cc”, something
that is not a feature of Standard ML.
Recently, the author has re-implemented all stages of the
SML.NET compiler pipeline to use a CPS-based intermediate language. Such a change was not undertaken lightly, amounting to
roughly 25,000 lines of replaced or new code. There are many
benefits: the language is smaller and more uniform, simplification of terms is more straightforward and extremely efficient, and
advanced optimizations such as contification are more easily expressed. We use CPS only because it is a good place to do optimization; we are not interested in first-class control in the source
language (call/cc), or as a means of implementing other features
such as concurrency. Indeed, as SML.NET targets .NET IL, a callstack-based intermediate language with support for structured exception handling, the compilation process can be summarized as
“transform direct style (SML) into CPS; optimize CPS; transform
CPS back to direct style (.NET IL)”.
1.1 Some history
CPS. What’s special about CPS? As Appel (1992, p2) put it,
“Continuation-passing style is a program notation that makes every aspect of control flow and data flow explicit”. An important
consequence is that full β-reduction (function inlining) is sound.
In contrast, for call-by-value languages based on the lambda calculus, only the weaker β-value rule is sound. For example, βreduction cannot be applied to (λx.0) (f y) because f y may
have a side-effect or fail to terminate; but its CPS transform,
f y (λz.(λx.λk.k 0) z k) can be reduced without prejudice.
There are obvious drawbacks: the complexity of CPS terms; the
need to eliminate administrative redexes introduced by the CPS
transformation; and the cost of allocating closures for lambdas introduced by the CPS transformation, unless some static anlysis is
first applied. In fact, these drawbacks are more apparent than real:
the complexity of CPS terms is really a benefit, assigning useful names to all intermediate computations and control points; the
CPS transformation can be combined with administrative reduction; and by employing a syntactic separation of continuation- and
source-lambdas it is possible to generate good code directly from
CPS terms.
ANF. In their influential paper “The Essence of Compiling with
Continuations”, Flanagan et al. (1993) observed that “fully developed CPS compilers do not need to employ the CPS transformation
but can achieve the same results with a simple source-level transformation”. They proposed a direct-style intermediate language based
on A-normal forms, in which a let construct assigns names to every
intermediate computation. For example, the term above is represented as let z = f y in (λx.0) z, to which β-reduction can be applied, obtaining the semantically equivalent let z = f y in 0. This
style of language has become commonplace, not only in compilers,

but also to simplify the study of semantics for impure functional
languages (Pitts 2005, §7.4).

monadic terms. Consider the term

Monads. Very similar to ANF are so-called monadic languages
based on Moggi’s computational lambda calculus (Moggi 1991).
Monads also make sequencing of computations explicit through a
let x ⇐ M in N binding construct, the main difference from ANF
being that let constructs can themselves be let-bound. The separation of computations from values also provides a place to hang
effect annotations (Wadler and Thiemann 1998) which compilers
can use to perform effect-based optimizing transformations (Benton et al. 1998).

This is in ANF, but β-reduction produces

let z = (λx.if x then a else b) c in M
let z = (if c then a else b) in M,
which is not in normal form because it contains a let-bound
conditional expression. To reduce it to normal form, one must
either apply a standard commuting conversion that duplicates
the term M , producing
if c then let z = a in M else let z = b in M ,
or introduce a ‘join-point’ function for term M , to give

1.2 The problem
A-Normal Form is put forward as a compiler intermediate language
with all the benefits of CPS (Flanagan et al. 1993, §6). Unfortunately, the normal form is not preserved under useful compiler
transformations such as function inlining (β-reduction). Consider
the ANF term
M ≡ let x = (λy.let z = a b in c) d in e.
Now naive β-reduction produces
let x = (let z = a b in c) in e
which is not in normal form. The ‘fix’ is to define a more complex
notion of β-reduction that re-normalizes let constructs (Sabry and
Wadler 1997), in this case producing the normal form
let z = a b in (let x = c in e).
In contrast, the CPS transform of M , namely
(λy.λk.ab(λz.k c)) d (λx.k e),
simplifies by simple β-reduction to
a b (λz.(λx.k e) c).
As Sabry and Wadler explain in their study of the relationship between CPS and monadic languages, “the CPS language achieves
this normalization using the metaoperation of substitution which
traverses the CPS term to locate k and replace it by the continuation thus effectively ‘pushing’ the continuation deep inside the
term” (Sabry and Wadler 1997, § 8).
Monadic languages permit let expressions to be nested, but
incorporate so-called commuting conversions (cc’s) such as
let y ⇐ (let x ⇐ M in N ) in P
→ let x ⇐ M in (let y ⇐ N in P ).
ANF can be seen as a monadic language in which β-reduction is
combined with cc-normalization ensuring that terms remain in ccnormal form.
All of the above seems quite benign; except for two things:
1. Commuting conversions increase the complexity of simplifying
intermediate language terms. Reductions that strictly decrease
the size of the term can be applied exhaustively on CPS terms,
the number of reductions applied being linear in the size of the
term. The equivalent ANF or monadic reductions must necessarily involve commuting conversions, which leads to O(n2 )
reductions in the worst case. Moreover, as Appel and Jim (1997)
have shown, given a suitable term representation, shrinking reductions on CPS can be applied in time O(n); it is far from clear
how to amortize the cost of commuting conversions to obtain a
similar measure for ANF or monadic simplification.
2. Real programming languages include conditional expressions,
or, more generally, case analysis on datatype constructors.
These add considerable complexity to reductions on ANF or

let k z = M
in if c then let z = a in k z else let z = b in k z.
Observe that k is simply a continuation! In our CPS language,
k is already available in the original term, being the (named)
continuation that is passed to the function to be inlined. The desire to share subterms almost forces some kind of continuation
construct into the language. Better to start off with a language
that makes continuations explicit.
1.3 Contribution
Much of the above has been said before by others, though not always in the context of compilation; in this author’s opinion, the
most illuminating works are Appel (1992); Danvy and Filinski
(1992); Hatcliff and Danvy (1994); Sabry and Wadler (1997). One
contribution of this paper, then, is to draw together these observations in a form accessible to implementers of functional languages.
As is often the case, the devil is in the details, and so another
purpose of this paper is to advocate a certain style of CPS that
works very smoothly for compilation. Continuations are named and
mandatory (just as every intermediate value is named, so is every
control point), are second-class (they’re not general lambdas), can
represent basic blocks and loops, can be shared (typically, through
common continuations of branches), represent exceptional control
flow (using double-barrelled CPS), and are typeable (but can be
used in untyped form too). By refining the types of exception
values in the double-barrelled variant we get an effect system for
exceptions ‘for free’.
We make two additional contributions. Following Appel and
Jim (1997), we describe a graph-based representation of CPS terms
that supports the application of shrinking β-reductions in time linear in the size of the term. We improve on Appel and Jim’s selective use of back pointers for accessing variable binders, and employ the union-find data structure to give amortized near-constanttime access to binders for all variable occurrences. This leads to efficient implementation of η-reductions and other transformations.
We present benchmark results comparing our graph-CPS representation with (a) an earlier graphical representation of the original
monadic language used in our compiler, and (b) the original functional representation of that language.
Lastly, we show how to transform functions into local continuations using simple term rewriting rules. This approach to contification avoids the need for a global dominator analysis (Fluet and
Weeks 2001), and furthermore supports nested and first-class functions.

2. Untyped CPS
We start by defining an untyped continuation-passing language
λU
CPS that supports non-recursive functions, the unit value, pairs,
and tagged values. Even for such a simple language, we can cover
many of the issues and demonstrate advantages over alternative,
direct-style languages.

• The expression let x = πi y in K projects the i’th component

Grammar
(terms) CTm 3 K, L

(values)

CVal 3 V, W

::=
|
|
|
|
|

letval x = V in K
let x = πi x in K
letcont k x = K in L
kx
fkx
case x of k1 8 k2

::=

() | (x, y) | ini x | λk x.K

Well-formed terms
Γ ` V ok Γ, x; ∆ ` K ok
(let)
Γ; ∆ ` letval x = V in K ok
(letc)

Γ, x; ∆ ` K ok Γ; ∆, k ` L ok
Γ; ∆ ` letcont k x = K in L ok

(proj)

x ∈ Γ Γ, y; ∆ ` K ok
i ∈ 1, 2
Γ; ∆ ` let y = πi x in K ok

k ∈ ∆, x ∈ Γ
(appc)
Γ; ∆ ` k x ok
(case)

x ∈ Γ, k1 , k2 ∈ ∆
Γ; ∆ ` case x of k1 8 k2 ok

Well-formed values
x, y ∈ Γ
(pair)
Γ ` (x, y) ok
(unit)

k ∈ ∆, f, x ∈ Γ
(app)
Γ; ∆ ` f k x ok

Γ ` () ok

(tag)
(abs)

x∈Γ
i ∈ 1, 2
Γ ` ini x ok
Γ, x; k ` K ok
Γ ` λk x.K ok

of a pair y and binds it to variable x in K.
• The expression letcont k x = K in L introduces a local

continuation k whose single argument is x and whose body
is K, to be used in term L. It corresponds to a labelled block in
traditional lower-level representations. In Section 3 we extend
local continuations with support for recursion, and so represent
loops directly.
• A continuation application k x corresponds to a jump (if k is a

local continuation) or a return (if k is the return continuation
of a function value). As with values, continuations must be
named: function application expressions and case constructs
do not have subterms, but instead mention continuations by
name. We need only ever substitute continuation variables for
continuation variables.
Local continuations can be applied more than once, as in
letcont j y = K in
letcont k1 x1 = (letval x = V1 in j x) in
letcont k2 x2 = (letval x = V2 in j x) in
case z of k1 8 k2

.

Here j is the common continuation, or ‘join point’ for branches
k1 and k2 .
• The expression f k x is the application of a function f to an

argument x and a continuation k whose parameter receives the
result of applying the function. If k is the return continuation
for the nearest enclosing λ, then the application is a ‘tail call’.
For example, consider the function value

Well-formed programs
(prog)

λk x.(letcont j y = g k y in f j x).
{}; halt ` K ok

Figure 1. Syntax and scoping rules for untyped language λU
CPS

Here g is in tail position, and f is not. In effect, we are defining
λx.g(f (x)).
• The construct case x of k1 8 k2 expects x to be bound to

a tagged value ini y and then dispatches to the appropriate
continuation ki , passing y as argument.
In Section 3, we add recursive functions, types, polymorphism,
exceptions, and effect annotations. At that point, the language resembles a practical CPS-based intermediate language of the sort
that could form the core of a compiler for SML, Caml, or Scheme.
Figure 1 presents the syntax of the untyped language. Here ordinary variables are ranged over by x, y, f , and g, and continuation
variables are ranged over by k and j. Indices i range over 1,2. We
specify scoping of variables using well-formedness rules for values
and terms. Here Γ ` V ok means that value V is well-formed in
the scope of a list of ordinary variables Γ, and Γ; ∆ ` K ok means
that term K is well-formed in the scope of a list of continuation
variables ∆ and a list of ordinary variables Γ. Complete programs
are well-formed in the context of a distinguished top-level continuation halt. (For the typed variant of our language there will be
typing rules with Γ and ∆ generalized to typing contexts.)
We describe the constructs of the language in turn.
• The expression letval x = V in K binds a value V to a

variable x in the term K. This is the only way a value V
can be used in a term; arguments to functions, case scrutinees,
components of pairs, and so on, must all be simple variables.
Even the unit value () must be bound to a variable before being
used (in the full language, the same holds even for constants
such as 42). This means that there is no need for a general
notion of substitution: we only substitute variables for variables.
Notice also that there is no notion of redundant binding such as
let x ⇐ y in K.

• Values include the unit value (), pairs (x, y) and tagged values

ini x. Function values λk x.K include a return continuation k
and argument x. Note carefully the well-formedness rule (abs):
its continuation context includes only the return continuation k,
thus enforcing locality of continuations introduced by letcont.
The semantics is given by environment-style evaluation rules,
presented in Figure 2. As is conventional, we define a syntax of
run-time values, ranged over by r, supporting the unit value, pairs,
constructor applications, and closures. Environments map variables
to run-time values, and continuation variables to continuation values. Continuation values are represented in a closure form, which
gives the impression that they are first-class. An alternative would
be to model stack frames more directly and thereby demonstrate
that continuations are in fact just code pointers. For the purpose
of simply defining the meaning of programs we prefer the closurebased semantics.
The function J·K ρ interprets a value expression in an environment ρ. Terms are evaluated in an environment ρ; the only observations that we can make of programs are termination, i.e. the application of the top-level continuation halt to a unit value.
2.1 CPS transformation
To illustrate how the CPS-based language can be used for functional language compilation, consider a fragment of Standard ML

Runtime values: r ::= () | (r1 , r2 ) | ini r | hρ, λk x.Ki
Continuation values: c ::= hρ, λx.Ki
Environments: ρ ::= • | ρ, x 7→ r | ρ, k 7→ c
Interpretation of values:
J()K ρ
Jini V K ρ

=
=

()
ini (ρ(x))

J(x, y)K ρ
Jλk x.KK ρ

=
=

(ρ(x), ρ(y))
hρ, λk x.Ki

Evaluation Rules:
(e-let)

ρ, x 7→ JV K ρ ` K ⇓
ρ ` letval x = V in K ⇓

(e-letc)

ρ, k 7→ hρ, λx.Ki ` L ⇓
ρ ` letcont k x = K in L ⇓

(e-proj)

ρ, y 7→ ri ` K ⇓
ρ ` let y = πi x in K ⇓

(e-appc)

ρ0 , y 7→ ρ(x) ` K ⇓
ρ`kx⇓

(e-case)

ρ0 , y 7→ r ` K ⇓
ρ ` case x of k1 8 k2 ⇓

ρ(x) = (r1 , r2 )

ρ(k) = hρ0 , λy.Ki
ρ(x) = ini r
ρ(ki ) = hρ0 , λy.Ki

0

(e-app)
(e-halt)

ρ , j 7→ ρ(k), y →
7 ρ(x) ` K ⇓
ρ(f ) = hρ0 , λj y.Ki
ρ`f kx⇓
ρ ` halt x ⇓
Figure 2. Evaluation rules for

J·K
JxK κ
J()K κ
Je1 e2 K κ

tion of the standard higher-order one-pass call-by-value transformation (Danvy and Filinski 1992). An alternative, first-order, transformation is described by Danvy and Nielsen (2003).
The transformation works by taking a translation-time function κ as argument, representing the ‘context’ into which the translation of the source term is embedded. For our language, the context’s argument is a variable, as all intermediate results are named.
Note some conventions used in Figure 3: translation-time lambda
abstraction is written using  and translation-time application is
written κ(. . .), to distinguish from λ and juxtaposition used to denote lambda abstraction and application in the target language. Also
note that any object variables present in the target terms but not in
the source are assumed fresh with respect to all other bound variables.
The translation is one-pass in the sense that it introduces no
‘administrative reductions’ (here, β-redexes for continuations) that
must be removed in a separate phase, except for let constructs (to
avoid these also would require analysis of the let expression; we
prefer to apply simplifying rewrites on the output of the transformation). However, the translation is naive in two ways. First, it introduces η-redexes for continuations when translating tail function
applications. For example, Jfn x => f (x,y)K κ produces
letval g = λk x.(letval p = (x, y) in letcont j z = k z in f j p)
in κ(g)
whose η-redex (highlighted) can be eliminated to obtain the more
compact
letval g = (λk x.letval p = (x, y) in f k p) in κ(g).

λU
CPS

:
=
=
=

ML → (Var → CTm) → CTm
κ(x)
letval x = () in κ(x)
Je1 K (z1 .
Je2 K (z2 .
letcont k x = κ(x) in z1 k z2 ))
J(e1 ,e2 )K κ = Je1 K (z1 .
Je2 K (z2 .
letval x = (z1 , z2 ) in κ(x)))
Jini eK κ = JeK (z.letval x = ini z in κ(x))
J#i eK κ = JeK (z.let x = πi z in κ(x))
Jfn x => eK κ = letval f = λk x.JeK (z.k z) in κ(f )
Jlet val x = e1 in e2 endK κ =
letcont j x = Je2 K κ in Je1 K (z.j z)
Jcase e of in1 x1 => e1 | in2 x2 => e2 K κ =
JeK (z.letcont k1 x1 = Je1 K κ in
letcont k2 x2 = Je2 K κ in
case z of k1 8 k2 )
Figure 3. Naive CPS transformation of toy ML into λU
CPS
whose expressions (ranged over by e) have the following syntax:
ML 3 e ::= x | e e0 | fn x => e | (e,e0 ) | #i e | ()
| ini e | let val x = e in e0 end
| case e of in1 x1 => e1 | in2 x2 => e2
We assume a datatype declared by
datatype (’a,’b) sum = in1 of ’a | in2 of ’b
Expressions in this language can be translated into untyped CPS
terms using the function shown in Figure 3. This is an adapta-

Second, the translation of case duplicates the context; consider,
for example, f (case x of in1 x1 => e1 | in2 x2 => e2 ) whose
translation involves two calls to f .
The more sophisticated translation scheme of Figure 4 avoids
both these problems; again, this is based on Danvy and Filinski
(1992). The translation function J·K is as before, except (a) it introduces a join point continuation to avoid context duplication for
case, and (b) for terms in tail position it uses an alternative translation function L·M that takes an explicit continuation variable as
argument instead of a context.
2.2 Rewrites
After translating from source language to intermediate language,
most functional language compilers perform a number of optimization phases that are implemented as transformations on intermediate language terms. Some phases are specific (for example, arityraising of functions, or hoisting expressions out of loops) but usually there is some set of general rewrites based on standard reductions in the lambda-calculus. Figure 5 presents some general
rewrites for our CPS-based language. The rewrites look more complicated than the equivalent reductions in the lambda-calculus because the naming of intermediate values forces introduction and
elimination forms apart. For example, β-reduction on pairs, which
in the lambda calculus is simply πi (e1 , e2 ) → ei , has to support
an intervening context C. In practice, the rewrites are not hard to implement. In functional style, value bindings (e.g. pairs) are stored in
an environment which is accessed at the reduction site (e.g. a projection). In imperative style, bindings are accessed directly through
pointers, as we shall see in Section 4.1.
The payoff from this style of rewrite is the selective use of β
rules. For example, in a lambda-calculus extended with a let construct, one might perform the reduction let p = (x, y) in M →
M [(x, y)/p] but this would be undesirable unless every substitution of (x, y) for p in M produced a redex. In our language,
letval p = (x, y) in . . . k p . . . let z = π1 p in K reduces to

J·K
Jfn x => eK κ

:
=

ML → (Var → CTm) → CTm
letval f = λk x. LeM k in κ(f )

Jlet val x = e1 in e2 endK κ = letcont j x = Je2 K κ in Le1 M j
Jcase e of in1 x1 => e1 | in2 x2 => e2 K κ
= JeK (z. letcont j x = κ(x) in letcont k1 x1 = Le1 M j in letcont k2 x2 = Le2 M j in case z of k1 8 k2 )
L·M
LxM k
Le1 e2 M k
Lfn x => eM k
L(e1 ,e2 )M k
Lini eM k
L()M k
L#i eM k
Llet val x = e1 in e2 endM k
Lcase e of in1 x1 => e1 | in2 x2 => e2 M k

:
=
=
=
=
=
=
=
=
=

ML → CVar → CTm
kx
Je1 K (x1 .Je2 K (x2 .x1 k x2 ))
letval f = λj x.LeM j in k f
Je1 K (x1 .Je2 K (x2 .letval x = (x1 , x2 ) in k x))
JeK (z.letval x = ini z in k x)
letval x = () in k x
JeK (z.let x ⇐ πi z in k x)
letcont j x = Le2 M k in Le1 M j
JeK (z.letcont k1 x1 = Le1 M k in letcont k2 x2 = Le2 M k in case z of k1 8 k2 )

Figure 4. Tail CPS transformation (changes and additions only shown)

C ::= [] | letval x = V in C | let x = πi y in C |
letval x = λk x.C in K | letcont k x = C in K |
letcont k x = K in C
D EAD-C ONT
D EAD-VAL

letcont k x = L in K → L (k not free in K)
letval x = V in K → K (x not free in K)

β-C ONT

letcont k x = K in C[k y]
→ letcont k x = K in C[K[y/x]]
letval f = λk x.K in C[f j y]
→ letval f = λk x.K in C[K[y/x, j/k]]
letval x = ini y in C[case x of k1 8 k2 ]
→ letval x = ini y in C[ki y]
letval x = (x1 , x2 ) in C[let y = πi x in K]
→ letval x = (x1 , x2 ) in C[K[xi /y]]

β-F UN
β-C ASE
β-PAIR
β-C ONT -L IN
β-F UN-L IN

letcont k x = K in C[k y]
→ C[K[y/x]] (if k not free in C)
letval f = λk x.K in C[f j y]
→ C[K[y/x, j/k]] (f 6= y, f not free in C)

η-C ONT
η-F UN
η-PAIR

letcont k x = j x in K → K[j/k]
letval f = λk x.g k x in K → K[g/f ]
let xi = πi x in C[let xj = πj x
in C 0 [letval y = (x1 , x2 ) in K]]
→ let xi = πi x in C[let xj = πj x
in C 0 [K[x/y]]]
({i, j} = {1, 2})

η-C ASE

letcont ki x1 = (letval y1 = ini x1 in k y1 ) in
C[letcont kj x2 = (letval y2 = inj x2 in k y2 ) in
C 0 [case x of k1 8 k2 ]]
→ letcont ki x1 = (letval y1 = ini x1 in k y1 ) in
C[letcont kj x2 = (letval y2 = inj x2 in k y2 ) in
C 0 [k x]]
({i, j} = {1, 2})
Figure 5. General rewrites for λU
CPS

letval p = (x, y) in . . . k p . . . K[x/z] which applies the β-PAIR
rule to π1 p but preserves other occurrences of p.
It is easy to show that all rewrites preserve well-formedness of
terms. In particular, the scoping of local continuations is respected.
The β-F UN and β-C ONT reductions are inlining transformations for functions and continuations. The remainder of the reductions we call shrinking reductions, as they strictly decrease the size
of terms (Appel and Jim 1997). The β-C ONT -L IN and β-F UN-L IN
reductions are special cases of β-reduction for linear uses of a variable, in effect combining D EAD- and β- reductions. Shrinking reductions can be applied exhaustively on a term, and are typically
used to ‘clean up’ a term after some special-purpose global transformation such as arity-raising or monomorphisation. Clearly the
number of such reductions will be linear in the size of the term;
moreover, using the representation of terms described in Section 4
it is possible to perform such reductions in linear time.
2.3 Comparison with a monadic language
The original implementations of the MLj and SML.NET compilers used monadic languages inspired by Moggi’s computational
lambda calculus (Moggi 1991). Figure 6 presents syntax for a
monadic language λmon and selected reduction rules.
The defining feature of monadic languages is that sequencing
of computations is made explicit through the let construct; values are converted into trivial computations using the val construct.
Monadic languages share with CPS languages the property that familiar β-reduction on functions is sound, as evaluation of the function argument is made explicit through let. But there are drawbacks,
as we outlined in the Introduction. (An orthogonal issue – as for
CPS based languages – is whether values can appear anywhere except inside val. In λmon , for ease of presentation, we permit values
to be embedded in applications, pairs, and so on, whereas for λU
CPS
we insist that they are named. The difference shows up in the reduction rules, which in λU
CPS make use of contexts. It should be
noted that the drawbacks of monadic languages that we are about
to discuss are unaffected by this choice.)
Problem 1: need for let/let commuting conversion. The basic
reductions listed in Figure 5 have corresponding reductions in CPS.
The let construct itself has β and η rules which correspond to
β-C ONT and η-C ONT for λU
CPS (consider the CPS transforms of
the terms). In contrast to CPS-based languages, though, monadic

Grammar
MTm 3 M, N

::=

MVal 3 v, w

::=

Reductions
β-L ET
η-L ET
CC-L ET

let x ⇐ val v in M → M [v/x]
let x ⇐ M in val x → M
let x2 ⇐ (let x1 ⇐ M1 in M2 ) in N
→ let x1 ⇐ M1 in (let x2 ⇐ M2 in N )
let x ⇐ (case v of in1 x1 .M1 8 in2 x2 .M2 ) in N
→ let f ⇐ val λx.N
in case v of in1 x1 .let x ⇐ M1 in f x
8 in2 x2 .let x ⇐ M2 in f x

CC-C ASE

β-PAIR
β-F UN
β-C ASE

val v | let x ⇐ M in N | v w | πi v
| case v of in1 x1 .M1 8 in2 x2 .M2
x | λx.M | (v, w) | ini v | ()

πi (v1 , v2 ) → vi
(λx.M ) v → M [v/x]
case ini v of in1 x1 .M1 8 in2 x2 .M2 → Mi [v/xi ]

Figure 6. Syntax and selected rewrites for monadic language λmon
languages include a so-called commuting conversion, expressing
associativity for let:
CC-L ET

let x2 ⇐ (let x1 ⇐ M1 in M2 ) in N
→ let x1 ⇐ M1 in (let x2 ⇐ M2 in N )

This reduction plays a vital role in exposing further reductions.
Consider the source expression
#1 ((fn x => (g x,x)) y)
Its translation into λmon is
let z2 ⇐ (λx.let z1 ⇐ g x in val (z1 , x)) y in π1 z2 .

Problem 2: quadratic blowup. The CC-L ET reduction seems innocent enough. But observe that it is not a shrinking reduction – so
it’s not immediately clear whether reduction will terminate. Fortunately, the combination of CC-L ET and shrinking β/η-reductions
of Figure 6 does terminate (Lindley 2005), and moreover there is
a formal correspondence between the reductions of the monadic
language and CPS (Hatcliff and Danvy 1994). Unfortunately, the
order in which conversions are applied is critical to the efficiency
of simplification by reduction. Consider the following term in λmon :
let fn ⇐ val (λxn .let yn ⇐ g xn in g yn ) in
let fn−1 ⇐ val (λxn−1 .let yn−1 ⇐ fn xn−1 in g yn−1 ) in
..
.
let f1 ⇐ val (λx1 .let y1 ⇐ f2 x1 in g y1 ) inf1 a
If (linear) β-F UN is applied to all functions in this term, followed
by a sequence of CC-L ET reductions, then no redexes remain
after O(n) reductions. If, however, the commuting conversions
are interleaved with β-F UN, then O(n2 ) reductions are required.
(There are other examples where it is better to apply commuting
conversions first.) Although this is a pathological example, the
‘simplifier’ was a major bottleneck in the MLj and SML.NET
compilers (Benton et al. 2004a), in part (we believe) because of
the need to perform commuting conversions.
Solution 2: Use CPS. It is interesting to note that monadic terms
can be translated into CPS in linear-time; shrinking reductions can
be applied exhaustively there in linear-time (see Section 4); and the
term can be translated back into CPS in linear-time. Therefore the
quadratic blowup we saw above is not fundamental, and there may
be some means of amortizing the cost of commuting conversions
so that exhaustive reductions can be peformed in linear time. Nevertheless, it is surely better to have the term in CPS from the start,
and enjoy the benefit of linear-time simplification.

Now suppose that we apply β-F UN, to get
let z2 ⇐ (let z1 ⇐ g y in val (z1 , y)) in π1 z2 .
In order to make any further progress, we must use CC-L ET to get
let z1 ⇐ g y in let z2 ⇐ val (z1 , y) in π1 z2 .
Now we can apply β-L ET and β-PAIR to get let z1 ⇐ g y in z1
which further reduces by η-L ET to g y.
Solution 1: Use CPS. Now take the original source expression
and translate it into our CPS-based language, with k representing
the enclosing continuation.
let f = λj1 x.
(letcont j2 z1 = (letval z2 = (z1 , x) in j1 z2 ) in g j2 x)
in letcont j3 z3 = (let z4 = π1 z3 in k z4 )
in f j3 y
Applying β-F UN-L IN gives the following, with substitutions highlighted:
letcont j3 z3 = (let z4 = π1 z3 in k z4 )
in letcont j2 z1 = (letval z2 = (z1 , y ) in j3 z2 ) in g j2 y
and by β-C ONT -L IN on j3 we get
letcont j2 z1 =
(letval z2 = (z1 , y) in let z4 = π1 z2 in k z4 )
in g j2 y.
Finally, use of β-PAIR and D EAD-VAL produces letcont j2 z1 =
k z1 in g j2 y which reduces by η-C ONT to g k y. All reductions
were simple uses of β and η rules, without the need for the additional ‘administrative’ reduction CC-L ET .

Problem 3: need for let/case commuting conversion. Matters
become more complicated with conditionals or case constructs.
Consider the source expression
g 0 (g((fn x => case x of in1 x1 => (x1 ,x3 )| in2 x2 => g 00 x) y))
Its translation into λmon is
let z ⇐ (λx.case x of in1 x1 .val (x1 , x3 ) 8 in2 x2 .g 00 x) y in
let z 0 ⇐ g z in g 0 z 0 .
This reduces by β-F UN to
let z ⇐ (case y of in1 x1 .val (x1 , x3 ) 8 in2 x2 .g 00 y) in
let z 0 ⇐ g z in g 0 z 0 .
At this point, we want to ‘float’ the case expression out of the let.
The proof-theoretic commuting conversion that expresses this
rewrite is
let x ⇐ (case v of in1 x1 .M1 8 in2 x2 .M2 ) in N
→
case v of in1 x1 .(let x ⇐ M1 in N ) 8 in2 x2 .(let x ⇐ M2 in N )
This can have the effect of exposing more redexes; unfortunately,
it also duplicates N which is not so desirable. So instead, compilers typically adopt a variation of this commuting conversion that
shares M between the branches, creating a so-called join point
function:
CC-C ASE

let x ⇐ (case v of in1 x1 .M1 8 in2 x2 .M2 ) in N
→ let f ⇐ val λx.N
in case v of in1 x1 .let x ⇐ M1 in f x
8 in2 x2 .let x ⇐ M2 in f x

Applying this to our example produces the result
let f ⇐ val (λz.let z 0 ⇐ g z in g 0 z 0 ) in
case x of
in1 x1 .(let z ⇐ val (x1 , x3 ) in f z)
8 in1 x2 .(let z ⇐ g 00 x in f z).
As observed earlier, join points such as f are just continuations.
Solution 3: Use CPS. Consider the CPS transformation of the
original source expression, with k being the enclosing return continuation.
letcont j 0 z 0 = g 0 k z 0 in
letcont j z = g j 0 z in
letval f = λj 00 x.
(letcont k1 x1 = (letval z 00 = (x1 , x3 ) in j 00 z 00 ) in
letcont k2 x2 = g 00 j 00 x in
case x of k1 8 k2 )
in f j y
Applying β-F UN-L IN immediately produces the following term,
with substitutions highlighted:
letcont j 0 z 0 = g 0 k z 0 in
letcont j z = g j 0 z in
letcont k1 x1 = (letval z 00 = (x1 , x3 ) in j z 00 ) in
letcont k2 x2 = g 00 j
case y of k1 8 k2

y in

There is no need to apply anything analogous to CC-C ASE , or to
introduce a join point: the original term already had one, namely j,
which was substituted for the return continuation j 00 of the function.
The absence of explicit join points in monadic languages is
an annoyance in itself. By representing join points as ordinary
functions, it is necessary to perform a separate static analysis to
determine that such functions can be compiled efficiently as basic
blocks.
Explicitly named local continuations in CPS have the advantage
that locality is immediate from the syntax, and preserved under
transformation; furthermore traditional intra-procedural compiler
optimizations (such as those performed on SSA representations)
can be adapted to operate on functions in CPS form.
2.4 Comparison with ANF
Flanagan et al. (1993) propose an alternative to CPS which they call
A-Normal Form, or ANF for short. This is defined as the image
of the composition of the CPS, administrative normalization and
inverse CPS transformations.
CS •

CPS

/•

Instead of going via a CPS language, the transformation into ANF
can be performed in one pass, as suggested by the dotted line A in
the diagram above.1 A similar transformation has been studied by
Danvy (2003).
As Flanagan et al. (1993) suggest, the “back end of an A-normal
form compiler can employ the same code generation techniques
that a CPS compiler uses”. However, as we mentioned in the Introduction, it is not so apparent whether ANF is ideally suited to
optimization. After all, it is not even closed under the usual rule
for β reduction (λx.A) v → A[v/x]. As Sabry and Wadler
(1997) later explained, it is necessary to combine substitution with
re-normalization to get a sound rule for β-reduction: essentially the
repeated application of CC-L ET . They do not consider conditionals
or case constructs, but presumably to maintain terms in ANF in it
is necessary to normalize with respect to CC-L ET and CC-C ASE
following function inlining.
It is clear, then, that ANF suffers all the same problems that affect monadic languages: the need for (non-shrinking) commuting
conversions, quadratic blowup of ‘linear’ reductions, and the absence of explicit join points.

3. Typed CPS with exceptions
We now add types and other features to the language of Section 2.
In the untyped world, we can model recursion using a call-by-value
fixed-point combinator. For a typed language, we must add explicit support for recursive functions – which, in any case, is more
practical. Moreover, we would like to express recursive continuations too, in order to represent loops. Finally, to support exceptions, functions in the extended language take two continuations:
an exception-handler continuation, and a return continuation. This
is the so-called double-barrelled continuation-passing style (Thielecke 2002).
Figure 7 presents the syntax and typing rules for the extended
language λTCPS . Types of values are ranged over by τ , σ and include
unit, a type of exceptions, products, sums and functions. (To save
space, we omit constructs for manipulating exception values.) Continuation types have the form ¬τ which is interpreted as ‘continuations accepting values of type τ ’. Note that for simplicity of presentation we do not annotate terms with types; it is an easy exercise to
add sufficient annotations to determine unique typing derivations.
Typing judgments for values have the form Γ ` V : τ in which Γ
maps variables to value types. Judgments for terms have the form
Γ; ∆ ` K ok in which the additional context ∆ maps continuation variables to continuation types. Complete programs are typed
in the context of a single top-level continuation halt accepting unit
values.
We consider each construct in turn.
• The letval construct is as before, with the obvious typing rule

and associated value typing rules. Likewise for projections.
A

β-normalization





A(CS ) •o

un-CPS

•

The source language CS is Core Scheme (corresponding to our
fragment of ML), and their CPS transformation composed with βnormalization is equivalent to our one-pass transformation J·K of
Figure 4.
The language A(CS ) corresponds precisely to CC-L ET /CCC ASE normal forms in λmon . We can express these normal forms
by a grammar:
ATm 3 A, B

::=

ACmp 3 R
AVal 3 v, w

::=
::=

R | let x ⇐ R in A
| case v of in1 x1 .A1 8 in2 x2 .A2
v w | πi v | v
x | λx.A | (v, w) | ini v | ()

• The letcont construct is generalized to support mutually recur-

sive continuations. These represent loops directly. Local continuations are also used for exception handlers.
• The letfun construct introduces a set of mutually recursive

functions; each function takes a return continuation k, an exception handler continuation h, and an argument x. As a language
construct, there is nothing special about the handler continuation except that its type is fixed to be ¬exn, and so a function
type τ → σ is constructed from the argument type τ and the
type ¬σ of the return continuation. What really distinguishes
1 Though, curiously, the ‘A-normalization algorithm’ in (Flanagan et al.
1993, Fig. 9) does not actually normalize terms, as it leaves let-bound
conditionals alone.

Grammar
(value types)
(values)
(terms)

τ, σ
CVal 3 V, W
CTm 3 K, L

::=
::=
::=

(function def.)
(cont. def.)

FunDef 3 F
ContDef 3 C

::=
::=

Variables
x:τ ∈ Γ
(var)
Γ`x:τ

(contvar)

unit | exn | τ × σ | τ + σ | τ → σ
() | (x, y) | ini x
letval x = V in K | let x = πi x in K | letcont C in K | letfun F in K
| k x | f k h x | case x of k1 8 k2
f khx = K
kx=K

k:¬τ ∈ ∆
∆ ` k : ¬τ

Well-typed terms
{Γ, xi :τi ; ∆, k1 :¬τ1 , . . . , kn :¬τn ` Ki ok}16i6n
Γ; ∆, k1 :¬τ1 , . . . , kn :¬τn ` L ok
(letc)
Γ; ∆ ` letcont k1 x1 = K1 , . . . , kn xn = Kn in L ok
(letrec)

{Γ, xi :τi , f1 :τ1 → σ1 , . . . , fn :τn → σn ; ki :¬σi , hi :¬exn ` Ki ok}16i6n
Γ, f1 :τ1 → σ1 , . . . , fn :τn → σn ; ∆ ` L ok
Γ; ∆ ` letfun f1 k1 h1 x1 = K1 , . . . , fn kn hn xn = Kn in L ok

(letv)

Γ ` V : τ Γ, x:τ ; ∆ ` K ok
Γ; ∆ ` letval x = V in K ok

(case)

Γ ` x : τ1 + τ2 ∆ ` k1 : ¬τ1 ∆ ` k2 : ¬τ2
Γ; ∆ ` case x of k1 8 k2 ok

Well-typed values
Γ`x:τ Γ`y:σ
(pair)
Γ ` (x, y) : τ × σ

(appc)

Γ ` x : τ ∆ ` k : ¬τ
Γ; ∆ ` k x ok
(app)

(proj)

Γ`f :τ →σ

Γ ` x : τ1 × τ2 Γ, y:τi ; ∆ ` K ok
i ∈ 1, 2
Γ; ∆ ` let y ⇐ πi x in K ok
∆ ` k : ¬σ ∆ ` h : ¬exn
Γ; ∆ ` f k h x ok

Γ`x:τ

Well-typed programs
Γ ` x : τi
i ∈ 1, 2
(tag)
Γ ` ini x : τ1 + τ2

(unit)

Γ ` () : unit

(prog)

{}; halt:¬unit ` K ok

Figure 7. Syntax and typing rules for typed language λTCPS
exceptions is (a) their role in the translation from source language into CPS, and (b) typical strategies for generating code.
• Continuation application k x is as before. Now there are four

possibilities for k: it may be a recursive or non-recursive occurrence of a letcont-bound continuation, compiled as a jump, it
may be the return continuation, or it may be a handler continuation, which is interpreted as raising an exception.

continuation h0 is declared whose body is the translation of e2 ; this
is then used as the handler passed to the translation function for e1 .
3.2 Rewrites
The rewrites of Figure 5 can be adapted easily to λTCPS , and extended
with transformations such as ‘loop unrolling’:
β-R EC

letfun

• Function application f k h x includes a handler continua-

tion argument h. If k is the return continuation for the nearest enclosing function, and h is its handler continuation, then
the application is a tail call. If k is a local continuation and h
is the handler continuation for the enclosing function, then
the application is a non-tail call without an explicit exception handler – so exceptions are propagated to the context.
Otherwise, h is an explicit handler for exceptions raised by
the function. (Other combinations are possible; for example in
letfun f k h x = C[g h h y] in K the function application is
essentially raise (g y) in a tail position.)

→

...
in K
letfun

β-R EC C ONT

...
in K
letcont

→

...
in K
letcont

• Branching using case is as before.

...

3.1 CPS transformation
We can extend the fragment of ML described in Section 2.1 with
exceptions and recursive functions:
ML 3 e

::=

MLDef 3 d

::=

. . . | raise e | e1 handle x => e2
| let fun d in e end
f x=e

The revised CPS transformation is shown in Figure 8 (see (Kim
et al. 1998) for the selective use of a double-barrelled CPS transformation). Both J·K and L·M take an additional argument: a continuation h for the exception handler in scope. Then raise e is translated as an application of h. For e1 handle x => e2 a local handler

f1 k1 h1 x1 = C[fi k h x]
f2 k2 h2 x2 = K2
fn kn hn xn = Kn
f1 k1 h1 x1 = C[Ki [k/ki , h/hi , x/xi ]]
f2 k2 h2 x2 = K2
fn kn hn xn = Kn
k1 x1 = C[ki x]
k2 x2 = K2
kn xn = Kn
k1 x1 = C[Ki [x/xi ]]
k2 x2 = K2
kn xn = Kn

in K
There are no special rewrites for exception handling, e.g. corresponding to (raise M ) handle x.N → let x ⇐ M in N . Standard β-reduction on functions and continuations gives us this for
free. For example, the CPS transform of
let fun f x = raise x in f y handle z => (z,z) end
is
letfun f k0 h0 x = h0 x
in letcont j z = (letval z 0 = (z, z) in k z 0 ) in f k j y
which reduces by β-F UN and β-C ONT to letval z 0 = (y, y) in k z 0 .

J·K
: ML → CVar → (Var → CTm) → CTm
JxK h κ = κ(x)
Je1 e2 K h κ = Je1 K h (x1 .Je2 K h (x2 .letcont k x = κ(x) in x1 k h x2 ))
Jfn x => eK h κ = letfun f k h0 x = LeM h0 k in κ(f )
J(e1 ,e2 )K h κ = Je1 K h (x1 .Je2 K h (x2 .letval x = (x1 , x2 ) in κ(x)))
Jini eK h κ = JeK h (z.letval x = ini z in κ(x))
J()K h κ = letval x = () in κ(x)
J#i eK h κ = JeK h (z.let x ⇐ πi z in κ(x))
Jlet val x = e1 in e2 endK h κ = letcont j x = Je2 K h κ in Le1 M h j
Jlet fun d in e endK h κ = letfun JdK in JeK h κ
Jraise eK h κ = JeK h (z.h z)
Je1 handle x => e2 K h κ = letcont j x = κ(x) in letcont h0 x = Le2 M h j in Le1 M h0 j
Jcase e of in1 x1 => e1 | in2 x2 => e2 K κ
= JeK h (z.letcont j x = κ(x) letcont k1 x1 = Le1 M h j in letcont k2 x2 = Le2 M h j in case z of k1 8 k2 )
J·K
Jf x = eK

:
=

MLDef → FunDef
f k h x = LeM h k

L·M
: ML → CVar → CVar → CTm
LxM h k = k x
Le1 e2 M h k = Je1 K h (x1 .Je2 K h (x2 .x1 k h x2 ))
Lfn x => eM h k = letval f = λj x.LeM h j in k f
L(e1 ,e2 )M h k = Je1 K h (x1 .Je2 K h (x2 .letval x = (x1 , x2 ) in k x))
Lini eM h k = JeK h (z.letval x = ini z in k x)
L()M h k = letval x = () in k x
L#i eM h k = JeK h (z.let x ⇐ πi z in k x)
Llet val x = e1 in e2 endM h k = letcont j x = Le2 M h k in Le1 M h j
Llet fun d in e endM h k = letfun JdK in LeM h k
Lraise eM h k = JeK h (z.h z)
Le1 handle x => e2 M h k = letcont h0 x = Le2 M h k in Le1 M h0 k
Lcase e of in1 x1 => e1 | in2 x2 => e2 M h k
= JeK h (z.letcont k1 x1 = Le1 M h k in letcont k2 x2 = Le2 M h k in case z of k1 8 k2 )
Figure 8. Tail CPS transformation for λTCPS
Likewise, commuting conversions are not required, in contrast
with monadic languages, where in order to define well-behaved
conversions it is necessary to generalize the usual M handle x ⇒
N construct to try y ⇐ M in N1 unless x ⇒ N2 , incorporating a
success ‘continuation’ N1 (Benton and Kennedy 2001).

generalization. For example:
(letv)

V, α; Γ ` V : τ V; Γ, x:∀α.τ ; ∆ ` K ok
V; Γ; ∆ ` letval x = V in K ok

For elimination, we simply adapt the variable rule (var) to
incorporate polymorphic specialization:

3.3 Other features

(var)
λTCPS

It is straightforward to extend
with other features useful for
compiling full-scale programming languages such as Standard ML.
• Recursive types of the form µα.τ can be supported by adding

suitable introduction and elimination constructs: a value fold x
and a term let x = unfold y inK.
• Binary products and sums generalize to the n-ary case. For opti-

mizing representations it is common for intermediate languages
to support functions with multiple arguments and results, and
constructors taking multiple arguments. This is easy: function
definitions have the form f k h x = K, and continuations have
the form k x = K and are used for passing multiple results
and for case branches where the constructor takes multiple arguments.
• Polymorphic types of the form ∀α.τ can be added. Typing con-

texts are extended with a set of type variables V. Then to support ML-style let-polymorphism, each value binding construct
(letval, letfun, and projection) must incorporate polymorphic

x:∀α.τ ∈ Γ
Γ ` x : τ [σ/α]

3.4 Effect analysis and transformation
The use of continuations in an explicit ‘handler-passing style’ lends
itself very nicely to an effect analysis for exceptions. Suppose, for
simplicity, that there are a finite number of exception constructors
ranged over by E. We make the following changes to λTCPS :
• We introduce exception set types of the form {E1 , . . . , En },

representing exception values built with any of the constructors E1 , . . . , En . Set inclusion induces a subtype ordering on
exception types, with top type exn representing any exception,
and bottom type {} representing no exception.
• The type of handler continuations in function definitions are

refined to describe the exceptions that the function is permitted
to throw. For example:
(1)
(2)
(3)

letfun f k (h:¬{}) x = K in . . .
letfun f k (h:¬exn) x = K in . . .
letfun f k (h:¬{E, E 0 }) x = K in . . .

The type of (1) tells us that K never raises an exception, in
(2) the function can raise any exception, and in (3) the function
might raise E or E 0 .
• Now that handlers are annotated with more precise types, the
0

function types must reflect this too. We write τ →σ σ for the
type of functions that either return a result of type σ or raise an
exception of type σ 0 <: exn. Subtyping on function types and
continuation types is specified by the following rules:
τ2 <: τ1
σ1 <: σ2
σ10 <: σ20
0
0
σ1
σ2
τ1 → σ1 <: τ2 → σ2

σ2 <: σ1
¬σ1 <: ¬σ2

Exception effects enable effect-specific transformations (Benton
and Buchlovsky 2007). Suppose that the type of f is τ →{E1 } σ.
Then we can apply a ‘dead-handler’ rewrite on the following:
letcont h:¬{E1 , E2 } x = (case x of E1 .k1 8 E2 .k2 ) in f k h y
→ letcont h:¬{E1 } x = (case x of E1 .k1 ) in f k h y
In fact, there is nothing exception-specific about this rewrite: it is
just employing refined types for constructed values. The use of
continuations has given us exception effects ‘for free’.

4. Implementing CPS
Many compilers for functional languages represent intermediate
language terms in a functional style, as instances of an algebraic
datatype of syntax trees, and manipulate them functionally. For example, the language λU
CPS can be implemented by an SML datatype,
here using integers for variables, with all bound variables distinct:
type Var = int and CVar = int
datatype CVal =
Unit | Pair of Var * Var | Inj of int * Var
| Lam of CVar * Var * CTm
and CTm =
LetVal of Var * CVal * CTm
| LetProj of Var * int * Var * CTm
| LetCont of CVar * Var * CTm * CTm
| AppCont of CVar * Var
| App of Var * CVar * Var
| Case of Var * CVar * CVar

Rewrites such as those of Figure 5 are then implemented by a
function that maps terms to terms, applying as many rewrites as
possible in a single pass. Here is a typical fragment that applies the
β-PAIR and D EAD-VAL reductions:
fun simp census env S K =
case K of
LetVal(x, V, L) =>
if count(census,x) = 0 (* Dead-Val *)
then simp census env S L
else LetVal(x, simpVal census env S V,
simp census (addEnv(env,x,V)) S L)
| LetProj(x, 1, y, L) =>
let val y’ = applySubst S y
in case lookup(env, y’) of
(* Beta-Pair *)
Pair(z,_) =>
simp census env (extendSubst S (x,z)) L
| _
=>
LetProj(x, 1, y’, simp census env S L)
end

In addition to the term K itself, the simplifier function simp
takes a parameter env that tracks letval bindings, a parameter S
used to substitute variables for variables and a parameter census
that maps each variable to the number of occurrences of the variable, computed prior to applying the function.

The census becomes out-of-date as reductions are applied, and
this may cause reductions to be missed until the census is recalculated and simp applied again. For example, the β-PAIR reduction
may trigger a D EAD-VAL in an enclosing letval binding (consider
letval x = (y1 , y2 ) in . . . let z = π1 x in . . . where x occurs only
once). Maintaining accurate census information as rewrites are performed can increase the number of reductions performed in a single
pass (Appel and Jim 1997), but even with up-to-date census information, it is not possible to perform shrinking reductions exhaustively in a single pass, so a number of iterations may be required before all redexes have been eliminated. In the worst case, this leads
to O(n2 ) behaviour.
What’s more, each pass essentially copies the entire term, leaving the original term to be picked up by the garbage collector. This
can be expensive. (Nonetheless, the simplicity of our CPS language, with substitutions only of variables for variables, and the
lack of commuting conversions as are required in ANF or monadic
languages, leads to a very straightforward simplifier algorithm.)
4.1 Graphical representation of terms
An alternative is to represent the term using a graph, and to perform
rewrites by destructive update of the graph. Appel and Jim (1997)
devised a representation for which exhaustive application of the
shrinking β-reductions of Figure 5 takes time linear in the size of
the term. We improve on their representation to support efficient ηreductions and other transformations. The representation has three
ingredients.
1. The term structure itself is a doubly-linked tree. Every subterm
has an up-link to its immediately enclosing term. This supports
constant time replacement, deletion, and insertion of subterms.
2. Each bound variable contains a link to one of its free occurrences, or is null if the variable is dead, and the free occurrences
themselves are connected together in a doubly-linked circular
list. This permits the following operations to be performed in
constant time:
• Determining whether a bound variable has zero, one, or

more than one occurrence, and if it has only one occurrence,
locating that occurrence.
• Determining whether a free variable is unique.
• Merging two occurrence lists.

Furthermore, we separate recursive and non-recursive uses of
variables; in essence, instead of letfun f k h x = K in L we
write let f = rec g k h x.K[g/f ] in L. This lets us detect
D EAD-? and β-?-L IN reductions.
3. Free occurrences are partitioned into same-binder equivalence
classes by using the union-find data structure (Cormen et al.
2001)2 . The representative in each equivalence class (that is, the
root of the union-find tree) is linked to its binding occurrence.
This supports amortized near-constant time access to the binder
(the find operation) and merging of occurrence lists (the union
operation).
Substitution of variable x for variable y is implemented in nearconstant time by (a) merging the circular lists of occurrences so
that x now points to the merged list, and (b) applying a union
operation so that the occurrences of y are now associated with the
binder for x.
Consider the following value term, with doubly-linked tree
structure and union-find structure implicit but with binder-to-free
2 Readers

familiar with type inference may recall that union-find underpins
the almost-linear time algorithm for term unification (Baader and Nipkow
1998).

pointer shown as a dotted arrow and circular occurrence lists shown
as solid arrows:
λ

k

let

p

in

.

x

let

(x
y
d ,
(
... p R
h RR. . .
RRR
R(
z

=

...

v
zh

(
... 6 z

=

"

(

...

v
p
T

...

π1

)

p@
` @@ in

... ; p

Now suppose that we wish to apply β-PAIR to the projection π1 p.
Using the find operation on the union-find structure we can locate
the pair (x, y) in near constant time. Now we substitute x for z by
disconnecting z’s binder from its circular list and connecting x’s
occurrence list in its place, and merging the two lists, in constant
time. At the same time, we apply the union operation to merge the
binder equivalence classes (not shown).
λ

k

x

let

p

=

in

...
...

"

.

(
,
y
)
C xU,
,
( h, . . .
... p R
,,RR
,, RRRR
z
= ,,π1 ( p @
` @@ in


... 9 p
x i ... 6 x
(

v
p
U
let
...

Finally we remove the projection itself, deleting the occurrence of p
from the circular list, again in constant time:
λ

k

x

let

p

=

in

...

"

.

(x
y
)
; X2 ,
22
(
. . . p jUU2U
2. U.U.UUU
UUU
. . . *8 p
x i ... 6 x
(

v

p]
...

One issue remains: the classical union-find data structure does not
support deletion. There are recent techniques that extend union-find
with amortized near-constant time deletion (Kaplan et al. 2002).
However, the representation is non-trivial, and might add unacceptable overhead to the union and find operations, so we chose instead
a simpler solution: do nothing! Deleted occurrences remain in the
union-find data structure, possibly as root nodes, or as nodes on the
path to the root. In theory, the efficiency of rewriting is then dependent on the ‘peak’ size of the term, not its current size, but we have
not found this to be a problem in practice.
Each of the shrinking reductions of Figure 5 can be implemented in almost-constant time using our graph representation. To
put these together and apply them exhaustively on a term, we follow Appel and Jim (1997):
• First sweep over the term, detecting redexes and collecting them

in a worklist.
• Then pull items off the worklist one at a time (in any order),

applying the appropriate rewrite, and adding new redexes to
the worklist that are triggered by the rewrite. For example,
the removal of a free occurrence (as can happen for multiple
variables when applying D EAD-VAL ) can induce a D EAD-?
reduction (if no occurrences remain) or a β-?-L IN reduction
(if only a single occurrence remains).

In the current implementation, the worklist is represented as a
queue, but it should be possible to thread it through the term itself.
Shrinking reductions could then be performed with constant space
overhead.
4.2 Comparison with Appel/Jim
The representation of Appel and Jim (1997) did not make use of
union-find to locate binders. Instead, (a) the circular list of variable
occurrences included the bound occurrence, thus giving constant
time access to the binder in the case that the free variable is unique,
and (b) for letval-bound variables, each free occurrence contained
an additional pointer to its binder. When performing a substitution
operation, these binder links must be updated, using time linear in
the number of occurrences; fortunately, for any particular variable
this can happen only once during shrinking reductions, as letvalbound variables cannot become rebound. Thus the cost is amortized
across the shrinking reductions.
Unfortunately the lack of binder occurrences for non-letvalbound variables renders less efficient other optimizations such as
η-reduction. Take an instance of η-PAIR:
let x1 = π1 x in C[let x2 = π2 x in C 0 [letval y = (x1 , x2 ) in K]]
→ let x1 = π1 x in C[let x2 = π2 x in C 0 [K[x/y]]]
Just to locate the binder for x1 and x2 would take time linear in the
number of occurrences.
Our use of union-find gives us efficient implementation of all
shrinking reductions, and of other transformations too; moreover,
when analysing efficiency we need not be concerned whether variables are letval-bound or not.
4.3 Performance results
We have modified the SML.NET compiler to make use of a typed
CPS intermediate language only mildly more complex than that
shown in Figure 7. It employs the graphical representation of terms
described above; in particular, the simplifier performs shrinking
reductions exhaustively on a term representing the whole program,
and it is invoked a total of 15 times during compilation.
Table 1 presents some preliminary benchmark results showing average time spent in simplification, time spent in monomorphisation, and time spent in unit-removal (e.g. transformation of
unit*int values to int). We compare (a) the released version of
SML.NET, implementing a monadic intermediate language (MIL)
and functional-style simplification algorithm, (b) the Appel/Jimstyle graph representation adapted to MIL terms implemented by
Lindley (Benton et al. 2004a; Lindley 2005), and (c) the new graphbased CPS representation with union-find. Tests were run on a
3Ghz Pentium 4 PC with 1GB of RAM running Windows Vista.
The SML.NET compiler is implemented in Standard ML and compiled using the MLton optimizing compiler, which generates high
quality code from both functional and imperative coding styles – so
giving both techniques a fair shot.
As can be seen from the figures, the graph-based simplifier for
the monadic language is significantly faster than the functional simplifier – and although all times are small, bear in mind that the
simplifier is run many times during compilation. Unit removal is
roughly comparable in performance across implementations. Interestingly, the graph-based CPS implementation of monomorphisation runs up to twice as slowly as the functional monadic implementation. We conjecture that this is because monomorphisation
necessarily copies (and specializes) terms, and CPS terms tend to
be larger than MIL terms, and the graph representation is larger
still.
These figures come with a caveat: the comparison is somewhat
“apples and oranges”. There are differences between the MIL, gMIL and g-CPS representations that are unrelated to monads or

Table 1. Optimization times (in seconds)
Benchmark Lines
Phase
MIL g-MIL
raytrace
2,500
Simp
0.12 0.01
mlyacc
6,200
Simp
0.44 0.02
smlnet
80,000 Simp
7.29 0.29
Mono
0.75 n/a
Deunit 0.76 1.3
hamlet
20,000 Simp
0.97 0.08
Mono
0.15 n/a
Deunit 0.12 0.16

g-CPS
0.01
0.02
0.15
1.41
0.6
0.04
0.19
0.14

CPS. Future work is to make a fairer comparison, implementing
a functional version of the CPS terms, and perhaps also a precise
monadic analogue.

5. Contification
Our CPS languages make a syntactic distinction between functions
and local continuations. The former are typically compiled as heapallocated closures or as known functions, whilst the latter can always be compiled as inline code with continuation applications
compiled as jumps. For efficiency it is therefore desirable to transform functions into continuations, a process that has been termed
contification (Fluet and Weeks 2001).
Functions can be contified when they always return to the same
place. Consider the following code written in the subset of SML
studied in Section 2:
let fun f x = ...
in g (case d of in1 d1 => f y | in2 d2 => f d2) end

If f returns at all, it must pass control to g. Here, this is obvious,
but for more complex examples it is not so apparent. Now consider
its CPS transform:
letval f = (λk x.· · · k · · ·) in
letcont k0 w = g r w in
letcont j1 d1 = f k0 y in
letcont j2 d2 = f k0 d2 in
case d of j1 8 j2
It is clear that f is always passed the same continuation k0 – and
so, unless it diverges, it must return through k0 and so pass control
to g. We can transform f into a local continuation, as follows:
letcont k0 w = g r w in
letcont j x = · · · k0 · · · in
letcont j1 d1 = j y in
letcont j2 d2 = j d2 in
case d of j1 8 j2
We have done three things: (a) we have replaced the function f by
a continuation j, deleting the return continuation at both definition
and call sites, (b) we have substituted the argument k0 for the
formal k in the body of f , and (c) we have moved j so that it is
in the scope of k0 .
Fluet and Weeks (2001) use the dominator tree of a program’s
call graph to contify programs that consist of a collection of
mutually-recursive first-order functions. They show that their algorithm is optimal: no contifiable functions remain after applying
the transformation. Their dominator-based analysis can be adapted
to our CPS languages, and is simpler to describe in this context because all function definitions and uses have a named continuation
(Fluet and Weeks use named continuations only for non-tail calls).
When applied to top-level functions, the transformation is simpler
too, but in the presence of first-class functions and general block
structure the transformation becomes significantly more complex
to describe.

We prefer an approach based on incremental transformation, in
essence repeatedly applying the rewrite illustrated above until no
further rewrites are possible. We consider first the case of nonrecursive functions, then generalize to mutually-recursive functions, and conclude by relating our technique to dominator-based
contification.
5.1 Non-recursive functions
In the untyped language λU
CPS without recursion, it is particularly
straightforward to spot contifiable functions: they are those for
which all occurrences are applications with the same continuation
argument. We define the following rewrite:
C ONT (f not free in C, D and D minimal):
letval f = λk x.K in C[D[f k0 x1 , . . . , f k0 xn ]]
→
C[letcont j x = K[k0 /k] in D[j x1 , . . . , j xn ]]
Here C is a single-hole context as presented in Figure 5 and D is a
multi-hole context whose formalization we omit.
The C ONT rewrite combines three actions: (a) the function f
is replaced by a continuation j, with each application replaced
by a continuation application; (b) the common continuation k0 is
substituted for the formal continuation parameter k in the body K
of f ; and (c) the new continuation j is pulled into the scope
of the continuation k0 . The multi-hole context D is the smallest
context enclosing all uses of f , which ensures that j is in scope
after transformation. The analysis is trivial (just check call sites for
common continuation arguments), yet iterating this transformation
leads to optimal contification, in the sense of Fluet and Weeks
(2001). Here is an example adapted from loc. cit. §5.2,
letval h = λkh xh .· · · in
letval g1 = λk1 x1 .· · · h k1 z1 · · · k1 z8 · · · in
letval g2 = λk2 x2 .· · · h k2 z2 · · · in
letval f = λkf xf .· · · g1 kf z3 · · · g2 kf z4 · · · g2 kf z5 · · · in
letval m = λkm xm .· · · f j1 z6 · · · f j2 z7 in . . .
We can immediately see that g1 and g2 (but not h) are always
passed the same continuation kf , and so we can apply C ONT to
contify them both:
letval h = λkh xh .· · · in
letval f = λkf xf .
(letcont kg1 x1 = · · · h kf z1 · · · kf z8 · · · in
letcont kg2 x2 = · · · h kf z2 · · · in
· · · kg1 z3 · · · kg2 z4 · · · kg2 z5 · · ·) in
letval λm km .xm · · · f j1 z6 · · · f j2 z7 = in . . .
Now h can be contified as it is always passed kf :
letval f = λkf xf .
(letcont kh xh = · · · in
letcont kg1 x1 = · · · kh z1 · · · kf z8 in
letcont kg2 x2 = · · · kh z2 · · · in
· · · kg1 z3 · · · kg2 z4 · · · kg2 z5 · · ·) in
letval λm km .xm · · · f j1 z6 · · · f j2 z7 = in . . .
5.2 Recursive functions
Generalizing to recursive functions and continuations is a little
trickier. Suppose we have a λTCPS term of the form
letfun
···
in K.

f1 k1 h1 x1 = K1
fn kn hn xn = Kn

A set of functions F ⊆ {f1 , . . . , fn } can be contified collectively,
written Contifiable(F ), if there is some pair of continuations k0
and h0 such that each occurrence of f ∈ F is either a tail call

within F or is a call with continuation arguments k0 and h0 . Intuitively, each function (eventually) returns to the same place (k0 ),
or throws an exception that is caught by the same handler (h0 ),
though control may pass tail-recursively through other functions
in F . There may be many such subsets F ; we assume that F is in
fact strongly-connected with respect to tail calls contained within it
(or is a trivial singleton with no tail calls). Then for a given letfun
term there is a unique partial partition of the functions into disjoint
subsets satisfying Contifiable(−).
Let F = {f1 , . . . , fm }. Define a translation on function applications
(
(f k h x)? =

ji x
if f = fi ∈ F
f k h x otherwise

and extend this to all terms. Assuming that Contifiable(F ) holds,
there are two possibilities.
1. All applications of the form f k0 h0 x for f ∈ F are in the
term K. Then we can apply the following rewrite, which is the
direct analogue of C ONT.
R EC C ONT (f1 , . . . , fm not free in C, and K minimal):
letfun f1 k1 h1 x1 = K1
···
fn kn hn xn = Kn
in
C[K]
→
letfun fm+1 km+1 hm+1 xm+1 = Km+1
· · · fn kn hn xn = Kn
in C[letcont j1 x1 = K1? [k0 /k1 , h0 /h1 ]
?
· · · jm xm = Km
[k0 /km , h0 /hm ]
in K ? ]
2. Otherwise, all applications of the form f k0 h0 x for f ∈ F
are in the body of one of the functions outside of F ; without
loss of generality we assume this is fn .
R EC C ONT 2 (f1 , . . . , fm not free in C, and Kn minimal):
letfun f1 k1 h1 x1 = K1
···
fn−1 kn−1 hn−1 xn−1 = Kn−1
fn kn hn xn = C[Kn ]
in K
→
letfun fm+1 km+1 hm+1 xm+1 = Km+1
· · · fn−1 kn−1 hn−1 xn−1 = Kn−1
fn kn hn xn =
C[letcont j1 x1 = K1? [k0 /k1 , h0 /h1 ]
?
[k0 /km , h0 /hm ]
· · · jm xm = Km
?
in Kn ]
in K
For an example of the latter, more complex, transformation,
consider the following SML code:
let fun
|
and
|
in unif

unif(Ap(a,xs),Ap(b,ys)) = (unif(a,b);unifV(xs,ys))
unif(Ar(a,b),Ar(c,d)) = unifV([a,b],[c,d])
unifV(x::xs,y::ys) = (unif(x,y);unifV(xs,ys))
unifV([],[]) = ()
end

The function unifyV can be contified into the definition of unif: it
tail-calls itself, and its uses inside unif have the same continuation.
5.3 Comparing dominator-based contification
The dominator-based approach of Fluet and Weeks (2001) can be
recast in our CPS language as follows. (For simplicity we do not
consider exception handler continuations here). First construct a
continuation flow graph for the whole program. Nodes consist of
continuation variables and a distinguished root node. Then for each

function f with return continuation k, if f is passed around as a
first-class value then create an edge from root to k; otherwise, for
each application f j x create an edge from j to k. Finally, for each
local continuation k create an edge from root to k.
The non-recursive C ONT rewrite has the effect of merging two
nodes in the graph, as follows:

>>
>>
>>

7654
/ 0123
@ k0

@
=⇒

7654
/ 0123
k> /
>>
>>
>

??
?
??

?? 
 
7654
/ 0123
? k0 ?? /

??

??




The recursive R EC C ONT and R EC C ONT 2 rewrites are similar,
except that in place of k we have a strongly-connected component
{k1 , . . . , km }.
0123
7654
/
??
>>
?
; kO1
?

>>
x
>> xxxx

0123
x / /.-,
()*+
/ 7654
? k0 FFF kO i
FF
F# 
0123
7654
km

/

=⇒

/

??

? 
7654
/ 0123
/
k
? 0 ???

??

?


Conversely, any part of the flow graph matching the left-hand-side
of this diagram corresponds to a contifiable subset of functions in a
letfun to which the R EC C ONT or R EC C ONT 2 rules can be applied.
It is immediately clear that exhaustive rewriting terminates,
as the flow graph decreases in size with each rewrite, eventually
producing a graph with no occurrences of the pattern above.
The algorithm described by Fluet and Weeks (2001) contifies k
if it is strictly dominated by some continuation j whose immediate
dominator is root. It can be shown that if a rooted graph contains
such a pair of nodes j and k, then some part of the graph matches
the pattern above. Hence exhaustive rewriting has the same effect
as as optimal contification based on dominator trees.

6. Related work and conclusion
The use of continuation-passing style for functional languages has
its origins in Scheme compilers (Steele 1978; Kranz et al. 1986).
It later formed the basis of the Standard ML of New Jersey compiler (Appel 1992; Shao and Appel 1995).
In early compilers, lambdas originating from the CPS transformation were not distinguished from lambdas present in the source,
so some effort was expended at code generation time to determine
which lambdas could be stack-allocated and which could be heapallocated. Later compilers made a syntactic distinction between
true functions and ‘second-class’ continuations introduced by CPS;
and sometimes transformed one into the other (Kelsey and Hudak
1989), though contification was not studied formally.
A number of more recent compilers use what has been called
almost CPS. The Sequentialized Intermediate Language (SIL) employed by Tolmach and Oliva (1998) is a monadic-style language in
which a letcont-like feature is used to introduce join points. Somewhat closer to our CPS language is the First Order Language (FOL)
of the MLton compiler (Fluet and Weeks 2001). It goes further than
SIL in making use of named local continuations in all branch constructs and non-tail calls. However, functions are not parameterized
on return (or handler) continuations, and there is special syntax for
tail calls and returns. This non-uniform treatment of continuations
complicates transformations – inlining of non-tail functions must
replace all ‘return points’ with jumps, and the contification analysis and transformation must treat tail and non-tail calls differently.
We have found the uniform treatment of continuations in our
CPS language to be a real benefit, not only as a simplifying force in
implementation, but also in thinking about compiler optimizations:

contification, in particular, is difficult to characterize in the absence
of a notion of continuation passing.
As far as we are aware, we are the first to implement lineartime shrinking reductions in the style of Appel and Jim (1997). An
earlier term-graph implementation by Lindley was for a monadic
language and had worst-case O(n2 ) behaviour due to commuting
conversions (Benton et al. 2004a; Lindley 2005). Shivers and Wand
(2005) have proposed a rather different graph representation for
lambda terms, with the goal of sharing subterms after β-reduction.
Their representation does bear some resemblance to ours, though,
with up-links from subterms to enclosing terms, and circular lists
that connect the sites where a term is substituted for a variable.
This paper would not be complete without a mention of Static
Single Assignment form (SSA), the currently fashionable intermediate representation for imperative languages. As is well known,
SSA is in some sense equivalent to CPS (Kelsey 1995) and to
ANF (Appel 1998). Its focus is intra-procedural optimization (as
with ANF, it’s necessary to renormalize when inlining functions,
in contrast to CPS) and there is a large body of work on such optimizations. Future work is to transfer SSA-based optimizations to
CPS. We conjecture that CPS is a good fit for both functional and
imperative paradigms.

Acknowledgments
I would like to thank Nick Benton, Olivier Danvy, Sam Lindley,
Simon Peyton Jones and Claudio Russo for fruitful discussions on
compiler intermediate languages. Georges Gonthier suggested the
use of union-find in the graphical representation of terms.

References
Andrew W. Appel. Compiling with Continuations. Cambridge University
Press, 1992.
Andrew W. Appel. SSA is functional programming. SIGPLAN Notices, 33
(4):17–20, 1998.
Andrew W. Appel and Trevor Jim. Shrinking lambda expressions in linear
time. Journal of Functional Programming, 7(5):515–540, 1997.
Franz Baader and Tobias Nipkow. Term Rewriting and All That. Cambridge
University Press, 1998.
Nick Benton and Peter Buchlovsky. Semantics of an effect analysis for
exceptions. In ACM SIGPLAN International Workshop on Types in
Language Design and Implementation (TLDI), pages 15–26, 2007.
Nick Benton and Andrew Kennedy. Exceptional syntax. Journal of Functional Programming, 11(4):395–410, 2001.
Nick Benton, Andrew Kennedy, and George Russell. Compiling Standard
ML to Java bytecodes. In 3rd ACM SIGPLAN International Conference
on Functional Programming. ACM Press, September 1998.
Nick Benton, Andrew Kennedy, Sam Lindley, and Claudio Russo. Shrinking reductions in SML.NET. In 16th International Workshop on Implementation and Application of Functional Languages (IFL), 2004a.

Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen.
The essence of compiling with continuations (with retrospective). In
McKinley (2004), pages 502–514.
Matthew Fluet and Stephen Weeks. Contification using dominators. In
ICFP’01: Proceedings of the Sixth ACM SIGPLAN International Conference on Functional Programming, pages 2–13. ACM Press, September 2001.
John Hatcliff and Olivier Danvy. A generic account of continuation-passing
styles. In Principles of Programming Languages (POPL), pages 458–
471, 1994.
Haim Kaplan, Nira Shafrir, and Robert E. Tarjan. Union-find with deletions.
In SODA ’02: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pages 19–28, Philadelphia, PA, USA, 2002.
Society for Industrial and Applied Mathematics. ISBN 0-89871-513-X.
Richard Kelsey. A correspondence between continuation passing style
and static single assignment form. In Intermediate Representations
Workshop, pages 13–23, 1995.
Richard A. Kelsey and Paul Hudak. Realistic compilation by program
transformation. In Principles of Programming Languages (POPL).
ACM, January 1989.
Jung-taek Kim, Kwangkeun Yi, and Olivier Danvy. Assessing the overhead
of ML exceptions by selective CPS transformation. In ACM SIGPLAN
Workshop on ML, pages 112–119, 1998. Also appears as BRICS technical report RS-98-15.
David A. Kranz, Richard A. Kelsey, Jonathan A. Rees, Paul Hudak, and
James Philbin. ORBIT: an optimizing compiler for scheme. In Proceedings of the ACM SIGPLAN symposium on Compiler Construction, pages
219–233, June 1986.
Sam Lindley. Normalisation by evaluation in the compilation of typed functional programming languages. PhD thesis, University of Edinburgh,
2005.
Kathryn S. McKinley, editor. 20 Years of the ACM SIGPLAN Conference
on Programming Language Design and Implementation 1979-1999, A
Selection, 2004. ACM.
Eugenio Moggi. Notions of computation and monads. Information and
Computation, 93:55–92, 1991.
A. M. Pitts. Typed operational reasoning. In B. C. Pierce, editor, Advanced
Topics in Types and Programming Languages, chapter 7, pages 245–289.
The MIT Press, 2005.
Amr Sabry and Philip Wadler. A reflection on call-by-value. ACM Transactions on Programming Languages and Systems (TOPLAS), 19(6):916–
941, November 1997. ISSN 0164-0925.
Zhong Shao and Andrew W. Appel. A type-based compiler for Standard
ML. In Proc. 1995 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 116–129, La Jolla, CA,
Jun 1995.
Olin Shivers. Higher-order control-flow analysis in retrospect: lessons
learned, lessons abandoned (with retrospective). In McKinley (2004),
pages 257–269.
Olin Shivers and Mitchell Wand. Bottom-up β-reduction: Uplinks and λDAGs. In European Symposium on Programming (ESOP), pages 217–
232, 2005.

Nick Benton, Andrew Kennedy, and Claudio Russo. Adventures in interoperability: The SML.NET experience. In 6th International Conference on
Principles and Practice of Declarative Programming (PPDP), 2004b.

Guy L. Steele. RABBIT: A compiler for SCHEME. Technical Report AITR-474, MIT, May 1978.

Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein.
Introduction to Algorithms. MIT Press, second edition, 2001.

Hayo Thielecke. Comparing control constructs by double-barrelled CPS.
Higher-Order and Symbolic Computation, 15(2/3):141–160, 2002.

Olivier Danvy. A new one-pass transformation into monadic normal form.
In 12th International Conference on Compiler Construction (CC’03),
2003.

Andrew P. Tolmach and Dino Oliva. From ML to Ada: Strongly-typed
language interoperability via source translation. Journal of Functional
Programming, 8(4):367–412, 1998.

Olivier Danvy and Andrzej Filinski. Representing control: A study of the
CPS transformation. Mathematical Structures in Computer Science, 2
(4):361–391, 1992.

Philip Wadler and Peter Thiemann. The marriage of effects and monads. In
ACM SIGPLAN International Conference on Functional Programming
(ICFP), 1998.

Olivier Danvy and Lasse R. Nielsen. A first-order one-pass CPS transformation. Theor. Comput. Sci., 308(1-3):239–257, 2003.