Compiling with Continuations, Continued Andrew Kennedy Microsoft Research Cambridge akenn@microsoft.com Abstract We present a series of CPS-based intermediate languages suitable for functional language compilation, arguing that they have practical benefits over direct-style languages based on A-normal form (ANF) or monads. Inlining of functions demonstrates the benefits most clearly: in ANF-based languages, inlining involves a renormalization step that rearranges let expressions and possibly introduces a new ‘join point’ function, and in monadic languages, commuting conversions must be applied; in contrast, inlining in our CPS language is a simple substitution of variables for variables. We present a contification transformation implemented by simple rewrites on the intermediate language. Exceptions are modelled using so-called ‘double-barrelled’ CPS. Subtyping on exception constructors then gives a very straightforward effect analysis for exceptions. We also show how a graph-based representation of CPS terms can be implemented extremely efficiently, with linear-time term simplification. Categories and Subject Descriptors guages]: Processors – Compilers General Terms D.3.4 [Programming Lan- Languages Keywords Continuations, continuation passing style, monads, optimizing compilation, functional programming languages 1. Introduction Compiling with continuations is out of fashion. So report the authors of two classic papers on Continuation-Passing Style in recent retrospectives: “In 2002, then, CPS would appear to be a lesson abandoned.” (McKinley 2004; Shivers 1988) “Yet, compiler writers abandoned CPS over the ten years following our paper anyway.” (McKinley 2004; Flanagan et al. 1993) This paper argues for a reprieve for CPS: “Compiler writers, give continuations a second chance.” This conclusion is borne of practical experience. In the MLj and SML.NET whole-program compilers for Standard ML, coimplemented by the current author, we adopted a direct-style, monadic intermediate language (Benton et al. 1998, 2004b). In part, we were interested in effect-based program transformations, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’07, October 1–3, 2007, Freiburg, Germany. Copyright c 2007 ACM [This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.]. . . $5.00 so monads were a natural choice for separating computations from values in both terms and types. But, given the history of CPS, probably there was also a feeling that “CPS is for call/cc”, something that is not a feature of Standard ML. Recently, the author has re-implemented all stages of the SML.NET compiler pipeline to use a CPS-based intermediate language. Such a change was not undertaken lightly, amounting to roughly 25,000 lines of replaced or new code. There are many benefits: the language is smaller and more uniform, simplification of terms is more straightforward and extremely efficient, and advanced optimizations such as contification are more easily expressed. We use CPS only because it is a good place to do optimization; we are not interested in first-class control in the source language (call/cc), or as a means of implementing other features such as concurrency. Indeed, as SML.NET targets .NET IL, a callstack-based intermediate language with support for structured exception handling, the compilation process can be summarized as “transform direct style (SML) into CPS; optimize CPS; transform CPS back to direct style (.NET IL)”. 1.1 Some history CPS. What’s special about CPS? As Appel (1992, p2) put it, “Continuation-passing style is a program notation that makes every aspect of control flow and data flow explicit”. An important consequence is that full β-reduction (function inlining) is sound. In contrast, for call-by-value languages based on the lambda calculus, only the weaker β-value rule is sound. For example, βreduction cannot be applied to (λx.0) (f y) because f y may have a side-effect or fail to terminate; but its CPS transform, f y (λz.(λx.λk.k 0) z k) can be reduced without prejudice. There are obvious drawbacks: the complexity of CPS terms; the need to eliminate administrative redexes introduced by the CPS transformation; and the cost of allocating closures for lambdas introduced by the CPS transformation, unless some static anlysis is first applied. In fact, these drawbacks are more apparent than real: the complexity of CPS terms is really a benefit, assigning useful names to all intermediate computations and control points; the CPS transformation can be combined with administrative reduction; and by employing a syntactic separation of continuation- and source-lambdas it is possible to generate good code directly from CPS terms. ANF. In their influential paper “The Essence of Compiling with Continuations”, Flanagan et al. (1993) observed that “fully developed CPS compilers do not need to employ the CPS transformation but can achieve the same results with a simple source-level transformation”. They proposed a direct-style intermediate language based on A-normal forms, in which a let construct assigns names to every intermediate computation. For example, the term above is represented as let z = f y in (λx.0) z, to which β-reduction can be applied, obtaining the semantically equivalent let z = f y in 0. This style of language has become commonplace, not only in compilers, but also to simplify the study of semantics for impure functional languages (Pitts 2005, §7.4). monadic terms. Consider the term Monads. Very similar to ANF are so-called monadic languages based on Moggi’s computational lambda calculus (Moggi 1991). Monads also make sequencing of computations explicit through a let x ⇐ M in N binding construct, the main difference from ANF being that let constructs can themselves be let-bound. The separation of computations from values also provides a place to hang effect annotations (Wadler and Thiemann 1998) which compilers can use to perform effect-based optimizing transformations (Benton et al. 1998). This is in ANF, but β-reduction produces let z = (λx.if x then a else b) c in M let z = (if c then a else b) in M, which is not in normal form because it contains a let-bound conditional expression. To reduce it to normal form, one must either apply a standard commuting conversion that duplicates the term M , producing if c then let z = a in M else let z = b in M , or introduce a ‘join-point’ function for term M , to give 1.2 The problem A-Normal Form is put forward as a compiler intermediate language with all the benefits of CPS (Flanagan et al. 1993, §6). Unfortunately, the normal form is not preserved under useful compiler transformations such as function inlining (β-reduction). Consider the ANF term M ≡ let x = (λy.let z = a b in c) d in e. Now naive β-reduction produces let x = (let z = a b in c) in e which is not in normal form. The ‘fix’ is to define a more complex notion of β-reduction that re-normalizes let constructs (Sabry and Wadler 1997), in this case producing the normal form let z = a b in (let x = c in e). In contrast, the CPS transform of M , namely (λy.λk.ab(λz.k c)) d (λx.k e), simplifies by simple β-reduction to a b (λz.(λx.k e) c). As Sabry and Wadler explain in their study of the relationship between CPS and monadic languages, “the CPS language achieves this normalization using the metaoperation of substitution which traverses the CPS term to locate k and replace it by the continuation thus effectively ‘pushing’ the continuation deep inside the term” (Sabry and Wadler 1997, § 8). Monadic languages permit let expressions to be nested, but incorporate so-called commuting conversions (cc’s) such as let y ⇐ (let x ⇐ M in N ) in P → let x ⇐ M in (let y ⇐ N in P ). ANF can be seen as a monadic language in which β-reduction is combined with cc-normalization ensuring that terms remain in ccnormal form. All of the above seems quite benign; except for two things: 1. Commuting conversions increase the complexity of simplifying intermediate language terms. Reductions that strictly decrease the size of the term can be applied exhaustively on CPS terms, the number of reductions applied being linear in the size of the term. The equivalent ANF or monadic reductions must necessarily involve commuting conversions, which leads to O(n2 ) reductions in the worst case. Moreover, as Appel and Jim (1997) have shown, given a suitable term representation, shrinking reductions on CPS can be applied in time O(n); it is far from clear how to amortize the cost of commuting conversions to obtain a similar measure for ANF or monadic simplification. 2. Real programming languages include conditional expressions, or, more generally, case analysis on datatype constructors. These add considerable complexity to reductions on ANF or let k z = M in if c then let z = a in k z else let z = b in k z. Observe that k is simply a continuation! In our CPS language, k is already available in the original term, being the (named) continuation that is passed to the function to be inlined. The desire to share subterms almost forces some kind of continuation construct into the language. Better to start off with a language that makes continuations explicit. 1.3 Contribution Much of the above has been said before by others, though not always in the context of compilation; in this author’s opinion, the most illuminating works are Appel (1992); Danvy and Filinski (1992); Hatcliff and Danvy (1994); Sabry and Wadler (1997). One contribution of this paper, then, is to draw together these observations in a form accessible to implementers of functional languages. As is often the case, the devil is in the details, and so another purpose of this paper is to advocate a certain style of CPS that works very smoothly for compilation. Continuations are named and mandatory (just as every intermediate value is named, so is every control point), are second-class (they’re not general lambdas), can represent basic blocks and loops, can be shared (typically, through common continuations of branches), represent exceptional control flow (using double-barrelled CPS), and are typeable (but can be used in untyped form too). By refining the types of exception values in the double-barrelled variant we get an effect system for exceptions ‘for free’. We make two additional contributions. Following Appel and Jim (1997), we describe a graph-based representation of CPS terms that supports the application of shrinking β-reductions in time linear in the size of the term. We improve on Appel and Jim’s selective use of back pointers for accessing variable binders, and employ the union-find data structure to give amortized near-constanttime access to binders for all variable occurrences. This leads to efficient implementation of η-reductions and other transformations. We present benchmark results comparing our graph-CPS representation with (a) an earlier graphical representation of the original monadic language used in our compiler, and (b) the original functional representation of that language. Lastly, we show how to transform functions into local continuations using simple term rewriting rules. This approach to contification avoids the need for a global dominator analysis (Fluet and Weeks 2001), and furthermore supports nested and first-class functions. 2. Untyped CPS We start by defining an untyped continuation-passing language λU CPS that supports non-recursive functions, the unit value, pairs, and tagged values. Even for such a simple language, we can cover many of the issues and demonstrate advantages over alternative, direct-style languages. • The expression let x = πi y in K projects the i’th component Grammar (terms) CTm 3 K, L (values) CVal 3 V, W ::= | | | | | letval x = V in K let x = πi x in K letcont k x = K in L kx fkx case x of k1 8 k2 ::= () | (x, y) | ini x | λk x.K Well-formed terms Γ ` V ok Γ, x; ∆ ` K ok (let) Γ; ∆ ` letval x = V in K ok (letc) Γ, x; ∆ ` K ok Γ; ∆, k ` L ok Γ; ∆ ` letcont k x = K in L ok (proj) x ∈ Γ Γ, y; ∆ ` K ok i ∈ 1, 2 Γ; ∆ ` let y = πi x in K ok k ∈ ∆, x ∈ Γ (appc) Γ; ∆ ` k x ok (case) x ∈ Γ, k1 , k2 ∈ ∆ Γ; ∆ ` case x of k1 8 k2 ok Well-formed values x, y ∈ Γ (pair) Γ ` (x, y) ok (unit) k ∈ ∆, f, x ∈ Γ (app) Γ; ∆ ` f k x ok Γ ` () ok (tag) (abs) x∈Γ i ∈ 1, 2 Γ ` ini x ok Γ, x; k ` K ok Γ ` λk x.K ok of a pair y and binds it to variable x in K. • The expression letcont k x = K in L introduces a local continuation k whose single argument is x and whose body is K, to be used in term L. It corresponds to a labelled block in traditional lower-level representations. In Section 3 we extend local continuations with support for recursion, and so represent loops directly. • A continuation application k x corresponds to a jump (if k is a local continuation) or a return (if k is the return continuation of a function value). As with values, continuations must be named: function application expressions and case constructs do not have subterms, but instead mention continuations by name. We need only ever substitute continuation variables for continuation variables. Local continuations can be applied more than once, as in letcont j y = K in letcont k1 x1 = (letval x = V1 in j x) in letcont k2 x2 = (letval x = V2 in j x) in case z of k1 8 k2 . Here j is the common continuation, or ‘join point’ for branches k1 and k2 . • The expression f k x is the application of a function f to an argument x and a continuation k whose parameter receives the result of applying the function. If k is the return continuation for the nearest enclosing λ, then the application is a ‘tail call’. For example, consider the function value Well-formed programs (prog) λk x.(letcont j y = g k y in f j x). {}; halt ` K ok Figure 1. Syntax and scoping rules for untyped language λU CPS Here g is in tail position, and f is not. In effect, we are defining λx.g(f (x)). • The construct case x of k1 8 k2 expects x to be bound to a tagged value ini y and then dispatches to the appropriate continuation ki , passing y as argument. In Section 3, we add recursive functions, types, polymorphism, exceptions, and effect annotations. At that point, the language resembles a practical CPS-based intermediate language of the sort that could form the core of a compiler for SML, Caml, or Scheme. Figure 1 presents the syntax of the untyped language. Here ordinary variables are ranged over by x, y, f , and g, and continuation variables are ranged over by k and j. Indices i range over 1,2. We specify scoping of variables using well-formedness rules for values and terms. Here Γ ` V ok means that value V is well-formed in the scope of a list of ordinary variables Γ, and Γ; ∆ ` K ok means that term K is well-formed in the scope of a list of continuation variables ∆ and a list of ordinary variables Γ. Complete programs are well-formed in the context of a distinguished top-level continuation halt. (For the typed variant of our language there will be typing rules with Γ and ∆ generalized to typing contexts.) We describe the constructs of the language in turn. • The expression letval x = V in K binds a value V to a variable x in the term K. This is the only way a value V can be used in a term; arguments to functions, case scrutinees, components of pairs, and so on, must all be simple variables. Even the unit value () must be bound to a variable before being used (in the full language, the same holds even for constants such as 42). This means that there is no need for a general notion of substitution: we only substitute variables for variables. Notice also that there is no notion of redundant binding such as let x ⇐ y in K. • Values include the unit value (), pairs (x, y) and tagged values ini x. Function values λk x.K include a return continuation k and argument x. Note carefully the well-formedness rule (abs): its continuation context includes only the return continuation k, thus enforcing locality of continuations introduced by letcont. The semantics is given by environment-style evaluation rules, presented in Figure 2. As is conventional, we define a syntax of run-time values, ranged over by r, supporting the unit value, pairs, constructor applications, and closures. Environments map variables to run-time values, and continuation variables to continuation values. Continuation values are represented in a closure form, which gives the impression that they are first-class. An alternative would be to model stack frames more directly and thereby demonstrate that continuations are in fact just code pointers. For the purpose of simply defining the meaning of programs we prefer the closurebased semantics. The function J·K ρ interprets a value expression in an environment ρ. Terms are evaluated in an environment ρ; the only observations that we can make of programs are termination, i.e. the application of the top-level continuation halt to a unit value. 2.1 CPS transformation To illustrate how the CPS-based language can be used for functional language compilation, consider a fragment of Standard ML Runtime values: r ::= () | (r1 , r2 ) | ini r | hρ, λk x.Ki Continuation values: c ::= hρ, λx.Ki Environments: ρ ::= • | ρ, x 7→ r | ρ, k 7→ c Interpretation of values: J()K ρ Jini V K ρ = = () ini (ρ(x)) J(x, y)K ρ Jλk x.KK ρ = = (ρ(x), ρ(y)) hρ, λk x.Ki Evaluation Rules: (e-let) ρ, x 7→ JV K ρ ` K ⇓ ρ ` letval x = V in K ⇓ (e-letc) ρ, k 7→ hρ, λx.Ki ` L ⇓ ρ ` letcont k x = K in L ⇓ (e-proj) ρ, y 7→ ri ` K ⇓ ρ ` let y = πi x in K ⇓ (e-appc) ρ0 , y 7→ ρ(x) ` K ⇓ ρ`kx⇓ (e-case) ρ0 , y 7→ r ` K ⇓ ρ ` case x of k1 8 k2 ⇓ ρ(x) = (r1 , r2 ) ρ(k) = hρ0 , λy.Ki ρ(x) = ini r ρ(ki ) = hρ0 , λy.Ki 0 (e-app) (e-halt) ρ , j 7→ ρ(k), y → 7 ρ(x) ` K ⇓ ρ(f ) = hρ0 , λj y.Ki ρ`f kx⇓ ρ ` halt x ⇓ Figure 2. Evaluation rules for J·K JxK κ J()K κ Je1 e2 K κ tion of the standard higher-order one-pass call-by-value transformation (Danvy and Filinski 1992). An alternative, first-order, transformation is described by Danvy and Nielsen (2003). The transformation works by taking a translation-time function κ as argument, representing the ‘context’ into which the translation of the source term is embedded. For our language, the context’s argument is a variable, as all intermediate results are named. Note some conventions used in Figure 3: translation-time lambda abstraction is written using  and translation-time application is written κ(. . .), to distinguish from λ and juxtaposition used to denote lambda abstraction and application in the target language. Also note that any object variables present in the target terms but not in the source are assumed fresh with respect to all other bound variables. The translation is one-pass in the sense that it introduces no ‘administrative reductions’ (here, β-redexes for continuations) that must be removed in a separate phase, except for let constructs (to avoid these also would require analysis of the let expression; we prefer to apply simplifying rewrites on the output of the transformation). However, the translation is naive in two ways. First, it introduces η-redexes for continuations when translating tail function applications. For example, Jfn x => f (x,y)K κ produces letval g = λk x.(letval p = (x, y) in letcont j z = k z in f j p) in κ(g) whose η-redex (highlighted) can be eliminated to obtain the more compact letval g = (λk x.letval p = (x, y) in f k p) in κ(g). λU CPS : = = = ML → (Var → CTm) → CTm κ(x) letval x = () in κ(x) Je1 K (z1 . Je2 K (z2 . letcont k x = κ(x) in z1 k z2 )) J(e1 ,e2 )K κ = Je1 K (z1 . Je2 K (z2 . letval x = (z1 , z2 ) in κ(x))) Jini eK κ = JeK (z.letval x = ini z in κ(x)) J#i eK κ = JeK (z.let x = πi z in κ(x)) Jfn x => eK κ = letval f = λk x.JeK (z.k z) in κ(f ) Jlet val x = e1 in e2 endK κ = letcont j x = Je2 K κ in Je1 K (z.j z) Jcase e of in1 x1 => e1 | in2 x2 => e2 K κ = JeK (z.letcont k1 x1 = Je1 K κ in letcont k2 x2 = Je2 K κ in case z of k1 8 k2 ) Figure 3. Naive CPS transformation of toy ML into λU CPS whose expressions (ranged over by e) have the following syntax: ML 3 e ::= x | e e0 | fn x => e | (e,e0 ) | #i e | () | ini e | let val x = e in e0 end | case e of in1 x1 => e1 | in2 x2 => e2 We assume a datatype declared by datatype (’a,’b) sum = in1 of ’a | in2 of ’b Expressions in this language can be translated into untyped CPS terms using the function shown in Figure 3. This is an adapta- Second, the translation of case duplicates the context; consider, for example, f (case x of in1 x1 => e1 | in2 x2 => e2 ) whose translation involves two calls to f . The more sophisticated translation scheme of Figure 4 avoids both these problems; again, this is based on Danvy and Filinski (1992). The translation function J·K is as before, except (a) it introduces a join point continuation to avoid context duplication for case, and (b) for terms in tail position it uses an alternative translation function L·M that takes an explicit continuation variable as argument instead of a context. 2.2 Rewrites After translating from source language to intermediate language, most functional language compilers perform a number of optimization phases that are implemented as transformations on intermediate language terms. Some phases are specific (for example, arityraising of functions, or hoisting expressions out of loops) but usually there is some set of general rewrites based on standard reductions in the lambda-calculus. Figure 5 presents some general rewrites for our CPS-based language. The rewrites look more complicated than the equivalent reductions in the lambda-calculus because the naming of intermediate values forces introduction and elimination forms apart. For example, β-reduction on pairs, which in the lambda calculus is simply πi (e1 , e2 ) → ei , has to support an intervening context C. In practice, the rewrites are not hard to implement. In functional style, value bindings (e.g. pairs) are stored in an environment which is accessed at the reduction site (e.g. a projection). In imperative style, bindings are accessed directly through pointers, as we shall see in Section 4.1. The payoff from this style of rewrite is the selective use of β rules. For example, in a lambda-calculus extended with a let construct, one might perform the reduction let p = (x, y) in M → M [(x, y)/p] but this would be undesirable unless every substitution of (x, y) for p in M produced a redex. In our language, letval p = (x, y) in . . . k p . . . let z = π1 p in K reduces to J·K Jfn x => eK κ : = ML → (Var → CTm) → CTm letval f = λk x. LeM k in κ(f ) Jlet val x = e1 in e2 endK κ = letcont j x = Je2 K κ in Le1 M j Jcase e of in1 x1 => e1 | in2 x2 => e2 K κ = JeK (z. letcont j x = κ(x) in letcont k1 x1 = Le1 M j in letcont k2 x2 = Le2 M j in case z of k1 8 k2 ) L·M LxM k Le1 e2 M k Lfn x => eM k L(e1 ,e2 )M k Lini eM k L()M k L#i eM k Llet val x = e1 in e2 endM k Lcase e of in1 x1 => e1 | in2 x2 => e2 M k : = = = = = = = = = ML → CVar → CTm kx Je1 K (x1 .Je2 K (x2 .x1 k x2 )) letval f = λj x.LeM j in k f Je1 K (x1 .Je2 K (x2 .letval x = (x1 , x2 ) in k x)) JeK (z.letval x = ini z in k x) letval x = () in k x JeK (z.let x ⇐ πi z in k x) letcont j x = Le2 M k in Le1 M j JeK (z.letcont k1 x1 = Le1 M k in letcont k2 x2 = Le2 M k in case z of k1 8 k2 ) Figure 4. Tail CPS transformation (changes and additions only shown) C ::= [] | letval x = V in C | let x = πi y in C | letval x = λk x.C in K | letcont k x = C in K | letcont k x = K in C D EAD-C ONT D EAD-VAL letcont k x = L in K → L (k not free in K) letval x = V in K → K (x not free in K) β-C ONT letcont k x = K in C[k y] → letcont k x = K in C[K[y/x]] letval f = λk x.K in C[f j y] → letval f = λk x.K in C[K[y/x, j/k]] letval x = ini y in C[case x of k1 8 k2 ] → letval x = ini y in C[ki y] letval x = (x1 , x2 ) in C[let y = πi x in K] → letval x = (x1 , x2 ) in C[K[xi /y]] β-F UN β-C ASE β-PAIR β-C ONT -L IN β-F UN-L IN letcont k x = K in C[k y] → C[K[y/x]] (if k not free in C) letval f = λk x.K in C[f j y] → C[K[y/x, j/k]] (f 6= y, f not free in C) η-C ONT η-F UN η-PAIR letcont k x = j x in K → K[j/k] letval f = λk x.g k x in K → K[g/f ] let xi = πi x in C[let xj = πj x in C 0 [letval y = (x1 , x2 ) in K]] → let xi = πi x in C[let xj = πj x in C 0 [K[x/y]]] ({i, j} = {1, 2}) η-C ASE letcont ki x1 = (letval y1 = ini x1 in k y1 ) in C[letcont kj x2 = (letval y2 = inj x2 in k y2 ) in C 0 [case x of k1 8 k2 ]] → letcont ki x1 = (letval y1 = ini x1 in k y1 ) in C[letcont kj x2 = (letval y2 = inj x2 in k y2 ) in C 0 [k x]] ({i, j} = {1, 2}) Figure 5. General rewrites for λU CPS letval p = (x, y) in . . . k p . . . K[x/z] which applies the β-PAIR rule to π1 p but preserves other occurrences of p. It is easy to show that all rewrites preserve well-formedness of terms. In particular, the scoping of local continuations is respected. The β-F UN and β-C ONT reductions are inlining transformations for functions and continuations. The remainder of the reductions we call shrinking reductions, as they strictly decrease the size of terms (Appel and Jim 1997). The β-C ONT -L IN and β-F UN-L IN reductions are special cases of β-reduction for linear uses of a variable, in effect combining D EAD- and β- reductions. Shrinking reductions can be applied exhaustively on a term, and are typically used to ‘clean up’ a term after some special-purpose global transformation such as arity-raising or monomorphisation. Clearly the number of such reductions will be linear in the size of the term; moreover, using the representation of terms described in Section 4 it is possible to perform such reductions in linear time. 2.3 Comparison with a monadic language The original implementations of the MLj and SML.NET compilers used monadic languages inspired by Moggi’s computational lambda calculus (Moggi 1991). Figure 6 presents syntax for a monadic language λmon and selected reduction rules. The defining feature of monadic languages is that sequencing of computations is made explicit through the let construct; values are converted into trivial computations using the val construct. Monadic languages share with CPS languages the property that familiar β-reduction on functions is sound, as evaluation of the function argument is made explicit through let. But there are drawbacks, as we outlined in the Introduction. (An orthogonal issue – as for CPS based languages – is whether values can appear anywhere except inside val. In λmon , for ease of presentation, we permit values to be embedded in applications, pairs, and so on, whereas for λU CPS we insist that they are named. The difference shows up in the reduction rules, which in λU CPS make use of contexts. It should be noted that the drawbacks of monadic languages that we are about to discuss are unaffected by this choice.) Problem 1: need for let/let commuting conversion. The basic reductions listed in Figure 5 have corresponding reductions in CPS. The let construct itself has β and η rules which correspond to β-C ONT and η-C ONT for λU CPS (consider the CPS transforms of the terms). In contrast to CPS-based languages, though, monadic Grammar MTm 3 M, N ::= MVal 3 v, w ::= Reductions β-L ET η-L ET CC-L ET let x ⇐ val v in M → M [v/x] let x ⇐ M in val x → M let x2 ⇐ (let x1 ⇐ M1 in M2 ) in N → let x1 ⇐ M1 in (let x2 ⇐ M2 in N ) let x ⇐ (case v of in1 x1 .M1 8 in2 x2 .M2 ) in N → let f ⇐ val λx.N in case v of in1 x1 .let x ⇐ M1 in f x 8 in2 x2 .let x ⇐ M2 in f x CC-C ASE β-PAIR β-F UN β-C ASE val v | let x ⇐ M in N | v w | πi v | case v of in1 x1 .M1 8 in2 x2 .M2 x | λx.M | (v, w) | ini v | () πi (v1 , v2 ) → vi (λx.M ) v → M [v/x] case ini v of in1 x1 .M1 8 in2 x2 .M2 → Mi [v/xi ] Figure 6. Syntax and selected rewrites for monadic language λmon languages include a so-called commuting conversion, expressing associativity for let: CC-L ET let x2 ⇐ (let x1 ⇐ M1 in M2 ) in N → let x1 ⇐ M1 in (let x2 ⇐ M2 in N ) This reduction plays a vital role in exposing further reductions. Consider the source expression #1 ((fn x => (g x,x)) y) Its translation into λmon is let z2 ⇐ (λx.let z1 ⇐ g x in val (z1 , x)) y in π1 z2 . Problem 2: quadratic blowup. The CC-L ET reduction seems innocent enough. But observe that it is not a shrinking reduction – so it’s not immediately clear whether reduction will terminate. Fortunately, the combination of CC-L ET and shrinking β/η-reductions of Figure 6 does terminate (Lindley 2005), and moreover there is a formal correspondence between the reductions of the monadic language and CPS (Hatcliff and Danvy 1994). Unfortunately, the order in which conversions are applied is critical to the efficiency of simplification by reduction. Consider the following term in λmon : let fn ⇐ val (λxn .let yn ⇐ g xn in g yn ) in let fn−1 ⇐ val (λxn−1 .let yn−1 ⇐ fn xn−1 in g yn−1 ) in .. . let f1 ⇐ val (λx1 .let y1 ⇐ f2 x1 in g y1 ) inf1 a If (linear) β-F UN is applied to all functions in this term, followed by a sequence of CC-L ET reductions, then no redexes remain after O(n) reductions. If, however, the commuting conversions are interleaved with β-F UN, then O(n2 ) reductions are required. (There are other examples where it is better to apply commuting conversions first.) Although this is a pathological example, the ‘simplifier’ was a major bottleneck in the MLj and SML.NET compilers (Benton et al. 2004a), in part (we believe) because of the need to perform commuting conversions. Solution 2: Use CPS. It is interesting to note that monadic terms can be translated into CPS in linear-time; shrinking reductions can be applied exhaustively there in linear-time (see Section 4); and the term can be translated back into CPS in linear-time. Therefore the quadratic blowup we saw above is not fundamental, and there may be some means of amortizing the cost of commuting conversions so that exhaustive reductions can be peformed in linear time. Nevertheless, it is surely better to have the term in CPS from the start, and enjoy the benefit of linear-time simplification. Now suppose that we apply β-F UN, to get let z2 ⇐ (let z1 ⇐ g y in val (z1 , y)) in π1 z2 . In order to make any further progress, we must use CC-L ET to get let z1 ⇐ g y in let z2 ⇐ val (z1 , y) in π1 z2 . Now we can apply β-L ET and β-PAIR to get let z1 ⇐ g y in z1 which further reduces by η-L ET to g y. Solution 1: Use CPS. Now take the original source expression and translate it into our CPS-based language, with k representing the enclosing continuation. let f = λj1 x. (letcont j2 z1 = (letval z2 = (z1 , x) in j1 z2 ) in g j2 x) in letcont j3 z3 = (let z4 = π1 z3 in k z4 ) in f j3 y Applying β-F UN-L IN gives the following, with substitutions highlighted: letcont j3 z3 = (let z4 = π1 z3 in k z4 ) in letcont j2 z1 = (letval z2 = (z1 , y ) in j3 z2 ) in g j2 y and by β-C ONT -L IN on j3 we get letcont j2 z1 = (letval z2 = (z1 , y) in let z4 = π1 z2 in k z4 ) in g j2 y. Finally, use of β-PAIR and D EAD-VAL produces letcont j2 z1 = k z1 in g j2 y which reduces by η-C ONT to g k y. All reductions were simple uses of β and η rules, without the need for the additional ‘administrative’ reduction CC-L ET . Problem 3: need for let/case commuting conversion. Matters become more complicated with conditionals or case constructs. Consider the source expression g 0 (g((fn x => case x of in1 x1 => (x1 ,x3 )| in2 x2 => g 00 x) y)) Its translation into λmon is let z ⇐ (λx.case x of in1 x1 .val (x1 , x3 ) 8 in2 x2 .g 00 x) y in let z 0 ⇐ g z in g 0 z 0 . This reduces by β-F UN to let z ⇐ (case y of in1 x1 .val (x1 , x3 ) 8 in2 x2 .g 00 y) in let z 0 ⇐ g z in g 0 z 0 . At this point, we want to ‘float’ the case expression out of the let. The proof-theoretic commuting conversion that expresses this rewrite is let x ⇐ (case v of in1 x1 .M1 8 in2 x2 .M2 ) in N → case v of in1 x1 .(let x ⇐ M1 in N ) 8 in2 x2 .(let x ⇐ M2 in N ) This can have the effect of exposing more redexes; unfortunately, it also duplicates N which is not so desirable. So instead, compilers typically adopt a variation of this commuting conversion that shares M between the branches, creating a so-called join point function: CC-C ASE let x ⇐ (case v of in1 x1 .M1 8 in2 x2 .M2 ) in N → let f ⇐ val λx.N in case v of in1 x1 .let x ⇐ M1 in f x 8 in2 x2 .let x ⇐ M2 in f x Applying this to our example produces the result let f ⇐ val (λz.let z 0 ⇐ g z in g 0 z 0 ) in case x of in1 x1 .(let z ⇐ val (x1 , x3 ) in f z) 8 in1 x2 .(let z ⇐ g 00 x in f z). As observed earlier, join points such as f are just continuations. Solution 3: Use CPS. Consider the CPS transformation of the original source expression, with k being the enclosing return continuation. letcont j 0 z 0 = g 0 k z 0 in letcont j z = g j 0 z in letval f = λj 00 x. (letcont k1 x1 = (letval z 00 = (x1 , x3 ) in j 00 z 00 ) in letcont k2 x2 = g 00 j 00 x in case x of k1 8 k2 ) in f j y Applying β-F UN-L IN immediately produces the following term, with substitutions highlighted: letcont j 0 z 0 = g 0 k z 0 in letcont j z = g j 0 z in letcont k1 x1 = (letval z 00 = (x1 , x3 ) in j z 00 ) in letcont k2 x2 = g 00 j case y of k1 8 k2 y in There is no need to apply anything analogous to CC-C ASE , or to introduce a join point: the original term already had one, namely j, which was substituted for the return continuation j 00 of the function. The absence of explicit join points in monadic languages is an annoyance in itself. By representing join points as ordinary functions, it is necessary to perform a separate static analysis to determine that such functions can be compiled efficiently as basic blocks. Explicitly named local continuations in CPS have the advantage that locality is immediate from the syntax, and preserved under transformation; furthermore traditional intra-procedural compiler optimizations (such as those performed on SSA representations) can be adapted to operate on functions in CPS form. 2.4 Comparison with ANF Flanagan et al. (1993) propose an alternative to CPS which they call A-Normal Form, or ANF for short. This is defined as the image of the composition of the CPS, administrative normalization and inverse CPS transformations. CS • CPS /• Instead of going via a CPS language, the transformation into ANF can be performed in one pass, as suggested by the dotted line A in the diagram above.1 A similar transformation has been studied by Danvy (2003). As Flanagan et al. (1993) suggest, the “back end of an A-normal form compiler can employ the same code generation techniques that a CPS compiler uses”. However, as we mentioned in the Introduction, it is not so apparent whether ANF is ideally suited to optimization. After all, it is not even closed under the usual rule for β reduction (λx.A) v → A[v/x]. As Sabry and Wadler (1997) later explained, it is necessary to combine substitution with re-normalization to get a sound rule for β-reduction: essentially the repeated application of CC-L ET . They do not consider conditionals or case constructs, but presumably to maintain terms in ANF in it is necessary to normalize with respect to CC-L ET and CC-C ASE following function inlining. It is clear, then, that ANF suffers all the same problems that affect monadic languages: the need for (non-shrinking) commuting conversions, quadratic blowup of ‘linear’ reductions, and the absence of explicit join points. 3. Typed CPS with exceptions We now add types and other features to the language of Section 2. In the untyped world, we can model recursion using a call-by-value fixed-point combinator. For a typed language, we must add explicit support for recursive functions – which, in any case, is more practical. Moreover, we would like to express recursive continuations too, in order to represent loops. Finally, to support exceptions, functions in the extended language take two continuations: an exception-handler continuation, and a return continuation. This is the so-called double-barrelled continuation-passing style (Thielecke 2002). Figure 7 presents the syntax and typing rules for the extended language λTCPS . Types of values are ranged over by τ , σ and include unit, a type of exceptions, products, sums and functions. (To save space, we omit constructs for manipulating exception values.) Continuation types have the form ¬τ which is interpreted as ‘continuations accepting values of type τ ’. Note that for simplicity of presentation we do not annotate terms with types; it is an easy exercise to add sufficient annotations to determine unique typing derivations. Typing judgments for values have the form Γ ` V : τ in which Γ maps variables to value types. Judgments for terms have the form Γ; ∆ ` K ok in which the additional context ∆ maps continuation variables to continuation types. Complete programs are typed in the context of a single top-level continuation halt accepting unit values. We consider each construct in turn. • The letval construct is as before, with the obvious typing rule and associated value typing rules. Likewise for projections. A β-normalization   A(CS ) •o un-CPS • The source language CS is Core Scheme (corresponding to our fragment of ML), and their CPS transformation composed with βnormalization is equivalent to our one-pass transformation J·K of Figure 4. The language A(CS ) corresponds precisely to CC-L ET /CCC ASE normal forms in λmon . We can express these normal forms by a grammar: ATm 3 A, B ::= ACmp 3 R AVal 3 v, w ::= ::= R | let x ⇐ R in A | case v of in1 x1 .A1 8 in2 x2 .A2 v w | πi v | v x | λx.A | (v, w) | ini v | () • The letcont construct is generalized to support mutually recur- sive continuations. These represent loops directly. Local continuations are also used for exception handlers. • The letfun construct introduces a set of mutually recursive functions; each function takes a return continuation k, an exception handler continuation h, and an argument x. As a language construct, there is nothing special about the handler continuation except that its type is fixed to be ¬exn, and so a function type τ → σ is constructed from the argument type τ and the type ¬σ of the return continuation. What really distinguishes 1 Though, curiously, the ‘A-normalization algorithm’ in (Flanagan et al. 1993, Fig. 9) does not actually normalize terms, as it leaves let-bound conditionals alone. Grammar (value types) (values) (terms) τ, σ CVal 3 V, W CTm 3 K, L ::= ::= ::= (function def.) (cont. def.) FunDef 3 F ContDef 3 C ::= ::= Variables x:τ ∈ Γ (var) Γ`x:τ (contvar) unit | exn | τ × σ | τ + σ | τ → σ () | (x, y) | ini x letval x = V in K | let x = πi x in K | letcont C in K | letfun F in K | k x | f k h x | case x of k1 8 k2 f khx = K kx=K k:¬τ ∈ ∆ ∆ ` k : ¬τ Well-typed terms {Γ, xi :τi ; ∆, k1 :¬τ1 , . . . , kn :¬τn ` Ki ok}16i6n Γ; ∆, k1 :¬τ1 , . . . , kn :¬τn ` L ok (letc) Γ; ∆ ` letcont k1 x1 = K1 , . . . , kn xn = Kn in L ok (letrec) {Γ, xi :τi , f1 :τ1 → σ1 , . . . , fn :τn → σn ; ki :¬σi , hi :¬exn ` Ki ok}16i6n Γ, f1 :τ1 → σ1 , . . . , fn :τn → σn ; ∆ ` L ok Γ; ∆ ` letfun f1 k1 h1 x1 = K1 , . . . , fn kn hn xn = Kn in L ok (letv) Γ ` V : τ Γ, x:τ ; ∆ ` K ok Γ; ∆ ` letval x = V in K ok (case) Γ ` x : τ1 + τ2 ∆ ` k1 : ¬τ1 ∆ ` k2 : ¬τ2 Γ; ∆ ` case x of k1 8 k2 ok Well-typed values Γ`x:τ Γ`y:σ (pair) Γ ` (x, y) : τ × σ (appc) Γ ` x : τ ∆ ` k : ¬τ Γ; ∆ ` k x ok (app) (proj) Γ`f :τ →σ Γ ` x : τ1 × τ2 Γ, y:τi ; ∆ ` K ok i ∈ 1, 2 Γ; ∆ ` let y ⇐ πi x in K ok ∆ ` k : ¬σ ∆ ` h : ¬exn Γ; ∆ ` f k h x ok Γ`x:τ Well-typed programs Γ ` x : τi i ∈ 1, 2 (tag) Γ ` ini x : τ1 + τ2 (unit) Γ ` () : unit (prog) {}; halt:¬unit ` K ok Figure 7. Syntax and typing rules for typed language λTCPS exceptions is (a) their role in the translation from source language into CPS, and (b) typical strategies for generating code. • Continuation application k x is as before. Now there are four possibilities for k: it may be a recursive or non-recursive occurrence of a letcont-bound continuation, compiled as a jump, it may be the return continuation, or it may be a handler continuation, which is interpreted as raising an exception. continuation h0 is declared whose body is the translation of e2 ; this is then used as the handler passed to the translation function for e1 . 3.2 Rewrites The rewrites of Figure 5 can be adapted easily to λTCPS , and extended with transformations such as ‘loop unrolling’: β-R EC letfun • Function application f k h x includes a handler continua- tion argument h. If k is the return continuation for the nearest enclosing function, and h is its handler continuation, then the application is a tail call. If k is a local continuation and h is the handler continuation for the enclosing function, then the application is a non-tail call without an explicit exception handler – so exceptions are propagated to the context. Otherwise, h is an explicit handler for exceptions raised by the function. (Other combinations are possible; for example in letfun f k h x = C[g h h y] in K the function application is essentially raise (g y) in a tail position.) → ... in K letfun β-R EC C ONT ... in K letcont → ... in K letcont • Branching using case is as before. ... 3.1 CPS transformation We can extend the fragment of ML described in Section 2.1 with exceptions and recursive functions: ML 3 e ::= MLDef 3 d ::= . . . | raise e | e1 handle x => e2 | let fun d in e end f x=e The revised CPS transformation is shown in Figure 8 (see (Kim et al. 1998) for the selective use of a double-barrelled CPS transformation). Both J·K and L·M take an additional argument: a continuation h for the exception handler in scope. Then raise e is translated as an application of h. For e1 handle x => e2 a local handler f1 k1 h1 x1 = C[fi k h x] f2 k2 h2 x2 = K2 fn kn hn xn = Kn f1 k1 h1 x1 = C[Ki [k/ki , h/hi , x/xi ]] f2 k2 h2 x2 = K2 fn kn hn xn = Kn k1 x1 = C[ki x] k2 x2 = K2 kn xn = Kn k1 x1 = C[Ki [x/xi ]] k2 x2 = K2 kn xn = Kn in K There are no special rewrites for exception handling, e.g. corresponding to (raise M ) handle x.N → let x ⇐ M in N . Standard β-reduction on functions and continuations gives us this for free. For example, the CPS transform of let fun f x = raise x in f y handle z => (z,z) end is letfun f k0 h0 x = h0 x in letcont j z = (letval z 0 = (z, z) in k z 0 ) in f k j y which reduces by β-F UN and β-C ONT to letval z 0 = (y, y) in k z 0 . J·K : ML → CVar → (Var → CTm) → CTm JxK h κ = κ(x) Je1 e2 K h κ = Je1 K h (x1 .Je2 K h (x2 .letcont k x = κ(x) in x1 k h x2 )) Jfn x => eK h κ = letfun f k h0 x = LeM h0 k in κ(f ) J(e1 ,e2 )K h κ = Je1 K h (x1 .Je2 K h (x2 .letval x = (x1 , x2 ) in κ(x))) Jini eK h κ = JeK h (z.letval x = ini z in κ(x)) J()K h κ = letval x = () in κ(x) J#i eK h κ = JeK h (z.let x ⇐ πi z in κ(x)) Jlet val x = e1 in e2 endK h κ = letcont j x = Je2 K h κ in Le1 M h j Jlet fun d in e endK h κ = letfun JdK in JeK h κ Jraise eK h κ = JeK h (z.h z) Je1 handle x => e2 K h κ = letcont j x = κ(x) in letcont h0 x = Le2 M h j in Le1 M h0 j Jcase e of in1 x1 => e1 | in2 x2 => e2 K κ = JeK h (z.letcont j x = κ(x) letcont k1 x1 = Le1 M h j in letcont k2 x2 = Le2 M h j in case z of k1 8 k2 ) J·K Jf x = eK : = MLDef → FunDef f k h x = LeM h k L·M : ML → CVar → CVar → CTm LxM h k = k x Le1 e2 M h k = Je1 K h (x1 .Je2 K h (x2 .x1 k h x2 )) Lfn x => eM h k = letval f = λj x.LeM h j in k f L(e1 ,e2 )M h k = Je1 K h (x1 .Je2 K h (x2 .letval x = (x1 , x2 ) in k x)) Lini eM h k = JeK h (z.letval x = ini z in k x) L()M h k = letval x = () in k x L#i eM h k = JeK h (z.let x ⇐ πi z in k x) Llet val x = e1 in e2 endM h k = letcont j x = Le2 M h k in Le1 M h j Llet fun d in e endM h k = letfun JdK in LeM h k Lraise eM h k = JeK h (z.h z) Le1 handle x => e2 M h k = letcont h0 x = Le2 M h k in Le1 M h0 k Lcase e of in1 x1 => e1 | in2 x2 => e2 M h k = JeK h (z.letcont k1 x1 = Le1 M h k in letcont k2 x2 = Le2 M h k in case z of k1 8 k2 ) Figure 8. Tail CPS transformation for λTCPS Likewise, commuting conversions are not required, in contrast with monadic languages, where in order to define well-behaved conversions it is necessary to generalize the usual M handle x ⇒ N construct to try y ⇐ M in N1 unless x ⇒ N2 , incorporating a success ‘continuation’ N1 (Benton and Kennedy 2001). generalization. For example: (letv) V, α; Γ ` V : τ V; Γ, x:∀α.τ ; ∆ ` K ok V; Γ; ∆ ` letval x = V in K ok For elimination, we simply adapt the variable rule (var) to incorporate polymorphic specialization: 3.3 Other features (var) λTCPS It is straightforward to extend with other features useful for compiling full-scale programming languages such as Standard ML. • Recursive types of the form µα.τ can be supported by adding suitable introduction and elimination constructs: a value fold x and a term let x = unfold y inK. • Binary products and sums generalize to the n-ary case. For opti- mizing representations it is common for intermediate languages to support functions with multiple arguments and results, and constructors taking multiple arguments. This is easy: function definitions have the form f k h x = K, and continuations have the form k x = K and are used for passing multiple results and for case branches where the constructor takes multiple arguments. • Polymorphic types of the form ∀α.τ can be added. Typing con- texts are extended with a set of type variables V. Then to support ML-style let-polymorphism, each value binding construct (letval, letfun, and projection) must incorporate polymorphic x:∀α.τ ∈ Γ Γ ` x : τ [σ/α] 3.4 Effect analysis and transformation The use of continuations in an explicit ‘handler-passing style’ lends itself very nicely to an effect analysis for exceptions. Suppose, for simplicity, that there are a finite number of exception constructors ranged over by E. We make the following changes to λTCPS : • We introduce exception set types of the form {E1 , . . . , En }, representing exception values built with any of the constructors E1 , . . . , En . Set inclusion induces a subtype ordering on exception types, with top type exn representing any exception, and bottom type {} representing no exception. • The type of handler continuations in function definitions are refined to describe the exceptions that the function is permitted to throw. For example: (1) (2) (3) letfun f k (h:¬{}) x = K in . . . letfun f k (h:¬exn) x = K in . . . letfun f k (h:¬{E, E 0 }) x = K in . . . The type of (1) tells us that K never raises an exception, in (2) the function can raise any exception, and in (3) the function might raise E or E 0 . • Now that handlers are annotated with more precise types, the 0 function types must reflect this too. We write τ →σ σ for the type of functions that either return a result of type σ or raise an exception of type σ 0 <: exn. Subtyping on function types and continuation types is specified by the following rules: τ2 <: τ1 σ1 <: σ2 σ10 <: σ20 0 0 σ1 σ2 τ1 → σ1 <: τ2 → σ2 σ2 <: σ1 ¬σ1 <: ¬σ2 Exception effects enable effect-specific transformations (Benton and Buchlovsky 2007). Suppose that the type of f is τ →{E1 } σ. Then we can apply a ‘dead-handler’ rewrite on the following: letcont h:¬{E1 , E2 } x = (case x of E1 .k1 8 E2 .k2 ) in f k h y → letcont h:¬{E1 } x = (case x of E1 .k1 ) in f k h y In fact, there is nothing exception-specific about this rewrite: it is just employing refined types for constructed values. The use of continuations has given us exception effects ‘for free’. 4. Implementing CPS Many compilers for functional languages represent intermediate language terms in a functional style, as instances of an algebraic datatype of syntax trees, and manipulate them functionally. For example, the language λU CPS can be implemented by an SML datatype, here using integers for variables, with all bound variables distinct: type Var = int and CVar = int datatype CVal = Unit | Pair of Var * Var | Inj of int * Var | Lam of CVar * Var * CTm and CTm = LetVal of Var * CVal * CTm | LetProj of Var * int * Var * CTm | LetCont of CVar * Var * CTm * CTm | AppCont of CVar * Var | App of Var * CVar * Var | Case of Var * CVar * CVar Rewrites such as those of Figure 5 are then implemented by a function that maps terms to terms, applying as many rewrites as possible in a single pass. Here is a typical fragment that applies the β-PAIR and D EAD-VAL reductions: fun simp census env S K = case K of LetVal(x, V, L) => if count(census,x) = 0 (* Dead-Val *) then simp census env S L else LetVal(x, simpVal census env S V, simp census (addEnv(env,x,V)) S L) | LetProj(x, 1, y, L) => let val y’ = applySubst S y in case lookup(env, y’) of (* Beta-Pair *) Pair(z,_) => simp census env (extendSubst S (x,z)) L | _ => LetProj(x, 1, y’, simp census env S L) end In addition to the term K itself, the simplifier function simp takes a parameter env that tracks letval bindings, a parameter S used to substitute variables for variables and a parameter census that maps each variable to the number of occurrences of the variable, computed prior to applying the function. The census becomes out-of-date as reductions are applied, and this may cause reductions to be missed until the census is recalculated and simp applied again. For example, the β-PAIR reduction may trigger a D EAD-VAL in an enclosing letval binding (consider letval x = (y1 , y2 ) in . . . let z = π1 x in . . . where x occurs only once). Maintaining accurate census information as rewrites are performed can increase the number of reductions performed in a single pass (Appel and Jim 1997), but even with up-to-date census information, it is not possible to perform shrinking reductions exhaustively in a single pass, so a number of iterations may be required before all redexes have been eliminated. In the worst case, this leads to O(n2 ) behaviour. What’s more, each pass essentially copies the entire term, leaving the original term to be picked up by the garbage collector. This can be expensive. (Nonetheless, the simplicity of our CPS language, with substitutions only of variables for variables, and the lack of commuting conversions as are required in ANF or monadic languages, leads to a very straightforward simplifier algorithm.) 4.1 Graphical representation of terms An alternative is to represent the term using a graph, and to perform rewrites by destructive update of the graph. Appel and Jim (1997) devised a representation for which exhaustive application of the shrinking β-reductions of Figure 5 takes time linear in the size of the term. We improve on their representation to support efficient ηreductions and other transformations. The representation has three ingredients. 1. The term structure itself is a doubly-linked tree. Every subterm has an up-link to its immediately enclosing term. This supports constant time replacement, deletion, and insertion of subterms. 2. Each bound variable contains a link to one of its free occurrences, or is null if the variable is dead, and the free occurrences themselves are connected together in a doubly-linked circular list. This permits the following operations to be performed in constant time: • Determining whether a bound variable has zero, one, or more than one occurrence, and if it has only one occurrence, locating that occurrence. • Determining whether a free variable is unique. • Merging two occurrence lists. Furthermore, we separate recursive and non-recursive uses of variables; in essence, instead of letfun f k h x = K in L we write let f = rec g k h x.K[g/f ] in L. This lets us detect D EAD-? and β-?-L IN reductions. 3. Free occurrences are partitioned into same-binder equivalence classes by using the union-find data structure (Cormen et al. 2001)2 . The representative in each equivalence class (that is, the root of the union-find tree) is linked to its binding occurrence. This supports amortized near-constant time access to the binder (the find operation) and merging of occurrence lists (the union operation). Substitution of variable x for variable y is implemented in nearconstant time by (a) merging the circular lists of occurrences so that x now points to the merged list, and (b) applying a union operation so that the occurrences of y are now associated with the binder for x. Consider the following value term, with doubly-linked tree structure and union-find structure implicit but with binder-to-free 2 Readers familiar with type inference may recall that union-find underpins the almost-linear time algorithm for term unification (Baader and Nipkow 1998). pointer shown as a dotted arrow and circular occurrence lists shown as solid arrows: λ k let p in . x let (x y d , ( ... p R h RR. . . RRR R( z = ... v zh ( ... 6 z = " ( ... v p T ... π1 ) p@ ` @@ in ... ; p Now suppose that we wish to apply β-PAIR to the projection π1 p. Using the find operation on the union-find structure we can locate the pair (x, y) in near constant time. Now we substitute x for z by disconnecting z’s binder from its circular list and connecting x’s occurrence list in its place, and merging the two lists, in constant time. At the same time, we apply the union operation to merge the binder equivalence classes (not shown). λ k x let p = in ... ... " . ( , y ) C xU, , ( h, . . . ... p R ,,RR ,, RRRR z = ,,π1 ( p @ ` @@ in   ... 9 p x i ... 6 x ( v p U let ... Finally we remove the projection itself, deleting the occurrence of p from the circular list, again in constant time: λ k x let p = in ... " . (x y ) ; X2 , 22 ( . . . p jUU2U 2. U.U.UUU UUU . . . *8 p x i ... 6 x ( v p] ... One issue remains: the classical union-find data structure does not support deletion. There are recent techniques that extend union-find with amortized near-constant time deletion (Kaplan et al. 2002). However, the representation is non-trivial, and might add unacceptable overhead to the union and find operations, so we chose instead a simpler solution: do nothing! Deleted occurrences remain in the union-find data structure, possibly as root nodes, or as nodes on the path to the root. In theory, the efficiency of rewriting is then dependent on the ‘peak’ size of the term, not its current size, but we have not found this to be a problem in practice. Each of the shrinking reductions of Figure 5 can be implemented in almost-constant time using our graph representation. To put these together and apply them exhaustively on a term, we follow Appel and Jim (1997): • First sweep over the term, detecting redexes and collecting them in a worklist. • Then pull items off the worklist one at a time (in any order), applying the appropriate rewrite, and adding new redexes to the worklist that are triggered by the rewrite. For example, the removal of a free occurrence (as can happen for multiple variables when applying D EAD-VAL ) can induce a D EAD-? reduction (if no occurrences remain) or a β-?-L IN reduction (if only a single occurrence remains). In the current implementation, the worklist is represented as a queue, but it should be possible to thread it through the term itself. Shrinking reductions could then be performed with constant space overhead. 4.2 Comparison with Appel/Jim The representation of Appel and Jim (1997) did not make use of union-find to locate binders. Instead, (a) the circular list of variable occurrences included the bound occurrence, thus giving constant time access to the binder in the case that the free variable is unique, and (b) for letval-bound variables, each free occurrence contained an additional pointer to its binder. When performing a substitution operation, these binder links must be updated, using time linear in the number of occurrences; fortunately, for any particular variable this can happen only once during shrinking reductions, as letvalbound variables cannot become rebound. Thus the cost is amortized across the shrinking reductions. Unfortunately the lack of binder occurrences for non-letvalbound variables renders less efficient other optimizations such as η-reduction. Take an instance of η-PAIR: let x1 = π1 x in C[let x2 = π2 x in C 0 [letval y = (x1 , x2 ) in K]] → let x1 = π1 x in C[let x2 = π2 x in C 0 [K[x/y]]] Just to locate the binder for x1 and x2 would take time linear in the number of occurrences. Our use of union-find gives us efficient implementation of all shrinking reductions, and of other transformations too; moreover, when analysing efficiency we need not be concerned whether variables are letval-bound or not. 4.3 Performance results We have modified the SML.NET compiler to make use of a typed CPS intermediate language only mildly more complex than that shown in Figure 7. It employs the graphical representation of terms described above; in particular, the simplifier performs shrinking reductions exhaustively on a term representing the whole program, and it is invoked a total of 15 times during compilation. Table 1 presents some preliminary benchmark results showing average time spent in simplification, time spent in monomorphisation, and time spent in unit-removal (e.g. transformation of unit*int values to int). We compare (a) the released version of SML.NET, implementing a monadic intermediate language (MIL) and functional-style simplification algorithm, (b) the Appel/Jimstyle graph representation adapted to MIL terms implemented by Lindley (Benton et al. 2004a; Lindley 2005), and (c) the new graphbased CPS representation with union-find. Tests were run on a 3Ghz Pentium 4 PC with 1GB of RAM running Windows Vista. The SML.NET compiler is implemented in Standard ML and compiled using the MLton optimizing compiler, which generates high quality code from both functional and imperative coding styles – so giving both techniques a fair shot. As can be seen from the figures, the graph-based simplifier for the monadic language is significantly faster than the functional simplifier – and although all times are small, bear in mind that the simplifier is run many times during compilation. Unit removal is roughly comparable in performance across implementations. Interestingly, the graph-based CPS implementation of monomorphisation runs up to twice as slowly as the functional monadic implementation. We conjecture that this is because monomorphisation necessarily copies (and specializes) terms, and CPS terms tend to be larger than MIL terms, and the graph representation is larger still. These figures come with a caveat: the comparison is somewhat “apples and oranges”. There are differences between the MIL, gMIL and g-CPS representations that are unrelated to monads or Table 1. Optimization times (in seconds) Benchmark Lines Phase MIL g-MIL raytrace 2,500 Simp 0.12 0.01 mlyacc 6,200 Simp 0.44 0.02 smlnet 80,000 Simp 7.29 0.29 Mono 0.75 n/a Deunit 0.76 1.3 hamlet 20,000 Simp 0.97 0.08 Mono 0.15 n/a Deunit 0.12 0.16 g-CPS 0.01 0.02 0.15 1.41 0.6 0.04 0.19 0.14 CPS. Future work is to make a fairer comparison, implementing a functional version of the CPS terms, and perhaps also a precise monadic analogue. 5. Contification Our CPS languages make a syntactic distinction between functions and local continuations. The former are typically compiled as heapallocated closures or as known functions, whilst the latter can always be compiled as inline code with continuation applications compiled as jumps. For efficiency it is therefore desirable to transform functions into continuations, a process that has been termed contification (Fluet and Weeks 2001). Functions can be contified when they always return to the same place. Consider the following code written in the subset of SML studied in Section 2: let fun f x = ... in g (case d of in1 d1 => f y | in2 d2 => f d2) end If f returns at all, it must pass control to g. Here, this is obvious, but for more complex examples it is not so apparent. Now consider its CPS transform: letval f = (λk x.· · · k · · ·) in letcont k0 w = g r w in letcont j1 d1 = f k0 y in letcont j2 d2 = f k0 d2 in case d of j1 8 j2 It is clear that f is always passed the same continuation k0 – and so, unless it diverges, it must return through k0 and so pass control to g. We can transform f into a local continuation, as follows: letcont k0 w = g r w in letcont j x = · · · k0 · · · in letcont j1 d1 = j y in letcont j2 d2 = j d2 in case d of j1 8 j2 We have done three things: (a) we have replaced the function f by a continuation j, deleting the return continuation at both definition and call sites, (b) we have substituted the argument k0 for the formal k in the body of f , and (c) we have moved j so that it is in the scope of k0 . Fluet and Weeks (2001) use the dominator tree of a program’s call graph to contify programs that consist of a collection of mutually-recursive first-order functions. They show that their algorithm is optimal: no contifiable functions remain after applying the transformation. Their dominator-based analysis can be adapted to our CPS languages, and is simpler to describe in this context because all function definitions and uses have a named continuation (Fluet and Weeks use named continuations only for non-tail calls). When applied to top-level functions, the transformation is simpler too, but in the presence of first-class functions and general block structure the transformation becomes significantly more complex to describe. We prefer an approach based on incremental transformation, in essence repeatedly applying the rewrite illustrated above until no further rewrites are possible. We consider first the case of nonrecursive functions, then generalize to mutually-recursive functions, and conclude by relating our technique to dominator-based contification. 5.1 Non-recursive functions In the untyped language λU CPS without recursion, it is particularly straightforward to spot contifiable functions: they are those for which all occurrences are applications with the same continuation argument. We define the following rewrite: C ONT (f not free in C, D and D minimal): letval f = λk x.K in C[D[f k0 x1 , . . . , f k0 xn ]] → C[letcont j x = K[k0 /k] in D[j x1 , . . . , j xn ]] Here C is a single-hole context as presented in Figure 5 and D is a multi-hole context whose formalization we omit. The C ONT rewrite combines three actions: (a) the function f is replaced by a continuation j, with each application replaced by a continuation application; (b) the common continuation k0 is substituted for the formal continuation parameter k in the body K of f ; and (c) the new continuation j is pulled into the scope of the continuation k0 . The multi-hole context D is the smallest context enclosing all uses of f , which ensures that j is in scope after transformation. The analysis is trivial (just check call sites for common continuation arguments), yet iterating this transformation leads to optimal contification, in the sense of Fluet and Weeks (2001). Here is an example adapted from loc. cit. §5.2, letval h = λkh xh .· · · in letval g1 = λk1 x1 .· · · h k1 z1 · · · k1 z8 · · · in letval g2 = λk2 x2 .· · · h k2 z2 · · · in letval f = λkf xf .· · · g1 kf z3 · · · g2 kf z4 · · · g2 kf z5 · · · in letval m = λkm xm .· · · f j1 z6 · · · f j2 z7 in . . . We can immediately see that g1 and g2 (but not h) are always passed the same continuation kf , and so we can apply C ONT to contify them both: letval h = λkh xh .· · · in letval f = λkf xf . (letcont kg1 x1 = · · · h kf z1 · · · kf z8 · · · in letcont kg2 x2 = · · · h kf z2 · · · in · · · kg1 z3 · · · kg2 z4 · · · kg2 z5 · · ·) in letval λm km .xm · · · f j1 z6 · · · f j2 z7 = in . . . Now h can be contified as it is always passed kf : letval f = λkf xf . (letcont kh xh = · · · in letcont kg1 x1 = · · · kh z1 · · · kf z8 in letcont kg2 x2 = · · · kh z2 · · · in · · · kg1 z3 · · · kg2 z4 · · · kg2 z5 · · ·) in letval λm km .xm · · · f j1 z6 · · · f j2 z7 = in . . . 5.2 Recursive functions Generalizing to recursive functions and continuations is a little trickier. Suppose we have a λTCPS term of the form letfun ··· in K. f1 k1 h1 x1 = K1 fn kn hn xn = Kn A set of functions F ⊆ {f1 , . . . , fn } can be contified collectively, written Contifiable(F ), if there is some pair of continuations k0 and h0 such that each occurrence of f ∈ F is either a tail call within F or is a call with continuation arguments k0 and h0 . Intuitively, each function (eventually) returns to the same place (k0 ), or throws an exception that is caught by the same handler (h0 ), though control may pass tail-recursively through other functions in F . There may be many such subsets F ; we assume that F is in fact strongly-connected with respect to tail calls contained within it (or is a trivial singleton with no tail calls). Then for a given letfun term there is a unique partial partition of the functions into disjoint subsets satisfying Contifiable(−). Let F = {f1 , . . . , fm }. Define a translation on function applications ( (f k h x)? = ji x if f = fi ∈ F f k h x otherwise and extend this to all terms. Assuming that Contifiable(F ) holds, there are two possibilities. 1. All applications of the form f k0 h0 x for f ∈ F are in the term K. Then we can apply the following rewrite, which is the direct analogue of C ONT. R EC C ONT (f1 , . . . , fm not free in C, and K minimal): letfun f1 k1 h1 x1 = K1 ··· fn kn hn xn = Kn in C[K] → letfun fm+1 km+1 hm+1 xm+1 = Km+1 · · · fn kn hn xn = Kn in C[letcont j1 x1 = K1? [k0 /k1 , h0 /h1 ] ? · · · jm xm = Km [k0 /km , h0 /hm ] in K ? ] 2. Otherwise, all applications of the form f k0 h0 x for f ∈ F are in the body of one of the functions outside of F ; without loss of generality we assume this is fn . R EC C ONT 2 (f1 , . . . , fm not free in C, and Kn minimal): letfun f1 k1 h1 x1 = K1 ··· fn−1 kn−1 hn−1 xn−1 = Kn−1 fn kn hn xn = C[Kn ] in K → letfun fm+1 km+1 hm+1 xm+1 = Km+1 · · · fn−1 kn−1 hn−1 xn−1 = Kn−1 fn kn hn xn = C[letcont j1 x1 = K1? [k0 /k1 , h0 /h1 ] ? [k0 /km , h0 /hm ] · · · jm xm = Km ? in Kn ] in K For an example of the latter, more complex, transformation, consider the following SML code: let fun | and | in unif unif(Ap(a,xs),Ap(b,ys)) = (unif(a,b);unifV(xs,ys)) unif(Ar(a,b),Ar(c,d)) = unifV([a,b],[c,d]) unifV(x::xs,y::ys) = (unif(x,y);unifV(xs,ys)) unifV([],[]) = () end The function unifyV can be contified into the definition of unif: it tail-calls itself, and its uses inside unif have the same continuation. 5.3 Comparing dominator-based contification The dominator-based approach of Fluet and Weeks (2001) can be recast in our CPS language as follows. (For simplicity we do not consider exception handler continuations here). First construct a continuation flow graph for the whole program. Nodes consist of continuation variables and a distinguished root node. Then for each function f with return continuation k, if f is passed around as a first-class value then create an edge from root to k; otherwise, for each application f j x create an edge from j to k. Finally, for each local continuation k create an edge from root to k. The non-recursive C ONT rewrite has the effect of merging two nodes in the graph, as follows: >> >> >>  7654 / 0123 @ k0 @ =⇒ 7654 / 0123 k> / >> >> > ?? ? ??  ??    7654 / 0123 ? k0 ?? /  ??  ??    The recursive R EC C ONT and R EC C ONT 2 rewrites are similar, except that in place of k we have a strongly-connected component {k1 , . . . , km }. 0123 7654 / ?? >> ? ; kO1 ? >> x >> xxxx  0123 x / /.-, ()*+ / 7654 ? k0 FFF kO i FF F#  0123 7654 km / =⇒ / ??  ?  7654 / 0123 / k ? 0 ???  ??  ?  Conversely, any part of the flow graph matching the left-hand-side of this diagram corresponds to a contifiable subset of functions in a letfun to which the R EC C ONT or R EC C ONT 2 rules can be applied. It is immediately clear that exhaustive rewriting terminates, as the flow graph decreases in size with each rewrite, eventually producing a graph with no occurrences of the pattern above. The algorithm described by Fluet and Weeks (2001) contifies k if it is strictly dominated by some continuation j whose immediate dominator is root. It can be shown that if a rooted graph contains such a pair of nodes j and k, then some part of the graph matches the pattern above. Hence exhaustive rewriting has the same effect as as optimal contification based on dominator trees. 6. Related work and conclusion The use of continuation-passing style for functional languages has its origins in Scheme compilers (Steele 1978; Kranz et al. 1986). It later formed the basis of the Standard ML of New Jersey compiler (Appel 1992; Shao and Appel 1995). In early compilers, lambdas originating from the CPS transformation were not distinguished from lambdas present in the source, so some effort was expended at code generation time to determine which lambdas could be stack-allocated and which could be heapallocated. Later compilers made a syntactic distinction between true functions and ‘second-class’ continuations introduced by CPS; and sometimes transformed one into the other (Kelsey and Hudak 1989), though contification was not studied formally. A number of more recent compilers use what has been called almost CPS. The Sequentialized Intermediate Language (SIL) employed by Tolmach and Oliva (1998) is a monadic-style language in which a letcont-like feature is used to introduce join points. Somewhat closer to our CPS language is the First Order Language (FOL) of the MLton compiler (Fluet and Weeks 2001). It goes further than SIL in making use of named local continuations in all branch constructs and non-tail calls. However, functions are not parameterized on return (or handler) continuations, and there is special syntax for tail calls and returns. This non-uniform treatment of continuations complicates transformations – inlining of non-tail functions must replace all ‘return points’ with jumps, and the contification analysis and transformation must treat tail and non-tail calls differently. We have found the uniform treatment of continuations in our CPS language to be a real benefit, not only as a simplifying force in implementation, but also in thinking about compiler optimizations: contification, in particular, is difficult to characterize in the absence of a notion of continuation passing. As far as we are aware, we are the first to implement lineartime shrinking reductions in the style of Appel and Jim (1997). An earlier term-graph implementation by Lindley was for a monadic language and had worst-case O(n2 ) behaviour due to commuting conversions (Benton et al. 2004a; Lindley 2005). Shivers and Wand (2005) have proposed a rather different graph representation for lambda terms, with the goal of sharing subterms after β-reduction. Their representation does bear some resemblance to ours, though, with up-links from subterms to enclosing terms, and circular lists that connect the sites where a term is substituted for a variable. This paper would not be complete without a mention of Static Single Assignment form (SSA), the currently fashionable intermediate representation for imperative languages. As is well known, SSA is in some sense equivalent to CPS (Kelsey 1995) and to ANF (Appel 1998). Its focus is intra-procedural optimization (as with ANF, it’s necessary to renormalize when inlining functions, in contrast to CPS) and there is a large body of work on such optimizations. Future work is to transfer SSA-based optimizations to CPS. We conjecture that CPS is a good fit for both functional and imperative paradigms. Acknowledgments I would like to thank Nick Benton, Olivier Danvy, Sam Lindley, Simon Peyton Jones and Claudio Russo for fruitful discussions on compiler intermediate languages. Georges Gonthier suggested the use of union-find in the graphical representation of terms. References Andrew W. Appel. Compiling with Continuations. Cambridge University Press, 1992. Andrew W. Appel. SSA is functional programming. SIGPLAN Notices, 33 (4):17–20, 1998. Andrew W. Appel and Trevor Jim. Shrinking lambda expressions in linear time. Journal of Functional Programming, 7(5):515–540, 1997. Franz Baader and Tobias Nipkow. Term Rewriting and All That. Cambridge University Press, 1998. Nick Benton and Peter Buchlovsky. Semantics of an effect analysis for exceptions. In ACM SIGPLAN International Workshop on Types in Language Design and Implementation (TLDI), pages 15–26, 2007. Nick Benton and Andrew Kennedy. Exceptional syntax. Journal of Functional Programming, 11(4):395–410, 2001. Nick Benton, Andrew Kennedy, and George Russell. Compiling Standard ML to Java bytecodes. In 3rd ACM SIGPLAN International Conference on Functional Programming. ACM Press, September 1998. Nick Benton, Andrew Kennedy, Sam Lindley, and Claudio Russo. Shrinking reductions in SML.NET. In 16th International Workshop on Implementation and Application of Functional Languages (IFL), 2004a. Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. The essence of compiling with continuations (with retrospective). In McKinley (2004), pages 502–514. Matthew Fluet and Stephen Weeks. Contification using dominators. In ICFP’01: Proceedings of the Sixth ACM SIGPLAN International Conference on Functional Programming, pages 2–13. ACM Press, September 2001. John Hatcliff and Olivier Danvy. A generic account of continuation-passing styles. In Principles of Programming Languages (POPL), pages 458– 471, 1994. Haim Kaplan, Nira Shafrir, and Robert E. Tarjan. Union-find with deletions. In SODA ’02: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pages 19–28, Philadelphia, PA, USA, 2002. Society for Industrial and Applied Mathematics. ISBN 0-89871-513-X. Richard Kelsey. A correspondence between continuation passing style and static single assignment form. In Intermediate Representations Workshop, pages 13–23, 1995. Richard A. Kelsey and Paul Hudak. Realistic compilation by program transformation. In Principles of Programming Languages (POPL). ACM, January 1989. Jung-taek Kim, Kwangkeun Yi, and Olivier Danvy. Assessing the overhead of ML exceptions by selective CPS transformation. In ACM SIGPLAN Workshop on ML, pages 112–119, 1998. Also appears as BRICS technical report RS-98-15. David A. Kranz, Richard A. Kelsey, Jonathan A. Rees, Paul Hudak, and James Philbin. ORBIT: an optimizing compiler for scheme. In Proceedings of the ACM SIGPLAN symposium on Compiler Construction, pages 219–233, June 1986. Sam Lindley. Normalisation by evaluation in the compilation of typed functional programming languages. PhD thesis, University of Edinburgh, 2005. Kathryn S. McKinley, editor. 20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation 1979-1999, A Selection, 2004. ACM. Eugenio Moggi. Notions of computation and monads. Information and Computation, 93:55–92, 1991. A. M. Pitts. Typed operational reasoning. In B. C. Pierce, editor, Advanced Topics in Types and Programming Languages, chapter 7, pages 245–289. The MIT Press, 2005. Amr Sabry and Philip Wadler. A reflection on call-by-value. ACM Transactions on Programming Languages and Systems (TOPLAS), 19(6):916– 941, November 1997. ISSN 0164-0925. Zhong Shao and Andrew W. Appel. A type-based compiler for Standard ML. In Proc. 1995 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 116–129, La Jolla, CA, Jun 1995. Olin Shivers. Higher-order control-flow analysis in retrospect: lessons learned, lessons abandoned (with retrospective). In McKinley (2004), pages 257–269. Olin Shivers and Mitchell Wand. Bottom-up β-reduction: Uplinks and λDAGs. In European Symposium on Programming (ESOP), pages 217– 232, 2005. Nick Benton, Andrew Kennedy, and Claudio Russo. Adventures in interoperability: The SML.NET experience. In 6th International Conference on Principles and Practice of Declarative Programming (PPDP), 2004b. Guy L. Steele. RABBIT: A compiler for SCHEME. Technical Report AITR-474, MIT, May 1978. Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press, second edition, 2001. Hayo Thielecke. Comparing control constructs by double-barrelled CPS. Higher-Order and Symbolic Computation, 15(2/3):141–160, 2002. Olivier Danvy. A new one-pass transformation into monadic normal form. In 12th International Conference on Compiler Construction (CC’03), 2003. Andrew P. Tolmach and Dino Oliva. From ML to Ada: Strongly-typed language interoperability via source translation. Journal of Functional Programming, 8(4):367–412, 1998. Olivier Danvy and Andrzej Filinski. Representing control: A study of the CPS transformation. Mathematical Structures in Computer Science, 2 (4):361–391, 1992. Philip Wadler and Peter Thiemann. The marriage of effects and monads. In ACM SIGPLAN International Conference on Functional Programming (ICFP), 1998. Olivier Danvy and Lasse R. Nielsen. A first-order one-pass CPS transformation. Theor. Comput. Sci., 308(1-3):239–257, 2003.