Flow-Directed Lightweight Closure Conversion JEFFREY MARK SISKIND NEC Research Institute, Inc. This paper presents a lightweight closure-conversion method that is driven by the results of wholeprogram interprocedural flow, reachability, points-to, and escape analyses. The method has been implemented and evaluated as part of a complete Scheme compiler. When compared with a baseline closure-conversion method that does no optimization, as well as conventional closureconversion methods that only do optimizations that do not rely on interprocedural analysis, this method significantly increases the degree of closure, variable-slot, parent-slot, closure-pointer– slot, variable, parent-parameter, parent-passing, closure-pointer, variable-spilling, and parentspilling elimination. It also significantly increases the degree of parent-slot, closure-pointer–slot, and parent-parameter compression and reduces the number of indirections per variable reference. Finally, code produced by this compiler runs significantly faster than code produced by other state-of-the-art Scheme compilers. Categories and Subject Descriptors: D.3.3 [Programming Languages]: Language Constructs and Features—procedures, functions, and subroutines; D.3.4 [Programming Languages]: Processors— compilers; optimization General Terms: Experimentation, Languages, Performance Additional Key Words and Phrases: Closure conversion, compiler construction, global optimization/flow analysis, higher-order functional languages 1. INTRODUCTION This paper describes the lightweight closure-conversion process used by Stalin. Stalin is an aggressively optimizing whole-program compiler for Scheme. Experiments described in section 4 illustrate that code produced by Stalin is regularly several times faster, and often an order of magnitude or two faster, than code produced by other state-of-the-art Scheme compilers such as Scheme->C, Gambit-C, Bigloo, and Chez. As also illustrated in section 4, much of this speedup depends on whole-program analysis and lightweight closure conversion. While the techniques described in this paper are formulated in terms of Scheme, and have been implemented and evaluated for Scheme, they should be generally This research was supported, in part, by a Samuel and Miriam Wein Academic Lectureship and by the Natural Sciences and Engineering Research Council of Canada. Part of this research was performed while the author was at the University of Toronto Department of Computer Science, the Technion Department of Electrical Engineering, and the University of Vermont Department of Electrical Engineering and Computer Science. I wish to thank Ken Anderson, Robert Cartwright, Henry Cejtin, Matthias Felleisen, Cormac Flanagan, Suresh Jagannathan, Jacob Katzenelson, Shriram Krishnamurthi, David McAllester, Richard O’Keefe, Stephen Weeks, and Andrew Wright for many useful discussions pertaining to the material presented in this paper. Permission to make digital/hard copy of all or part of this material without fee is granted provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery, Inc. (ACM). To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. 2 · Jeffrey Mark Siskind applicable to any lexically-scoped higher-order language, such as Common Lisp, ml, Haskell, Smalltalk, Dylan, etc. Stalin is a batch-mode compiler. Unlike traditional interactive Scheme implementations, Stalin does not provide a read-eval-print loop or the ability to eval or load new code into a running program. Stalin compiles the Scheme source program into a single executable, indirectly via c. The behaviour of this executable is equivalent to loading the source program into a virgin interactive Scheme implementation and terminating its execution. Any computation results from evaluating top-level expressions. Thus Stalin is intended more for application delivery and production research runs than program development. Stalin performs numerous whole-program analyses and optimizations. First, it does polyvariant interprocedural flow and reachability analysis. The results of this analysis support interprocedural points-to, escape, and lifetime analyses, lightweight closure and CPS conversion, automatic in-lining, region-based storage management, unboxing, and program-specific and program-point-specific low-level representation selection and code generation with on-the-fly coercion between different representations. This paper describes only the interprocedural points-to and escape analyses, automatic in-lining, and lightweight closure conversion. Companion papers [Siskind 2000a; 2000b; 2000c; 2000d] describe the remaining analyses and optimizations. The remainder of this paper is organized as follows. Section 2 gives an overview of the lightweight closure-conversion process used by Stalin by way of comparison to alternate implementations that do little or no optimization. Section 3 presents the lightweight closure-conversion process in detail. Section 4 presents experimental results. Section 5 concludes with a discussion of related work. 2. OVERVIEW To understand the optimizations that Stalin performs during closure conversion, it is useful to compare the lightweight closure-conversion process performed by Stalin with both a baseline implementation that does no optimization as well as a conventional implementation1 that does limited optimization. For simplicity, the baseline, conventional, and lightweight implementations are presented here using a linked closure representation.2 The baseline implementation has the following characteristics: (1) Each procedure has a closure. (2) The closure for each procedure contains a variable slot for each variable bound by that procedure. (3) The closure for each procedure contains a parent slot, a pointer to the closure for the immediate lexically surrounding procedure. The parent slot of the closure for the top-level procedure contains a null pointer. 1 Throughout this paper, I use the term conventional implementation to refer to optimizations that can be performed without interprocedural analysis, regardless of whether any actual implementation performs any or all of these optimizations. 2 The lightweight closure-conversion process used by Stalin is not particular to a linked closure representation. It can be applied to display and flat closure representations as well. Flow-Directed Lightweight Closure Conversion · 3 (4) A procedure object contains a closure-pointer slot, a pointer to the closure for the immediate lexically surrounding procedure. The closure-pointer slot for the top-level procedure contains a null pointer. (5) A procedure object contains a code-pointer slot, a pointer to the code object for that procedure. (6) The code object for a procedure has a variable parameter for each variable bound by that procedure. (7) The code object for a procedure has a parent parameter, a pointer to the closure for the immediate lexically surrounding procedure. The parent parameter for the top-level procedure will be bound to the null pointer. (8) Procedure calls indirect to the code object pointed to by the target procedure object. (9) Variable passing: A procedure call passes each argument to its corresponding variable parameter in the code object. (10) Parent passing: A procedure call passes the closure pointer of the target procedure object to the parent parameter of the code object. (11) Each code object contains a closure pointer, a local variable that holds a pointer to its closure. Each code object begins with a preamble that allocates a closure and stores a pointer to that closure in the closure pointer. (12) Variable spilling: The preamble spills each variable parameter into the corresponding variable slot of the closure. (13) Parent spilling: The preamble spills the parent parameter into the parent slot of the closure. (14) Variables are referenced3 indirectly via the closure pointer. The baseline, conventional, and lightweight implementations differ in the following ways: 2.1 Direct procedure calls Baseline and conventional implementations generate code like the following for a lambda expression (lambda (x1) . . .): t1.tag = ONE_ARGUMENT_PROCEDURE; t1.closure = closure; t1.code = &p47; OBJECT p47(CLOSURE parent, OBJECT x1) { CLOSURE closure; . . . /* slots: CLOSURE parent; OBJECT x1; */ closure = allocate_closure47(); closure.parent = parent; closure.x1 = x1; . . .} 3 Throughout this paper, I use the term reference to mean access or assignment. 4 · Jeffrey Mark Siskind and code like the following for a procedure call (e1 e2 ): t3 t4 if t5 = e1 ; = e2 ; (t3.tag!=ONE_ARGUMENT_PROCEDURE) error(); = *(t3.code)(t3.closure, t4); Here, a procedure object p has three slots: its type tag p.tag, its closure-pointer slot p.closure, and its code-pointer slot p.code.4 The type tag of a procedure object encodes its arity and a call to a procedure indirects through its code-pointer slot. In contrast, Stalin always generates direct procedure calls. This is possible because Stalin does whole-program interprocedural flow analysis and can determine all potential targets of each call site. Stalin uses the type tag of a procedure object to encode the identity of the procedure, rather than its arity, and dispatches on the type tag to a collection of direct procedure calls, one for each potential target. Thus, for example, Stalin generates code like the following5 for a lambda expression (lambda (x1) . . .): t1.tag = PROCEDURE47; t1.closure = closure; OBJECT p47(CLOSURE parent, OBJECT x1) { CLOSURE closure; . . . /* slots: CLOSURE parent; OBJECT x1; */ closure = allocate_closure47(); closure.parent = parent; closure.x1 = x1; . . .} and code like the following for a procedure call (e1 e2 ), when e1 can take on several potential objects, including the procedures 47 and 63: 4 Throughout this paper, I am deliberately vague as to whether objects are boxed or unboxed and uniformly use o.s to denote a reference to slot s of object o rather than distinguishing between o.s and o->s. Furthermore, when objects are boxed, I am also deliberately vague as to whether slots, like the type tag, are represented in the object or its handle, and whether they are somehow compressed or encoded, say, in unused addresses or address bits in the handle. See Siskind [2000c] for a discussion of such representation issues. 5 The actual c code generated by Stalin is fully typed. Each procedure has a corresponding closure of a distinct c type and parent slots, closure-pointer slots, parent parameters, and closure pointers are all typed appropriately to the type of closure that they contain. c variables and slots that can hold objects of multiple types, such as t3 in this example, are represented as c unions. The code generated by Stalin contains the appropriate casts at each reference. For simplicity, throughout this paper, all such type declarations and casts are eliminated. Flow-Directed Lightweight Closure Conversion · 5 t3 = e1 ; t4 = e2 ; switch (t3.tag) { case PROCEDURE47: t5 = p47(t3.closure, t4); case PROCEDURE63: t5 = p63(t3.closure, t4); . . . default: error();} The default branch of the dispatch is eliminated when flow analysis determines that the target can only be a procedure of the correct arity. Furthermore, the dispatch itself is eliminated when flow analysis determines that there is only one potential target.6 This yields code like the following: t3 = e1 ; t4 = e2 ; t5 = p47(t3.closure, t4); Using direct procedure calls, and encoding the identity of the target procedure in the type tag, allows Stalin to eliminate the code-pointer slot of procedure objects. 2.2 Parent-slot, closure-pointer–slot, parent-parameter, parent-passing, and parentspilling elimination The baseline implementation has a parent slot, a closure-pointer slot, a parent parameter, parent passing, and parent spilling for every procedure. Conventional implementations eliminate the parent slot, closure-pointer slot, parent parameter, parent passing, and parent spilling for the top-level procedure because the pointers are always null. Conventional implementations also eliminate the parent slot of a closure if that slot is never accessed. Stalin carries this optimization further, eliminating the closure-pointer–slot, parent-parameter, parent-passing, and parentspilling for a procedure when the parent parameter is never accessed. Situations where the parent parameter is never accessed occur even more frequently in Stalin than in conventional implementations because parent parameters are used primarily to access variable slots of closures of lexically surrounding procedures and Stalin performs more aggressive variable and variable-slot elimination, as described below. Furthermore, Stalin is able to perform more aggressive parent-slot elimination than conventional implementations for the same reason. Experiments reported in section 4 show that, in practice, almost all parent slots, closure-pointer slots, parent parameters, parent passing, and parent spilling are eliminated. Closure-pointer– slot, parent-parameter, parent-passing, and parent-spilling elimination creates a complication. Since some procedures have closure-pointer slots, parent parameters, and parent passing, while others do not, the code generated for a procedure call will vary depending on the potential targets. This is possible both because flow analysis determines the potential targets and because the use of dispatch to direct procedure calls allows a target-specific call sequence to be generated when there is more than one target. For example, Stalin generates code like the following for a lambda expression (lambda (x1) . . .) that has a parent parameter: 6 When flow analysis determines that the target cannot be a procedure of the correct arity, the dispatch is eliminated and a direct call to the error handler is generated. 6 · Jeffrey Mark Siskind t1.tag = PROCEDURE47; t1.closure = closure; 1 OBJECT p47( CLOSURE parent, { CLOSURE closure; . . . 2 OBJECT x1) 3 /* slots: CLOSURE parent; OBJECT x1; */ closure = allocate_closure47(); closure.parent = parent; closure.x1 = x1; . . .} 4 code like the following for a lambda expression (lambda (x2) . . .) that does not have a parent parameter: t2.tag = PROCEDURE63; OBJECT p63(OBJECT x2) { CLOSURE closure; . . . /* slots: OBJECT x2; */ closure = allocate_closure63(); closure.x2 = x2; . . .} and code like the following for a procedure call (e1 e2 ), where the potential targets include procedure 47, which has a parent parameter, and procedure 63, which does not: t3 = e1 ; t4 = e2 ; switch (t3.tag) { case PROCEDURE47: t5 = p47( t3.closure, case PROCEDURE63: t5 = p63(t4); . . . 5 t4); default: error();} Note that the parent slot (box 3 above), closure-pointer slot (box 1 above), parent parameter (box 2 above), and parent spilling (box 4 above) that are present in procedure 47 are absent from procedure 63. Also note that the calling sequence for procedure 47 contains parent passing (box 5 above) which is absent from the calling sequence for procedure 63. Such per-target differentiation is possible because multiple-target calls generate dispatches to direct procedure calls. 2.3 Eliminating indirection through the closure pointer The baseline implementation always references variables indirectly through the closure pointer. Conventional implementations eliminate such indirection in two situ- Flow-Directed Lightweight Closure Conversion · 7 ations. First, because closure.parent must alias parent, a free variable reference can proceed via the parent parameter instead of via the closure pointer. The baseline implementation would generate the following code: closure. parent. . . . .parent .x | {z } n for a free reference to a variable x that is bound n levels up while a conventional implementation would generate the following code: parent. . . . .parent .x | {z } n instead. Second, if a variable is never assigned, a bound access can proceed via its variable parameter rather than via the closure pointer. The baseline implementation would generate closure.x to access the local variable x while a conventional implementation would simply generate x. Stalin applies this optimization in a wider set of circumstances because it uses a much more complex notion of what constitutes a free vs. bound reference, as described below. 2.4 Eliminating fictitious variables The baseline implementation has a slot, a parameter, passing, and spilling for each variable. The slot, parameter, passing, and spilling for a variable can be eliminated when a variable can hold only a single concrete object. Such a variable is called fictitious. Stalin approximates the fictitious property, with the results of flow analysis, as any variable that can hold only a single abstract object where that abstract object contains a single concrete object. The abstract interpretation used by Stalin during flow analysis treats (), #t, #f, the end-of-file object, and procedures created by a given lambda expression as distinct abstract objects. All of these except abstract procedures always contain a single concrete object. Abstract procedures contain single concrete procedures when the procedure has no closure-pointer slot, due to closure-pointer–slot elimination, as described below. Furthermore, the concrete aggregate objects in abstract aggregate objects such as pairs, strings, vectors, symbols, continuations, and procedures are indistinguishable when their identity is not important and the components of those aggregate objects are fictitious or unaccessed. Such a collection of indistinguishable concrete objects can be treated as a single concrete object. Finally, continuations can be indistinguishable in certain circumstances that require a must-alias property to hold. The method used by Stalin to approximate this must-alias property is the crux of the lightweight closure-conversion process and is described in detail in section 3.12. Experiments reported in section 4 show that closure-pointer–slot elimination, in practice, allows most procedures to be fictitious and the variables holding such procedures to be eliminated. Not only does the notion of fictitious variables affect code generation, by eliminating the slots, parameters, passing, and spilling for such variables, as well as references to such variables, it also impacts the lightweight closure-conversion process itself. Free references to fictitious variables are ignored, allowing a greater degree of parent-slot, closure-pointer–slot, parent-parameter, parent-passing, and parent-spilling elimination. 8 · 2.5 Eliminating variables that aren’t accessed Jeffrey Mark Siskind The baseline implementation has a slot, a parameter, passing, and spilling for each variable. Conventional implementations eliminate the slot, parameter, passing, and spilling for a non-global variable that is never accessed. Stalin carries this optimization further. First, Stalin eliminates all unaccessed variables, not just non-global ones. Second, only reached, non-fictitious accesses, as determined by flow and reachability analysis, count as accesses. An access might be fictitious, even if the variable itself is not fictitious, if the access is in the consequent or alternate of a conditional. For example, an access to x in e1 in (if (eq? x ’a) e1 e2 ) would be fictitious even if x itself were not. Third, as for the case of eliminating fictitious variables, this optimization not only affects code generation, by eliminating the slots, parameters, passing, and spilling for such variables, as well as assignments to such variables, it also impacts the lightweight closure-conversion process itself. Free assignments to unaccessed variables are ignored, allowing a greater degree of parent-slot, closure-pointer–slot, parent-parameter, parent-passing, and parent-spilling elimination. 2.6 Variable-slot elimination The baseline implementation has a slot and spill for each variable. Furthermore, all variable references are to variable slots via the closure pointer. Conventional implementations eliminate the slot and spill for a variable that is never freely referenced, compiling references to such a variable to the parameter instead of the slot. Stalin carries this optimization further. First, only reached free references, as determined by flow and reachability analysis, count as free references. Second, only non-fictitious free accesses, as determined by flow analysis, count as free accesses. Third, free assignments to hidden variables don’t count as free assignments. Hiding will be described below. Fourth, slots are eliminated for globalized variables. Globalization will be described below. Fifth, slots and spills are eliminated for hidden variables. Sixth, the slot and spill for a variable can be eliminated, and references to such a variable can be compiled to the parameter instead of the slot, if two conditions are met. First, all references to the variable must be from within the c function that binds that variable. With the current Stalin code generator, this happens when the variable reference is in-lined in the same procedure where the variable is bound. Second, at every reference to the variable, the slot that would be accessed by the baseline implementation must alias the corresponding parameter in the most recent active invocation of the procedure that binds that variable. The method used by Stalin to approximate this must-alias property is the crux of the lightweight closure-conversion process and is described in detail in section 3.12. 2.7 Closure elimination, closure-pointer elimination, and parent-slot compression The baseline implementation has a closure and closure pointer for every procedure. And the parent slot for a procedure always points to the closure for the immediate lexically surrounding procedure (or is null if there is none). Conventional implementations eliminate a closure, and the corresponding closure pointer, when all of its variable slots are eliminated. Conventional implementations also perform parent- Flow-Directed Lightweight Closure Conversion · 9 slot compression: whenever the parent slot of one closure would point to another closure but is only used to access the parent slot of the later closure, the parent slot of the former closure can point to the closure that the parent slot of the later closure points to. Parent-slot compression is necessary when a closure is eliminated, since all parent slots of other closures that would point to an eliminated closure must instead point to the closure that the parent slot of the eliminated closure would have pointed to or be eliminated if the parent slot of the eliminated closure was eliminated. Parent-slot compression reduces indirection in variable reference. Stalin performs more aggressive closure elimination, closure-pointer elimination, and parent-slot compression than conventional implementations because these optimizations are driven by variable and variable-slot elimination and Stalin performs more aggressive variable and variable-slot elimination. Experiments reported in section 4 show that, in practice, almost all closures and closure pointers are eliminated. When they are not, the number of indirections needed to reference free variables is almost always reduced to very low levels. 2.8 Closure-pointer–slot and parent-parameter compression In the baseline implementation, the closure-pointer slot and parent parameter for a procedure always point to the closure for the immediate lexically surrounding procedure (or are null if there is none). Conventional implementations perform closure-pointer–slot and parent-parameter compression: whenever the parent parameter would point to a closure but is only used to access the parent slot of that closure, the closure-point-slot and parameter parameter can point to the closure that the parent slot of the original closure points to. Closure-pointer–slot and parent-parameter compression reduce indirection in variable reference. Stalin performs more aggressive closure-pointer–slot and parent-parameter compression than conventional implementations because these optimizations are driven by variable and variable-slot elimination and Stalin performs more aggressive variable and variable-slot elimination. Experiments reported in section 4 show that, in practice, the number of indirections needed to reference free variables is almost always reduced to very low levels. 2.9 Globalization The baseline implementation generates slots in closures for all variables. Conventional implementations treat variables that are bound at the top level specially, compiling them as global variables rather than as slots of a closure. Stalin generalizes this optimization. A variable can be globalized when it can have at most one live instance. Stalin approximates this one-live-instance property. A variable has at most one live instance if the procedure that binds that variable is not called more than once. And a procedure p is not called more than once if all procedures that can call p, either directly or indirectly, have a single reached call site. A variable also has at most one live instance if the procedure that binds that variable does not call itself directly or indirectly through a non-tail call and all accesses to that variable must alias the most recently created instance. The method used by Stalin to approximate this must-alias property is the crux of the lightweight closure-conversion process and is described in detail in section 3.12. Globalization impacts code generation. Globalized variables need a form of spilling but do not need parameters 10 · Jeffrey Mark Siskind or passing. Furthermore, globalized variables are referenced without indirection. Globalization also impacts the lightweight closure-conversion process itself. Since globalized variables do not require variable slots and are referenced without access to parent parameters or parent slots, globalization allows more aggressive parentslot, closure-pointer-slot, and parent parameter compression and closure, parentslot, closure-pointer–slot, parent-parameter, parent-passing, closure-pointer, and parent-spilling elimination. 2.10 Hiding The situation often arises where a variable slot always points to the closure containing that slot. Such a situation arises in the following example: (define (f x) (define (g y) . . . x . . .) (define (h z) . . . x . . .) . . .) assuming that there are no assignments to the variables g and h (other than those implicit in the definitions). Here, f requires a closure because x is freely referenced. And g and h both take the closure for f as their parent parameter to allow them to reference x. Thus the procedure objects for g and h will have the closure for f as their closure-pointer slot. Furthermore, the variables slots for g and h in the closure for f will always contain the procedure objects for g and h respectively. Since Stalin eliminates code-pointer slots of procedure objects, and since the type tags for the slots g and h can be eliminated because they contain known procedures, the only information in the variable slots for g and h is their closure-pointer slot. But these will always point to the closure that contains those slots. So such variable slots, as well as the corresponding parameters, passing, and spilling, are redundant and can be eliminated as hidden. A conventional implementation would compile a bound access to a variable x as x and a free access to a variable x that is bound n levels up as: parent. . . . .parent .x | {z } n When x is hidden, Stalin compiles a bound access to x as an access to the closure pointer closure and a free access to x that is bound n levels up as: parent. . . . .parent | {z } n Stalin generalizes the hiding optimization even further. A variable is hidden not only when it must point to the closure that contains its slot, but also when it must point to a specific closure for a procedure that lexically surrounds the procedure whose closure contains that slot. In this situation, if x is a variable that must point to a closure m levels up, Stalin compiles a bound access to x as: parent. . . . .parent | {z } m Flow-Directed Lightweight Closure Conversion · 11 and a free access to x that is bound n levels up as: parent. . . . .parent | {z } m+n Sound application of this optimization requires that the variable slot to be hidden must point to the correct ancestor closure. The method used by Stalin to approximate this must-alias property is the crux of the lightweight closure-conversion process and is described in detail in section 3.12. Hiding impacts code generation. Hidden variables do not need slots, parameters, passing, or spilling. Hiding also impacts the lightweight closure-conversion process itself. Since hidden variables do not require variable slots, are never assigned, and are accessed by accessing a widerscope closure than the closure of the procedure that binds that variable, hiding allows more aggressive parent-slot, closure-pointer–slot, and parent-parameter compression and closure, parent-slot, closure-pointer–slot, parent-parameter, parentpassing, closure-pointer, and parent-spilling elimination. 3. LIGHTWEIGHT CLOSURE CONVERSION Closure conversion is often formulated as a source-to-source transformation, often as a series of small transformations. In contrast, the approach to lightweight closure conversion taken in this paper is to view closure conversion as an analysis phase that annotates the source program with certain properties and relations followed by a code-generation phase that uses those annotations to control aspects of the code being generated. In this case, the code generator itself is rather straightforward. The annotations control the presence or absence of fragments of the procedure call and entry sequences: variable parameters and slots, parent parameters and slots, closures, closure pointers, and closure-pointer slots, variable and parent spilling, and variable and parent passing. The annotations also control the code generated for variable references. In contrast, the analysis phase is rather complex. This section presents that analysis phase. Because the analysis phase is complex, I will start with an overview. The input to the analysis phase is the abstract-syntax tree of the program. The output consists of the following annotations: Local(x) Global(x) Hidden(x) Slotted(x) HasClosure(p) HasParentSlot(p) ParentSlot(p) HasParentParameter(p) ParentParameter(p) x will be allocated as a local variable. x will be allocated as a global variable. x will be allocated as a hidden closure slot. x will be allocated as a closure slot. p will have a closure and closure pointer. p will have a parent slot. The parent slot for p will point to the closure for ParentSlot(p). p will have a parent parameter and closurepointer slot. The parent parameter and closure-pointer slot for p will point to the closure for ParentParameter(p). The first four annotations are properties that apply to variables. The remaining five 12 · Jeffrey Mark Siskind annotations apply to abstract procedures. At most one of the first four properties will be true of any given variable. That property indicates how that variable is represented. If none of these four properties are true of some variable, then that variable is eliminated and not explicitly represented at run time. This can happen either because the variable is not accessed or because the variable holds a compile-time determinable object. As for the five abstract-procedure annotations, three are properties and two are functions. The properties HasClosure(p), HasParentSlot(p), and HasParentParameter(p) indicate whether closure, parent-slot, or closurepointer–slot and parent-parameter elimination apply to a given abstract procedure p. ParentSlot(p) and ParentParameter(p) are functional annotations that indicate parent-slot and closure-pointer-slot or parent-parameter compression. If a given abstract procedure p has a parent slot, then ParentSlot(p) indicates which abstract procedure creates the closure that the parent slot of p points to. Likewise, if a given abstract procedure p has a closure-pointer–slot or parent parameter, then ParentParameter(p) indicates which abstract procedure creates the closure that the closure-pointer–slot or parent parameter of p points to. Before producing the ultimate annotations that drive the closure-conversion aspects of code generation from the undecorated abstract-syntax tree, Stalin produces a number of intermediate annotations. First, Stalin performs flow analysis (described in section 3.2). This determines the potential values of expressions, variables, and other locations. As part of flow analysis, Stalin performs reachability analysis (described in section 3.3). This determines which expressions are reached and which variables are accessed and assigned. Among other things, this allows Stalin to ignore unreached accesses when deciding whether a variable is accessed, to ignore unreached references when deciding whether a variable is freely referenced, and to eliminate unaccessed variables. Next, Stalin determines which components of aggregate objects are accessed and assigned (described in section 3.4). This allows Stalin to eliminate unaccessed components of aggregate objects, and in turn, to eliminate an aggregate object itself when all of its components are eliminated and the identity of the object is unimportant. Next, Stalin computes the call graph (described in section 3.5). Because all call sites are higher-order in Scheme, this requires the output of flow analysis to do effectively. Next, Stalin uses the call graph to determine which abstract procedures are not called more than once (described in section 3.6). This information supports globalization of variables, since variables can be globalized when they have only one live instance and this is trivially true when the abstract procedure that binds a variable is not called more than once. Next, Stalin computes the free variables for each abstract procedure (described in section 3.7). This computation uses the results of flow and reachability analysis to ignore unreached references. The free-variable computation is used, in part, to support variable-slot elimination. Next, Stalin does points-to analysis, driven by the output of flow analysis, to determine which objects can point to other objects (described in section 3.8). Points-to analysis supports escape analysis (described in section 3.9) which determines which objects can be accessed after their creating procedure returns. Escape analysis is used to support must-alias analysis which in turn supports variable-slot elimination, globalization, hiding, and determining when continuations are fictitious. Next, Stalin determines which abstract procedures to in-line (described in section 3.10). In-lining decisions are driven by Flow-Directed Lightweight Closure Conversion · 13 the results of flow analysis and are used to support variable-slot elimination. Finally, Stalin determines which procedures are reentrant (described in section 3.11). This is important since variables that are bound by reentrant procedures cannot be globalized because they can have more that one live instance. Stalin makes a single pass through the above analyses. Since each analysis depends on prior analyses, the order in which the analyses can be performed is fairly tightly constrained. But there are no circularities in these analyses. Thus a single pass suffices.7 The above analyses are precursors to the analyses that are directly associated with lightweight closure conversion. In fact, the above analyses support other analyses and optimizations that Stalin performs in addition to lightweight closure conversion. After performing the above analyses, Stalin performs the analyses to directly support lightweight closure conversion. First, Stalin performs must-alias analysis (described in section 3.12). This is used to justify variable-slot elimination, globalization, and hiding and to determine when continuations are fictitious. Next, Stalin determines which abstract locations are fictitious, i.e. always contain the same compile-time determinable concrete object (described in section 3.13). Next, Stalin determines which variable references are trivial and can be ignored (described in section 3.14). Next, Stalin determines which variable slots can be eliminated (described in section 3.15). Next, Stalin determines which variables can be globalized (described in section 3.16). Next, Stalin determines the ancestor relation (described in section 3.17). Next, Stalin determines which variables can be hidden (described in section 3.18). Finally, Stalin completes the lightweight closure-conversion process by determining the representation for each variable and the static backchain for each abstract procedure (described in section 3.19). Unlike the analyses in sections 3.2 through 3.11, the analyses in sections 3.12 through 3.19 are circular, requiring the computation of a least fixpoint. Because the analyses described in this section are fairly complex and numerous, I have adopted several conventions in terminology and notation. I often define properties or relations that apply to expressions e or expression invocations ê. In all such cases, the property or relation is intended to apply to e or ê itself, not any subexpressions of e or ê and not any expressions in procedures that might be called by e or ê. I often define a base relation along with its transitive and reflexive-transitive variants. When doing so, the symbol for the transitive variant will contain a ‘+’ symbol, while the symbol for the reflexive-transitive variant will contain a ‘∗’ symbol. Often, the + or ∗ symbol will be overlayed on the symbol for the base relation, as in + ◦ or ◦∗ respectively. This is to distinguish between an inherently transitive or reflexive-transitive relation, and ◦+ or ◦∗ , which I use to denote the transitive or reflexive-transitive closures of the ◦ relation respectively. The reason for such distinction is that ◦+ or ◦∗ might be less precise than + ◦ or ◦∗ respectively. I often give English names to relations. When doing so, the English name for 7 Actually, in one little way, flow analysis depends on the fictitious property which is computed after flow analysis. (eq? e1 e2 ) can be determined not to yield #f when β(e1 ) and β(e2 ) each contain the same single fictitious abstract object. Stalin eliminates the circularity in this case by using a weaker noncircular approximation to the fictitious property during flow analysis. 14 · Jeffrey Mark Siskind the base relation will contain the term ‘direct(ly)’ while the English name for the transitive variant will contain the term ‘proper(ly).’ The English name for the reflexive-transitive variant will not contain any special term. I often overload the notation for properties or relations to refer to both concrete properties or relations, i.e. ones that apply to concrete entities, and abstract properties or relations, i.e. ones that apply to abstract entities. (As will be described later, abstract entities are taken to be sets of concrete entitites.) Associated with each such pair of properties or relations is an abstraction lemma which states that the abstract property or relation holds between some abstract entities if the concrete property or relation holds between some concrete entities in those abstract entities.8 Note that the converse might not be true. For example, suppose that k1 and k2 are concrete entities in the abstract entity σ1 and that k3 and k4 are concrete entities in the abstract entity σ2 . And suppose that the concrete relations k1 ◦ k3 and k2 ◦ k4 hold. Then, by the abstraction lemma, the abstract relation σ1 ◦ σ2 holds even though the concrete relations k1 ◦ k4 and k2 ◦ k3 might not hold. Thus abstraction can introduce a degree of approximation. I often define a pair of properties or relations: the first being an underlying property or relation with the second being an approximation to the first. The need for approximations follows from the fact that the underlying properties and relations are often undecidable. Underlying properties and relations, as well as syntactic properties and relations that do not need approximations, are written without an overbar. In contrast, a property or relation written with an overbar, as in ◦, denotes an approximation to the corresponding underlying property or relation ◦. The soundness of the analysis, and thus the soundness of the lightweight closureconversion process in general, follows from the fact that the approximations are conservative. Associated with each approximation is a conservative approximation lemma which states that the approximation holds between some abstract entities if the underlying property or relation holds between those abstract entities.9 Note that it is trivially possible to derive sound approximations by taking ◦ to be always true. This corresponds to disabling all optimization. On the other hand, it is not possible to derive a tight approximation because the underlying relations are undecidable. Thus it is always possible to derive tighter approximations. The approximations presented in this paper are a good compromise in that they empirically yield good results, yet can be computed efficiently. Most of the analyses described in this paper are presented in three stages: concrete properties or relations, followed by abstract properties or relations, followed, in turn, by conservative approximations to the abstract properties or relations. Ac8 In some cases, particularly for the fictitious property described in section 3.13, the polarity is reversed. In such cases, the abstraction lemma states that the abstract property or relation does not hold between some abstract entitites if the concrete property or relation does not hold between some concrete entitites in those abstract entitites. 9 In some cases, particularly for the must-alias, fictitious, localizable, globalizable, and hideable properties described in sections 3.12, 3.13, 3.15, 3.16, and 3.18 respectively, the polarity is reversed. In such cases, the conservative approximation lemma states that approximation does not hold between some abstract entities if the underlying property or relation does not hold between those abstract entities. In such cases, it is trivially possible to derive sound approximations by taking ◦ to be always false. Flow-Directed Lightweight Closure Conversion e ⇒ | | | | | | (quote k) x (set! x e) (e0 e1 . . . en ) (q e1 . . . en ) (lambda (x1 . . . xn ) e) (if e0 e1 e2 ) Fig. 1. · 15 constant access assignment call primcall lambda conditional The Stalin abstract syntax. cordingly, most of the analyses have two sources of imprecision: the abstraction lemma mapping the concrete properties or relations to abstract properties or relations and the conservative approximation lemma mapping abstract properties or relations to approximations of those properties or relations. Because the analysis leading to lightweight closure conversion is complex and involves numerous intermediate program annotations, the terminology and notation used in the remainder of this section may appear to be quite cumbersome. To help guide the reader, I have summarized the terminology and notation in a glossary in appendix A. 3.1 Preprocessing Stalin prepends a standard prelude to every source program that defines the builtin procedures from R4RS [Clinger and Rees 1991]. Stalin then macro expands this extended source program and converts it into an abstract-syntax tree that is essentially of the form specified in figure 1. This macro expansion is done using essentially the same rewrite rules as given in the appendix of R4RS. The result of this macro expansion is a single closed lambda expression that contains the entire source program, including the standard prelude. Running the program corresponds to evaluating this lambda expression to yield a procedure and calling this procedure. All subsequent analyses are performed on the abstract-syntax tree and take the form of computing properties of, and relations between, nodes in this tree and abstract objects and locations. In the abstract syntax, I refer to the subexpression of an assignment as its source, the first subexpression of a call or primcall as its callee, any remaining subexpressions of a call or primcall as its arguments, any variables of a lambda expression as its parameters, the number of arguments of a call or primcall or the number of parameters of a lambda expression as its arity, the subexpression of a lambda expression as its body, and the subexpressions of a conditional as its antecedent, consequent, and alternate respectively. Note that the macro-expansion process produces abstract-syntax trees where the body of each lambda expression consists of a single expression and the alternate of a conditional is always present. Throughout this paper, I use the symbol x to denote variables, the symbol q to denote primitives, the symbol e to denote expressions, the symbol p to denote lambda expressions, and the symbol p0 to denote the top-level lambda expression that contains the entire source program including the standard prelude. Furthermore, I use the symbols X, E, A, S, C, Ci , R, Ri , and P to denote the set of all variables, expressions, accesses, assignments, calls, calls of arity i, primcalls, primcalls of arity i, and lambda expressions in the source program respectively. 16 · Jeffrey Mark Siskind pair? set-car! eq? string->symbol exact? > positive? + quotient truncate sin acos exact->inexact integer->char string-length make-vector vector-set! input-port? close-input-port eof-object? Table I. The Stalin primitives. cons car set-cdr! not null? symbol? number? real? inexact? = <= >= negative? max * remainder floor round exp cos tan atan sqrt inexact->exact char? string? make-string string-ref string-set! vector vector-length procedure? apply output-port? open-input-file close-output-port read-char char-ready? write-char cdr boolean? symbol->string integer? < zero? min / ceiling log asin expt char->integer string vector? vector-ref call/cc open-output-file peak-char Throughout this paper, I use equality between expressions or between variables to denote intensional rather than extensional equality, i.e. eq? rather than equal?. This also affects notions derived from equality such as set membership. Thus I use e1 = e2 to mean that e1 and e2 denote the same expression in the source program, not that they have the same form. Similarly, I use x1 = x2 to mean that x1 and x2 denote the same variable in the source program, not that they have the same name. This obviates the need for alpha renaming and the complexity of expression indices as used by Steckler and Wand [1997]. Table I lists essentially the primitives available in Stalin.10 Note that primitives are not first-class objects. A primitive can only appear as the first subexpression of a primcall. Primitives are taken to be disjoint from variables. This allows distinguishing primcalls from calls by whether or not the first subexpression is a primitive. The names of primitives are intentionally similar to the names of some builtin R4RS procedures because they implement the same functionality. User programs, however, never contain primcalls. Instead, they call procedures defined in the standard prelude which contain the corresponding primcalls. This is essentially η expansion. As an optimization, Stalin replaces certain calls in the user program that are known to be to certain procedures that are defined in the standard prelude with the corresponding primcalls, thus performing η reduction. Since the analyses described in this paper are formulated independently of this optimization, it will be safely ignored. For the sake of expository simplicity, the abstract syntax presented in figure 1, as well as all of the analyses presented in this paper, make three crucial simplifications. First, the primitive apply is not supported. Second, variable-arity procedures (i.e. ‘rest’ arguments) are not supported. Third, the primitive call/cc cannot be passed a continuation as the first argument. The actual implementation does not 10 Note that Stalin does not currently implement complex and rational numbers. Flow-Directed Lightweight Closure Conversion · 17 impose these restrictions. Throughout this paper, I use the following functions applied to nodes in the abstract-syntax tree: Source(e) x(e) Callee(e) Arguments(e) Argumenti (e) Parameteri (e) Arity(e) Body(e) p(x) p(e) Denotes the source subexpression of an assignment e. Denotes the variable in an access or assignment e. Denotes the callee subexpression of a call or primcall e. Denotes the set of argument subexpressions of a call or primcall e. Denotes the ith argument subexpression of a call or primcall e. Denotes the ith parameter of a lambda expression e. Denotes the arity of a call, primcall, or lambda expression e. Denotes the body subexpression of a lambda expression e. Denotes the lambda expression in which x is bound. Denotes the narrowest lambda expression that properly contains e. Furthermore, if p2 = p(p1 ), I say that p1 is directly nested in p2 , denoted p1 ≺ p2 . Let ≺+ and ≺∗ be the transitive and reflexive-transitive closures of ≺ respectively. If p1 ≺+ p2 or p1 ≺∗ p2 , I say that p1 is properly nested in or nested in p2 respectively. Finally, I say that an expression e is in tail position, denoted InTailPosition(e), if and only if it is —the body of a lambda expression or —the consequent or alternate of a conditional that is in tail position. Note that this definition of tail position agrees with the definition given in R5RS [Kelsey et al. 1998]. The above annotations all denote simple syntactic properties of the source program and do not involve any approximation. 3.2 Flow analysis After macro expansion, Stalin performs whole-program interprocedural flow analysis. Flow analysis determines the potential values of expressions. The lightweight closure-conversion process uses the results of flow analysis in numerous ways. Knowing the potential targets of call sites allows building a call graph which in turn allows determining which procedures are reentrant or are called more than once. This supports globalization of variables. Flow analysis also supports escape analysis which forms the basis of a crucial must-alias criterion for variable-slot elimination, globalization, hiding, and determining when continuations are fictitious. Flow analysis is actually a precursor, directly or indirectly, to almost all of the subsequent analyses that support lightweight closure conversion. Flow analysis is an abstract interpretation that partitions the set of all concrete objects into a set of abstract objects that is finite for a given program and the set of all concrete locations into a set of abstract locations that is also finite for a given program. Thus each abstract object is a set of concrete objects and each abstract location is a set of concrete locations. Concrete interpretation, i.e. evaluation, associates concrete locations with variable instances, expression invocations, 18 · Jeffrey Mark Siskind the slots of concrete pairs, and the elements of concrete strings and vectors. Abstract interpretation, i.e. flow analysis, associates abstract locations with variables, expressions, the slots of abstract pairs, and the elements of abstract strings and vectors. Throughout this paper, I use the symbol k to denote concrete objects, the symbol l to denote concrete locations, the symbol σ to denote abstract objects, and the symbol β to denote abstract locations. A concrete location can be viewed as the set of all concrete objects that that concrete location can have as its value. Similarly, an abstract location can be viewed as the set of all abstract objects that that abstract location can have as its value. The principal relation produced by flow analysis is σ ∈ β. Flow analysis obeys the following abstraction lemma: Lemma 1. For all concrete objects k, concrete locations l, abstract objects σ, and abstract locations β, if, in some state in some execution of the program, k ∈ σ, l ∈ β, and k ∈ l, then σ ∈ β. Flow analysis also obeys the following conservative approximation lemma: Lemma 2. For all abstract objects σ and abstract locations β, if σ ∈ β, then σ ∈ β. Different flow analyses differ in the way that they group concrete objects and locations into abstract objects and locations as well as how they approximate ∈ as ∈. The lightweight closure-conversion process described in this paper works for any flow analysis that meets the following criteria: (1) All concrete procedures created by a given lambda expression are in the same abstract procedure. (2) All concrete locations associated with different instances of a given variable are in the same abstract location. (3) All concrete locations associated with different invocations of a given expression are in the same abstract location. (4) If an abstract object contains a concrete procedure then it does not contain any concrete non-procedures. (5) If an abstract object contains a concrete pair then it does not contain any concrete non-pairs. (6) If an abstract object contains a concrete string then it does not contain any concrete non-strings. (7) If an abstract object contains a concrete vector then it does not contain any concrete non-vectors. (8) If an abstract object contains a concrete symbol then it does not contain any concrete non-symbols. (9) If an abstract object contains a concrete continuation then it does not contain any concrete non-continuations. (10) All concrete locations associated with the car slots of different concrete pairs in the same abstract pair are in the same abstract location. (11) All concrete locations associated with the cdr slots of different concrete pairs in the same abstract pair are in the same abstract location. Flow-Directed Lightweight Closure Conversion · 19 (12) All concrete locations associated with all of the elements of different concrete strings in the same abstract string are in the same abstract location. (13) All concrete locations associated with all of the elements of different concrete vectors in the same abstract vector are in the same abstract location. (14) All concrete locations associated with the print-name–string slot of different concrete symbols in the same abstract symbol are in the same abstract location. (15) All concrete continuations created from different invocations of a given expression are in the same abstract continuation. Criteria (4) through (9) imply that it is meaningful to talk about abstract procedures, pairs, strings, vectors, symbols, and continuations respectively. Criteria (10) through (14) imply that it is meaningful to talk about the slots and elements of abstract pairs, strings, vectors, and symbols respectively. Criteria (1) through (3) imply that the flow analysis is monovariant (i.e. computed with 0-CFA [Shivers 1988; 1990; 1991a; 1991b; Heintze 1993; 1994]). Stalin actually performs a novel polyvariant analysis [Siskind 2000b] which requires enhancements to the lightweight closure-conversion analyses presented here. For expository simplicity, these enhancements are not presented. They complicate the presentation of the analyses but do not add any conceptual novelty. As a result of criterion (1) above, there is a one-to-one correspondence between lambda expressions and abstract procedures. Thus I often use the symbol p to denote either a lambda expression or an abstract procedure. Furthermore, the following properties and functions are meaningful because of the above constraints on flow analysis: β(x) β(e) Pair?(σ) Car(σ) Cdr(σ) String?(σ) String-Ref(σ) Vector?(σ) Vector-Ref(σ) Symbol?(σ) Symbol->String(σ) Continuation?(σ) e(σ) Denotes the abstract location associated with x. Denotes the abstract location associated with the result of evaluating e. True if σ is an abstract pair. Denotes the abstract location containing the car slots of the concrete pairs in the abstract pair σ. Denotes the abstract location containing the cdr slots of the concrete pairs in the abstract pair σ. True if σ is an abstract string. Denotes the abstract location containing the elements of the concrete strings in the abstract string σ. True if σ is an abstract vector. Denotes the abstract location containing the elements of the concrete vectors in the abstract vector σ. True if σ is an abstract symbol. Denotes the abstract location containing the printname strings of the concrete symbols in the abstract symbol σ. True if σ is an abstract continuation. Denotes the call to call/cc where the concrete continuations in the abstract continuation σ were created. · 20 3.3 Jeffrey Mark Siskind Reachability analysis As part of performing flow analysis, Stalin also performs reachability analysis. Reachability analysis determines which expressions are reached and, in turn, which variables are accessed or assigned. Reachability analysis supports lightweight closure conversion in numerous ways. Unaccessed variables can be eliminated. Unreached free references can be ignored when determining whether variables slots can be eliminated. Like flow analysis, reachability analysis is actually a precursor, directly or indirectly, to almost all of the subsequent analyses that support lightweight closure conversion. Before defining the concrete reachability properties, the following auxiliary definitions are needed: Definition 1. Calling a procedure created by evaluating a lambda expression p creates an expression invocation ê for each expression e where p(e) = p and creates a variable instance x̂ for each variable x where p(x) = p. Each expression invocation has program points just before and just after that expression invocation. Assignment, call, and primcall invocations also have intermediate program points. Assignment invocations have an intermediate program point just after the source subexpression invocation but before the actual assignment takes place. Call invocations have an intermediate program point just after the callee and argument subexpression invocations but before the actual procedure call takes place. Primcall invocations have an intermediate program point just after the argument subexpression invocations but before the actions associated with the primitive take place. Given the above, the concrete reachability properties are defined as follows: Definition 2. An expression invocation ê is reached, denoted Reached(ê), when control flows to the program point just before ê. An expression invocation ê returns, denoted Returns(ê), when control flows to the program point just after ê. An assignment, call, or primcall invocation ê is executed, denoted Executed(ê), when control flows to the intermediate program point in ê. A variable instance x̂ is accessed, denoted Accessed(x̂), when some access invocation ê to x is reached. A variable instance x̂ is assigned, denoted Assigned(x̂), when some assignment invocation ê to x is executed. A call invocation ê is successful if it is executed and the arity of the call site equals the arity of the target procedure or continuation. A primcall invocation ê is successful if it is executed and the arity of the call site is allowed by the primitive. Given the above, the abstract reachability properties are defined as follows: Definition 3. An expression e is reached, returns, or is executed, denoted Reached(e), Returns(e), or Executed(e), if some invocation ê of e is reached, returns, or is exectuted in some execution of the program respectively. A variable x is accessed or assigned, denoted Accessed(x) or Assigned(x), if some instance x̂ of x is accessed or assigned in some execution of the program respectively. Flow analysis computes Reached, Returns, Executed, Accessed, and Assigned as approximations to Reached, Returns, Executed, Accessed, and Flow-Directed Lightweight Closure Conversion · 21 Assigned respectively. In particular, flow analysis directly produces the approximations Reached and Returns. Executed, Accessed, and Assigned are derived from Reached and Returns by the following:11   e ∈ S ∧ Returns(Source(e))∨        4 e ∈ C ∧ Returns(Callee(e))∧ ∨ Executed(e) = 0 0 (∀e ∈ Arguments(e))Returns(e )       e ∈ R ∧ (∀e0 ∈ Arguments(e))Returns(e0 ) 4 Accessed(x) = (∃e ∈ A)x(e) = x ∧ Reached(e) 4 Assigned(x) = (∃e ∈ S)x(e) = x ∧ Executed(e) Reachability analysis obeys the following abstraction lemma: Lemma 3. For all expressions e and invocations ê of e, if Reached(ê), then Reached(e), if Returns(ê), then Returns(e), and if Executed(ê), then Executed(e). For all variables x and instances x̂ of x, if Accessed(x̂), then Accessed(x) and if Assigned(x̂), then Assigned(x). Reachability analysis also obeys the following conservative approximation lemma: Lemma 4. For all expressions e, if Reached(e), then Reached(e), if Returns(e), then Returns(e), and if Executed(e), then Executed(e). For all variables x, if Accessed(x), then Accessed(x) and if Assigned(x), then Assigned(x). 3.4 Approximating the abstract aggregate access and assignment properties and relations After flow and reachability analysis, Stalin approximates the abstract aggregate accesses and assigns relations and the abstract aggregate accessed and assigned properties. Just as a variable or variable instance can be accessed or assigned, components of concrete and abstract aggregate objects can be accessed or assigned. And just as Stalin can eliminate unaccessed variables and perform certain other optimizations on unassigned variables, Stalin can eliminate unaccessed components of abstract aggregate objects. And just as Stalin can eliminate a closure when all of its variable slots are eliminated, Stalin can eliminate an aggregate object when 11 Stalin actually uses a more precise notion of Accessed(x). A variable is accessed if some access to that variable is accessed. And a reached expression is accessed in the following cases: —An access is accessed. —The source of an assignment is accessed if its destination is accessed. —The callee of a call is accessed. —An argument of a call to a procedure is accessed if the corresponding parameter is accessed. —An argument of a primcall is accessed. —The body of a lambda expression is accessed if some call to that lambda expression is accessed. —The antecedant of a conditional is accessed. —The consequent and alternate of an accessed conditional are accessed. —The body of the top-level lambda expression is accessed. The objective of this more complex analysis is to treat x1 as unaccessed when x2 is unaccessed in code like (let ((x2 x1 )) . . .). 22 · Jeffrey Mark Siskind all of its components are eliminated. To do this, Stalin needs a precise notion of what are all of the different kinds of aggregate objects and their components and under what circumstances these components are accessed and assigned. In addition to procedures, whose closures are handled by other mechanisms, Scheme has four kinds of aggregate objects: pairs, strings, vectors, and symbols. Pairs have two components: their car and cdr slots. Vectors and strings have two kinds of components: their length slot and their elements. Symbols have one component: their print-name–string slot. Additionally, all objects have a typetag slot. Each of these components is accessed or assigned by a disjoint set of primitives. The car and cdr primitives access the car and cdr slots respectively. The set-car! and set-cdr! primitives assign the car and cdr slots respectively. The string-length and vector-length primitives access the string and vector length slots respectively. The string-ref and vector-ref primitives access string and vector elements respectively. The string-set! and vector-set! primitives assign string and vector elements respectively. The symbol->string primitive accesses the print-name–string slot. And the not, boolean?, pair?, null?, symbol?, number?, real?, integer?, char?, string?, vector?, procedure?, input-port?, output-port?, and eof-object? primitives access the type-tag slot. Note that it is not possible to assign length, print-name–string, or type-tag slots. Stalin is able to eliminate abstract locations that contain a single compile-time– determinable concrete object. Such locations are called fictitious. This optimization will be discussed in detail in section 3.19. What is relevant here is that if none of the components of an abstract aggregate object are ever accessed then all of the concrete objects in that abstract object are indistinguishable. And a location that contains only indistinguishable objects can be treated as fictitious and eliminated. But in order for this to be sound, one other condition must be met. In Scheme, aggregate objects have identity in addition to components. Even if the components of an object are never accessed, it is still possible to check the identity of an object with the eq? primitive. In order to treat the concrete aggregate objects in an abstract aggregate object as indistinguishable, two conditions must hold: all of its components must be unaccessed and its identity must never be checked with the eq? primitive. Scheme nominally has a fifth kind of aggregate object: continuations. But continuations don’t have components that can be accessed or assigned, except for a type-tag slot. Nonetheless, it is convenient to treat continuations as accessed when they are called. This allows the concrete continuations in an abstract continuation to be treated as indistinguishable when their type-tag slot is never accessed and they are never called. Note that it is not necessary to determine that the identity of an abstract continuation is never checked in order to treat its concrete continuations as indistinguishable since R4RS does not define eq?-ness for continuations. Similarly, it is convenient to treat procedures as accessed when they are called. This allows the concrete procedures in an abstract procedure to be treated as indistinguishable when their type-tag slot is never accessed and they are never called. Again, note that it is not necessary to determine that the identity of an abstract procedure is never checked in order to treat its concrete procedures as indistinguishable since R4RS does not define eq?-ness for procedures. Stalin is able to do one further optimization. It can eliminate the code for prim- Flow-Directed Lightweight Closure Conversion · 23 itive predicates when flow analysis determines that a primcall to such a predicate will always return #t or always return #f. The concrete aggregate accesses12 and assigns relations are defined as follows: Definition 4. A one-argument primcall invocation ê to not, boolean?, pair?, null?, symbol?, number?, real?, integer?, char?, string?, vector?, procedure?, input-port?, output-port?, or eof-object? type-tag accesses a concrete object k passed as the argument, denoted TypeTagAccesses(ê, k), when ê is executed, unless flow analysis has determined that the invocation must return #t or must return #f. A two-argument primcall invocation ê to eq? eq? accesses a concrete object k passed as one of the arguments, denoted Eq?Accesses(ê, k), when ê is executed, unless flow analysis has determined that the invocation must return #t or must return #f. A one-argument primcall invocation ê to car car accesses a concrete pair k passed as the argument, denoted CarAccesses(ê, k), when ê is executed. A one-argument primcall invocation ê to cdr cdr accesses a concrete pair k passed as the argument, denoted CdrAccesses(ê, k), when ê is executed. A oneargument primcall invocation ê to string-length string-length accesses a concrete string k passed as the argument, denoted StringLengthAccesses(ê, k), when ê is executed. A two-argument primcall invocation ê to string-ref string-ref accesses a concrete string k passed as the first argument, denoted StringRefAccesses(ê, k), when ê is executed and an exact in-bounds integer is passed as the second argument. A one-argument primcall invocation ê to vector-length vector-length accesses a concrete vector k passed as the argument, denoted VectorLengthAccesses(ê, k), when ê is executed. A two-argument primcall invocation ê to vector-ref vector-ref accesses a concrete vector k passed as the first argument, denoted VectorRefAccesses(ê, k), when ê is executed and an exact in-bounds integer is passed as the second argument. A one-argument primcall invocation ê to symbol->string symbol-to-string accesses a concrete symbol k passed as the argument, denoted SymbolToStringAccesses(ê, k), when ê is executed. A call invocation ê to a concrete continuation k continuation accesses k, denoted ContinuationAccesses(ê, k), when ê is successful. A call invocation ê to a concrete procedure k procedure accesses k, denoted ProcedureAccesses(ê, k), when ê is successful. A two-argument primcall invocation ê to set-car! car assigns a concrete pair k passed as the first argument, denoted CarAssigns(ê, k), when ê is executed. A two-argument primcall invocation ê to set-cdr! cdr assigns a concrete pair k passed as the first argument, denoted CdrAssigns(ê, k), when ê is executed. A three-argument primcall invocation ê to string-set! string-ref assigns a concrete string k passed as the first argument, denoted StringRefAssigns(ê, k), when ê is executed, an exact in-bounds integer is passed as the second argument, and a character is passed as the third argument. A three-argument primcall invocation ê to vector-set! vector-ref assigns a concrete vector k passed as the first argument, denoted VectorRefAssigns(ê, k), when ê is executed and an exact in-bounds integer is passed as the second argument. Given the above, the concrete aggregate accessed and assigned properties are 12 Note that since Stalin does not currently implement rational and complex numbers, the primitives rational? and complex? are not included in the notion of type-tag access. 24 · Jeffrey Mark Siskind defined as follows: Definition 5. A concrete object k is type-tag, eq?, car, cdr, string-length, stringref, vector-length, vector-ref, symbol-to-string, continuation, or procedure accessed, or car, cdr, string-ref, or vector-ref assigned, denoted TypeTagAccessed(k), Eq?Accessed(k), CarAccessed(k), CdrAccessed(k), StringLengthAccessed(k), StringRefAccessed(k), VectorLengthAccessed(k), VectorRefAccessed(k), SymbolToStringAccessed(k), ContinuationAccessed(k), ProcedureAccessed(k), CarAssigned(k), CdrAssigned(k), StringRefAssigned(k), or VectorRefAssigned(k), if some expression invocation ê type-tag, eq?, car, cdr, string-length, string-ref, vector-length, vector-ref, continuation, or procedure accesses or car, cdr, string-ref, or vector-ref assigns k respectively. Given the above, the abstract aggregate accesses and assigns relations are defined as follows: Definition 6. An expression e type-tag, eq?, car, cdr, string-length, string-ref, vector-length, vector-ref, continuation, or procedure accesses, or car, cdr, stringref, or vector-ref assigns an abstract object σ, denoted TypeTagAccesses(e, σ), Eq?Accesses(e, σ), CarAccesses(e, σ), CdrAccesses(e, σ), StringLengthAccesses(e, σ), StringRefAccesses(e, σ), VectorLengthAccesses(e, σ), VectorRefAccesses(e, σ), SymbolToStringAccesses(e, σ), ContinuationAccesses(e, σ), ProcedureAccesses(e, σ), CarAssigns(e, σ), CdrAssigns(e, σ), StringRefAssigns(e, σ), or VectorRefAssigns(e, k), if some invocation ê of e type-tag, eq?, car, cdr, string-length, string-ref, vector-length, vector-ref, continuation, or procedure accesses or car, cdr, string-ref, or vector-ref assigns a concrete object k ∈ σ in some execution of the program respectively. Given the above, the abstract aggregate accessed and assigned properties are defined as follows: Definition 7. An abstract object σ is type-tag, eq?, car, cdr, string-length, string-ref, vector-length, vector-ref, symbol-to-string, continuation, or procedure accessed, or car, cdr, string-ref, or vector-ref assigned, denoted TypeTagAccessed(σ), Eq?Accessed(σ), CarAccessed(σ), CdrAccessed(σ), StringLengthAccessed(σ), StringRefAccessed(σ), VectorLengthAccessed(σ), VectorRefAccessed(σ), SymbolToStringAccessed(σ), ContinuationAccessed(σ), ProcedureAccesses(σ,), CarAssigned(σ), CdrAssigned(σ), StringRefAssigned(σ), or VectorRefAssigned(σ), if some expression invocation ê type-tag, eq?, car, cdr, string-length, string-ref, vector-length, vector-ref, continuation, or procedure accesses or car, cdr, string-ref, or vector-ref assigns a concrete object k ∈ σ in some execution of the program respectively. Stalin approximates the abstract aggregate accesses and assigns relations as Flow-Directed Lightweight Closure Conversion · 25 follows:  4 TypeTagAccesses(e, σ) = 4 Eq?Accesses(e, σ) = 4 CarAccesses(e, σ) = 4 CdrAccesses(e, σ) = 4 StringLengthAccesses(e, σ) = 4 StringRefAccesses(e, σ) = 4 VectorLengthAccesses(e, σ) = 4 VectorRefAccesses(e, σ) =  e ∈ R1 ∧ Executed(e)∧     not,             boolean?,             pair?,             null?,             number?,             real?,           integer?,  Callee(e) ∈ ∧   char?,             string?,             vector?,             procedure?,             input-port?,            output-port?,            eof-object?    σ ∈ β(Argument1 (e))∧  #t ∈ β(e) ∧ #f ∈ β(e)   e ∈ R2 ∧ Executed(e)∧   Callee(e) = eq?∧     σ ∈ β(Argument1 (e))∨   ∧   σ ∈ β(Argument2 (e)) #t ∈ β(e) ∧ #f ∈ β(e)   e ∈ R1 ∧ Executed(e)∧  Callee(e) = car∧  σ ∈ β(Argument1 (e))   e ∈ R1 ∧ Executed(e)∧  Callee(e) = cdr∧  σ ∈ β(Argument1 (e))   e ∈ R1 ∧ Executed(e)∧  Callee(e) = string-length∧  σ ∈ β(Argument1 (e))   e ∈ R2 ∧ Executed(e)∧  Callee(e) = string-ref∧     σ ∈ β(Argument1 (e))∧  number ∈ β(Argument2 (e))   e ∈ R1 ∧ Executed(e)∧  Callee(e) = vector-length∧  σ ∈ β(Argument1 (e))   e ∈ R2 ∧ Executed(e)∧  Callee(e) = vector-ref∧     σ ∈ β(Argument1 (e))∧  number ∈ β(Argument2 (e)) 26 · Jeffrey Mark Siskind 4 SymbolToStringAccesses(e, σ) = 4 ContinuationAccesses(e, σ) = 4 ProcedureAccesses(e, σ) = 4 CarAssigns(e, σ) = 4 CdrAssigns(e, σ) = 4 StringRefAssigns(e, σ) = 4 VectorRefAssigns(e, σ) =   e ∈ R1 ∧ Executed(e)∧  Callee(e) = symbol->string∧  σ ∈ β(Argument1 (e))   e ∈ C1 ∧ Executed(e)∧ σ ∈ β(Callee(e))   e ∈ C ∧ Executed(e)∧   σ ∈ β(Callee(e))∧ Arity(e) = Arity(σ)   e ∈ R2 ∧ Executed(e)∧  Callee(e) = set-car!∧  σ ∈ β(Argument1 (e))   e ∈ R2 ∧ Executed(e)∧  Callee(e) = set-cdr!∧  σ ∈ β(Argument1 (e))   e ∈ R3 ∧ Executed(e)∧   Callee(e) = string-set!∧    σ ∈ β(Argument1 (e))∧     number ∈ β(Argument2 (e))∧  char ∈ β(Argument3 (e))   e ∈ R3 ∧ Executed(e)∧  Callee(e) = vector-set!∧     σ ∈ β(Argument1 (e))∧  number ∈ β(Argument2 (e)) Stalin approximates the abstract aggregate accessed and assigned properties as follows: 4 TypeTagAccessed(σ) = (∃e ∈ R1 )TypeTagAccesses(e, σ) 4 Eq?Accessed(σ) = (∃e ∈ R2 )Eq?Accesses(e, σ) 4 CarAccessed(σ) = (∃e ∈ R1 )CarAccesses(e, σ) 4 CdrAccessed(σ) = (∃e ∈ R1 )CdrAccesses(e, σ) 4 StringLengthAccessed(σ) = (∃e ∈ R1 )StringLengthAccesses(e, σ) 4 StringRefAccessed(σ) = (∃e ∈ R2 )StringRefAccesses(e, σ) 4 VectorLengthAccessed(σ) = (∃e ∈ R1 )VectorLengthAccesses(e, σ) 4 VectorRefAccessed(σ) = (∃e ∈ R2 )VectorRefAccesses(e, σ) 4 SymbolToStringAccessed(σ) = (∃e ∈ R1 )SymbolToStringAccesses(e, σ) 4 ContinuationAccessed(σ) = (∃e ∈ C1 )ContinuationAccesses(e, σ) 4 ProcedureAccessed(σ) = (∃e ∈ C)ProcedureAccesses(e, σ) 4 CarAssigned(σ) = (∃e ∈ R2 )CarAssigns(e, σ) Flow-Directed Lightweight Closure Conversion · 27 4 CdrAssigned(σ) = (∃e ∈ R2 )CdrAssigns(e, σ) 4 StringRefAssigned(σ) = (∃e ∈ R3 )StringRefAssigns(e, σ) 4 VectorRefAssigned(σ) = (∃e ∈ R3 )VectorRefAssigns(e, σ) The aggregate access and assignment properties and relations obey the following abstraction lemma: Lemma 5. For all expressions e, invocations ê of e, abstract objects σ, and concrete objects k ∈ σ, if TypeTagAccesses(ê, k), then TypeTagAccesses(e, σ), if Eq?Accesses(ê, k), then Eq?Accesses(e, σ), if CarAccesses(ê, k), then CarAccesses(e, σ), if CdrAccesses(ê, k), then CdrAccesses(e, σ), if StringLengthAccesses(ê, k), then StringLengthAccesses(e, σ), if StringRefAccesses(ê, k), then StringRefAccesses(e, σ), if VectorLengthAccesses(ê, k), then VectorLengthAccesses(e, σ), if VectorRefAccesses(ê, k), then VectorRefAccesses(e, σ), if SymbolToStringAccesses(ê, k), then SymbolToStringAccesses(e, σ), if ContinuationAccesses(ê, k), then ContinuationAccesses(e, σ), if ProcedureAccesses(ê, k), then ProcedureAccesses(e, σ), if CarAssigns(ê, k), then CarAssigns(e, σ), if CdrAssigns(ê, k), then CdrAssigns(e, σ), if StringRefAssigns(ê, k), then StringRefAssigns(e, σ), and if VectorRefAssigns(ê, k), then VectorRefAssigns(e, σ). For all σ and k ∈ σ, if TypeTagAccessed(k), then TypeTagAccessed(σ), if Eq?Accessed(k), then Eq?Accessed(σ), if CarAccessed(k), then CarAccessed(σ), if CdrAccessed(k), then CdrAccessed(σ), if StringLengthAccessed(k), then StringLengthAccessed(σ), if StringRefAccessed(k), then StringRefAccessed(σ), if VectorLengthAccessed(k), then VectorLengthAccessed(σ), if VectorRefAccessed(k), then VectorRefAccessed(σ), if SymbolToStringAccessed(k), then SymbolToStringAccessed(σ), if ContinuationAccessed(k), then ContinuationAccessed(σ), if ProcedureAccessed(k), then ProcedureAccessed(σ), if CarAssigned(k), then CarAssigned(σ), if CdrAssigned(k), then CdrAssigned(σ), if StringRefAssigned(k), then StringRefAssigned(σ), and if VectorRefAssigned(k), then VectorRefAssigned(σ). The aggregate access and assignment properties and relations also obey the following conservative approximation lemma: Lemma 6. For all expressions e and abstract objects σ, if TypeTagAccesses(e, σ), then TypeTagAccesses(e, σ), if Eq?Accesses(e, σ), then Eq?Accesses(e, σ), if CarAccesses(e, σ), then CarAccesses(e, σ), · 28 Jeffrey Mark Siskind if CdrAccesses(e, σ), then CdrAccesses(e, σ), if StringLengthAccesses(e, σ), then StringLengthAccesses(e, σ), if StringRefAccesses(e, σ), then StringRefAccesses(e, σ), if VectorLengthAccesses(e, σ), then VectorLengthAccesses(e, σ), if VectorRefAccesses(e, σ), then VectorRefAccesses(e, σ), if SymbolToStringAccesses(e, σ), then SymbolToStringAccesses(e, σ), if ContinuationAccesses(e, σ), then ContinuationAccesses(e, σ), if ProcedureAccesses(e, σ), then ProcedureAccesses(e, σ), if CarAssigns(e, σ), then CarAssigns(e, σ), if CdrAssigns(e, σ), then CdrAssigns(e, σ), if StringRefAssigns(e, σ), then StringRefAssigns(e, σ), and if VectorRefAssigns(e, σ), then VectorRefAssigns(e, σ). For all σ, if TypeTagAccessed(σ), then TypeTagAccessed(σ), if Eq?Accessed(σ), then Eq?Accessed(σ), if CarAccessed(σ), then CarAccessed(σ), if CdrAccessed(σ), then CdrAccessed(σ), if StringLengthAccessed(σ), then StringLengthAccessed(σ), if StringRefAccessed(σ), then StringRefAccessed(σ), if VectorLengthAccessed(σ), then VectorLengthAccessed(σ), if VectorRefAccessed(σ), then VectorRefAccessed(σ), if SymbolToStringAccessed(σ), then SymbolToStringAccessed(σ), if ContinuationAccessed(σ), then ContinuationAccessed(σ), if ProcedureAccessed(σ), then ProcedureAccessed(σ), if CarAssigned(σ), then CarAssigned(σ), if CdrAssigned(σ), then CdrAssigned(σ), if StringRefAssigned(σ), then StringRefAssigned(σ), and if VectorRefAssigned(σ), then VectorRefAssigned(σ). 3.5 Approximating the abstract call-graph properties and relations After approximating the abstract aggregate access and assignment properties and relations, Stalin approximates the abstract call-graph properties and relations. The concrete direct call-graph properties and relations are defined as follows: Definition 8. If k1 and k2 are concrete procedures that were created by evaluating p1 and p2 respectively, the call invocation ê was created by calling k1 , the callee of ê is k2 , and ê is successful, then k2 is called and ê and k1 directly call k2 , denoted Called(k2 ), ê  k2 , and k1  k2 respectively. Furthermore, if e is in tail position, then ê and k1 directly tail call k2 , denoted ê t k2 and k1 t k2 respectively. And if e is not in tail position, then ê and k1 directly non-tail call k2 , denoted ê t k2 and k1 t k2 respectively. And if e is in tail position and p1 is in-lined in p2 , denoted p1 ,→∗ p2 , then ê and k1 directly self-tail call k2 , denoted ê st k2 and k1 st k2 respectively. And if either e is not in tail position or p1 is not in-lined in p2 , then ê and k1 directly non-self-tail call k2 , denoted ê st k2 and k1 st k2 respectively. Given the above, the concrete transitive call-graph relations are defined as follows: Definition 9. If n ≥ 2 and k1 , . . . , kn are concrete procedures that were created Flow-Directed Lightweight Closure Conversion · 29 by evaluating p1 , . . . , pn respectively, the call invocations eb1 , . . . , ed n−1 were created by calling k1 , . . . , kn−1 respectively, the callee of each eb1 , . . . , ed n−1 is k2 , . . . , kn respectively, and eb1 , . . . , ed are successful, then e b and k properly call kn , denoted n−1 1 1 + + eb1  kn and k1  kn respectively. Furthermore, if e1 , . . . , en−1 are all in tail + + position, then eb1 and k1 properly tail call kn , denoted eb1  t kn and k1  t kn respectively. And if one or more of e1 , . . . , en−1 are not in tail position, then eb1 + + and k1 properly non-tail call kn , denoted eb1  t kn and k1  t kn respectively. And if e1 , . . . , en−1 are all in tail position, and each p1 , . . . , pn−1 is in-lined in pn , p1 , . . . , pn−2 respectively, then eb1 and k1 properly self-tail call kn , denoted + + eb1  st kn and k1  st kn respectively. And if either one or more of e1 , . . . , en−1 are not in tail position or one or more of p1 , . . . , pn−1 are not in-lined in pn , p1 , . . . , pn−2 + respectively, then eb1 and k1 properly non-self-tail call kn , denoted eb1  st kn and + k respectively. k1  st n Given the above, the concrete reflexive-transitive call-graph relations are defined as follows: + ∗ Definition 10. If k1 = k2 or k1  k2 then k1 calls k2 , denoted k1  k2 . If + ∗ + k1 = k2 or k1  t k2 then k1 tail calls k2 , denoted k1  t k2 . If k1 = k2 or k1  t k2 + ∗ k . If k = k or k  k then k self-tail then k1 non-tail calls k2 , denoted k1  2 1 2 1 st 2 1 t ∗ + calls k2 , denoted k1  st k2 . If k1 = k2 or k1  st k2 then k1 non-self-tail calls k2 , ∗ k . denoted k1  2 st Given the above, the abstract call-graph properties and relations are defined as follows: Definition 11. An abstract procedure p is called, denoted Called(p), if some concrete procedure k ∈ p is called in some execution of the program. A call e directly calls an abstract procedure p, denoted e  p, if some call invocation ê of e directly calls some concrete procedure k ∈ p in some execution of the program. An abstract procedure p1 directly calls an abstract procedure p2 , denoted p1  p2 , if some concrete procedure k1 ∈ p1 directly calls some concrete procedure k2 ∈ p2 in + ,  + some execution of the program. Likewise for the relations t , t , st , st ,  t, + + ∗ ,  ∗ ∗ ∗ ∗ + ,  ,  ,  ,  ,  , and  .  st t st t st t st Note that the direct and proper tail, non-tail, self-tail, and non-self-tail call relations are all independent because it is possible for an abstract procedure to both directly tail and non-tail call another abstract procedure by different paths. Stalin approximates the abstract call-graph properties and relations as follows: 4 Called(p) = (∃e ∈ C)e  p 4 e  p = Executed(e) ∧ p ∈ β(Callee(e)) ∧ Arity(e) = Arity(p) 4 p1  p2 = (∃e ∈ C)p(e) = p1 ∧ e  p2 4 e t p = e  p ∧ InTailPosition(e) 4 p1 t p2 = (∃e ∈ C)p(e) = p1 ∧ e t p2 4 e t p = e  p ∧ ¬InTailPosition(e) 4 p1 t p2 = (∃e ∈ C)p(e) = p1 ∧ e t p2 30 · Jeffrey Mark Siskind 4 e st p = e  p ∧ InTailPosition(e) ∧ p ,→∗ p(e) 4 p1 st p2 = (∃e ∈ C)p(e) = p1 ∧ e st p2 4 e st p = e  p ∧ (¬InTailPosition(e) ∨ p 6,→∗ p(e)) 4 p1 st p2 = (∃e ∈ C)p(e) = p1 ∧ e st p2 4 + + p p1  2 = p1  p2 4 + + p1  t p 2 = p1  t p 2 4 0 0 0 ∗ + ∗ p1  t p2 = (∃p, p ∈ P )p1  p ∧ p t p ∧ p  p2 4 + + p1  st p2 = p1 st p2 ∨ (∃p ∈ P )p1 t p ∧ p ,→ p1 ∧ p  st p2   + 4 p1 t p ∧ p ,→ p1 ∧ p  st p2 ∨ + p1  p = p  p ∨ (∃p ∈ P ) 1 st 2 st 2 + p (p1 t p ∨ p1 t p ∧ p 6,→∗ p1 ) ∧ p  2 4 0 0 0 ∗ + p = (∃p ∈ P )e  p ∧ p  e p 4 0 0 0 ∗ + e t p = (∃p ∈ P )e t p ∧ p  t p 4 0 0 ∗ 0 0 0 ∗ + e t p = (∃p ∈ P )e t p ∧ p  p ∨ e t p ∧ p  t p 4 0 0 0 0 + + e st p = e st p ∨ (∃p ∈ P )e t p ∧ p ,→ p(e) ∧ p  st p2   0 0 0 + 4 e  p ∧ p ,→ p(e) ∧ p  p∨ t 0 st + e st p = e st p ∨ (∃p ∈ P ) (e  p0 ∨ e  p0 ∧ p0 6,→∗ p(e)) ∧ p0  + p t t 4 ∗ + p p1  p 2 = p1 = p 2 ∨ p 1  2 4 ∗ + p1  t p 2 = p1 = p 2 ∨ p 1  t p2 4 ∗ + p1  t p 2 = p1 = p 2 ∨ p 1  t p2 4 ∗ + p1  st p2 = p1 = p2 ∨ p1  st p2 4 ∗ + p1  st p2 = p1 = p2 ∨ p1  st p2 The call-graph properties and relations obey the following abstraction lemma: Lemma 7. For all expressions e, invocations ê of e, abstract procedures p, p1 , and p2 , and concrete procedures k ∈ p, k1 ∈ p1 , and k2 ∈ p2 : if Called(k), then Called(p), if ê  k, then e  p, if ê t k, then e t p, if ê t k, then e t p, if ê st k, then e st p, if ê st k, then e st p, if k1  k2 , then p1  p2 , if k1 t k2 , then p1 t p2 , if k1 t k2 , then p1 t p2 , if k1 st k2 , then p1 st p2 , if k1 st k2 , + k, then e  + p, if ê  + + + + then p1 st p2 , if ê  t k, then e  t p, if ê  t k, then e  t p, + + + + + + if ê  k, then e  p, if k  k , then p  p2 , if st k, then e  st p, if ê  1 2 1 st st + + + + + + k1  k , then p  p , if k  k , then p  t k2 , then p1  t p2 , if k1  2 1 2 1 st 2 1 st p2 , t t + + ∗ ∗ ∗ ∗ if k1  k , then p  p , if k  k , then p  p , if k  k , then p  2 1 2 1 2 1 2 1 t 2 1 t p2 , st st ∗ ∗ ∗ ∗ ∗ if k1  k , then p  p , if k  k , then p  p , and if k  k , then 2 1 2 1 st 2 1 st 2 1 2 t t st ∗ p1  p . 2 st The call-graph properties and relations also obey the following conservative approximation lemma: Lemma 8. For all expressions e and abstract procedures p, p1 , and p2 : if Flow-Directed Lightweight Closure Conversion · 31 Called(k), then Called(p), if e  p, then e  p, if e t p, then e t p, if e t p, then e t p, if e st p, then e st p, if e st p, then e st p, if p1  p2 , then p1  p2 , if p1 t p2 , then p1 t p2 , if p1 t p2 , then p1 t p2 , if p1 st p2 , then + p, then e  + p, if e  + + p1 st p2 , if p1 st p2 , then p1 st p2 , if e  t p, then e  t p, + + + + + + + p , if e  p, then e  p, if e  p, then e  p, if e  p, then e  p, if p  st st 1 2 t t st st + p , if p  + + + + then p1  2 1 + t p2 , then p1  t p2 , if p1  t p2 , then p1  t p2 , if p1  st p2 , + + + ∗ ∗ ∗ then p1  st p2 , if p1  st p2 , then p1  st p2 , if p1  p2 , then p1  p2 , if p1  t p2 , ∗ ∗ ∗ ∗ ∗ then p1  p , if p  p , then p  p , if p  p , then p  p , 1 1 st 2 1 t 2 1 st 2 and if t 2 t 2 ∗ ∗ p1  p , then p  p . 1 st 2 st 2 3.6 Approximating the abstract called-more-than-once property After approximating the abstract call-graph properties and relations, Stalin approximates the abstract called-more-than-once property. This information is used to determine which variables can be globalized. A variable can be globalized if it can have at most one live instance. A variable will have only one instance, hence at most one live instance, if the abstract procedure that binds that variable is not called more than once. The concrete called-more-than-once property is defined as follows: Definition 12. A concrete procedure k is called more than once, denoted CalledMoreThanOnce(k), if there are two or more successful call invocations that target k. Given the above, the abstract called-more-than-once property is defined as follows: Definition 13. An abstract procedure p is called more than once, denoted CalledMoreThanOnce(p), if some concrete procedure k ∈ p is called more than once in some execution of the program. Stalin approximates the property CalledMoreThanOnce(p) as follows: 4 CalledMoreThanOnce(p) =   p = 6 p ∧ 0     (∃e ∈ C)e  p ∧ CalledMoreThanOnce(p(e)) ∨  (∃e1 , e2 ∈ C)e1 6= e2 ∧ e1  p ∧ e2  p The called-more-than-once property obeys the following abstraction lemma: Lemma 9. For all abstract procedures p and concrete procedures k ∈ p, if CalledMoreThanOnce(k), then CalledMoreThanOnce(p). The called-more-than-once property also obeys the following conservative approximation lemma: Lemma 10. For all abstract procedures p, if CalledMoreThanOnce(p), then CalledMoreThanOnce(p). 3.7 Approximating the abstract free-in relation After approximating the abstract called-more-than-once property, Stalin approximates the abstract free-in relation. This information is used to support to support · 32 Jeffrey Mark Siskind parent-slot, closure-pointer–slot, and parent-parameter compression and elimination. If an abstract procedure p1 binds a variable x that is free in an abstract procedure p2 , then the closure for p1 must be accessible to p2 via the static backchain. Conversely, if an abstract procedure p1 does not bind any variables that are free in an abstract procedure p2 , then the closure for p1 , if it exists, need not be accessible to p2 , allowing parent-slot, closure-pointer–slot, and parent-parameter compression and elimination. Conventional implementations use a purely syntactic definition of the free-in relation: a variable x is free in an abstract procedure p that doesn’t bind x if there are any references to x nested in p. Stalin uses a definition that is more precise in two ways. First, a variable is not free if it is never accessed, even if it is freely assigned. This is sound because Stalin eliminates unaccessed variables and assignments to unaccessed variables. Second, unreached accesses and unexecuted assignments to a variable are ignored. A variable can be free only if it has reached free accesses or executed free assignments. This is sound because Stalin eliminates unreached accesses and unexecuted assignments. The concrete free-in relation is defined as follows: Definition 14. A variable x is free in some procedure p, denoted FreeIn(x, p), if p is properly nested in p(x), x is accessed, and there is some reached access e or executed assignment e that references x such that p(e) is nested in p. Stalin approximates the relation FreeIn(x, p) as follows:   + p  p(x) ∧ Accessed(x)∧   ≺ 4 FreeIn(x, p) =  (∃e ∈ A)Reached(e) ∧ x(e) = x ∧ p(e) ≺∗ p ∨  (∃e ∈ S)Executed(e) ∧ x(e) = x ∧ p(e) ≺∗ p The free-in relation obeys the following conservative approximation lemma: Lemma 11. For all abstract procedures p and variables x, if FreeIn(x, p), then FreeIn(x, p). 3.8 Approximating the abstract points-to relation After approximating the abstract free-in relation, Stalin approximates the abstract points-to relation. The abstract points-to relation is used to approximate the abstract escape relation which forms the basis of a must-alias analysis that is used to support variable-slot elimination, globalization, hiding, and determining when continuations are fictitious. The concrete points-to relation is defined as follows: Definition 15. A concrete object k directly points to a concrete location l, denoted k ; l, if l is a slot or element of k. A concrete location l directly points to a concrete object k, denoted l ; k, if k is the value of l. A concrete object k1 points to a concrete object k2 , denoted k1 ∗; k2 , if k1 ;∗ k2 . A concrete object k points to a concrete location l, denoted k ∗; l, if k ;∗ l. A concrete location l points to a concrete object k, denoted l ∗; k, if l ;∗ k. A concrete location l1 points to a concrete location l2 , denoted l1 ∗; l2 , if l1 ;∗ l2 . Given the above, the abstract points-to relation is defined as follows: Flow-Directed Lightweight Closure Conversion · 33 Definition 16. An abstract object σ directly points to an abstract location β, denoted σ ; β, if some concrete object in k ∈ σ directly points to some concrete location l ∈ β in some state in some execution of the program. An abstract location β directly points to an abstract object σ, denoted β ; σ, if some concrete location l ∈ β directly points to some concrete object k ∈ σ in some state in some execution of the program. An abstract object σ1 points to an abstract object σ2 , denoted σ1 ∗; σ2 , if some concrete object k1 ∈ σ1 points to some concrete object k2 ∈ σ2 in some state in some execution of the program. An abstract object σ points to an abstract location β, denoted σ ∗; β, if some concrete object k ∈ σ points to some concrete location l ∈ β in some state in some execution of the program. An abstract location β points to an abstract object σ, denoted β ∗; σ, if some concrete location l ∈ β points to some concrete object k ∈ σ in some state in some execution of the program. An abstract location β1 points to an abstract location β2 , denoted β1 ∗; β2 , if some concrete location l1 ∈ β1 points to some concrete location l2 ∈ β2 in some state in some execution of the program. Stalin approximates the ; relation as follows: 4 β;σ = σ∈ β    σ  ∈ P ∧ ProcedureAccessed(σ)∧  ∨   (∃x ∈ X)Accessed(x) ∧ β = β(x) ∧ FreeIn(x,    σ)    Car(σ) = β ∧ CarAccessed(σ)∨   Pair?(σ) ∧ ∨   Cdr(σ) = β ∧ CdrAccessed(σ)     4  String?(σ) ∧ String-Ref(σ) = β∧   σ;β =  ∨     StringRefAccessed(σ)    Vector?(σ) ∧ Vector-Ref(σ) = β∧   ∨   VectorRefAccessed(σ)      Symbol?(σ) ∧ Symbol->String(σ) = β∧ SymbolToStringAccessed(σ) Stalin then approximates the ∗; relation, denoted ∗; , as ;∗ . The points-to relation obeys the following abstraction lemma: Lemma 12. For all pairs of abstract objects σ1 and σ2 , concrete objects k1 ∈ σ1 , concrete objects k2 ∈ σ2 , pairs of abstract locations β1 and β2 , concrete locations l1 ∈ β1 , and concrete locations l2 ∈ β2 , if, in some state in some execution of the program, k1 ; l1 , l1 ; k1 , k1 ∗; k2 , k1 ∗; l2 , l1 ∗; k2 , or l1 ∗; l2 , then σ1 ; β1 , β1 ; σ1 , σ1 ∗; σ2 , σ1 ∗; β2 , β1 ∗; σ2 , or β1 ∗; β2 respectively. The points-to relation also obeys the following conservative approximation lemma: Lemma 13. For all pairs of abstract objects σ1 and σ2 and pairs of abstract locations β1 and β2 , if σ1 ; β1 , β1 ; σ1 , σ1 ∗; σ2 , σ1 ∗; β2 , β1 ∗; σ2 , or β1 ∗; β2 , then σ1 ; β1 , β1 ; σ1 , σ1 ∗; σ2 , σ1 ∗; β2 , β1 ∗; σ2 , or β1 ∗; β2 respectively. 3.9 Approximating the abstract escape relation After approximating the abstract points-to relation, Stalin approximates the abstract escape relation. The abstract escape relation forms the basis of a must-alias 34 · Jeffrey Mark Siskind analysis that is used to support variable-slot elimination, globalization, hiding, and determining when continuations are fictitious. Before defining the concrete escape relation, the following auxiliary definitions are needed: Definition 17. A reached lambda expression invocation creates a concrete procedure. A successful primcall invocation to cons creates a concrete pair. A successful primcall invocation to string or make-string creates a concrete string. A successful primcall invocation to vector or make-vector creates a concrete vector. A successful primcall invocation to string->symbol creates a symbol. A successful primcall invocation to call/cc creates a continuation. An expression invocation that type-tag, eq?, car, cdr, string-length, string-ref, vector-length, vector-ref, symbol-to-string, or continuations accesses a concrete object accesses that object. An expression invocation that car, cdr, string-ref, or vector-ref assigns a concrete object assigns that object. A reached access invocation ê accesses a concrete procedure k, if k binds x̂(ê). An executed assignment invocation ê assigns a concrete procedure k, if k binds x̂(ê). Given the above, concrete escape relation is defined as follows: Definition 18. A concrete object k escapes escapes an expression invocation ê, denoted k ⇑ ê, if k is created after ê is reached but before ê returns and is accessed after ê returns. Given the above, the abstract escape relation is defined as follows: Definition 19. An abstract object σ escapes an expression e, denoted σ ⇑ e, if some concrete object k ∈ σ escapes some invocation ê of e in some execution of the program. The escape relation will be needed only when e is the body of a lambda expression. This leads to the following definition: Definition 20. An abstract object σ escapes an abstract procedure p, denoted σ ⇑ p, if σ escapes Body(p). Flow-Directed Lightweight Closure Conversion · 35 Stalin approximates the relation σ ⇑ p as follows:   Returns(Body(p)) ∧ β(Body(p)) ∗; σ∨         0   Executed(e) ∧ Returns(e )∧       0    Accessed(x(e)) ∧ p(e )    + p  ∗ p(e)∧       ∃ e ∈ S,   0 0     ∗ ∗ β(Source(e)) ; σ ∧ β(e ) ; p ∧   0       e ∈ E, ∨   0 +     p ≺ p(x(e))∧   0       p ∈ P   00 00     Reached(e ) ∧ x(e ) = x(e)∧   00   (∃e ∈ A)   00 ∗ 0   p(e ) ≺ p         0     Executed(e) ∧ p  + p  ∗   p(e)∧   ∃ e ∈ C1 ,   0       Continuation?(σ )∧ 0       p ∈ P, ∨       ∗   β(Argument (e)) ; σ∧ 1 0   σ ∈ β(Callee(e))   0 0     p  ∈ β(Argument2 (e(σ )))  4 0 σ⇑p = Executed(e) ∧ Returns(e )∧        p(e0 )  + p  ∗  p(e) ∧ Pair?(σ 0 )∧        ∃ e ∈ C2 ,      Callee(e) = set-car!∨   0         e ∈ E, ∨ ∧       Callee(e) = set-cdr! 0       σ ∈ β(Argument1 (e))     ∗   β(Argument (e)) ; σ∧ 2     0 0   ∗   β(e ) ; σ         0   Executed(e) ∧ Returns(e )∧       0 0 0     + p  ∗ ∗   p(e )  p(e) ∧ β(e ) ; σ ∧      ∃ e ∈ C3 ,    0     Vector?(σ )∧   0       e ∈ E,       Callee(e) = vector-set!∧   0     σ ∈ β(Argument1 (e))       number ∈ β(Argument (e))∧ 2     β(Argument3 (e)) ∗; σ Note that calls to string-set! do not cause their third argument to escape because that argument must always be a character and characters are not created. The escape relation obeys the following abstraction lemma: Lemma 14. For all abstract objects σ, expressions e, concrete objects k ∈ σ, and invocations ê of e, if k ⇑ ê, then σ ⇑ e. The escape relation also obeys the following conservative approximation lemma: Lemma 15. For all abstract objects σ and abstract procedures p, if σ ⇑ p, then σ⇑p. 3.10 Deciding which procedures to in-line After approximating the abstract escape relation, Stalin decides which procedures to in-line. The resulting in-lined-in relation is important since an accessed, nonfictitious, non-global variable slot can be eliminated only if all reached accesses and executed assignments are in-lined in the procedure that binds that variable. Stalin in-lines procedures that have a unique call site. Stalin translates self tail calls as gotos. Such self tail calls do not count as call sites when making inlining decisions. Moreover, the notion of self tail call used by Stalin is sensitive to in-lining decisions. Definition 21. A call e is a unique call site of p, denoted UniqueCallSite(e, p), if e directly non-self-tail calls p and there is no other call e0 that directly non-self- · 36 Jeffrey Mark Siskind tail calls p. If e is the unique call site of p then p is directly in-lined in p(e), denoted p ,→ p(e). Let ,→∗ denote the reflexive-transitive closure of ,→. If p1 ,→∗ p2 , then p1 is in-lined in p2 . The above definition must be approximated, because the direct non-self-tail call relation must be approximated. Furthermore, the above definition is circular. The directly–in-lined-in relation ,→ depends on the notion of unique call site, which depends on the notion of direct non-self-tail call, which depends on the relation ,→∗ , which depends back on the relation ,→. To approximate the in-lining relation and break this circularity, Stalin finds a least fixed point to approximations of the above definitions. In other words, it finds a minimal relation ,→ that satisfies the following constraints: 4 e st p = e  p ∧ (¬InTailPosition(e) ∨ p(e) 6,→∗ p) 4 UniqueCallSite(e, p) = e st p ∧ ¬(∃e0 ∈ E)e0 6= e ∧ e0 st p 4 p1 ,→ p2 = (∃e ∈ C)p2 = p(e) ∧ UniqueCallSite(e, p1 ) It may seem overly restrictive to in-line only procedures that have a unique call site. Stalin overcomes this restriction by doing a polyvariant flow analysis and splitting procedures, assigning different copies to different call sites. As a result of such splitting, individual copies might have a unique call site, and thus might be in-lined, even if the original procedure would have multiple call sites under a monovariant analysis. The details of the polyvariant flow analysis and splitting procedure are beyond the scope of this paper. They are discussed in detail in Siskind [2000b]. 3.11 Approximating the reentrant property After deciding which procedures to in-line, Stalin approximates the reentrant property. The reentrant property is used to justify globalization. A variable can be globalized when it has at most one live instance. Variables bound by reentrant procedures can have more than one live instance, one for each reentrant invocation. Thus non-reentrancy is one requirement for globalization. A procedure is recursive if it can call itself. A procedure is reentrant if it can have more than one live activation record. Since tail-merged calls do not create new activation records, not all recursive procedures are reentrant. In an implementation that merged all tail calls, a procedure would be reentrant only if it properly non-tail called itself. Stalin, however, does not merge all tail calls. It only merges self tail calls. Because of this, the concrete reentrant property is defined as follows: Definition 22. A concrete procedure k is reentrant, denoted Reentrant(k), if k properly non-self-tail calls itself. Given the above, the abstract reentrant property is defined as follows: Definition 23. An abstract procedure p is reentrant, denoted Reentrant(p), if some concrete procedure k ∈ p is reentrant in some execution of the program. Stalin approximates the property Reentrant(p), denoted Reentrant(p), as + p st p. The reentrant property obeys the following abstraction lemma: Flow-Directed Lightweight Closure Conversion · 37 Lemma 16. For all abstract procedures p and concrete procedures k ∈ p, if Reentrant(k), then Reentrant(p). The reentrant property also obeys the following conservative approximation lemma: Lemma 17. For all abstract procedures p, if Reentrant(p), then Reentrant(p). 3.12 Approximating the abstract must-alias properties Sound lightweight closure conversion requires determining a number of abstract must-alias properties. These are defined as follows: Definition 24. A variable x must alias, denoted MustAlias(x), if every reached access invocation ê to x accesses the instance x̂ of x bound by the most recent active invocation of some concrete procedure k ∈ p(x). An abstract continuation σ must alias, denoted MustAlias(σ), if for every successful call invocation ê to some concrete continuation k ∈ σ, k was created by the most recent active invocation of e(σ). An abstract procedure p must alias, denoted MustAlias(p), either if p has no parent parameter or if for every successful call invocation ê to some concrete procedure k ∈ p, the most recent active invocation of ParentParameter(p) when k is called is the same as the most recent active invocation of ParentParameter(p) when k was created. The variable must-alias property is used to justify variable-slot elimination and globalization. The abstract-continuation must-alias property is used to determine when continuations are fictitious. The abstract-procedure must-alias property is used to justify hiding. The variable must-alias property can be violated in only two ways: either a. some reference is to an instance in an invocation that has already returned or b. some reference is to an instance in a recursive invocation that is not the current invocation. Condition (a) occurs only when the following situation arises: (lambda (. . . x . . .) . . . (lambda (. . .) . . . xe or (set! x . . .)e . . .)p . . .)p(x) Here, a nontrivial reference e to x is nested in p, which is, in turn, properly nested in p(x), and p escapes p(x). Note that p(e) could be the same as p but, in order for p to escape p(x), p cannot be the same as p(x). Calling p after p escapes p(x) allows a reference to x after p(x) returns. Condition (b) occurs only when the following situation arises: (lambda (. . . x0 . . .) . . . (lambda (. . . x . . .) . . . x0e0 . . . (lambda (. . .) . . . xe or (set! x . . .)e . . .)p . . .)p(x) . . .)p(x0 ) 38 · Jeffrey Mark Siskind Here, p(x) is nested in p(x0 ), p(x) is recursive, a nontrivial reference e to x is nested in p, which is, in turn, properly nested in and properly called by p(x), and an access e0 to x0 is nested in p(x), where β(e0 ) points to p. Note that p(x) can be the same as p(x0 ), p(e0 ) can be the same as p(x), p(e) can be the same as p, and e0 can be nested anywhere in p(x), including inside p. Passing p, which contains a reference to x, created on one invocation of p(x), to a recursive invocation, and calling p in that recursive invocation, allows an access to x in other than the most recent invocation. Such passing can only happen via an x0 . Note that p cannot be the same as p(x) in order for e to access a different instance of x than the one in the most recent invocation. Stalin approximates the property MustAlias(x) as follows: 4 MustAlias(x) =     x(e) = x ∧ p(e) ≺∗ p ≺+ p(x)∧ ¬(∃e ∈ A ∪ S, p ∈ P ) ∧  NonTrivialReference(e) ∧ p⇑p(x)        NonTrivialReference(e)∧       p(e0 ) ≺∗ p(x) ≺∗ p(x(e0 ))∧      ¬(∃e ∈ A ∪ S, e0 ∈ A, p ∈ P )  p(e) ≺∗ p ≺+ p(x)∧         p(x)    + p(x) ∧ x(e) = x∧ 0 ∗ + p β(e ) ; p ∧ p(x)  Note that the above approximation uses the property NonTrivialReference(e) which will be defined in section 3.14. The abstract-continuation must-alias property can be violated in only two ways: either a. the continuation is accessed after the expression invocation that created that continuation returns or b. the continuation is accessed when there is an active recursive invocation of the expression that created that continuation that is not the invocation that created that continuation. Condition (a) occurs only when the following situation arises: (call/cc . . .)e(σ) . . . e Here, σ escapes e(σ) and some expression e accesses σ. Condition (b) occurs only when the following situation arises: (lambda (. . .) . . . e . . .)p(e) . . . (lambda (. . . x . . .) . . . (lambda (x0 ) . . . xe0 . . .)p . . . (lambda (. . .) . . . (call/cc . . .)e(σ) . . .)p(e(σ)) . . .)p(x) Here, a call e accesses the abstract continuation σ, p is passed σ by virtue of the fact that it is called by the call e(σ) to call/cc, p is properly nested in p(x), Flow-Directed Lightweight Closure Conversion · 39 p(e(σ)) is nested in p(x), p and p(e(σ)) properly call each other, and an access e0 to x is nested in p, where β(e0 ) points to σ. Stalin approximates the property MustAlias(σ) as follows: 4 MustAlias(σ) =      TypeTagAccessed(σ) ∨ ContinuationAccessed(σ) ∧          p ∈ P ∧ σ ∈ β(Parameter1 (p))∧  ∧    ¬   (∃p ∈ β(Argument1 (e(σ))))     σ⇑p        + p(e(σ))  + p ∧ p  ∗  p(e)∧ p     β(e0 ) ∗; σ ∧ p ≺+ p(x(e0 ))∧ ∃ e ∈ C,      ∗ 0 0       p(e(σ)) ≺ p(x(e ))∧ e ∈ A, ¬             TypeTagAccesses(e, σ)∨ p ∈ β(Argument1 (e(σ)))       ContinuationAccesses(e, σ) Note that Eq?Accessed(p) is not needed in the above since R4RS does not specify the conditions for equality between continuations. The abstract-procedure must-alias property can be violated in only two ways: either a. p is accessed after ParentParameter(p) returns or b. p is accessed when there is an active recursive invocation of ParentParameter(p) that is not the invocation that created p. Condition (a) occurs only when the following situation arises: (lambda (. . .) . . . (lambda (. . .) . . .)p . . .)ParentParameter(p) . . . e Here, p escapes ParentParameter(p) and some expression e accesses p. Condition (b) occurs only when the following situation arises: (lambda (. . .) . . . e . . .)p(e) . . . (lambda (. . . x . . .) . . . (lambda (. . .) . . . xe0 . . .)p0 . . .)p(x) Here p(e0 ) is nested in p0 which is in turn nested in p(x(e0 )), ParentParameter(p) is nested in p0 , β(e0 ) points to p, p0 is recursive, p0 calls p(e), and e accesses p. · 40 Jeffrey Mark Siskind Stalin approximates the property MustAlias(p) as follows: 4 MustAlias(p) =     HasParentParameter(p) ∧ p⇑ParentParameter(p)∧     ¬ ∧     TypeTagAccessed(p) ∨ ProcedureAccessed(p)         0 ∗ 0 ∗ 0 0 0   + p ∧ p(e ) ≺ p ≺ p(x(e )) ∧ p    0 0   β(e ) ∗; p ∧ p  ∗ p(e)∧        HasParentParameter(p)∧ ¬(∃e ∈ C, e0 ∈ A, p0 ∈ P )            ParentParameter(p) ≺∗ p0 ∧          TypeTagAccesses(e, p) ∨ e  p Note that the above approximation uses the property HasParentParameter(p) and the function ParentParameter(p) which will be defined in section 3.19. Also note that Eq?Accessed(p) is not needed in the above since R4RS does not specify the conditions for equality between procedures. The must-alias properties obey the following conservative approximation lemma: Lemma 18. For all variables x, if ¬MustAlias(x), then ¬MustAlias(x). For all abstract continuations σ, if ¬MustAlias(σ), then ¬MustAlias(σ). For all abstract procedures p, if ¬MustAlias(p), then ¬MustAlias(p). 3.13 Approximating the abstract fictitious property Stalin is able to eliminate locations, and the code to manipulate such locations, when they are known to always hold the same concrete object. Such data is called fictitious. The concrete fictitious property is defined as follows: Definition 25. A concrete location l is fictitious, denoted Fictitious(l), if it always contains the same concrete object. Given the above, the abstract fictitious property is defined as follows: Definition 26. An abstract location β is fictitious, denoted Fictitious(β), if every l ∈ β is fictitious in every execution of the program. Stalin approximates the fictitious property by declaring an abstract location to be fictitious if it contains a single abstract object that, in turn, contains a single concrete object. The particular abstract interpretation that Stalin uses puts each of the concrete objects (), #t, #f, and eof-object into abstract objects that contain only that concrete object. Furthermore, if an abstract procedure p is not procedure accessed or does not have a parent parameter then all of the concrete procedures k ∈ p are indistinguishable so p can be treated as if it contained a single concrete procedure. Additionally, if all of the components of some abstract pair, string, vector, or external symbol σ are never accessed, or contain fictitious locations, then all of the concrete objects k ∈ σ are indistinguishable so σ can be treated as if it contained a single concrete object. Finally, Stalin compiles a call to an abstract continuation σ as a goto when a. the call to σ is in-lined in the call to call/cc that created σ and b. σ must alias. Flow-Directed Lightweight Closure Conversion · 41 This is described in greater detail in Siskind [2000a]. When this occurs for all calls to σ, or when σ is not continuation accessed, all of the concrete continuations k ∈ σ are indistinguishable so σ can be treated as if it contained a single concrete continuation. Stalin approximates the property Fictitious(β) as follows: 4 Fictitious(σ) =   ¬TypeTagAccessed(σ)∧     σ = () ∨ σ = #t ∨ σ = #f ∨ σ = eof-object∨             ¬ProcedureAccessed(σ)∨      ∨ σ ∈ P ∧      ¬HasParentParameter(σ))            Pair?(σ) ∧ ¬Eq?Accessed(σ)∧               Fictitious(Car(σ)) ∨ ¬CarAccessed(σ) ∧ ∨               Fictitious(Cdr(σ)) ∨ ¬CdrAccessed(σ)         String?(σ) ∧ ¬Eq?Accessed(σ)∧       ∨     ¬StringLengthAccessed(σ) ∧ ¬StringRefAccessed(σ)               Vector?(σ) ∧ ¬Eq?Accessed(σ)∧     ¬VectorLengthAccessed(σ)∧      ∨      Fictitious(Vector-Ref(σ))∨                ¬VectorRefAccessed(σ)             Symbol?(σ) ∧ ¬Eq?Accessed(σ)∧      ∨      ¬SymbolToStringAccessed(σ)             Continuation?(σ)∧               ¬ContinuationAccessed(σ)∨                       MustAlias(σ)∧               (∀e ∈ C1 )             ContinuationAccesses(e, σ) → p(e) ,→∗ p(e(σ)) 4 Fictitious(β) = ||β|| = 1 ∧ (∀σ ∈ β)Fictitious(σ) Note that the above approximation uses the property HasParentParameter(p) which will be defined in section 3.19. The fictitious property obeys the following abstraction lemma: Lemma 19. For all abstract locations β and concrete locations l ∈ β, if ¬Fictitious(l), then ¬Fictitious(β). The fictitious property also obeys the following conservative approximation lemma: Lemma 20. For all abstract locations β, if ¬Fictitious(β), then ¬Fictitious(β). 3.14 Approximating the abstract nontrivial-reference property Stalin does not generate any code for unreached accesses, unexecuted assignments, and assignments to fictitious or hidden variables. Furthermore, an access to a variable which can only yield a single compile-time determinable object need not generate code to access that variable and can instead generate code to return that object as a constant. All other references are called nontrivial. Only nontrivial references count as free references. · 42 Jeffrey Mark Siskind The concrete nontrivial-reference property is defined as follows: Definition 27. An access invocation ê is nontrivial, denoted NonTrivialReference(ê), if it is reached and returns an object that is not known at compile time. An assignment invocation ê is nontrivial, denoted NonTrivialReference(ê), if it is executed, x(e) is not hidden, and there is a subsequent nontrivial access to x̂(ê). Given the above, the abstract nontrivial-reference property is defined as follows: Definition 28. A reference e is nontrivial, denoted NonTrivialReference(e), if some invocation ê of e is nontrivial in some execution of the program. Stalin approximates the property NonTrivialReference(e) as follows: 4 NonTrivialReference(e) =    e ∈ A ∧ Reached(e) ∧ ¬Fictitious(β(e)) ∨    e ∈ S ∧ Executed(e) ∧ Accessed(x(e))∧  ¬Fictitious(β(x(e))) ∧ ¬Hidden(x(e)) Note that the above approximation uses the property Hidden(x) which will be defined in section 3.19. The nontrivial-reference property obeys the following abstraction lemma: Lemma 21. For all references e and invocations ê of e, if NonTrivialReference(ê), then NonTrivialReference(e). The nontrivial-reference property also obeys the following conservative approximation lemma: Lemma 22. For all references e, if NonTrivialReference(e), then NonTrivialReference(e). 3.15 Approximating the abstract localizable property In order to eliminate a variable slot and represent a variable as a c local variable it must be localizable. The abstract localizable property is defined as follows: Definition 29. A variable x is localizable, denoted Localizable(x), if x must alias and all nontrivial references to x are in-lined in the procedure that binds x. Stalin approximates the property Localizable(x) as follows: 4 Localizable(x) =   MustAlias(x)∧ ¬(∃e ∈ A ∪ S)x(e) = x ∧ NonTrivialReference(e) ∧ p(e) 6,→∗ p(x) The localizable property obeys the following conservative approximation lemma: Lemma 23. For all variables x, if ¬Localizable(x), then ¬Localizable(x). Flow-Directed Lightweight Closure Conversion 3.16 · 43 Approximating the abstract globalizable property In order to represent a variable as a c global variable it must be globalizable. The abstract globalizable property is defined as follows: Definition 30. A variable x is globalizable, denoted Globalizable(x), if it can have at most one live instance. If the procedure that binds a variable is not called more than once then that variable can have at most one live instance. Also, if the procedure that binds the variable is not reentrant and the variable must alias then that variable can have at most one live instance. Stalin approximates the property Globalizable(x) as follows:   4 ¬CalledMoreThanOnce(p(x))∨ Globalizable(x) = ¬Reentrant(p(x)) ∧ MustAlias(x) The globalizable property obeys the following conservative approximation lemma: Lemma 24. For all variables x, if ¬Globalizable(x), then ¬Globalizable(x). 3.17 Approximating the abstract ancestor relation Certain expressions, such as free references to slotted and hidden variables, as well as lambda expressions that contain such free references, access the parent parameter and parent slots of procedures to get a pointer to the requisite closure. The abstract ancestor relation is defined as follows: Definition 31. A procedure p1 is an ancestor of a procedure p2 , denoted Ancestor(p1 , p2 ), if some reached expression in p2 accesses a pointer to the closure for p1 . Stalin approximates the relation Ancestor(p1 , p2 ) as follows: 4 Ancestor(p1 , p2 ) =  ∧ p(x) = p1 ∧ FreeIn(x, p2 )∨    Slotted(x)  (∃x ∈ X) Hidden(x) ∧ (p(x) = p2 ∨ FreeIn(x, p2 ))   (∃p ∈ P )β(x) = {p} ∧ Ancestor(p1 , p) Note that the above approximation uses the properties Slotted(x) and Hidden(x) which will be defined in section 3.19. The ancestor relation obeys the following conservative approximation lemma: Lemma 25. For all abstract procedures p1 and p2 , if Ancestor(p1 , p2 ), then Ancestor(p1 , p2 ). 3.18 Approximating the abstract hideable property In order to eliminate a variable slot and have all accesses to that variable access some closure record on the static backchain it must be hideable. The abstract hideable property is defined as follows: Definition 32. A variable x is hideable, denoted Hideable(x), if a. the abstract location of x contains a single abstract procedure p, · 44 Jeffrey Mark Siskind b. for all nontrivial accesses e to x, p(e) is nested in every ancestor of p, and c. p must alias. Stalin approximates the property Hideable(x) as follows: 4 Hideable(x) = β(x) = {p} ∧ MustAlias(p)∧  (∀e ∈ A, p0 ∈ P )   (∃p ∈ P )   x(e) = x ∧ Ancestor(p0 , p)∧ → p(e) ≺∗ p0 NonTrivialReference(e)     The hideable property obeys the following conservative approximation lemma: Lemma 26. For all variables x, if ¬Hideable(x), then ¬Hideable(x). 3.19 Lightweight closure conversion It is now possible to describe the lightweight closure-conversion process. Recall that the output of lightweight closure conversion consists of the following annotations: Local(x) Global(x) Hidden(x) Slotted(x) HasClosure(p) HasParentSlot(p) ParentSlot(p) HasParentParameter(p) ParentParameter(p) x will be allocated as a local variable. x will be allocated as a global variable. x will be allocated as a hidden closure slot. x will be allocated as a closure slot. p will have a closure and closure pointer. p will have a parent slot. The parent slot for p will point to the closure for ParentSlot(p). p will have a parent parameter and closurepointer slot. The parent parameter and closure-pointer slot for p will point to the closure for ParentParameter(p). Unaccessed and fictitious variables are eliminated. The ones that remain are made either local, global, hidden, or slotted. Localizable variables can be made local. Globalizable variables can be made global. Hideable variables can be made hidden. Variables that are neither localizable, globalizable, nor hideable must be made slotted. It is possible for more than one of Localizable(x), Globalizable(x), and Hideable(x) to be true of a given variable x. In this situation, making a variable local is given preference to making a variable global since access to local variables is faster, especially since current c compilers do intraprocedural optimization but not interprocedural optimization. 4 Local(x) = Accessed(x) ∧ ¬Fictitious(β(x)) ∧ Localizable(x)   4 Accessed(x) ∧ ¬Fictitious(β(x))∧ Global(x) = Globalizable(x) ∧ ¬Local(x)   4 Accessed(x) ∧ ¬Fictitious(β(x))∧ Hidden(x) = Hideable(x) ∧ ¬Local(x) ∧ ¬Global(x) Flow-Directed Lightweight Closure Conversion 4 Slotted(x) =  Accessed(x) ∧ ¬Fictitious(β(x))∧ ¬Local(x) ∧ ¬Global(x) ∧ ¬Hidden(x) · 45  An abstract procedure has a closure if it binds some slotted variable. 4 HasClosure(p) = (∃x ∈ X)p(x) = p ∧ Slotted(x) An abstract procedure p has a parent slot if some abstract procedure p0 is an ancestor of p and p is an ancestor of some abstract procedure p00 that is directly nested in p. 4 HasParentSlot(p) = (∃p0 ∈ P, p00 ≺ p)Ancestor(p, p00 ) ∧ Ancestor(p0 , p) The parent slot of an abstract procedure p is the narrowest such procedure p0 . 4 ParentSlot(p) = min {p0 ∈ P |(∃p00 ≺ p)Ancestor(p, p00 ) ∧ Ancestor(p0 , p)} ∗ ≺ An abstract procedure p has a parent parameter and closure-pointer slot if it has an ancestor. 4 HasParentParameter(p) = (∃p0 ∈ P )Ancestor(p0 , p) The parent parameter and closure-pointer slot of an abstract procedure p is the narrowest such ancestor. 4 ParentParameter(p) = min {p0 ∈ P |Ancestor(p0 , p)} ∗ ≺ The optimizations described in section 2 can now be formulated in terms of the properties and relations enumerated above. Parent-slot, closure-pointer–slot, parent-parameter, parent-passing, and parent-spilling elimination: The parent slot and spilling for x are eliminated when ¬HasParentSlot(x). The closure-pointer slot, parent parameter, and parent passing for x are eliminated when ¬HasParentParameter(x). Eliminating indirection through the closure pointer: A bound access to x can proceed via its variable parameter rather than via the closure pointer if ¬Assigned(x). Eliminating fictitious variables: The slot, parameter, passing, and spilling for x are eliminated when Fictitious(β(x)). Eliminating variables that aren’t accessed: The slot, parameter, passing, and spilling for x are eliminated when ¬Accessed(x). Variable-slot elimination: The slot and spilling for x are eliminated when ¬Slotted(x). Closure elimination, closure-pointer elimination, and parent-slot compression: The closure and closure-pointer for p are eliminated when ¬HasClosure(p). When the parent slot for p is not eliminated, it is compressed to ParentSlot(p). Closure-pointer–slot and parent-parameter compression: When the closure-pointer slot and parent parameter for p are not eliminated, they are compressed to ParentParameter(p). Globalization: A variable x is globalized when Global(x). Hiding: A variable x is hidden when Hidden(x). · 46 Jeffrey Mark Siskind The soundness of lightweight closure conversion is summarized by the following theorem: Theorem 1. For all variables x, a. at most one of Local(x), Global(x), Hidden(x), and Slotted(x) are true, b. if there is a nontrivial access to x, then one of Local(x), Global(x), Hidden(x), or Slotted(x) is true, c. if Local(x), then x is localizable, d. if Global(x), then x is globalizable, e. if Hidden(x), then x is hideable, and f. if Slotted(x), then HasClosure(p(x)). For all nontrivial references e, if Slotted(x(e)) and p(x(e)) 6= p(e), then g. HasParentParameter(p(e)) and h. there exists i ≥ 0 such that i. for all 0 ≤ j < i, HasParentSlot(ParentSlotj (ParentParameter(p(e)))) and j. p(x(e)) = ParentSloti (ParentParameter(p(e))). The approximations given in sections 3.12 through 3.19 are circular. To resolve this circularity, Stalin uses prioritized circumscription [McCarthy 1980]. It first finds a solution to the approximations with a set of hidden variables such that there is no solution to the approximations with a superset of that set of hidden variables. Fixing this set of hidden variables, it then finds a solution to the approximations with a set of abstract objects that are fictitious and a set of pairs of procedures p1 and p2 for which p1 is not an ancestor of p2 such that there is no solution to the approximations with a superset of those fictitious variables and those pairs of procedures p1 and p2 for which p1 is not an ancestor of p2 . 4. EXPERIMENTAL RESULTS The lightweight closure-conversion method described in this paper has been implemented as part of the Stalin compiler13 for Scheme. As originally implemented, lightweight closure conversion is a necessary component of Stalin. In order to evaluate the lightweight closure-conversion method while keeping all other aspects of the compilation process invariant, Stalin was modified to allow it to use two other closure-conversion methods. The first, the baseline method, is as described in section 2. In this method, there is no variable, parameter, slot, passing, spilling, closure, or closure-pointer elimination, no slot or parameter compression, no globalization, and no hiding. This method, however, still uses direct procedure calls and eliminates indirection through the closure pointer for free references, though not for bound accesses of unassigned variables. This is an overly conservative method. The second, the conventional method, is an attempt to model common practice in other non–whole-program compilers for Scheme and similar lexically-scoped higher-order 13 The version of the compiler used to run the experiments described in this paper, as well as all of the benchmark code, is available from ftp://ftp.nj.nec.com/pub/qobi/stalin-0.9.tar.Z. Flow-Directed Lightweight Closure Conversion · 47 languages. In this method, the only optimizations performed are those that can be driven by simple syntactic analysis of individual top-level procedures without flow analysis. More precisely, the baseline method was implemented by taking all of the properties and relations as defined in section 3 with the following exceptions: 4 Reached(e) = true 4 Returns(e) = true 4 Fictitious(β) = false 4 Accessed(x) = true 4 Assigned(x) = true 4 Localizable(x) = false 4 Globalizable(x) = false 4 Hideable(x) = false And the conventional method was implemented by taking all of the properties and relations as defined in section 3 with the following exceptions: 4 Reached(e) = true 4 Returns(e) = true 4 Fictitious(β) = false 4 Accessed(x) = x is defined at the top level ∨ (∃e ∈ A)x(e) = x 4 Assigned(x) = x is defined at the top level ∨ (∃e ∈ S)x(e) = x   x(e) = x ∧ p(e) 6= p(x(e))∧ 4 Localizable(x) = ¬(∃e ∈ A ∪ S) NonTrivialReference(e)   4 x is bound by a let expression∧ Globalizable(x) = every surrounding expression is also a let 4 Hideable(x) = false The above definitions for Reached, Accessed, and Assigned were used only during closure conversion. The full definition of these properties, along with the full whole-program interprocedural flow analysis, was used for all other analyses, optimization, and code generation performed by the compiler. The code generator needed some modification to support the baseline and conventional methods. As originally implemented, the code generator never generated slots, elements, parameters, locals, or globals for fictitious locations. Because such locations were always eliminated under the lightweight method. With the baseline and conventional methods, however, fictitious locations are not eliminated. Thus the code generator was modified so that dummy c ints were used wherever a fictitious object was needed. 48 · Jeffrey Mark Siskind Twenty Scheme benchmarks14 were compiled using each of the baseline, conventional, and lightweight methods. All runs were done on one processor of an unloaded Dell PowerEdge 6350 with four 450MHz Pentium II Xeon processors with 512K L2 cache and 1G main memory running Red Hat 6.0 with kernel 2.2.10/SMP and egcs 2.91.66. Tables II, III, and IV give the variable-, procedure-, and referenceelimination statistics for these runs respectively. All entries are static counts except for the last column in table IV which gives run time in CPU seconds. The first, second, and third lines for each benchmark show the results for the baseline, conventional, and lightweight methods respectively. Table II shows the total number of variables, the number of variables for which Accessed(x) or Assigned(x) is false or Fictitious(β(x)) is true, the number of variables that are eliminated, and the number of variables for which Local(x), Global(x), Hidden(x), and Slotted(x) are true. Table III shows the total number of procedures, the number of procedures for which HasClosure(p), HasParentSlot(p), and HasParentParameter(p) are true, and the amount of parent parameter and slot compression, the number of procedures for which HasParentSlot(p) and HasParentParameter(p) is true but for which ParentSlot(p) and ParentParameter(p) do not equal p(p) respectively. Table IV shows the average number of indirections (i.e. c -> operators) per reference, the total number of accesses, the number of nontrivial accesses, the total number of assignments, the number of nontrivial assignments, and the user+system CPU time for the benchmarks as reported by the time command. Note that the variable, procedure, and reference counts include variables, procedures, and references in the standard prelude as well as those that are introduced by macro expanding the source program, except where such variables, procedures, and references are eliminated by a simple syntax-directed dead-code–elimination prepass. These results show some important trends. First, flow-directed reachability analysis allows a substantial increase in the number of variables that are determined to be unassigned or unaccessed. Second, in all of the benchmarks, the lightweight method eliminates many more variables, accesses, and assignments and substantially reduces the number of slotted variables, the number of procedures that require closures, parent parameters, and parent slots, and the average number of indirections per reference, as compared with both the baseline and conventional methods. In half of the benchmarks, the lightweight method eliminates all slotted variables and, with that, all closures, parent parameters, parent slots, and indirection during variable reference. Third, in all of the benchmarks, the code produced by the lightweight method runs substantially faster than the code produced by both the baseline and conventional methods. Table V compares the run times for these twenty benchmarks compiled with Stalin using the lightweight method as compared with Scheme->C, Gambit-C, 14 The version of boyer that was used was obtained from the Scheme repository. Andrew Wright provided the versions of graphs and lattice that were used. Saumya Debray provided the versions of nucleic2, matrix, earley, scheme, and conform that were used. William Clinger provided the versions of nboyer, sboyer, and dynamic that were used. Bruno Haible provide the version of fannkuch that was used. Richard O’Keefe provided the versions of integ, gold, and sort that were used. The simplex, em-functional, em-imperative, and nfm benchmarks were written by the author. · Flow-Directed Lightweight Closure Conversion 49 Table II. Variable-elimination statistics for twenty Scheme benchmarks. The first, second, and third row for each benchmark give the statistics for the baseline, conventional, and lightweight methods respectively. All entries are static counts. vars boyer graphs lattice nucleic2 matrix earley scheme conform nboyer sboyer dynamic fannkuch simplex em-functional em-imperative nfm integ gold sort rrr 2193 2193 2193 2473 2473 2473 2258 2258 2258 3212 3212 3212 2617 2617 2617 2899 2899 2899 2928 2928 2928 2711 2711 2711 2380 2380 2380 2384 2384 2384 7822 7822 7822 2158 2158 2158 2240 2240 2240 3628 3628 3628 2807 2807 2807 4481 4481 4481 2104 2104 2104 2185 2185 2185 2192 2192 2192 2755 2755 2755 not assigned not accessed fictitious eliminated local global hidden slotted 0 1315 1937 0 1589 2209 0 1380 1998 0 2055 2673 0 1711 2329 0 1962 2579 0 1951 2438 0 1737 2324 0 1480 2087 0 1483 2090 0 6513 7071 0 1287 1907 0 1346 1970 0 2582 3190 0 1820 2426 0 3347 3932 0 1243 1859 0 1310 1926 0 1318 1935 0 1812 2426 0 393 2056 0 536 2209 0 384 2043 0 658 2450 0 417 2067 0 556 2228 0 704 1248 0 528 2051 0 432 2041 0 433 2042 0 905 2169 0 388 2072 0 441 2145 0 716 2282 0 604 2168 0 733 2150 0 360 2017 0 381 2038 0 388 2049 0 547 2175 0 0 2106 0 0 2260 0 0 2116 0 0 2658 0 0 2254 0 0 2357 0 0 1763 0 0 2290 0 0 2149 0 0 2151 0 0 3932 0 0 2098 0 0 2174 0 0 2666 0 0 2358 0 0 3123 0 0 2045 0 0 2077 0 0 2110 0 0 2331 0 393 2109 0 536 2306 0 384 2126 0 658 2724 0 417 2261 0 556 2366 0 704 1815 0 528 2297 0 432 2153 0 433 2155 0 905 3978 0 388 2103 0 441 2180 0 716 2672 0 604 2366 0 733 3156 0 360 2047 0 381 2083 0 388 2119 0 547 2343 0 953 80 0 977 142 0 992 117 0 1286 406 0 1144 307 0 1104 530 0 950 1014 0 1108 388 0 1004 219 0 1006 221 0 4510 3698 0 934 53 0 938 50 0 1368 944 0 1062 429 0 2043 1315 0 939 52 0 961 97 0 950 71 0 1188 375 0 240 4 0 247 16 0 251 4 0 473 65 0 274 10 0 244 2 0 489 14 0 309 12 0 287 8 0 288 8 0 460 59 0 234 2 0 239 10 0 325 12 0 304 12 0 274 10 0 237 4 0 246 5 0 244 2 0 262 36 0 0 0 0 0 2 0 0 1 0 0 2 0 0 3 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2193 607 0 2473 713 7 2258 631 10 3212 795 15 2617 782 36 2899 995 1 2928 785 85 2711 766 12 2380 657 0 2384 657 0 7822 1947 87 2158 602 0 2240 622 0 3628 1219 0 2807 837 0 4481 1431 0 2104 568 1 2185 597 0 2192 610 0 2755 758 1 50 · Jeffrey Mark Siskind Table III. Procedure-elimination statistics for twenty Scheme benchmarks. The first, second, and third row for each benchmark give the statistics for the baseline, conventional, and lightweight methods respectively. All entries are static counts. boyer graphs lattice nucleic2 matrix earley scheme conform nboyer sboyer dynamic fannkuch simplex em-functional em-imperative nfm integ gold sort rrr procedures closures parent slots parent parameters parent slot compression parent parameter compression 2285 2285 2285 2351 2351 2351 2181 2181 2181 2868 2868 2868 2533 2533 2533 2640 2640 2640 2773 2773 2773 2626 2626 2626 2466 2466 2466 2467 2467 2467 6830 6830 6830 2109 2109 2109 2249 2249 2249 3458 3458 3458 2761 2761 2761 4170 4170 4170 2045 2045 2045 2100 2100 2100 2139 2139 2139 2600 2600 2600 1132 450 0 1330 536 6 1195 476 4 1427 583 7 1432 582 24 1536 676 1 1404 573 49 1414 582 5 1243 503 0 1243 503 0 4656 1463 87 1133 460 0 1201 490 0 2097 920 0 1543 662 0 2548 1077 0 1092 438 1 1132 452 0 1153 466 0 1452 559 1 34 22 0 109 87 4 47 30 2 137 68 1 136 101 15 241 218 0 444 326 9 214 156 0 109 44 0 109 44 0 845 535 27 43 35 0 67 62 0 437 343 0 273 233 0 554 431 0 38 26 0 54 41 0 43 35 0 160 126 0 110 82 0 270 240 32 143 111 14 462 297 30 368 315 110 489 443 9 1495 1313 103 576 432 40 319 216 0 319 216 0 3532 2238 94 105 89 0 179 167 0 1071 948 0 639 585 0 1613 1432 0 100 79 1 134 110 0 124 110 0 432 388 1 6 2 0 35 33 2 8 2 0 27 25 0 30 23 8 53 42 0 163 120 0 31 22 0 27 18 0 27 18 0 112 86 0 11 10 0 26 25 0 59 51 0 75 69 0 98 90 0 7 7 0 16 15 0 7 7 0 47 42 0 31 23 0 83 77 25 43 31 9 64 58 23 96 80 65 121 95 8 629 567 54 155 102 25 85 66 0 85 66 0 810 643 5 30 27 0 78 76 0 212 196 0 187 177 0 348 336 0 24 20 0 42 36 0 31 30 0 149 141 0 Flow-Directed Lightweight Closure Conversion · 51 Table IV. Reference-elimination statistics for twenty Scheme benchmarks. The first, second, and third row for each benchmark give the statistics for the baseline, conventional, and lightweight methods respectively. All entries are static counts, except for the last column which is in CPU seconds. boyer graphs lattice nucleic2 matrix earley scheme conform nboyer sboyer dynamic fannkuch simplex em-functional em-imperative nfm integ gold sort rrr average indirections per reference accesses nontrivial accesses assignments nontrivial assignments 1.147 0.357 0.000 1.690 1.038 0.044 1.164 0.385 0.026 1.316 0.314 0.006 1.425 0.616 0.065 1.862 1.384 0.003 1.743 0.742 0.020 1.336 0.555 0.032 1.257 0.428 0.000 1.257 0.426 0.000 1.322 0.473 0.008 1.267 0.478 0.000 3.529 2.233 0.000 1.770 0.933 0.000 2.300 1.361 0.000 1.651 0.843 0.000 1.162 0.298 0.001 1.448 0.706 0.000 1.187 0.440 0.000 2.804 1.201 0.000 4402 4402 4402 4604 4604 4604 4311 4311 4311 7659 7659 7659 5280 5280 5280 5246 5246 5246 5823 5823 5823 5144 5144 5144 4898 4898 4898 4908 4908 4908 16602 16602 16602 3995 3995 3995 4289 4289 4289 6505 6505 6505 5239 5239 5239 8282 8282 8282 3938 3938 3938 4160 4160 4160 4114 4114 4114 5319 5319 5319 4402 4402 212 4604 4604 501 4311 4311 281 7659 7659 1147 5280 5280 729 5246 5246 1113 5823 5823 3280 5144 5144 981 4898 4898 630 4908 4908 638 16602 16602 6555 3995 3995 146 4289 4289 334 6505 6505 2109 5239 5239 1271 8282 8282 2400 3938 3938 167 4160 4160 351 4114 4114 204 5319 5319 1109 483 483 483 478 478 478 472 472 472 751 751 751 500 500 500 531 531 531 572 572 572 573 573 573 514 514 514 515 515 515 945 945 945 465 465 465 500 500 500 642 642 642 584 584 584 728 728 728 455 455 455 469 469 469 470 470 470 545 545 545 263 263 13 264 264 1 260 260 4 539 539 50 288 288 2 320 320 3 524 524 48 392 392 20 313 312 32 314 313 32 793 793 106 251 251 1 282 282 23 440 440 16 384 384 25 549 549 7 245 245 0 259 259 3 258 258 4 337 337 13 run time 149.740 106.450 94.820 73.470 47.560 15.270 332.790 239.490 53.480 28.340 20.570 17.510 488.060 257.770 100.380 84.120 42.660 22.720 444.220 257.750 235.630 527.740 289.690 198.320 35.310 27.540 23.700 33.130 23.880 17.250 57.670 31.950 28.430 525.030 511.530 52.750 15.320 11.190 2.210 70.440 30.920 7.360 18.130 9.400 2.410 18.600 15.500 5.030 4.390 3.540 1.460 16.990 6.110 5.520 60.650 31.780 12.500 748.320 493.390 143.100 52 · Jeffrey Mark Siskind Bigloo, and Chez. For all but three of the benchmarks, Stalin significantly outperforms the other compilers. Note that many of the compilers were run with options that violate R4RS semantics. In particular, Scheme->C was run with -On for many of the benchmarks, Gambit-C was run with standard-bindings and extended-bindings for all of the benchmarks and with fixnum for many of the benchmarks, and Bigloo was run with -farithmetic for many of the benchmarks. This, in essence, manually gives these compilers information that can only be determined by whole-program interprocedural flow analysis. Information that Stalin soundly and automatically determines on its own. When Scheme->C, Gambit-C, and Bigloo are run without these options, the ratio of run times relative to Stalin are even more favourable to Stalin than the results presented in table V. 5. CONCLUSION A number of researchers have pursued work along the lines of the work presented in this paper. Deutsch [1990], Emami and Hendren [1994], and Jagannathan et al. [1998] present aliasing analyses that are based on flow analysis. Henglein [1992], Wand and Steckler [1994], Steckler [1994b], Steckler [1994c], Steckler [1994a], Minamide et al. [1996], and Steckler and Wand [1997] present alternate approaches to lightweight closure conversion. The techniques of Steckler and Wand, in particular, are incomparable to the techniques presented in this paper. Cases can be constructed where the approach of Steckler and Wand performs optimizations that the approach presented in this paper does not perform, and vice versa. Whole program analysis, in particular flow, reachability, points-to, and escape analysis, when used to support the lightweight closure-conversion method described in this paper, offers significant reduction in —variable parameters and variable slots, —parent parameters and parent slots, —closures, closure pointers, and closure-pointer slots, —variable parameter and parent parameter spilling, —variable parameter and parent parameter passing, and —indirection in variable reference. Furthermore, it results in substantial improvements in compiled-code speed. REFERENCES Clinger, W. and Rees, J. 1991. Revised4 Report on the Algorithmic Language Scheme. Deutsch, A. 1990. On determining lifetime and aliasing of dynamically allocated data in higherorder functional specifications. In Proceedings of the 17th Annual ACM Symposium on Principles of Programming Languages. 157–168. Emami, M. and Hendren, R. G. L. J. 1994. Context-sensitive interprocedural points-to analysis in the presence of function pointers. In Proceedings of the 1994 SIGPLAN Conference on Programming Language Design and Implementation. 242–256. Heintze, N. 1993. Set based analysis of ml programs. Tech. Rep. CMU CS 93-193, School of Computer Science, Carnegie-Mellon University. July. Heintze, N. 1994. Set based analysis of ml programs. In Proceedings of the 1994 ACM Conference on Lisp and Functional Programming. Flow-Directed Lightweight Closure Conversion · 53 Table V. The ratio of CPU times to run the twenty benchmarks under Scheme->C (15mar93), Gambit-C (3.0), Bigloo (2.0e), and Chez (5.0a) relative to Stalin (0.9). All runs were done on one processor of an unloaded Dell PowerEdge 6350 with four 450MHz Pentium II Xeon processors with 512K L2 cache and 1G main memory running Red Hat 6.0 with kernel 2.2.10/SMP and egcs 2.91.66. All compilers were run with settings to give maximal speed at the expense of minimal safety. Scheme->C was run with -Ob -Og -Ot -O3 -fomit-frame-pointer -freg-struct-return. Scheme->C was given the additional option -On for the boyer, graphs, lattice, matrix, earley, scheme, conform, nboyer, sboyer, dynamic, fannkuch, and rrr benchmarks. Gambit-C was run with r4rs-scheme, block, standard-bindings, extended-bindings, not safe, not interrupts-enabled, and -O3 -fomit-frame-pointer -freg-struct-return -D___SINGLE_HOST. Gambit-C was given the additional option fixnum for the same benchmarks that Scheme->C was given the -On option. Bigloo was run with -call/cc -unsafe -Obench -O6 -fstack -copt "-O3 -fomit-frame-pointer -freg-struct-return". Bigloo was given the additional option -farithmetic for the same benchmarks that Scheme->C was given the -On option. Chez was run with optimize-level 3. Stalin was run with -Ob -Om -On -Or -Ot -copt -O3 -copt -fomit-frame-pointer -copt -freg-struct-return. Stalin was given the additional options -clone-size-limit 0 -do-not-index-allocated-structure-types-by-expression -do-not-index-constant-structure-types-by-expression -treat-all-symbols-as-external for the scheme benchmark, the additional option -do-not-index-allocated-structure-types-by-expression for the dynamic benchmark, and the additional option -d for those benchmarks that were not given -On under Scheme->C. Times are relative best-of-three-successive user+system CPU times as reported by the time command. Runs where Stalin performs worse than other implementations are highlighted in bold. Scheme->C Gambit-C Bigloo Chez Stalin Stalin Stalin Stalin boyer 1.611 4.095 1.331 2.225 graphs 3.742 8.764 18.112 2.036 lattice 2.493 3.123 1.740 1.795 nucleic2 12.391 44.851 9.962 8.420 matrix 1.773 2.948 1.545 1.889 earley 1.106 3.668 1.454 1.844 scheme 0.555 0.816 0.345 0.422 conform 1.204 2.765 1.025 1.544 nboyer 3.527 4.145 5.437 3.972 sboyer 1.228 2.406 1.963 1.647 dynamic 0.366 1.620 0.479 0.518 fannkuch 1.104 1.075 0.742 3.677 simplex 3.789 30.426 5.108 4.441 em-functional 6.993 18.503 4.868 3.758 em-imperative 6.680 24.768 6.755 5.141 nfm 3.645 163.917 6.683 7.950 integ 17.828 44.953 15.485 9.931 gold 13.578 47.564 12.758 8.391 sort 3.410 12.467 2.449 1.569 rrr 1.289 10.804 1.020 5.010 54 · Jeffrey Mark Siskind Henglein, F. 1992. Simple closure analysis. Tech. Rep. D-193, Department of Computer Science, University of Copenhagen. Mar. Jagannathan, S., Thiemann, P., Weeks, S. T., and Wright, A. K. 1998. Single and loving it: Must-alias analysis for higher-order languages. In Proceedings of the 1998 Symposium on Principles of Programming Languages. Kelsey, R., Clinger, W., and Rees, J. 1998. Revised5 report on the algorithmic language Scheme. Higher-Order and Symbolic Computation 11, 1 (Sept.). McCarthy, J. 1980. Circumscription—a form of non-monotonic reasoning. Artificial Intelligence 13, 1–2, 27–39. Minamide, Y., Morrisett, J. G., and Harper, R. 1996. Typed closure conversion. In Proceedings of the 1996 Symposium on Principles of Programming Languages. 271–283. Shivers, O. 1988. Control flow analysis in Scheme. In Proceedings of the 1988 SIGPLAN Conference on Programming Language Design and Implementation. 164–174. Shivers, O. 1990. Data-flow analysis and type recovery in Scheme. In Topics in Advanced Language Implementation. The MIT Press. Appears as Technical Report CMU-CS-90-115. Shivers, O. 1991a. Control-flow analysis of higher-order languages or taming lambda. Ph.D. thesis, School of Computer Science, Carnegie-Mellon University. Shivers, O. 1991b. The semantics of Scheme control-flow analysis. In ACM Symposium on Partial Evaluation and Semantics-Based Program Manipulation. 190–198. Appears as Technical Report CMU-CS-91-119. Siskind, J. M. 2000a. Flow-directed lightweight CPS conversion. In preparation. Siskind, J. M. 2000b. Flow-directed polyvariance. In preparation. Siskind, J. M. 2000c. Flow-directed representation selection. In preparation. Siskind, J. M. 2000d. Flow-directed storage management. In preparation. Steckler, P. A. 1994a. Correct higher-order program transformations. Ph.D. thesis, College of Computer Science, Northeastern University. Steckler, P. A. 1994b. Correct separate and selective closure conversion. Tech. rep., Laboratory for the Foundations of Computer Science, Edinburgh University. Oct. Steckler, P. A. 1994c. Tracking available values for lightweight closures. Tech. rep., College of Computer Science, Northeastern University. Mar. Steckler, P. A. and Wand, M. 1997. Lightweight closure conversion. ACM Transactions on Programming Languages and Systems 19, 1 (Jan.), 48–86. Wand, M. and Steckler, P. A. 1994. Selective and lightweight closure conversion. In Proceedings of the 21nd Annual ACM Symposium on Principles of Programming Languages. 435–445. Flow-Directed Lightweight Closure Conversion A. · GLOSSARY A.1 Syntactic terminology source callee argument parameters arity body antecedent consequent alternate x q e p p0 X E A S C Ci R Ri P Source(e) x(e) Callee(e) Arguments(e) Argumenti (e) Parameteri (e) Arity(e) Body(e) p(x) p(e) p1 ≺ p2 p1 ≺+ p2 p1 ≺ ∗ p2 InTailPosition(e) the first subexpression of an assignment the first subexpression of a call or primcall any subexpressions of a call or primcall except the first the variables of a lambda expression the number of arguments of a call or primcall or the number of parameters of a lambda expression the subexpression of a lambda expression the first subexpression of a conditional the second subexpression of a conditional the third subexpression of a conditional a variable a primitive an expression a lambda expression, also an abstract procedure the top-level lambda expression that contains the entire source program the set of all variables in the source program the set of all expressions in the source program the set of all accesses in the source program the set of all assignments in the source program the set of all calls in the source program the set of all calls of arity i in the source program the set of all primcalls in the source program the set of all primcalls of arity i in the source program the set of all lambda expressions in the source program the source subexpression of an assignment e the variable in an access or assignment e the callee subexpression of a call or primcall e the set of argument subexpressions of a call or primcall e the ith argument subexpression of a call or primcall e the ith parameter of a lambda expression e the arity of a call, primcall, or lambda expression e the body subexpression of a lambda expression e the lambda expression in which x is bound the narrowest lambda expression that properly contains e p1 is directly nested in p2 p1 is properly nested in p2 p1 is nested in p2 e is in tail position 55 56 · A.2 Flow analysis terminology Jeffrey Mark Siskind k l σ β k∈l σ∈β β(x) β(e) Pair?(σ) Car(σ) Cdr(σ) String?(σ) String-Ref(σ) Vector?(σ) Vector-Ref(σ) Symbol?(σ) Symbol->String(σ) Continuation?(σ) e(σ) a concrete object a concrete location an abstract object an abstract location l can take on k as its value some l ∈ β can take on some k ∈ σ as its value in some execution of the program the abstract location associated with x the abstract location associated with the result of evaluating e σ is an abstract pair the abstract location containing the car slots of the concrete pairs in the abstract pair σ the abstract location containing the cdr slots of the concrete pairs in the abstract pair σ σ is an abstract string the abstract location containing the elements of the concrete strings in the abstract string σ σ is an abstract vector the abstract location containing the elements of the concrete vectors in the abstract vector σ σ is an abstract symbol the abstract location containing the print-name strings of the concrete symbols in the abstract symbol σ σ is an abstract continuation the call to call/cc where the concrete continuations in the abstract continuation σ were created Flow-Directed Lightweight Closure Conversion A.3 · Reachability analysis terminology ê x̂ Reached(ê) Returns(ê) Executed(ê) Accessed(x̂) Assigned(x̂) successful Reached(e) Returns(e) Executed(e) Accessed(e) Assigned(e) an invocation of e an instance of x control flows to the program point just before ê control flows to the program point just after ê control flows to the intermediate program point in an assignment, call, or primcall invocation ê some access invocation ê to x is reached some assignment invocation ê to x is reached the call invocation ê is executed and the arity of the call site equals the arity of the target procedure or continuation or the primcall invocation ê is executed and the arity of the call site is allowed by the primitive some invocation ê of e is reached in some execution of the program some invocation ê of e returns in some execution of the program some invocation ê of an assignment, call, or primcall e is executed in some execution of the program some instance x̂ of x is accessed in some execution of the program some instance x̂ of x is assigned in some execution of the program 57 58 · A.4 Concrete aggregate access and assignment terminology—I Jeffrey Mark Siskind TypeTagAccesses(ê, k) Eq?Accesses(ê, k) CarAccesses(ê, k) CdrAccesses(ê, k) StringLengthAccesses(ê, k) StringRefAccesses(ê, k) VectorLengthAccesses(ê, k) VectorRefAccesses(ê, k) SymbolToStringAccesses(ê, k) ContinuationAccesses(ê, k) ProcedureAccesses(ê, k) CarAssigns(ê, k) CdrAssigns(ê, k) StringRefAssigns(ê, k) VectorRefAssigns(ê, k) the primcall invocation ê accesses the type tag slot of k the primcall invocation ê accesses the identity of k the primcall invocation ê accesses the car slot of the concrete pair k the primcall invocation ê accesses the cdr slot of the concrete pair k the primcall invocation ê accesses the length slot of the concrete string k the primcall invocation ê accesses an element of the concrete string k the primcall invocation ê accesses the length slot of the concrete vector k the primcall invocation ê accesses an element of the concrete vector k the primcall invocation ê accesses the print-name–string slot of the concrete symbol k the call invocation ê calls the concrete continuation k the call invocation ê successfully calls the concrete procedure k the primcall invocation ê assigns the car slot of the concrete pair k the primcall invocation ê assigns the cdr slot of the concrete pair k the primcall invocation ê assigns an element of the concrete string k the primcall invocation ê assigns an element of the concrete vector k Flow-Directed Lightweight Closure Conversion A.5 · Concrete aggregate access and assignment terminology—II TypeTagAccessed(k) Eq?Accessed(k) CarAccessed(k) CdrAccessed(k) StringLengthAccessed(k) StringRefAccessed(k) VectorLengthAccessed(k) VectorRefAccessed(k) SymbolToStringAccessed(k) ContinuationAccessed(k) ProcedureAccessed(k) CarAssigned(k) CdrAssigned(k) StringRefAssigned(k) VectorRefAssigned(k) the type-tag slot of k is accessed the identity of k is accessed the car slot of the concrete pair k is accessed the cdr slot of the concrete pair k is accessed the length slot of the concrete string k is accessed an element of the concrete string k is accessed the length slot of the concrete vector k is accessed an element of the concrete vector k is accessed the print-name–string slot of the concrete symbol k is accessed the concrete continuation k is called the concrete procedure k is called the car slot of the concrete pair k is assigned the cdr slot of the concrete pair k is assigned an element of the concrete string k is assigned an element of the concrete vector k is assigned 59 60 · A.6 Abstract aggregate access and assignment terminology—I Jeffrey Mark Siskind TypeTagAccesses(e, σ) Eq?Accesses(e, σ) CarAccesses(e, σ) CdrAccesses(e, σ) StringLengthAccesses(e, σ) StringRefAccesses(e, σ) VectorLengthAccesses(e, σ) VectorRefAccesses(e, σ) SymbolToStringAccesses(e, σ) ContinuationAccesses(e, σ) ProcedureAccesses(e, σ) CarAssigns(e, σ) CdrAssigns(e, σ) StringRefAssigns(e, σ) VectorRefAssigns(e, σ) some invocation ê of the primcall e accesses the type-tag slot of some k ∈ σ some invocation ê of the primcall e accesses the identity of some k ∈ σ some invocation ê of the primcall e accesses the car slot of some concrete pair k ∈ σ some invocation ê of the primcall e accesses the cdr slot of some concrete pair k ∈ σ some invocation ê of the primcall e accesses the length slot of some concrete string k ∈ σ some invocation ê of the primcall e accesses an element of some concrete string k ∈ σ some invocation ê of the primcall e accesses the length slot of some concrete vector k ∈ σ some invocation ê of the primcall e accesses an element of some concrete vector k ∈ σ some invocation ê of the primcall e accesses the print-name–string slot of some concrete symbol k ∈ σ some invocation ê of the call e calls some concrete continuation k ∈ σ some invocation ê of the call e successfully calls some concrete procedure k ∈ σ some invocation ê of the primcall e assigns the car slot of some concrete pair k ∈ σ some invocation ê of the primcall e assigns the cdr slot of some concrete pair k ∈ σ some invocation ê of the primcall e assigns an element of some concrete string k ∈ σ some invocation ê of the primcall e assigns an element of some concrete vector k ∈ σ Flow-Directed Lightweight Closure Conversion A.7 · 61 Abstract aggregate access and assignment terminology—II TypeTagAccessed(σ) Eq?Accessed(σ) CarAccessed(σ) CdrAccessed(σ) StringLengthAccessed(σ) StringRefAccessed(σ) VectorLengthAccessed(σ) VectorRefAccessed(σ) SymbolToStringAccessed(σ) ContinuationAccessed(σ) ProcedureAccessed(σ) CarAssigned(σ) CdrAssigned(σ) StringRefAssigned(σ) VectorRefAssigned(σ) the type-tag slot of some k ∈ σ is accessed the identity of some k ∈ σ is accessed the car slot of some concrete pair k ∈ σ is accessed the cdr slot of some concrete pair k ∈ σ is accessed the length slot of some concrete string k ∈ σ is accessed an element of some concrete string k ∈ σ is accessed the length slot of some concrete vector k ∈ σ is accessed an element of some concrete vector k ∈ σ is accessed the print-name–string slot of some concrete symbol k ∈ σ is accessed some concrete continuation k ∈ σ is called some concrete procedure k ∈ σ is called the car slot of some concrete pair k ∈ σ is assigned the cdr slot of some concrete pair k ∈ σ is assigned an element of some concrete string k ∈ σ is assigned an element of some concrete vector k ∈ σ is assigned 62 · A.8 Concrete call-graph terminology—I Jeffrey Mark Siskind Called(k) ê  k ê t k ê t k ê st k ê st k k1  k2 k1  t k2 k1  t k2 k1 st k2 k1 st k2 the concrete procedure k is called the call invocation ê directly calls the concrete procedure k the call invocation ê directly tail calls the concrete procedure k the call invocation ê directly non-tail calls the concrete procedure k the call invocation ê directly self-tail calls the concrete procedure k the call invocation ê directly non-self-tail calls the concrete procedure k the concrete procedure k1 directly calls the concrete procedure k2 the concrete procedure k1 directly tail calls the concrete procedure k2 the concrete procedure k1 directly non-tail calls the concrete procedure k2 the concrete procedure k1 directly self-tail calls the concrete procedure k2 the concrete procedure k1 directly non-self-tail calls the concrete procedure k2 Flow-Directed Lightweight Closure Conversion A.9 · Concrete call-graph terminology—II + k ê  + ê  t k + ê  t k + ê  st k + ê  st k + k k1  2 + k1  t k2 + k1  t k2 + k1  st k2 + k1  st k2 ∗ k1  k2 ∗ k1  t k2 ∗ k1  t k2 ∗ k1  st k2 ∗ k1  st k2 the call invocation ê properly calls the concrete procedure k the call invocation ê properly tail calls the concrete procedure k the call invocation ê properly non-tail calls the concrete procedure k the call invocation ê properly self-tail calls the concrete procedure k the call invocation ê properly non-self-tail calls the concrete procedure k the concrete procedure k1 properly calls the concrete procedure k2 the concrete procedure k1 properly tail calls the concrete procedure k2 the concrete procedure k1 properly non-tail calls the concrete procedure k2 the concrete procedure k1 properly self-tail calls the concrete procedure k2 the concrete procedure k1 properly non-self-tail calls the concrete procedure k2 the concrete procedure k1 calls the concrete procedure k2 the concrete procedure k1 tail calls the concrete procedure k2 the concrete procedure k1 non-tail calls the concrete procedure k2 the concrete procedure k1 self-tail calls the concrete procedure k2 the concrete procedure k1 non-self-tail calls the concrete procedure k2 63 64 A.10 · Jeffrey Mark Siskind Abstract call-graph terminology—I Called(p) ep e t p e t p e st p e st p p 1  p2 p1  t p2 p1  t p2 p1 st p2 p1 st p2 some some some some some some some some some some some some some some some some some some some some some concrete procedure k ∈ p is called invocation e of the call e directly calls concrete procedure k ∈ p invocation e of the call e directly tail calls concrete procedure k ∈ p invocation e of the call e directly non-tail calls concrete procedure k ∈ p invocation e of the call e directly self-tail calls concrete procedure k ∈ p invocation e of the call e directly non-self-tail calls concrete procedure k ∈ p concrete procedure k1 ∈ p1 directly calls concrete procedure k2 ∈ p2 concrete procedure k1 ∈ p1 directly tail calls concrete procedure k2 ∈ p2 concrete procedure k1 ∈ p1 directly non-tail calls concrete procedure k2 ∈ p2 concrete procedure k1 ∈ p1 directly self-tail calls concrete procedure k2 ∈ p2 concrete procedure k1 ∈ p1 directly non-self-tail calls concrete procedure k2 ∈ p2 Flow-Directed Lightweight Closure Conversion A.11 · Abstract call-graph terminology—II + p e + e t p + e t p + e st p + e st p + p p1  2 + p1  t p2 + p1  t p2 + p1  st p2 + p1  st p2 ∗ p1  p2 ∗ p1  t p2 ∗ p1  t p2 ∗ p1  st p2 ∗ p1  st p2 some invocation e of the call e properly calls some concrete procedure k ∈ p some invocation e of the call e properly tail calls some concrete procedure k ∈ p some invocation e of the call e properly non-tail calls some concrete procedure k ∈ p some invocation e of the call e properly self-tail calls some concrete procedure k ∈ p some invocation e of the call e properly non-self-tail calls some concrete procedure k ∈ p some concrete procedure k1 ∈ p1 properly calls some concrete procedure k2 ∈ p2 some concrete procedure k1 ∈ p1 properly tail calls some concrete procedure k2 ∈ p2 some concrete procedure k1 ∈ p1 properly non-tail calls some concrete procedure k2 ∈ p2 some concrete procedure k1 ∈ p1 properly self-tail calls some concrete procedure k2 ∈ p2 some concrete procedure k1 ∈ p1 properly non-self-tail calls some concrete procedure k2 ∈ p2 some concrete procedure k1 ∈ p1 calls some concrete procedure k2 ∈ p2 some concrete procedure k1 ∈ p1 tail calls some concrete procedure k2 ∈ p2 some concrete procedure k1 ∈ p1 non-tail calls some concrete procedure k2 ∈ p2 some concrete procedure k1 ∈ p1 self-tail calls some concrete procedure k2 ∈ p2 some concrete procedure k1 ∈ p1 non-self-tail calls some concrete procedure k2 ∈ p2 65 · 66 A.12 Jeffrey Mark Siskind Called-more-than-once, points-to, escape, in-lining, and reentrancy terminology CalledMoreThanOnce(k) CalledMoreThanOnce(p) FreeIn(x, p) k;l l;k k1 ∗; k2 k ∗; l l ∗; k l1 ∗; l2 σ;β β;σ σ1 ∗; σ2 σ ∗; β β ∗; σ β1 ∗; β2 k ⇑ ê σ⇑e σ⇑p UniqueCallSite(e, p) p1 ,→ p2 p1 ,→∗ p2 Reentrant(k) Reentrant(p) the concrete procedure k is called more than once some concrete procedure k ∈ p is called more than once x is free in p k directly points to l l directly points to k k1 points to k2 k points to l l points to k l1 points to l2 some k ∈ σ directly points to some l ∈ β some l ∈ β directly points to some k ∈ σ some k1 ∈ σ1 points to some k2 ∈ σ2 some k ∈ σ points to some l ∈ β some l ∈ β points to some k ∈ σ some l1 ∈ β1 points to some l2 ∈ β2 k escapes ê some k ∈ σ escapes some invocation ê of e in some execution of the program σ escapes Body(p) the call e is the unique call site of p p1 is directly in-lined in p2 p1 is in-lined in p2 the concrete procedure k is reentrant some concrete procedure k ∈ p is reentrant in some execution of the program Flow-Directed Lightweight Closure Conversion A.13 · 67 Internal lightweight closure-conversion terminology MustAlias(x) MustAlias(σ) MustAlias(p) Fictitious(l) Fictitious(β) NonTrivialReference(ê) NonTrivialReference(e) Localizable(x) Globalizable(x) Ancestor(p1 , p2 ) Hideable(x) every reached access invocation ê to x accesses the instance x̂ bound by the most recent active invocation of some concrete procedure k ∈ p(x) for every successful call invocation ê to some concrete continuation k ∈ σ, k was created by the most recent active invocation of e(σ) either p has no parent parameter or for every successful call invocation ê to some concrete procedure k ∈ p, the most active invocation of ParentParameter(p) when k is called is the same as the most active invocation of ParentParameter(p) when k was created l always contains the same concrete object every l ∈ β is fictitious in every execution of the program an access invocation e is reached and returns an object that is not known at compile time or an assignment invocation e is executed, x(e) is not hidden, and there is a subsequent nontrivial access to x̂(ê) some invocation ê of the reference e is nontrivial in some execution of the program x must alias and all nontrivial references to x are in-lined in the procedure that binds x x can have at most one live instance some reached expression in p2 accesses a pointer to the closure for p1 the abstract location of x contains a single abstract procedure p, for all nontrivial accesses e to x, p(e) is nested in every ancestor of p, and p must alias · 68 A.14 Jeffrey Mark Siskind Output lightweight closure-conversion terminology Local(x) Global(x) Hidden(x) Slotted(x) HasClosure(p) HasParentSlot(p) ParentSlot(p) HasParentParameter(p) ParentParameter(p) x will be allocated as a local variable x will be allocated as a global variable x will be allocated as a hidden closure slot x will be allocated as a closure slot p will have a closure and closure pointer p will have a parent slot The parent slot for p will point to the closure for ParentSlot(p) p will have a parent parameter and closurepointer slot The parent parameter and closure-pointer slot for p will point to the closure for ParentParameter(p)