Maximal Laziness An Efficient Interpretation Technique for Purely Functional DSLs Eelco Dolstra Utrecht University eelco@cs.uu.nl Abstract In lazy functional languages, any variable is evaluated at most once. This paper proposes the notion of maximal laziness, in which syntactically equal terms are evaluated at most once: if two terms e1 and e2 arising during the evaluation of a program have the same abstract syntax representation, then only one will be evaluated, while the other will reuse the former’s evaluation result. Maximal laziness can be implemented easily in interpreters for purely functional languages based on term rewriting systems that have the property of maximal sharing — if two terms are equal, they have the same address. It makes it easier to write interpreters, as techniques such as closure updating, which would otherwise be required for efficiency, are not needed. Instead, a straight-forward translation of call-by-name semantic rules yields a call-by-need interpreter, reducing the gap between the language specification and its implementation. Moreover, maximal laziness obviates the need for optimisations such as memoisation and let-floating. 1 Introduction In lazy functional languages such as Haskell [18], the value of a variable binding is computed only when it is needed, and then only once. For instance, in the Haskell function f x y = if x == 0 then y else z + z where z = product [1..x] the function argument y is only computed when x = 0, and the local variable z only and only once when x 6= 0. Laziness is a useful property because it allows the programmer to abstract over the ordering of computations, and enables the construction of infinite data structures and the definition of control structures within the language [12]. This paper proposes the notion of maximal laziness, in which any set of “equal” terms is evaluated at most once during the execution of a program. For instance, in a function such as f n = fac n + fac n where fac n = product [1..n] Preprint submitted to Electronic Notes in Theoretical Computer Science 16 February 2008 under a maximal laziness regime, the expression fac n will be computed only once, while the second occurrence of the expression will reuse the result from the first. To be precise, if two terms e1 and e2 arising during the evaluation of a program have the same abstract syntax representation, then only one will be evaluated, while the other will reuse the former’s evaluation result. Hence, it’s a rather stronger property than static common subexpression elimination. For instance, in the program f n = fac n + fac 10 where fac n = product [1..n] if f is called with argument 10, fac 10 will be computed only once. Maximal laziness is an expensive property to implement in a general purpose, compiled language. Indeed, in compiled code there is generally no notion of the abstract syntax tree of a value — certainly not one that relates in a meaningful way to the abstract syntax of the language. However, for domain specific languages (DSLs), one typically does not want to implement a full compiler but rather an interpreter that performs sufficiently well without too much implementation effort. As a motivating example of such a DSL, this paper uses the Nix expression language (described in Section 2), a purely functional language used by the Nix software deployment system [7,5] to specify how to build and compose software packages. In interpreters for functional languages based on term rewriting, maximal laziness is much easier to achieve. In a term rewriting approach, the abstract syntax term representing the program is rewritten according to the semantic rules of the language until a normal form — the evaluation result — is reached. In fact, maximal laziness comes naturally when one implements the interpreter in a term rewriting system that has the property of maximal sharing, such as ASF+SDF [23] or the Stratego/XT program transformation system [24], both of which rely on the ATerm library [20] to implement maximal sharing of terms. In such systems, if two terms are syntactically equal, then they occupy the same location in memory — i.e., any term is stored only once (a technique known as hash-consing in Lisp). This makes it easy and cheap to add a simple memoisation to the term rewriting code to map abstract syntax trees to their normal forms, thus “caching” evaluation results and achieving maximal laziness. Maximal laziness has a number of advantages: • The implementation of the language becomes simpler and stays closer to the specification of the semantics of the language. The semantics of a purely functional language can be specified conveniently as a set of rewrite rules over the terms of the language, e.g., β-reduction to execute function calls: (λx.e1 )e2 7→ e1 [x e2 ]. However, direct implementation of such “call-by-name” semantics generally gives extremely poor performance, as one gets a lot of work duplication: for instance, in the β-reduction rule, the computation of e2 will be duplicated for every occurrence of x in e1 . To prevent this, an entirely different, call-by-need style of implementation is required. We cannot simply substitute variables; rather, they must be represented explicitly and updated when they have been computed (see, e.g., [6] for an attempt to do so in a rewriting formalism). But with maximal laziness, the naive implementation of the semantics has the required “updating” 2 behaviour. For instance, in the case of β-reduction, multiple occurrences of e2 will be evaluated at most once due to the memoisation of evaluation results. This is shown in detail in Section 3 for the case of the interpreter for the Nix language. • Maximal laziness can give a nice performance improvement over a “traditional” implementation of sharing (Section 5), reducing the number of rewrite steps by 40% to as much as 280% for typical, large Nix expressions. So maximal laziness gives faster but simpler interpreters. Also, several years of experience with maximal laziness in the Nix expression evaluator has shown that memory use scales well despite the naive evaluation result caching that never discards any result. • Maximal laziness obviates the need for optimisations such as let-floating [19] (also known as the full laziness transformation) and function memoisation in many cases (Section 4). However, function memoisation is tricky in the case of non-strict languages. 2 Motivating example: Nix expressions This section introduces a motivating example: a purely functional, domain specific language for which we want to implement an efficient interpreter without much effort. The DSL in question is the Nix expression language, which is used in the Nix deployment system [7,5] to specify how to build and compose software packages. A purely functional language is a good fit to the problem of specifying the building of software packages, because packages often need to be built in different versions or variants. For instance, different packages in the system may need to be built with different versions of the C compiler; and packages often have a great deal of variability in the functionality that can be compiled into them, such as whether to build Mozilla Firefox with support for Scalable Vector Graphics, which requires additional dependencies. Thus, it makes sense to describe packages as functions of their variability in terms of dependencies and optional features, so that such functions can be called any number of times with different arguments to create the desired instances of a package. The evaluation of a Nix expression yields a graph of build actions that must be performed to build a specific instance of a package with all its dependencies. As an example, Figure 1 show a Nix expression that builds the GNU Hello package. First, it specifies at point 1 a function named helloFun that builds the Hello package, given values that describe the dependencies required by that package, such as perl. Functions have the syntax arg: body. Thus, the body of helloFun is the call to the function stdenv.mkDerivation (at 2 ). The helper function stdenv.mkDerivation returns a special value called a derivation, which is simply the build graph for this particular instance of the Hello package. The arguments to mkDerivation specify the various inputs to the build, such as the package’s name, its source code (src), and its dependencies (buildInputs). The source code is obtained by calling the function fetchurl, which specifies a derivation that downloads a file from the network. The evaluation of a Nix expression, at top-level, must yield one or more of these derivations, which are then used to perform the 3 helloFun = {stdenv, fetchurl, perl}: 1 stdenv.mkDerivation { 2 name = "hello-2.1.1"; src = fetchurl { url = mirror://gnu/hello/hello-2.1.1.tar.gz; md5 = "70c9ccf9fac07f762c24f2df2290784d"; }; buildInputs = [perl]; }; hello = helloFun { 3 inherit fetchurl stdenv perl; }; stdenv = ...; perl = ...; fetchurl = ...; Fig. 1. Nix expression for GNU Hello required build actions imperatively. Thus, the declarative, purely functional Nix expression language is used to specify a set of imperative build steps. Since helloFun is a function, to actually build an instance of the Hello package, we must call it. This is done at point 3 , and the resulting build graph is bound to the variable hello. The function is called with a set of arguments inherited from the surrounding lexical scope using the inherit keyword. (inherit x is merely syntactic sugar for an argument specification x = x;, i.e. the argument x is the expression x, where the latter x refers to the variable x in the surrounding scope.) Of course, the function can be called any number of times. For instance, if we had a value perl6 representing a different version of Perl, we could build Hello with it: helloWithPerl6 = helloFun {inherit fetchurl stdenv; perl = perl6;}. This is a lazy language: expressions are only evaluated, and the build graphs they represent only built, when they are actually needed. It is not the purpose of this paper to give a full treatment of Nix or its expression language. (These can be found in [5] and in the Nix manual at http://nix.cs. uu.nl/.) Instead, the remainder of this section shows a part of the syntax and semantics of Nix expressions to illustrate, in Section 3, how such a semantics can be turned into an efficient interpreter using term rewriting and maximal laziness. 2.1 Syntax The Nix expression language has several data types, such as strings, Booleans (with values true and false), lists (between square brackets), and attribute sets (between curly braces). The most important data type in the language is the attribute set, which is a set of name/value pairs, e.g., { x = ”foo”; y = 123; }. Attribute names are identifiers, and attribute values are arbitrary expressions. The order of attributes is irrelevant, but any attribute name can occur only once in a set. Attributes can be selected using the . operator. E.g., { x = ”foo”; y = 123; }.y evaluates to 123. Recursive attribute sets allow attribute values to refer to each other. They are constructed using the rec keyword. Formally, each attribute in the set is added to the scope of the entire attribute set. Hence, rec { x = y; y = 123; }.x evaluates to 123. If rec were omitted, the identifier y in the definition of the attribute x would refer to some y bound in the surrounding scope. Recursive attribute sets introduce the 4 possibility of recursion, including non-termination, e.g. rec { x = x; }.x. Recursion is used in the Nix language for many purposes, such as defining packages that are an input to themselves, e.g., the bootstrap process of compilers. As we saw above, when defining an attribute set, attribute values can be inherited from the surrounding lexical scope or from other attribute sets. The expression x: { inherit x; y = 123; } defines a function that returns an attribute set with two attributes: x which is inherited from the function argument named x, and y which is declared normally. As the inherit construct is just syntactic sugar, the previous expression could also have been written as x: { x = x; y = 123; }. Note that the right-hand side of the attribute x = x refers to the function argument x, not to the attribute x. Thus, x = x is not a recursive definition. The language has two types of functions. The first takes a single argument and has the form x : e. For instance, the (anonymous) identity function can be defined as x: x. Of course, this style of function is just a plain λ-abstraction from the λ-calculus. Though this style only allows functions with a single argument, since this is a functional language we can still define (in a sense) functions with multiple arguments, e.g., x: y: x + y, which is a function taking an argument x that returns another function that accepts an argument y. The second style of function definition, and the one used in Figure 1, is more important in this language. It takes an attribute set and binds the attributes defined therein to local variables. Thus, {x, y}: x + y declares a function that accepts an attribute set with attributes x and y (and nothing else), and the expression ({x, y}: x + y) {y = ”bar”; x = ”foo”;} yields ”foobar”. 2.2 Semantics The operational semantics of the language is specified using semantic rules of the form e1 7→ e2 that transform expression e1 into e2 . Rules may only be applied to closed terms, i.e., terms that have no free variables. Thus it is not allowed to arbitrarily apply rules to subterms. ∗ An expression e1 is said to evaluate to e2 , notation e1 7→ e2 , if there exists a sequence of zero or more applications of semantic rules to e1 that transform it into e2 such that no rule is applicable to e2 . Thus e2 is the normal form of e1 . Since rules are only allowed to be applied to an expression at top level (i.e., not to subexpressions), a normal form corresponds to the notion of a weak head normal form (WHNF) [16, Section 11.3.1]. Weak head normal form differs from the notion of head normal form in that right-hand sides of functions need not be normalised. A nice property of this style of evaluation is that there can be no name capture [3], which simplifies the evaluation machinery. Not all expressions have a normal form. For instance, the expression (rec {x = x;}).x does not terminate. But if evaluation does terminate, there must be a single normal form. This confluence property [2] follows from the fact that at most one rule applies to any expression. The semantic rules are stated below in the following general form Rule : condition 0 e7→e0 . That is, we can conclude that e evaluates to e if the proposition condition holds. If there are no conditions, the rule is simply written as Rule : e 7→ e0 . 5 As an example of a simple rewrite rule, consider conditionals, if e1 then e2 else e3 . Conditional expressions first evaluate the condition expression. It must evaluate to a Boolean. (Evaluation fails if it is not, but for simplicity I will not consider type errors here.) The conditional then evaluates to one of its alternatives. ∗ IfThen : ∗ e1 7→ true if e1 then e2 else e3 → 7 e2 IfElse : e1 7→ false if e1 then e2 else e3 → 7 e3 The Select rule implements attribute selection. This rule governs successful selection, i.e., it applies only if the given attribute name exists in the attribute set. ∗ e 7→ {as} ∧ hn = e0 i ∈ as Select : e.n 7→ e0 Here as are the elements of an attribute set, and hn = ei ∈ as denotes that the attribute set as has an attribute named n with value e. Note that there is no rule for failure. If attribute n is not in as, evaluation fails and a nice error message is printed in the actual implementation. For the remaining rules below, we need a notion of substitution of expressions for variables in other expressions. The substitution function subst(subs, e) (not shown here) performs a set of substitutions subs in the expression e. The set subs consists of substitutions of the form x e that replace a variable x with an expression e. subst replaces all free variables for which there is a substitution. A variable is free in a subexpression if it is not bound by any of its enclosing expressions. Variables are bound in functions and in recursive attribute sets. In recursive attribute sets, only the recursive attributes (as1 ) bind variables; the non-recursive attributes (as2 ) do not. It is assumed that the expressions in subs contain no free variables, so subst does not have to perform renaming to prevent name capture. A recursive attribute set is desugared to a normal attribute set by replacing all occurrences of references to the attributes with the recursive attribute set. For instance, if e = rec {x = f x y; y = x;}, then e is desugared to {x = f (e.x) (e.y); y = e.x;}, or in full, {x = f ((rec {x = f x y; y = x;}).x) ((rec {x = f x y; y = x;}).y); y = (rec {x = f x y; y = x;}).x;}. This desugaring is implemented by the Rec rule: Rec : rec {as} 7→ {subst(subs, {as})} where subs = {n (rec {as}).n | n ∈ names(as)} and names(as) is the set of attribute names occurring in the left hand side of a set of attributes as. As we shall see in Section 3, due to maximal sharing, this substitution does not lead to a potential explosion in the size of expressions. Function calls to single-argument functions (i.e., lambdas) are just plain βreduction in the λ-calculus [3]. ∗ e1 7→ x: e3 β-Reduce : e1 e2 → 7 subst({x e2 }, e3 ) (As argued in Section 3, expression e2 can contain no free variables. Therefore, there is no danger of name capture in subst.) Calls to multi-argument functions, 6 Expr eval(Expr e) { Expr e1, e2, e3; if (matchIf(e, e1, e2, e3) && evalBool(e1)) return eval(e2); ATerm x; if (matchCall(e, e1, e2) && matchFunction1(eval(e1), x, e3)) { ATermMap subs; subs.set(x, e2); return eval(subst(subs), e3); } . . . more rules . . . } Expr eval(Expr e) : if cache[e] 6=  : return cache[e] else : e0 ← realEval(e) cache[e] ← e0 return e0 Fig. 2. Implementation of some of the semantic rules in C++ Fig. 3. Evaluation caching (pseudo-code) i.e., functions that match an attribute set, are a bit more complicated: ∗ ∗ 7 {fs}: e3 ∧ e2 7→ {as} ∧ names(as) = fs e1 → e1 e2 7→ subst({n e | hn = ei ∈ as}, e3 ) (fs (for “formals”) is the set of names of arguments of a multi-argument function.) Note that a multi-argument function call is strict in its argument—the attribute set—but not in the values of the attributes. β-Reduce’ : 3 Implementation Using term rewriting, it is straight-forward to turn the semantic rules from the previous section into a concrete interpreter for the language. However, without maximal laziness, such an interpreter would not perform well. This section shows how we can obtain an efficient interpreter from a straight-forward translation of the semantic rules using term rewriting and maximal laziness. 3.1 Evaluation through rewriting A typical way to derive an interpreter from rewrite rules is to select some abstract syntax representation for terms, and then to translate the rewrite rules into whatever meta-language the interpreter is implemented in. The Nix expression evaluator uses ATerms (for Annotated Terms) [20] to represent terms. The ATerm library is a C library that allows the efficient creation and manipulation of term data structures in C. An ATerm t is a the application of an n-ary constructor to n subterms, denoted C(t1 , . . . , tn ); or a list of n terms [t1 , . . . tn ]; or some terminal term such as an integer or string (actually a nullary constructor). For example, the expression (x: x) 123 is represented as the ATerm Call(Function1(”x”,Var(”x”)),Int(123)). Both the ASF+SDF Meta-Environment [23] and Stratego/XT [24] program transformation systems use ATerms for representing abstract syntax trees, and can be used to manipulate them conveniently. For example, in Stratego/XT the IfThen rule could be implemented as eval: If(e1, e2, e3) -> e2 where e1 => Bool(True) which is an almost literal translation of that rule. However, the Nix expression evaluator is written in C++, thus the translation of the rules is a bit more verbose. Figure 2 shows the outline of the function eval that implements the Nix expression 7 evaluator, with the code corresponding to the IfThen and β-Reduce rules. It takes a pointer e to the ATerm representing the term to be evaluated, and returns a pointer to the ATerm representing the resulting normal form. Helper functions such as matchIf are used to recognise and build ATerms. The elided helper function evalBool(e) calls eval(e) and returns true if the resulting term is Bool(True). The evaluator in Figure 2 is extremely slow. This is a result of a lack of sharing in the evaluation of variables. For instance, the code for the β-Reduce rule simply replaces every occurrence of Var(x) in the body of the function with the term representing the argument value. Thus, if x occurs n times in the body of the function, it is possible for e2 (the argument) to be evaluated n times. Indeed, if x is passed as an argument to other functions, it may be duplicated even further, quickly leading to an exponential running time. The typical solution to this explosion is to arrange for sharing of variable evaluation. For instance, β-Reduce could be defined as follows: ∗ β-Reduce : e1 7→ x: e3 e1 e2 7→ let x = e2 in e3 where we give let a special “destructive update” semantics so that the evaluation result of x is written back into the right-hand side of the let-binding. Of course, to give a semantics to let, we need to maintain some kind of environment, which makes the semantic rules rather more complicated [6]. At runtime, there is the same problem: in an interpreter, we need to keep an environment of the bound variables that are in scope (which is much more work than the simple call to subst in Figure 2), while in compiled code, x would be a pointer that points to a piece of memory containing code and environment pointers (the closure or thunk [17]), which after evaluation is overwritten with the actual result. 3.2 Maximal sharing with ATerms A very nice property of the ATerm library, which will be critical in solving the performance problems described above, is its maximal sharing: if two terms are syntactically equal, then they occupy the same location in memory. This means that a shallow pointer equality test is sufficient to perform a deep syntactic equality test. Maximal sharing is implemented through a hash table. Whenever a new term is created through the ATerm API (using functions such as ATmakeAppl), the term to be created is looked up in the hash table. If the term already exists, the address of the term obtained from the hash table is returned. Otherwise, the term is allocated, initialised, added to the hash table, and returned. A garbage collector takes care of freeing terms that are no longer referenced. Maximal sharing makes term creation slower, due to the hash table check. However, this is offset by the fact that memory use is reduced and the overhead of allocating a term that already exists is removed. More importantly, testing for equality between terms is very cheap, namely a pointer equality test. This makes the implementation of operations such as substitutions and memoisation very cheap. Empirical results on the efficiency of maximal sharing of ATerms are given in [20,21]. 8 3.3 Maximal laziness Maximal sharing is extremely useful in the implementation of a Nix expression interpreter since it allows easy caching of evaluation results, which speeds up expression evaluation by removing unnecessary evaluation of identical terms. The interpreter maintains a hash lookup table cache : ATerm → ATerm that maps ATerms representing Nix expressions to their normal forms. Figure 3 shows pseudo-code for the caching evaluation function eval, which “wraps” the real eval function from Figure 2 (now renamed to realEval) in a memoisation layer. It is assumed that realEval calls ∗ back into eval to evaluate subterms (i.e., every time a rule uses the relation 7→ in a condition). Thus we obtain the desired caching. The special value  denotes that no mapping exists in the cache for the expression. Note that thanks to maximal sharing, the lookup cache[e] is very cheap: it is a lookup of a pointer in a hash table. Since any syntactically equal term is now evaluated at most once, the interpreter in Figure 3 is maximally lazy. So does this solve the performance problem with the “naive” implementation of rules such as β-Reduce? Intuitively this seems to be the case, because multiple occurrences of x will be replaced by the same argument term e2 , and due to the memoisation in Figure 3, repeated encounters of e2 will reuse the normal form of e2 computed on the first encounter. However, there is a catch: what if later substitutions in subexpressions of the body of the function cause the copies of e2 to change in different ways? Consider the function call (x: (rec {y = ”foo”; z = x;}.z) + (rec {y = ”bar”; z = x;}.z)) y, which would reduce to (rec {y = ”foo”; z = y;}.z) + (rec {y = ”bar”; z = y;}.z). Here, the two occurrences of the variable x in the original expression evaluate to different results (”foo” and ”bar” respectively). This is of course the result of unhygienic substitution: the free variable y in the argument becomes bound after β-reduction. However, it is easy to see that this situation can never occur because all top-level terms are always closed. (This fact is proven in [5, Section 4.4] and follows from the observation that all rules produce closed terms when applied to closed terms.) Thus, a straight-forward, substitution-based reduction scheme such as the naive implementation of β-Reduce has at least as much sharing as a more difficult implementation based on closure updating. More importantly, this property comes at almost no additional cost, as Section 5 shows. 4 Optimisations 4.1 Optimising substitution While the memoisation of term evaluation prevents unnecessary recomputation, there is still a problem with substitution-based semantic rules such as β-Reduce. Consider the expression (x : y : e1 ) e2 e3 , where e2 is a large expression. With normal substitution, we first replace all occurrences of x in e1 with e2 . Then, we replace all occurrences of y in the resulting term with e3 . This substitution also descends into the e2 replacements of x, even though those subterms are closed. Since e2 is large, this is inefficient. A naive implementation of subst that recurses over the 9 Expr eval(Expr e) : if cache[e] 6=  : if cache[e] = blackhole : Abort. return cache[e] else : cache[e] ← blackhole e0 ← realEval(e) cache[e] ← e0 return e0 (rec {f = x: f x;}).f 10 (Rec) 7→ {f = x: (rec {f = x: f x;}).f x;}.f 10 (Select) 7→ (x: (rec {f = x: f x;}).f x) 10 (β-Reduce) 7→ (rec {f = x: f x;}).f 10 Fig. 4. Evaluation caching with blackholing Fig. 5. Detecting infinite recursion structure of the term, may thus perform a lot of redundant work by substituting repeatedly in syntactically equal subterms. It is important to recognise that under maximal sharing, a term should be treated as a graph rather than a tree. Thus, one optimisation for subst is to memoise it (taking into account the fact that when substitutions are removed from the mapping subs in some of the recursive cases, a new memoisation table must be used for the recursive call). There is, however, a much simpler and efficient solution that uses the fact that all substitution terms are closed. The optimisation is that we can mark replacement terms to indicate to the substitution function that it need not descend into such subterms. Since substitution terms are always closed, we can adapt substitution function subst as follows:  closed(e)) ∈ subs  closed(e) if (x subst(subs, x) = closed(e) if (x e) ∈ subs   x otherwise That is, replacement terms e are placed inside a wrapper closed(e). (The first case merely prevents repeated wrapping in closed nodes, e.g., closed(closed(e)), which reduces the effectiveness of caching.) The wrapper denotes that e is a closed subterm under which no substitution is necessary, since it has no free variables. To actually make use of this optimisation, we also add a case to subst to stop at closed terms, namely subst(subs, closed(e)) = closed(e). Of course, during evaluation we must get rid of closed eventually. That’s easily implemented through a rule Closed: closed(e) 7→ e, as a closed term is semantically equivalent to the term that it wraps. Since reduction only takes place at top-level, the closed wrapper is only discarded when the term actually needs to be evaluated. 4.2 Blackholing Figure 4 shows a simple modification of the eval function in Figure 3 that, in addition to maximal laziness, implements a trick known as blackholing [17] that allows detection of certain simple kinds of infinite recursion. When we evaluate an expression e, we store in the cache a preliminary “fake” normal form blackhole. If, during the evaluation of e, we need to evaluate e again, the cache will contain blackhole as the normal form for e. Due to the determinism and purity of the language, this necessarily indicates an infinite loop, since if we start evaluating e again, we will eventually encounter it another time, and so on. Note that blackholing as implemented here differs from conventional blackholing, which overwrites a value being evaluated with a black hole. This allows discovery 10 of self-referential values, e.g., x = ... x ...;. But it does not detect infinite recursions like in the expression (rec {f = x: f x;}).f 10, since every recursive call to f creates a new value of x, and so blackholing will not catch the infinite recursion. In contrast, our blackholing does detect it, since it is keyed on maximally shared ATerms that represent syntactically equal expressions. Figure 5 shows the evaluation of this expression. Note that the final expression is equal to the first (which is blackholed at this time), and so an infinite recursion is signalled. 4.3 Optimisations Because of maximal laziness, we get some optimisations that are conventionally applied to purely functional languages for free. For instance, the full laziness transformation [19] makes code more efficient by moving subexpressions outward as far as possible, e.g. let {f x = let {y = fac 100} in x + y} in f 1 + f 2, which computes fac 100 twice, can be transformed into let {y = fac 100; f x = x + y} in f 1 + f 2, which computes it only once in a conventional lazy implementation. With maximal laziness, this transformation is unnecessary: repeated occurrences of the same subexpression across multiple calls to a function will be computed only once. Another, usually explicit optimisation in purely functional programs is to memoise specific functions [4]. Intuitively, one would expect that memoisation of the language evaluation function (Figure 3) also memoises functions in the language. This is not necessarily the case, however, in the presence of non-strict arguments. For instance, consider the Fibonacci function: fib = n: if n == 0 then 0 else if n == 1 then 1 else fib (n-1) + fib (n-2); Without memoisation, this function is very inefficient. But maximal laziness won’t memoise it for us in a non-strict language. This is because the arguments won’t be evaluated terms 1, 2, etc., but unevaluated expressions such as ((9-1)-1)-1. Some memoisation will occur, but not enough to make the function run in O(n) time. Note that the function fib is in fact strict in its argument; the conditional if n == 0 ... forces evaluation of the argument. But by the time we evaluate the argument, we are already in the evaluation of the function, and it’s too late. The Nix expression evaluator implements a technique called function shortcircuiting that cuts off evaluation of a function when the normal form of its argument becomes known and the function has been called before with an argument with the same normal form. It does so by keeping track of which function calls are currently being evaluated. In eval, after realEval returns with a normal form e0 for some expression e, we check if we are currently evaluating some function Call(f , e) and cache[Call(f , e0 )] 6= . If so, we unwind the stack to the eval call for Call(f , e) (by throwing an exception), and return cache[Call(f , e0 )]. Also, memoised Calls in cache must be stored with their normalised argument. That is, when eval has ∗ computed that Call(f , e) 7→ e0 , and cache[e] 6=  (i.e., the function has evaluated its argument), it sets cache[Call(f , cache[e])] to e0 . 11 5 Evaluation To see how the variants of maximal laziness perform compared to no sharing and to a conventional implementation based on closure updating, Table 1 shows the execution times of several variants of the Nix expression evaluator on a number of Nix expressions. The execution times are in seconds. The tests were performed on an Linux-based Athlon 64 X2 3800+ with 1 GiB of memory. Entries marked “-” in the table mean that the test did not finish in a reasonable amount of time because of the exponential explosion due to the lack of sharing or in substitutions. Table 2 shows the number of calls to eval for each of the Nix expressions in Table 1 for some of the variants, along with the number of cache hits (the number of times that a call to eval could be satisfied from the cache). The tested Nix expressions are: 1) The function fib (Section 4.3) with n = 25. 2) A variant of fib in which the recursive call reads strict fib (n-1) + strict fib (n-2) where strict is a built-in function that reduces its second argument to normal form before applying the first to it. 3) While fib is a toy problem for which the Nix expression DSL isn’t even intended, the remaining tests are realistic. The third test computes the derivation graph of the gcc attribute in the Nix Packages collection, a set of Nix expressions for over a thousand Unix software packages. 4) The evaluation performed by the Nix command nix-env -qa ’*’ –drv-path –out-path (which shows all packages defined by an expression) applied to the Nix Packages collection, which involves computing the derivation graphs of all packages, and causes the evaluation of 743 source files containing 22191 lines of code. 5) The computation of the derivation graph for the installation CD of NixOS, a Linux distribution based on Nix, which involves the evaluation of 162 source files containing 13503 lines of code. The variants of the Nix expression evaluator (available at https://svn.cs.uu. nl:12443/repos/trace/nix/branches/sharing-hackery/) are as follows. • No sharing: the naive term-rewriting based interpreter from Section 2. Unsurprisingly, it performs very poorly. • Traditional sharing: an implementation that updates variable bindings after they have been evaluated. This is the sharing model in most implementations of functional languages. This implementation is a modification of the existing (maximally lazy) Nix expression evaluator, made for comparison purposes. It therefore does use the ATerm library, and maximal sharing to store terms efficiently. • Maximal laziness: the no sharing variant with memoisation as in Figure 3. Table 2 shows that the number of rewrite steps is substantially smaller than with traditional sharing, and the number of cache hits is substantial. (The number of rewrite steps is the same as for the maximal laziness with both variant in Table 2.) However, term blow-up due to substitutions causes it to perform poorly on large terms: it is outperformed by traditional sharing on the GCC test, and doesn’t finish in a reasonable time frame on the nix-env and NixOS tests. • Maximal laziness with substitution memoisation is the previous variant with memoisation around the subst function (see Section 4.1). It helps performance a bit, 12 No sharing Traditional sharing Maximal laziness Maximal laziness + Maximal laziness + Maximal laziness + Maximal laziness + substitution memoisation closed term optimisation both short-circuiting fib 25 32.576 23.904 15.253 6.558 6.184 0.022 fib 25 (strict) 28.866 33.178 0.018 0.023 0.018 0.022 0.022 GCC 151.633 0.788 2.778 1.025 0.212 0.195 0.197 nix-env 45.304 2.750 2.752 2.820 NixOS 10.752 0.988 0.939 1.181 Table 1 CPU times in seconds for sharing variants No sharing Traditional sharing Maximal laziness + both Maximal laziness + short-circuiting fib 25 Steps Hits n/a 5838K n/a 3820K 1850K 675 292 fib 25 (strict) Steps Hits 6809K n/a 6809K n/a 705 342 682 319 GCC Steps Hits 4421K n/a 6516 n/a 4538 2254 4538 nix-env -qa Steps Hits n/a 561K n/a 368K 225K NixOS Steps Hits n/a 975K n/a 253K 111K 367K 253K 2254 225K 111K Table 2 Rewrite steps and cache hits for sharing variants but not enough to save it. • Maximal laziness with closed term optimisation wraps substituted terms in closed nodes as described in Section 4.1. This very simple change alone makes maximal laziness fast enough: operations such as nix-env -qa now run in a few seconds, which is actually faster than similar operations in other package management tools that do not have package descriptions in a full-fledged programming language. We can thus conclude that the closed term optimisation is essential to make maximal laziness feasible. This is the variant that the production version of Nix uses. • Maximal laziness with both combines substitution memoisation and closed term optimisation. It does not give an appreciable improvement over the latter. • Maximal laziness with short-circuiting adds the short-circuiting technique described in Section 4.3. It does indeed succeed in turning the non-strict fib function automatically into a memoised function that runs in O(n) time. However, it doesn’t do much for Nix expressions in the real world. So what is the cost in terms of memory use of maximal laziness? The nix-env -qa test, which represents the largest computation occuring in practice, takes around 21 MiB, a fairly modest amount of memory on current systems. On the other hand, the atypical non-strict fib 25 test on the interpreter with maximal laziness and the closed term optimisation (the one with 3820K reduction steps) takes around 170 MiB as a result of a lack of identical subterms (which short-circuiting solves). The main lesson of this evaluation is that maximal laziness only works well with the closed term optimisation, which is fortunately trivial to implement. However, cache pruning becomes necessary to control memory consumption when evaluating programs with little sharing, which is not the case for the Nix DSL. 6 Related work Maximal sharing, the technique upon which maximal laziness is implemented, goes back a long way. It is known as hash-consing in Lisp [1,8], where its utility is limited 13 by the impurity of Lisp [10]. Type-safe hash-consing in OCaml that ensures that programmers cannot make unshared terms is discussed in [9], which uses operations on λ-terms (similar to the term rewriting in Section 2) as an example. The Nix expression evaluator is built on top of the ATerm library [20,22], which is used in numerous term rewriting systems such as Stratego/XT [24] and in particular the ASF+SDF Meta-Environment [23], for which the ATerm library was originally developed. The evaluator would certainly have been easier to implement in Stratego than in C++, but this was not done as 1) C++ is a more suitable language for general systems programming, and 2) being a deployment tool, Nix should have as few dependencies as possible to ensure portability and ease of installation. The ASF+SDF compiler relies heavily on maximal sharing for performance; see [21] for a in-depth discussion. The compiler can be instructed to generate memoisation around explicitly specified ASF+SDF functions. The authors note that “memoization may easily become counterproductive if the memoized functions are not called with the same arguments sufficiently often, and finding the right subset of functions to memoize may require considerable experimentation and insight.” In this paper, we have suggested memoising everything, which is not a feasible strategy for general purpose languages, but, as we have seen, may simplify the implementation of DSLs while providing sufficient performance. Unlimp [13] also appears to memoise arbitrary term evaluation in the context of a purely functional language, but does not discuss experience with non-trivial programs. The “purity” of purely functional languages naturally suggests the use of maximal sharing, since, contrary to impure languages, one can unconditionally memoise functions to obtain optimised versions. This relationship is explored in depth in [11], which describes a language that has maximal sharing as a part of its runtime system to ensure that all data is maximally shared, and shows that maximal sharing is not an expensive feature (a fact also borne out by the experiences with ASF+SDF and Stratego/XT). It also discusses the sharing of computations (as opposed to data) through memoisation using maximal sharing — precisely the subject of this paper. However, as in [4], memoisation is not automatically applied to all computations. There is a great deal of theoretical work on optimal reduction strategies for the λ-calculus (in particular Lamping’s work [14]; an overview is given in [15]). The main restriction in the Nix evaluator, compared to optimal reduction, is that it only shares closed terms (as these are the only terms that eval ever sees). Thus, in an expression such as (f : f 1 + f 2)(x : e), the expression e is duplicated, with only subterms of e not containing x being shared between the calls. On the other hand, the evaluator does optimise evaluation for terms that are not initially shared but become syntactically equal after a number of reduction steps. 7 Conclusion This paper has given a practical demonstration of the use of maximal sharing as an implementation technique for interpreters. It shows that maximal sharing is efficient enough to allow the evaluation of a practical purely functional DSL to be 14 completely memoised, giving rise to the highly useful property of maximal laziness. Acknowledgements. This paper builds heavily upon the ATerm library and its property of maximal sharing, developed by Mark van den Brand, Hayco de Jong, Paul Klint and Pieter Olivier. I would also like to thank the contributors to the Nix package manager, in particular Martin Bravenboer and Eelco Visser, as well as the anonymous reviewers for their comments. References [1] Allen, J., “Anatomy of LISP,” McGraw-Hill, Inc., New York, NY, USA, 1978. [2] Baader, F. and T. Nipkow, “Term rewriting and all that,” Cambridge University Press, 1998. [3] Barendregt, H., “The Lambda Calculus: Its Syntax and Semantics,” Studies in Logic and the Foundations of Mathematics II, Elsevier Science Publishers, Amsterdam, 1984, second edition. [4] Cook, B. and J. Launchbury, Disposable memo functions (extended abstract), in: ICFP ’97: Proc. ACM SIGPLAN Intl. Conf. on Functional programming (1997), p. 310. [5] Dolstra, E., “The Purely Functional Software Deployment Model,” Ph.D. thesis, Faculty of Science, Utrecht University, The Netherlands (2006). [6] Dolstra, E. and E. Visser, Building interpreters with rewriting strategies, in: M. van den Brand and R. Lämmel, editors, Workshop on Language Descriptions, Tools and Applications (LDTA’02), Electronic Notes in Theoretical Computer Science 65/3 (2002), pp. 57–76. [7] Dolstra, E., E. Visser and M. de Jonge, Imposing a memory management discipline on software deployment, in: Proc. 26th Intl. Conf. on Software Engineering (ICSE 2004) (2004), pp. 583–592. [8] Ershov, A. P., On programming of arithmetic operations, Commun. ACM 1 (1958), pp. 3–6. [9] Filliâtre, J.-C. and S. Conchon, Type-safe modular hash-consing, in: ML ’06: Proceedings of the 2006 workshop on ML (2006), pp. 12–19. [10] Goto, E., Monocopy and associative algorithms in an extended Lisp, Technical Report TR-74-03, University of Toyko (1974). [11] Goubault, J., Implementing functional languages with fast equality, sets and maps: an exercise in hash consing, in: Journes Francophones des Langages Applicatifs (JFLA’93), 1993, pp. 222–238. [12] Hudak, P., Conception, evolution, and application of functional programming languages, ACM Computing Surveys 21 (1989), pp. 359–411. [13] Kahrs, S., Unlimp — Uniqueness as a Leitmotiv for implementation, in: M. Bruynooghe and M. Wirsing, editors, Proc. of the 4th Intl. Symposium on Programming Language Implementation and Logic Programming PLILP ’92 (1992), pp. 115–129. [14] Lamping, J., An algorithm for optimal lambda calculus reduction, in: POPL ’90: Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages (1990), pp. 16–30. [15] Lawall, J. L. and H. G. Mairson, Optimality and inefficiency: what isn’t a cost model of the lambda calculus?, SIGPLAN Notices 31 (1996), pp. 92–101. [16] Peyton Jones, S., “The Implementation of Functional Programming Languages,” Prentice Hall, 1987. [17] Peyton Jones, S., Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine, Journal of Functional Programming 2 (1992), pp. 127–202. [18] Peyton Jones, S., editor, “Haskell 98 Language and Libraries: The Revised Report,” Cambridge University Press, 2004. [19] Peyton Jones, S., W. Partain and A. Santos, Let-floating: moving bindings to give faster programs, in: ICFP ’96: Proc. ACM SIGPLAN Intl. Conf. on Functional programming (1996), pp. 1–12. [20] van den Brand, M. G. J., H. A. de Jong, P. Klint and P. A. Olivier, Efficient annotated terms, Software— Practice and Experience 30 (2000), pp. 259–291. [21] van den Brand, M. G. J., J. Heering, P. Klint and P. A. Olivier, Compiling language definitions: the ASF+SDF compiler, ACM Trans. Program. Lang. Syst. 24 (2002), pp. 334–368. [22] van den Brand, M. G. J. and P. Klint, ATerms for manipulation and exchange of structured data: It’s all about sharing, Inf. Softw. Technol. 49 (2007), pp. 55–64. [23] van Deursen, A., J. Heering and P. Klint, editors, “Language Prototyping. An Algebraic Specification Approach,” AMAST Series in Computing 5, World Scientific, Singapore, 1996. [24] Visser, E., Program transformation with Stratego/XT: Strategies, tools, and systems in StrategoXT-0.9, in: C. Lengauer, D. S. Batory, C. Consel and M. Odersky, editors, Domain-Specific Program Generation, Lecture Notes in Computer Science 3016, Spinger-Verlag, 2004 pp. 216–238. 15