Implementing Explicit and Finding Implicit Sharing in Embedded DSLs Oleg Kiselyov oleg@okmij.org Aliasing, or sharing, is prominent in many domains, denoting that two differently-named objects are in fact identical: a change in one object (memory cell, circuit terminal, disk block) is instantly reflected in the other. Languages for modelling such domains should let the programmer explicitly define the sharing among objects or expressions. A DSL compiler may find other identical expressions and share them, implicitly. Such common subexpression elimination is crucial to the efficient implementation of DSLs. Sharing is tricky in embedded DSL, since host aliasing may correspond to copying of the underlying objects rather than their sharing. This tutorial summarizes discussions of implementing sharing in Haskell DSLs for automotive embedded systems and hardware description languages. The technique has since been used in a Haskell SAT solver and the DSL for music synthesis. We demonstrate the embedding in pure Haskell of a simple DSL with a language form for explicit sharing. The DSL also has implicit sharing, implemented via hash-consing. Explicit sharing greatly speeds up hash-consing. The seemingly imperative nature of hash-consing is hidden beneath a simple combinator language. The overall implementation remains pure functional and easy to reason about. I think all DSLs suffer from the same problems: sharing and recursion. I’ve used wrappers for CSound, SuperCollider, MetaPost, they all have these problems. Henning Thielemann [19] 1 Introduction We present implicit and explicit sharing in the original context of embedded domains-specific language (DSL) compilers. The sharing implementation techniques have since found other uses, e.g., writing SAT solvers [3]. Embedded compilers – typical for circuit description, embedded control systems or GPU programming DSLs – complement the familiar embeddings of a DSL in a host language as a library or an interpreter. For example, we may build a circuit model in Haskell using gate descriptions and combinators provided by the DSL library. We may test the circuit in Haskell by running the model on sample inputs. Eventually we have to produce a Verilog code or a Netlist to manufacture the circuit. Likewise, a control system DSL program should eventually be compiled into machine code and burned as firmware. An embedded compiler thus is a host language program that turns a DSL program into a (lower-level) code. One of the important tasks of a compiler is the so-called “common subexpression elimination” – detecting subexpressions denoting the same computation and arranging for that computation to be performed once and the results shared. This optimization (significantly) improves both the running time and compactness of the code and is particularly important for hardware description and firmware compilers. As DSL implementers, it becomes our duty to detect duplicate subexpressions and have them shared. We call this detection implicit sharing, to contrast with the sharing explicitly declared by users. Imperative languages, where it matters whether an expression or its result are duplicated, have to provide a Olivier Danvy, Chung-chieh Shan (Eds.): IFIP Working Conference on Domain-Specific Languages 2011 (DSL 2011). EPTCS 66, 2011, pp. 210–225, doi:10.4204/EPTCS.66.11 c Oleg Kiselyov This work is licensed under the Creative Commons Attribution License. Oleg Kiselyov 211 form (some sort of a local variable introduction) for the programmer to declare the sharing of expression’s result. In the present paper we limit ourselves to side-effect–free expressions such as arithmetic expressions or combinational circuits. Explicit sharing is still important: sharing declarations can significantly reduce the search space for common subexpressions and prevent exponential explosions. We shall cite several examples later. Sharing declarations also help human readers of the program, drawing their attention to the ‘common’ computations. A common pitfall in implementing explicit sharing is attempting to use the let form of the host language to express sharing in the DSL code. §5 will show that although the let -form may speed up some DSL interpretations, it leads to code duplication rather than sharing in the compilation result. Before we get to that discussion, we introduce our running example in §2, describing an unsuccessful attempt to detect sharing in the course of implementing real DSLs. Show-stopping was the expression comparison, which, in pure language is structural and requires the full traversal of expressions. In §3 we review the main approaches to speed-up the comparison, all relying on some sort of ‘pointer’ equality. Our method of expression comparison and sharing detection is presented in §4. The method is a pure veneer over hash-consing, alleviating the cost of comparison. Its main benefit is the facilitation of explicit sharing, described in §5. The code accompanying the paper is available online at http://okmij.org/ftp/tagless-final/ sharing/. 2 Detecting sharing: necessity and difficulty Our running example is a show-stopping problem encountered by the implementer of an embedded DSL compiler for an embedded control system, posed in [12]. The example illustrates the need and the difficulty of common subexpression elimination. The problem was posed for arithmetic expressions, which are part of nearly every DSL. Typically arithmetic expressions are embedded in Haskell as the values of the datatype1 data Exp = Add Exp Exp | Variable String | Constant Int deriving (Eq, Ord, Show) Here are the sample expressions in our DSL: exp a = Add (Constant 10) (Variable ”i1 ”) exp b = Add exp a (Variable ”i2 ”) The implementer wanted to compile these DSL expressions into C or the machine code, using the standard approach of finding common subexpressions and sharing them. To make the sharing explicit, the expression tree is converted into a directed acyclic graph (DAG). The graph is then topologically sorted and each subexpression is assigned a C operation or a machine instruction. The first step, detecting identical subexpressions, was the most troublesome. The step is crucial since common subexpressions are abound, being easy to create. We show two real-life examples, to be used throughout the paper. The first example is the multiplication by a known integer constant. Our DSL does not have the multiplication operation (8-bit CPUs rarely have the needed instruction). Nevertheless, 1 See the file ExpI.hs in the accompaniment for the complete code. 212 Sharing in embedded DSLs we can multiply a DSL expression by the known constant using repeated addition. Here is the standard efficient procedure based on the recursive subdivision: mul : mul 0 mul 1 mul n mul n Int → Exp → Exp = Constant 0 x =x x | n ‘ mod‘ 2 == 0 = mul (n ‘div‘ 2) (Add x x) x = Add x (mul (n−1) x) The result of the sample expression exp mul4 = mul 4 (Variable ”i1 ”) shows two identical subexpressions of adding the variable i1 to itself: Add (Add (Variable ”i1 ”) (Variable ”i1 ”)) (Add (Variable ”i1 ”) (Variable ”i1 ”)) The result of mul 8 (Variable ”i1 ”) shows twice as much duplication. The other running example, from the domain of hardware description, is sklansky by Naylor [15], with further credit to Sheeran and Axelsson. The example computes the running sum of given expressions; like the previous multiplication example, sklansky uses recursive subdivision to expose more parallelism and reduce latency: sklansky : sklansky f sklansky f sklansky f where (left , left ’ right ’ (a → a → a) → [ a] → [ a] [] = [] [ x] = [x] xs = left ’ ++ [f (last left ’) r | r ← right ’] right ) = splitAt (length xs ‘ div‘ 2) xs = sklansky f left = sklansky f right The pretty-printed result of sklansky Add (map (Variable ◦show) [1..4] [ ”v1”,”(v1+v2)”,”((v1+v2)+v3)”,”((v1+v2)+(v3+v4))”] demonstrates the triplication of the subexpression v1+v2. The duplication should be eliminated when we build the circuit. In the process of converting expressions like exp mul4 to a DAG, the implementer [12] had to compare subexpressions. It is there he encountered a problem: in a pure language, we may only compare datatype values structurally. General pointer comparison destroys referential transparency and parametricity. To check that the summands of the top-level addition in exp mul4 are identical, we have to therefore traverse them completely. Such comparisons of two expression trees take more and more time with larger programs (say, as we multiply by bigger integers). “As these trees grow in size, the equality comparison in graph construction quickly becomes the bottleneck for DSL compilation. What’s worse, the phase transition from tractable to intractable is very sharp. In one of my DSL programs, I made a seemingly small change, and compilation time went from milliseconds to not-in-a-million-years.”[12]. His message, entitled “I love purity, but it’s killing me” was a cry for help: he was about to give up on Haskell. He wondered how to dramatically speed-up comparisons, or find a better method of detecting common subexpressions and sharing them. Oleg Kiselyov 3 213 Pointer comparison Sharing detection, specifically, fast identity comparison of expressions, is a common metaprogramming problem and has been investigated extensively. This section reviews the main approaches, which are all based on some sort of ‘pointer’ equality. They associate with an expression a unique datum, e.g., an integer, admitting efficient comparison. For instance, we may define the data type of labeled expressions, where each variant carries a unique integer label2 : type Lab = Int data ExpL = AddL Lab ExpL ExpL | VariableL Lab String | ConstantL Lab Int deriving Show ExpL expressions are fast to compare, by comparing their labels. The full traversal of expressions is no longer needed: instance Eq ExpL where e1 == e2 = label e1 == label e2 where label (AddL p ) =p label (ConstantL p ) = p label (VariableL p ) = p The first approach to building labeled expressions is manual labeling. For example, we construct our sample expressions as follows, taking great care to pick unique labels: expL a = AddL 3 (ConstantL 1 10) (VariableL 2 ”i1 ”) expL b = AddL 4 expL a (VariableL 5 ”i2”) Needless to say this approach is greatly error-prone, let alone tedious. When implementing the mul example we stumble on another complication: the manual threading of a counter to generate unique labels. Some sort of automation is direly needed. A promising and increasingly popular way to hide and automate label assignment is Template Haskell (see the thesis [1] for the extensive discussion). A DSL implemented in Template Haskell quotations can hardly be called ‘embedded’ however. Another approach is the State monad hiding the counter used for the generation of unique labels: type ExpM = State Lab ExpL new labelM : State Lab Lab new labelM = do p ← get put (p+1) return p run expM : ExpM →ExpL run expM m = evalState m 0 2 The complete code for this section is in the file Ptrs.hs. 214 Sharing in embedded DSLs The computation new labelM, or ‘gensym’, yields a unique label; the function run expM runs the monadic computation returning the produced labeled expression. To hide the labeling of expression’s constructors, we build expressions with ‘constructor functions’, often called ‘smart constructors’: varM : String → ExpM varM v = new labelM >>=\p →return $ VariableL p v constM : Int → ExpM constM x = new labelM >>=\p →return $ ConstantL p x addM : ExpL → ExpL → ExpM addM x y = new labelM >>=\p →return $ AddL p x y The sample expression looks awful expM a = do xv ← constM 10 yv ← varM ”i1” addM xv yv although appropriate combinators may improve the appearance. In fact, the multiplication example below is quite perspicuous: the labeling is well-hidden (compare with mul in §2). The occasional return and the ‘call-by-value application’ (=<<) betray the effectful computation: mulM : Int → ExpL → ExpM mulM 0 = constM 0 mulM 1 x = return x mulM n x | n ‘ mod‘ 2 == 0 = mulM (n ‘div‘ 2) =<