Lazy v. Yield: Incremental, Linear Pretty-printing Oleg Kiselyov, Simon Peyton-Jones, and Amr Sabry 1 2 3 oleg@okmij.org simonpj@microsoft.com sabry@cs.indiana.edu Abstract. We propose a programming style for incremental stream processing based on typed simple generators. It promotes modularity and decoupling of producers and consumers just like lazy evaluation. Simple generators, however, expose the implicit suspension and resumption inherent in lazy evaluation as computational effects, and hence are robust in the presence of other effects. Simple generators let us accurately reason about memory consumption. To substantiate our claims we give a new solution to the notorious pretty-printing problem. Like earlier solutions, it is linear, backtracking-free and with bounded latency. It is also simpler to write and reason about, and is compatible with effects including IO, letting us read the source document from a file, and format it as we read. 1 Introduction Lazy evaluation is regarded as one of the main reasons why functional programming matters [1]. Lazy evaluation lets us write producers and consumers separately, whereas the two are inextricably intertwined in a call-by-value language. This separation allows a modular style of programming, in which a variety of producers, consumers, and transformers can readily be “plugged together.” Lazy evaluation is also an elegant implementation of a form of coroutines, suspending and resuming computations based on the demand for values, giving us memory-efficient, incremental computation ‘for free’ [2–4]. Extensive experience in Haskell has, however, exposed severe drawbacks of lazy evaluation, which are especially grievous for stream processing of large amounts of data. Lazy evaluation is fundamentally incompatible with computational effects, can cause fatal memory leaks, and greatly inhibits modular reasoning, especially about termination and space consumption. Seemingly innocuous and justified changes to the code or code compositions may lead to divergence, or explosion in memory consumption. We review and illustrate these drawbacks in more detail in §2. It is therefore worth investigating alternatives to lazy evaluation, which are nearly as convenient, modular, and conducive to incremental computation, and yet are more robust under composition with computational effects. Since lazy evaluation corresponds to an implicit and stylized use of coroutines [5–8], it is 2 natural to consider an explicit stylized use of coroutines. Such alternatives, historically known as generators or iterators, have indeed been used to structure programs using “pipelines” of producers and consumers [9] following Jackson’s principles of program design [10]. They have recently re-emerged in various modern languages such as Ruby, Python, C#, and JavaScript. Although the details differ, such languages offer an operator yield that captures a particular pattern in which a computation is suspended, allowed to communicate with another context, and then resumed.4 Survey [11] documents a rich variety of yield operators, whose most general variant is tantamount to full first-class delimited continuations. In this paper we study a much lighter-weight variant of yield, which we call simple generators. We make the following specific contributions: – We introduce typed simple generators (§3), offering a library of combinators that support efficient stream producers, transducers, and consumers. Despite their simplicity (implementable on a single linear stack without copying), simple generators are capable to solve a variety of stream-processing problems that would normally require lazy evaluation, and yet compose readily with effects. – We show that simple generators are expressive enough to efficiently implement a particularly recondite problem: that of bounded-lookahead, lineartime, incremental, pretty-printing (§4). This problem has been widely considered to require full coroutines or ‘tying of the knot,’ which in turn demands lazy evaluation. Our new solution, derived in §5, is distinguished by modularity and ease and precision of the analysis about latency, time, and especially space. The overall pretty-printing becomes a composition of independently developed, tested, and analyzed components. We can swap the components, for example, replacing the component that traverses an inmemory document with a generator that reads the document from a file. The new structure helped us discover and correct edge cases neglected in previous implementations. – We give benchmarks in §5.4 to validate our analyses of space and time. – Although we mostly use Haskell for concreteness and the ease of contrasting yield and lazy evaluation, our approach extends to other languages. Monadic style, needed to work with generators in Haskell, is a model of call-by-value languages. Since monads make effects (sometimes painfully) explicit, they give good guidance to implementing and using generators in strict languages – as we demonstrate for OCaml. Appendix A of the full paper http://okmij.org/ftp/continuations/PPYield/ yield-pp.pdf gives the derivation of optimal pretty-printing in all detail. The complete Haskell and OCaml code accompanying the paper is available online in the same directory. 4 In some other languages such as Java, the interleaving of execution between producers and consumers is achieved with threads or even bytecode post-processing (weaving). 3 2 The problem with laziness The real virtue of lazy evaluation, emphasized in Hughes’s famous paper [1], is that it supports modular programming by allowing the producer and consumer of a data structure to be separated, and then composed in a variety of ways in “plug-and-play” fashion. Alas, lazy evaluation is also fragile: – Coroutining via lazy evaluation is incompatible with computational effects. With effects, the evaluation order is significant and cannot be left implicit. Adding effects requires re-writing of the code, and, as in call-by-value languages, tangles data producers and consumers (see §2.1); – Lazy evaluation allows us to implement cyclic data transformers as well as linear ones; this is called ‘tying the knot.’ A number of elegant Haskell algorithms are written that way, e.g., the famous repmin [3]. Alas, it is particularly hard to reason about the termination of such algorithms, in part because the types do not tell us anything about demand and supply of values (see §2.2). – Reasoning about space requirements of lazy algorithms is notoriously hard, and non-modular: how much space a function may need depends on the context. One case study [12] reports that it took three weeks to write a prototype genomic data mining application in Haskell, and two months to work around laziness frequently causing heap overflows. More examples are discussed at the end of §3 and in a recent paper by the first author [13]. The rest of the section illustrates these problems and the next one demonstrates how yield answers them. 2.1 Effects Our running example is typical file processing: to read and print a file expanding all tabs5 . Ideally, reading file data, tab expansion itself, and writing the result should be separated, so each can be replaced. For example, the programmer may chose between two tab expansion algorithms: the naı̈ve replacement of ’\t’ with 8 spaces, and the sophisticated one, adding spaces up to the next tab stop (the multiple of 8). In idiomatic, lazy Haskell we start by writing the naı̈ve tabX0 and sophisticated algorithms tabX1 assuming that the input data are all in memory, in a String. We do not worry how data got into memory or how it is written out: tabX0, tabX1 :: String → String tabX0 [] = [] tabX0 (’\t’: rest ) = replicate 8 ’ ’ + + tabX0 rest tabX0 (c:rest ) = c : tabX0 rest tabX1 = go 0 where go pos [] = [] go pos (’\ t’: rest ) = let pos’ = (pos +8) − pos 8 mod8 8 in replicate (pos’ − pos) ’ ’ + + go pos’ rest go pos (c: rest ) = c : go (if c == ’\n’ then 0 else pos + 1) rest 5 We do not assume that the file is made of lines of bounded width. 4 The sophisticated version keeps a bit of local state, the pos argument, for the current output position. The complete function is as follows: expandFile lazy :: String → IO () expandFile lazy filename = do h ← openFile filename ReadMode str ← hGetContents h putStr (tabX0 str ) It opens the file, reads its contents, calls tabX0 to transform the contents, and writes out the result. It is modular, in the sense that it would be the work of a moment to replace tabX0 with tabX1, or by some other transformation entirely. But we obviously don’t want to read in the entire contents of the file into memory, then transform it, and then write it out. We would prefer to read the file on demand, so to process arbitrarily large files in bounded memory. Haskell’s hGetContents function does exactly that, returning a string containing lazy thunks that, when forced (ultimately by putStr), read the file. This on-demand input/output is called “lazy IO.” Although very convenient, it is fraught with problems, including deadlocks and leaks of space, file descriptors and other resources [13]. The problems are apparent already in our code. For example, on file reading error an exception would be raised not in hGetContents but rather at some later indeterminate point in putStr. Moreover, file reading operations interleave uncontrollably with other IO operations, which may be fatal if the input file is actually a pipe. These problems really do arise in practice; the standard response is “you should use strict I/O.”6 Thus lazy evaluation indeed does not work with effects. If we use effects, we have to re-write our tab-expansion code, for example, as follows: expandFile strict :: String → IO () expandFile strict filename = do h ← openFile filename ReadMode; loop h; hClose h where loop h = do done ← hIsEOF h if done then return () else hGetChar h = check  loop h check ’\ t’ = putStr (replicate 8 ’ ’) check c = putStr [ c] We now distinguish EOF from other input errors. We explicitly close the file as soon as we do not need it. Alas, the tab expansion, reading, and checking for the end of data are all intertwined. Although we can abstract out processing of a character, we cannot abstract out the processing of the entire stream, or easily replace the naı̈ve tab expansion algorithm with the smart tab expansion: we have to re-write the whole reading-writing loop to thread the output pos. We clearly see what John Hughes meant when saying that strict evaluation entangles consumers and producers and inhibits modularity. The same considerations would apply to any effectful producer or transformer. Thus, there is a real tension between the modular programming style advocated in “Why functional programming matters” and computational effects. 6 e.g. http://stackoverflow.com/questions/2981582/haskell-lazy-i-o-and-closing-files. 5 2.2 Recursive knots Richard Bird’s famous repmin function [3], shown below, has made a compelling case for lazy evaluation. The function takes a tree and returns a new tree of the same shape, with every leaf value replaced by the minimum leaf value of the original tree. The new tree is constructed on-the-fly and the original tree is traversed only once: data Tree = Leaf Int | Node Tree Tree repmin :: Tree → Tree repmin t = tr where (mn, tr) = walk mn t walk :: Int → Tree → (Int , Tree) walk mn (Leaf n) = (n, Leaf mn) walk mn (Node t1 t2) = (n1 8 min8 n2, Node tr1 tr2 ) where (n1,tr1) = walk mn t1 (n2,tr2 ) = walk mn t2 The main function walk takes the value to put into leaves and a tree and returns its minimum leaf value and the transformed tree. In repmin we pass the minimum leaf value computed by walk as an argument to walk itself, “tying the knot.” Crucial is putting the minimum computation mn into a Leaf without evaluating it. Once the computation is eventually evaluated, the resulting minimum value shows up in all leaves. Lazy evaluation is indispensable for repmin. Alas, small changes easily make the program diverge. For example, suppose we only wish to replace those leaves whose value is more than twice the minimum. If we replace the Leaf n case in walk with the following: walk mn (Leaf n) | n > 2 ∗ mn = (n, Leaf mn) | otherwise = (n, Leaf n) we get a divergent repmin. Now walk does need the value of mn before finishing the traversal, demanding the value it is yet to produce. We may try one of the following fixes: walk mn (Leaf n) = (n, if n > 2 ∗ mn then Leaf mn else Leaf n) walk mn (Leaf n) = (n, Leaf (if n > 2 ∗ mn then mn else n)) We leave it as an exercise to determine which one works. The answer is nonobvious, requiring us to do global dataflow analysis in our head, determining which terms are evaluated when. Even experienced functional programmers can make mistakes, and often confess a lack of complete certainty about whether the program is now right. However elegant, “tying the knot” is fragile. To make matters worse, just imagine needing to modify repmin to print the leaves as they are walked! 3 Yield We now describe how yield can be used as an alternative to lazy evaluation that is robust and compatible with arbitrary effects, and yet has attractive features of lazy evaluation in untangling producers from consumers and allowing them to be developed separately. Since there are many possible variants of yield [11], 6 as well as several Haskell libraries based on iteratees that are similar to ours, we begin by reviewing some background, and then present our particular design contrasting it others. Finally, we revisit the examples in the previous section using our library. Although we will be using Haskell to introduce yield, we will rely on monadic style, which is a model of call-by-value languages. The modeling of simple generators in Haskell helps implementing them in other languages, such as OCaml. 3.1 Background Our design of simple generators is inspired by CLU’s iterators [14] which are themselves inspired by the generators of Alphard [15]. Generators of Alphard were meant as a compositional abstraction of iteration: “generators abstract over control as functions abstract over operations” [15]. These simple generators can be viewed as asymmetric coroutines, a producer and a consumer, that pass data in one direction only, from the producer to the consumer. Besides unidirectionality, simple generators are further restricted: they can be nested but they cannot run side-by-side (and so cannot solve problems like ‘same-fringe’). More generally, generators with varying degrees of expressiveness have spread to many languages. A uniform way to understand the variations in expressiveness is to view the various designs as imposing restrictions on delimited continuations [11]. The most general design gives full first-class delimited continuations. Some other (like that of Javascript) expose the continuation as a first-class object in the form of an external iterator but restrict the continuation to be a one-shot continuation. Even more restricted designs such as Ruby’s never expose the continuation and only provide internal iterators that can only be used in the context of a foreach-like loop. One simple generators are also restricted in that sense: they never expose the continuation and restrict the impliict continuation to be one-shot. The restriction enables a simple and efficient implementation of simple generators on a single linear stack without copying [14]. Haskell’s Hackage has a package generator for a very simple version of Python generators. It is inefficient, relying on the full delimited continuation monad, and, mainly, does not offer stream transducers. Like generators, iteratees [13] provide incremental, compositional input processing and a sound alternative to lazy IO. Whereas generators focus on production of values, iteratees are designed around consumption [16]. There are many implementations of iteratees in many languages: just Haskell Hackage has the libraries iteratee, enumerator, monad-coroutine, iterIO, pipes, conduit and even broad categories ‘Enumerator’ and ‘Conduit’. Underlying all these iteratee implementations is the resumption monad, which is tantamount to first-class, multi-shot delimited continuations. Compared to simple generators, iteratees are thus more expressive but much more heavier-weight. They cannot be implemented on linear stack without copying or building auxiliary data structures. The remarkable implementation simplicity and efficiency of simple generators strongly motivates investigating and pushing the limits of their expressiveness. 7 −− Simple generators type GenT e m type Producer m e type Consumer m e type Transducer m1 m2 e1 e2 = = = = ReaderT (e → m ()) m GenT e m () e → m () Producer m1 e1 → Producer m2 e2 yield :: Monad m ⇒ e → Producer m e runGenT :: Monad m ⇒ Producer m e → Consumer m e → m () foldG :: Monad m ⇒ (s → e → m s) → s → Producer (StateT s m) e → m s newtype ReaderT env m a = ReaderT { runReaderT :: env → m a } ask :: ReaderT env m r newtype StateT s m a = StateT { runStateT :: s → m (a, s) } Fig. 1. The interface of our Haskell yield library. For completeness, we also include the ReaderT and StateT types from the transformers library. type type type type ’ a gen producer consumer ’ a transducer = = = = unit → ’ a unit gen exn → unit ’a gen → ’ a gen val yield : exn → unit val iterate : producer → consumer → unit val foldG : (’ s → exn → ’ s) → ’ s → producer → (unit → ’ s) Fig. 2. The interface of our OCaml library of simple generators. A yielded value is encapsulated in an exception object. 3.2 Producers, Consumers, Transducers Our design is lightweight: other than type abbreviations, it consists of just three functions. It is summarized in the top portion of Fig. 1 and discussed in detail in the remainder of this section. For comparison, Fig. 2 shows the corresponding interface in OCaml. The users of simple generators may regard GenT e m as an abstract monad: the type GenT e m a is that of computations that emit values of type e and eventually return a value of type a. The alias Producer emphasizes this interpretation. The concrete type of GenT e m reveals that a producer is structured as an environment (Reader) monad over an arbitrary m. The consumer is stored in an environment that the generator can query, or “ask.” The consumer thus acts as a loop body that the producer invokes to process the ‘emitted’ element. With yield being the only primitive producer, and GenT e m being a monad, we may write more complex producers, e.g., emitting characters read from a file: fileGen :: MonadIO m ⇒ Producer m Char fileGen = do h ← liftIO $ openFile ”/tmp/testf.txt ” ReadMode loop h; liftIO $ hClose h where loop h = do done ←liftIO $ hIsEOF h if done then return () 8 else liftIO (hGetChar h) = yield  loop h The standard System.IO function putChar is a sample consumer. We hook the producer and putChar by simply saying fileGen 8 runGenT8 putChar, which prints characters as they are read. The type of yield is an instance of Consumer m a where m is GenT a m’. Therefore, we can build consumers that transform the received element emitting (producing) the result – in short, act as stream transducers. Since yield is a consumer that immediately emits the consumed element, runGenT gen yield is the same as gen, and \gen → runGenT gen yield is the identity transducer. Here is a version of the naı̈ve tab expander tabX0 from §2.1 expressed as a transducer: tabY0 :: Monad m ⇒ Transducer (GenT String m) m Char String tabY0 gen = runGenT gen con where con ’\t’ = yield (replicate 8 ’ ’) con c = yield [ c] Thus tabY0 fileGen is a producer that reads the file and tabifies it, and which can be combined with a consumer like putStr using runGenT. We may regard Producer e m as an effectful analogue of [e], representing a sequence of elements whose production may require an effect m. Transducers are hence analogues of list transformers, and the list combinators map, fold, etc. have natural counterparts on generators. The following transducer is a map-like one: mapG :: Monad m ⇒ (e1 → e2) → Transducer (GenT e2 m) m e1 e2 mapG f gen = runGenT gen (yield ◦ f) Since transducers are Producer-to-Producer functions, they can be combined by functional composition – letting us add more stream processing stages by composing-in more transducers. For example, a producer which reads the file, upper-cases and tabifies it is composed as (tabY1 ◦ mapG toUpper) fileGen. The library’s implementation in terms of an environment monad guarantees that such whole stream processing happens in constant memory. So far, we merely rewrote expandFile strict of §2.1 using our generator library. In contrast to §2.1, we can now replace the naı̈ve tab expansion with a sophisticated one without re-writing the file reader. Recall that the sophisticated tab expansion has local state, the current output position. Since our library is parametrized over a monad m, we add local state by merely instantiating m to be the state monad. tabY1 :: Monad m ⇒ Transducer (StateT Int (GenT String m)) m Char String tabY1 gen = evalStateT (runGenT gen con) 0 where con e1 = get = (\s → lift (f s e1)) = put f pos ’\ t’ = let pos’ = (pos +8) − pos 8 mod8 8 in yield (replicate (pos’ − pos) ’ ’)  return pos’ f pos c = yield [ c]  return (if c == ’\n’ then 0 else pos + 1) To add the sophisticated tab expansion, all we need is to replace tabY0 with tabY1 in the previous code fragments. The examination of tabY1 points to further abstraction. The internal function f looks like tabX1 in the lazy evaluation example in §2.1, only without the argument rest. Its type fits the pattern s → e → m s of a monadic state trans- 9 former (with the local state s being Int). We capture the pattern in combinator similar to List.fold: foldG :: Monad m ⇒ (s → e → m s) → s → Producer (StateT s m) e → m s foldG f s0 gen = execStateT (runGenT gen consumer) s0 where consumer x = get = (\s → lift $ f s x) = put We rewrite tabY1 using foldG as: tabY1’ = foldG t 0  return () where t pos ’\ t’ = let pos’ = (pos +8) − pos 8 mod8 8 in yield (replicate (pos’ − pos) ’ ’)  return pos’ t pos c = yield [ c]  return (if c == ’\n’ then 0 else pos + 1) (we will abbreviate foldG t s  return () as foldG t s.) 3.3 Cycles The function repmin seems out of scope for generators, which are intended for stream processing. The function repmin builds a tree rather than a stream. We can bring repmin into the scope for generators by serializing the resulting tree into a stream of XML-like nodes: data TreeStream = BegNode | EndNode | LeafData Int serializeX :: Tree → [ TreeStream] We now show how to write serializeX ◦ repmin with generators, with the ‘side effect’ of being able to add arbitrary effects such as debug printing. We start by writing the generator traverse that turns a tree into a producer of TreeStream elements, and collect which collects the elements into a list: traverse :: Monad m ⇒ Tree → Producer m TreeStream traverse (Leaf i ) = yield (LeafData i ) traverse (Node t1 t2) = do yield BegNode; traverse t1; traverse t2; yield EndNode collect :: Monad m ⇒ Producer (StateT [e] m) e → m [e] collect g = foldG (\s e → return $ e:s) [] g = return ◦ reverse Then collect ◦ traverse (in the Identity monad) is equivalent to serializeX. We now need to insert a version of repmin that processes the TreeStream elements, replacing LeafData elements by the minimum. This operation clearly needs a lookahead buffer of type [TreeStream]. The first encountered LeafData node switches on the look-ahead, triggering the accumulation, which continues through the end. We will see a similar buffering in §5.2: repminT gen = foldG go (0,[]) gen = \(m,buf) → mapM (flush m) (reverse buf) where go (m, []) BegNode = yield BegNode  return (m, []) go (m, []) EndNode = yield EndNode  return (m, []) go (m, []) (LeafData x) = return (x,[ LeafData x]) go (m, b) (LeafData x) = return (x 8 min8 m,LeafData x : b) go (m, b) e = return (m, e: b) flush m (LeafData ) = yield (LeafData m) flush e = yield e The serialized repmin-ed tree is then the result of the modular composition collect ◦ repminT ◦ traverse. If we wish to add debug printing, we insert into the cascade a stream transformer that re-emits the elements while printing them. 10 The stream transformer repminT obviously has unbounded look-ahead. Although more difficult to discern, the original repmin also requires unbounded look-ahead, with the tree itself used as an implicit look-ahead ‘buffer.’ 4 Pretty-printing specification Oppen [17] defined pretty-printing as a ‘nice’ formatting of a (tree-structured) document within a fixed page width. The core of his specification of nice formatting – used in all other Haskell implementations – takes documents of the following abstract form: data Doc = Text String | Line | Doc :+: Doc | Group Doc An abstract document is either a string (Text), a potential line break (Line), a composition of two documents side-by-side (:+: ), or a group. A group specifies a unit whose linebreaks are interpreted consistently. If a group, with all Linebreaks within it interpreted as spaces, fits onto the remainder of the line, the group is formatted this way. Otherwise, the Linebreaks in the group (but not within embedded groups) are treated as newlines. For example, the following simple document: doc1 = Group (Text ”A” :+: (Line :+: Group (Text ”B” :+: (Line :+: Text ”C”)))) would be formatted as shown on the left if the width of the page is 5 or more characters, and as shown in the middle if the width of the page is 3 or 4 characters, and as shown on the right if the width of the page is 1 or 2 characters: ABC A BC A B C As an executable specification of the problem we take the following inefficient pretty-printer (also used as a starting point by Chitil [18]): type Fit type Width = Bool = Int type WidthLeft = Int type PageWidth = Int pretty1 :: PageWidth → Doc → String pretty1 w d = fst $ format False w d where format :: Fit → WidthLeft → Doc → (String, WidthLeft) format f r (Text z) = (z, r − length z) format True r Line = (” ”, r −1) format False r Line = (”\n”,w) format f r (d1 :+: d2) = (s1+ + s2, r2) where (s1,r1) = format f r d1 (s2, r2) = format f r1 d2 format f r (Group d) = format (f | | width d ≤ r ) r d width width width width width :: Doc → Width (Text z) = length z Line =1 (d1 :+: d2) = width d1 +width d2 (Group d) = width d The function pretty1 is invoked with the width of the page and the document to format. It immediately invokes an interpreter, format, which recursively 11 traverses and formats the document maintaining a boolean-valued environment variable f and an integer-valued state variable r. The flag f tells if the current document is part of the group that fits on the current line. The flag affects the formatting of Line. The state r tells the remaining available space on the current line; the fit of a group is determined by comparing r with the expected length of the group. This length is calculated by the function width which traverses the document adding up the lengths of its constituent strings. The executable specification of pretty-printing is clear but greatly inefficient: in the worst case, it takes time exponential in the size n of the input document. The width of an inner group may be repeatedly recomputed, as part of the width computation for ancestor groups as they are being formatted. Furthermore, we cannot begin to format a group until we computed its width; therefore the algorithm has an unbounded look-ahead and its latency is O(n). In contrast, Oppen’s original algorithm which is imperative and uses explicit coroutines, is linear in the size of the document, is independent of the page width w to which the document is formatted, and is incremental with a latency bounded by O(w). Attempts to algebraically derive just as efficient Haskell implementation have so far failed [19, 20]. Most Haskell pretty-printing libraries use some form of backtracking and hence cannot have bounded latency. Standing out is Chitil’s implementation [18], which matches the classical one in efficiency. It is written however in an iterative style, which amounts to performing a manual continuation-passing style transformation. Swierstra bolstered the case for laziness by showing that a linear, incremental lazy pretty-printing function exists after all [21]. It crucially relies on tying the knot, and its non-divergence is hard to see. Other analyses, in particular estimating space complexity, are difficult as well. The code is complex, with five state parameters, one of which is computed ‘backwards.’ In fact the solution was developed as an attribute grammar, and hand-translated into Haskell. (Swierstra et al. have since developed mechanical translations and even embedding of attribute grammars in Haskell [4, 22].) 5 Stepwise generation of pretty-printer We build an efficient pretty-printer by combining two key optimizations: (i) avoiding re-computations of group width by memoization or pre-computation and (ii) pruning, computing the width of a group only as far as needed to determine if the group fits on the line. These optimizations are present in one form or another in all optimal pretty-printing implementations. Our development is distinguished by a systematic, modular and compositional application of the optimizations. We build the pretty-printer as a cascade of separately developed, tested and analyzed stream transducers. We stress the ease of analysis and its composability. Here is a general idea. To avoid re-computing group widths, we may compute the width of all groups beforehand – for example, by traversing the whole document tree and annotating each node with its width. The traversal is standard post-order, linear in the size of the tree. Alas, the annotated tree needs as much 12 space as the original one. Since we have to traverse all children of the root node to compute its width, we really have to build the whole annotated tree first before we start formatting. Applying the pruning optimization seems non-obvious, until we make the traversal of the document tree incremental, as a generator of a stream of traversed nodes. The width computation becomes a transducer that adds width annotations to stream elements. The annotated tree is never explicitly constructed. Pruning becomes a straightforward optimization of the group width transducer, bounding its look-ahead. We realize this plan below, step-wise. §5.1 converts the document tree to a stream of nodes, which we then annotate with the horizontal position. §5.2 modifies the annotations so they effectively become group width annotations. §5.3 optimizes the annotation algorithm using pruning. The widthannotated stream is formatted in §5.4. To save space, we focus on the key steps and relegate the details to Appendix A of the full paper. 5.1 Generating document stream The first step of our plan is converting the document tree to a stream of nodes. The elements of the stream are of the following type: data StreamB = TE String | LE | GBeg | GEnd with constructors for Text and Line and a pair of constructors for entering and leaving a group. The function genB generates a bare stream by in-order traversing the document tree: genB :: Monad m ⇒ Doc → Producer m StreamB Analysis. As genB reaches a text or a line node, it (like traverse in §3) immediately emits the corresponding stream element. Hence genB has unit latency. Since genB is a simple in-order traversal of the tree, the total time to generate the whole stream is linear in the size of the tree. The function needs stack space proportional to the depth d of the tree since genB is not tail-recursive. We annotate the stream elements with the rolling width, or the horizontal position HP in a hypothetical formatting of the document in a single line: type HP = Int type HPB = Int data StreamHPB = TEb HP String | LEb HP | GBegb HPB | GEndb HP All stream elements except GBeg are annotated with the horizontal position at the end of formatting of that element on the hypothetical single line. In particular, GEnd is annotated with the final HP for its group. The node GBeg is however annotated with the horizontal position HPB at the beginning of the formatting of the group. In other words, each node is annotated with the sum of the widths of all preceding nodes including the current. The annotation is done by the simple state transducer trHPB, consuming StreamB and emitting StreamHPB of annotated elements. The horizontal position is the state: trHPB :: Monad m ⇒ Transducer (StateT HP (GenT StreamHPB m)) m StreamB StreamHPB 13 Analysis. The transforming function merely increments the current horizontal position. It hence does constant amount of work, has unit latency and runs in constant space. The total transformation time is linear in the size of the input stream. 5.2 Determining group widths The annotated stream is not directly suitable for formatting: when we encounter a group, that is, a GBeg element, we have to decide if the group fits; hence we need the width of the group, or the horizontal position of the group’s end. Therefore, we transform StreamHPB into StreamHPA where GBeg will be annotated with final rather than initial HP of the group, that is, the HP of the GEnd element of the group. Clearly this requires look-ahead. Furthermore, since groups can be nested, the look-ahead buffer must be structured, so that we can track several groups in progress: data StreamHPA = TEa HP String | LEa HP | GBega HP | GEnda HP type Buffer m = [Buf StreamHPA m] The overall look-ahead Buffer m is a list of simple buffers Buf that each correspond to one unfinished, nested group. A Buf accumulates stream elements corresponding a tree branch, after GBeg and up to and including the matching GEnd. A simple buffer Buf should permit the following operations: buf empty :: (D ) :: (E ) :: buf ccat :: buf emit :: Monad Monad Monad Monad Monad m m m m m ⇒ Buf e m ⇒ Buf e m → e → Buf e m ⇒ e → Buf e m → Buf e m ⇒ Buf e m → Buf e m → Buf e m ⇒ Buf e m → Producer m e that is, the creation of the empty buffer, appending an element to the buffer b D e and prepending an element e E b in constant time, concatenation of two buffers in constant time, and emitting all elements in the buffer in linear time. The producer of StreamHPA is also a state transducer, from the stream StreamHPB built in the previous section. The state is the look-ahead Buffer m: type St m = StateT (Buffer m) (GenT StreamHPA m) trHPA :: Monad m ⇒ Transducer (St m) m StreamHPB StreamHPA trHPA = foldG go [] where = return (buf empty:q) go q (GBegb ) go (b: q) (GEndb p) = pop q (GBega p E (b D GEnda p)) go [] (TEb p) = yield (TEa p)  return [] −− ditto for LE go (b: q) (TEb p) = return (( b D TEa p): q) −− ditto for LE pop [] b = buf emit b  return [] pop (b’: q) b = return (( buf ccat b’ b): q) GBeg adds a new layer to the Buffer ready to accumulate elements of the new group. Text and Line elements outside of a group are emitted immediately to the output stream. Otherwise, they are accumulated in the Buf of their parent group. GEnd p supplies the final horizontal position p of the group, letting us emit GBeg p and flush the accumulated elements in Buf. Since the terminated group may be part of another, still unfinished group, we delay emitting elements 14 of the terminated group and put them into the look-ahead buffer of the parent group. Only when the outer group is terminated we finally empty the look-ahead buffer emitting all its elements. Analysis. Since we cannot emit any group element until we see GEnd, the latency is of the order of n, the size of the whole document (stream). The look-ahead Buffer m is the extra space, again linear in n. Total time is determined by amortization. Assume that each element of the input stream brings us the credit of 2. We spend one credit to yield the element, and to put the element into the buffer (in general, for any constant amount of work within go). Thus all elements in the buffer have one credit left, enough to pay for the linear-time operation buf emit. Thus, the total time complexity is linear in n. Hooking up the stream StreamHPA to a linear-time constant-space formatter (similar to the one in §5.4 below) gives the overall pretty-printer, with lineartime complexity but unbounded, O(n) latency and the corresponding amount of extra space. To bound the look-ahead we apply the second optimization below, pruning. 5.3 Pruning We have just seen that determining the width of each group is expensive since we have to scan the whole group first. However, the exact group width is not necessary: if the width is greater than the page width, we do not need to know by how much. We introduce an ‘approximate horizontal position’ HPP: data HPP = Small HP | TooFar data StreamHPP = TEp HP String | LEp HP | GBegp HPP | GEndp HP to use instead of the exact final horizontal position HP to annotate GBeg elements with. GBeg is annotated with TooFar if the final horizontal position of the group is farther than the page width w away from the group’s initial horizontal position. Computing HPP requires only bounded, by w, look-ahead. The stream transformer trHPP below is the pruned version of trHPA of the previous section. The look-ahead BufferP, like the look-ahead Buffer of trHPA, is a sequence of simple Bufs that accumulate delayed elements following a GBeg up to and including the corresponding GEnd. We need to efficiently access the sequence from both ends however; the simple list no longer suffices. The Haskell basis library provides the data structure Seq with the needed algorithmic properties (we import Data.Sequence as S): type BufferP m = (HPL, S.Seq (HPL, Buf StreamHPP m)) bufferP empty = (0,S.empty) type HPL = Int If HP is the beginning horizontal position of the group, HPL is a w-offset position: any position after HPL is TooFar. For each accumulated group we compute HPL and make it easily accessible. Furthermore, fst BufferP provides the HPL for the outermost group, so we can easily see if the current HP is too far for that group. If so, we can emit GBeg TooFar and empty its look-ahead Buf. The transformer trHPP of StreamHPB to StreamHPP is the ‘pruned’ version of trHPA: 15 type St m = StateT (BufferP m) (GenT StreamHPP m) trHPP :: Monad m ⇒ PageWidth → Transducer (St m) m StreamHPB StreamHPP trHPP w = foldG go bufferP empty where ... check :: BufferP m → HP → GenT StreamHPP m () check (p0,q) p | p ≤ p0 && S.length q ≤ w = return (p0,q) = check ( , q) p | ( , b) : < q’ ← S.viewl q buf emit (GBegp TooFar E b)  check’ q’ p check’ q p | (p’, ) : < ← S.viewl q = check (p’, q) p | otherwise = return bufferP empty Except for check, it is essentially the same as trHPA of §5.2. The function check prunes the look-ahead: it checks to see if the current horizontal position p exceeds p0, the HPL of the outer group. If so, the outer group is wider than w, which lets us immediately emit GBegp TooFar and the elements accumulated in the outer Buf. The not-yet-terminated inner group may also turn out too wide: we have to recursively check. The function check also prunes the look-ahead BufferP when it becomes deeper than w, which may happen in the edge case of a document: Group (Group (Group ... :+: Group ...) :+: Group (Group ... :+: Group ...)) whose StreamHPB includes an arbitrarily long sequence of GBeg p with the same initial group position p. The first pruning criterion will not be triggered then. In genB of §A.1 we have ensured that each group is at least one character-wide, with no group as a sole child. Therefore, a group that contains at least w descendant groups must be wider than w. Incidentally, this edge case has not been accounted for in [21]; the latter algorithm would need to add yet another state parameter to the formatting function. Analysis. All S.Seq operations used in the code – adding to the left (C ) or to the right (B ) end of the queue and deconstructing the left or the right end with S.viewl or S.viewr – take constant amortized time. Therefore, the analysis of trHPP is similar to the analysis of trHPA, modulo the fact that the total size of the look-ahead BufferP is bounded by w. Therefore, latency and the extra space for the look-ahead buffer are bounded by the page width. The total processing time remains linear in the size of the input stream. 5.4 Putting it all together and benchmarking The final step of pretty-printing is the formatting: transforming the pruned StreamHPP to a stream of Strings. To format stream elements LE as spaces or newlines, the formatter keeps track of an indicator if the current group and its ancestors fit on the remainder of the line. The formatter trFormat is straightforward, with unit latency and the overall linear running time, operating in constant space. The complete pretty-printer of a document is a cascade of the width estimators and the formatter, applied to the initial stream generator: pp :: Monad m ⇒ PageWidth → Doc → Producer m String pp w = trFormat w ◦ trHPP w ◦ trHPB ◦ genB Final Analysis. The total latency is the sum of latencies contributed by all transducers, which is bounded by the page width w. Since all transducers process 16 the whole stream in time linear to the size of the stream, the total running time of the pretty-printer is linear in the source document size. We need extra space: O(w) for the look-ahead BufferP in trHPP and O(d) (where d is the document depth) for the initial generator genB. To validate the analyses, we ran a benchmark meant to resemble the full binary tree prepared for pretty-printing. The benchmark, rather than writing 5 3 4.5 4 2.5 CPU time (s) CPU time (s) 3.5 3 2.5 2 1.5 2 1.5 1 0.5 0 1 2e+02 4e+02 6e+02 Document size, Kbytes 8e+02 10 15 20 25 30 35 Page width 40 45 50 55 the formatted document to a file, accumulates, in the Writer monad, its total size and the number of lines. The benchmark was compiled with GHC -O2 version 7.0.4 and ran on a i386 FreeBSD 2GHz Pentium 4 system. The running times (in seconds) are the medians of five consecutive runs. The figure on the left plots the running time against the size of the formatted output, for the fixed page size w = 50. The figure on the right plots the running time of pretty-printing the same benchmark document (output size 414,870 bytes) against the different values of the page width w. The benchmark validates the analyzes: the running time is a linear function of the document length, independent of w. 6 Conclusions We have described simple generators to complement or supplant lazy evaluation in stream-processing programs. Like lazy evaluation, simple generators promote modularity, stepwise development and incremental testing by decoupling stream producers, consumers and transformers. Unlike lazy evaluation, simple generators are compatible with effects including IO, and allow modular, composable reasoning about time, latency, and especially about space consumption. We have implemented simple generators as libraries in Haskell and OCaml. The Haskell monadic implementation guides implementations in other languages, making connections with the visitor pattern and dynamic binding clear. In future work, we will formalize the yield calculus and formally relate with call-by-need. We have illustrated simple generators and demonstrated their expressive power by solving the challenging efficient pretty-printing problem. Our implementation is a new and unexpected solution: efficient pretty-printing was believed to require full delimited continuations or coroutines, which simple generators do not provide. Like the other optimal solutions, it is linear in the size of the input document and has bounded latency. Our solution however was assembled from separately developed and tested components. We have also analyzed time 17 and space complexity component-by-component, combining the analyses at the end. Our precise analyses discovered previously overlooked edge cases. Acknowledgments. We thank S. Doaitse Swierstra for helpful discussions. Many helpful suggestions by anonymous reviewers are gratefully acknowledged. References [1] Hughes, J.: Why functional programming matters. Comput. J. 32 (1989) 98–107 [2] McIlroy, M.D.: Power series, power serious. J. Funct. Program. 9 (1999) 325–337 [3] Bird, R.S.: Using circular programs to eliminate multiple traversals of data. Acta Informatica 21 (1984) 239–250 10.1007/BF00264249. [4] Viera, M., Swierstra, S.D., Swierstra, W.: Attribute grammars fly first-class: how to do aspect oriented programming in Haskell. In: ICFP. (2009) 245–256 [5] Henderson, P., Morris, Jr., J.H.: A lazy evaluator. In: POPL, New York, NY, USA, ACM (1976) 95–103 [6] Ariola, Z.M., Maraist, J., Odersky, M., Felleisen, M., Wadler, P.: A call-by-need lambda calculus. In: POPL, New York, NY, USA, ACM (1995) 233–246 [7] Garcia, R., Lumsdaine, A., Sabry, A.: Lazy evaluation and delimited control. In: POPL, New York, NY, USA, ACM (2009) 153–164 [8] Chang, S., Van Horn, D., Felleisen, M.: Evaluating call-by-need on the control stack. In: Trends in functional programming, Springer (2011) 1–15 [9] Kay, M.: You pull, I’ll push: on the polarity of pipelines. In: Proc. Balisage: The Markup Conference. Volume 3 of Balisage Series on Markup Technologies. (2009) [10] Jackson, M.A.: Principles of Program Design. Academic Press, Inc., Orlando, FL, USA (1975) [11] James, R.P., Sabry, A.: Yield: Mainstream deliminted continuations. In: Theory and Practice of Delimited Continuations. (2011) [12] Clare, A., King, R.D.: Data mining the yeast genome in a lazy functional language. In: Practical Aspects of Declarative Languages. Volume 2562 of LNCS. SpringerVerlag (2003) 19–36 [13] Kiselyov, O.: Iteratees. In: FLOPS. Volume 7294 of LNCS., Springer (2012) 166–181 [14] Liskov, B.: A history of CLU. Technical Report 561, MIT LCS (1992) [15] Shaw, M., Wulf, W.A., London, R.L.: Abstraction and verification in Alphard: defining and specifying iteration and generators. Comm. ACM 20 (1977) 553–564 [16] Lato, J.W.: Understandings of iteratees. http://johnlato.blogspot.com/2012/ 06/understandings-of-iteratees.html (2012) [17] Oppen, D.C.: Prettyprinting. ACM Trans. Program. Lang. Syst. 2 (1980) 465–483 [18] Chitil, O.: Pretty printing with lazy dequeues. ACM Trans. Program. Lang. Syst. 27 (2005) 163–184 [19] Hughes, J.: The design of a pretty-printing library. In: Advanced Functional Programming, First Int. Spring School, Springer-Verlag (1995) 53–96 [20] Wadler, P.: A prettier printer. In: The Fun of Programming. A Symposium in honour of Professor Richard Bird’s 60th birthday, Oxford (2003) [21] Swierstra, S.D.: Linear, online, functional pretty printing (corrected and extended version). Technical Report UU-CS-2004-025a, Utrecht University (2004) [22] Dijkstra, A.: Stepping through Haskell. PhD thesis, Utrecht University, Department of Information and Computing Sciences (2005) 18 A Detailed stepwise generation of pretty-printer This section fills in the outline in §5 of the stepwise development of the optimal pretty-printer, and gives the complete details. §A.1 converts the document tree to a stream of nodes, which we annotate with the horizontal position in §A.2. §A.3 modifies the annotations so they effectively become group width annotations. §A.4 optimizes the annotation algorithm using pruning. The width-annotated stream is formatted in §A.5 and (more optimally) §A.6. A.1 Generating document stream The first step of our plan is converting the document tree to a stream of nodes, to be transformed and eventually formatted. The elements of the stream are of the following type: data E ab a = TE a String | LE a | GBeg ab | GEnd a type StreamB = E () () with the variants for a Text and Line element and for the entering and leaving a group. The development in the main body of the paper, §5, defined a sequence of closely-related data types StreamB, StreamHPB, etc. To avoid the need to constantly re-define TE, LE and similar constructors, we introduce a single data type E and suitably parametrize it by the annotations ab and a to cover the whole range of data types in §5, from StreamB to StreamHPP. In the latter, GBeg had a different annotation (an estimate of the horizontal position) from the annotations on the other elements. StreamB, as before, describes an annotationfree, bare stream. The function genB generates a bare stream by in-order traversing the document tree. The function is quite like traverse already explained in §3: genB :: Monad m ⇒ Doc → GenT StreamB m () genB (Text ””) = return () genB (Text z) = yield (TE () z) genB Line = yield (LE ()) genB (d1 :+: d2) = genB d1  genB d2 genB (Group d0) | Just d ← norm d0 = yield (GBeg ())  genB d  yield (GEnd ()) where norm (Group d) = norm d norm (Text ””) = Nothing norm (Text ”” :+: d) = norm d norm d = Just d genB = return () If the document were on a disk, in a sort of XML file, genB could be understood as reading the file and emitting the nodes as they are parsed. The function genB performs normalization of the document, ensuring that: (i) every group is strictly wider than any of its children groups (thus eliminating Group (Group ...)); (ii) any group is at least one-character wide. Normalization is often overlooked, yet critical: without it no pretty-printing algorithm can have bounded look-ahead. For example, in a document with a branch (Group ◦ Group ◦ ... ◦ Group) (Text ””) 19 with an arbitrarily long sequence of Groups, any pretty-printing algorithm without normalization has to scan the whole branch, which can be arbitrarily long, to determine that it does not contribute to the width of the current branch. We discuss this bad edge-case in more detail in §A.4. Analysis. As genB reaches a text or a line node (whose content is reflected in the formatted output), it immediately emits the corresponding stream element. Hence genB has unit latency. Since genB is a simple in-order traversal of the tree, the total time to generate the whole stream is linear in the size n of the tree. The function needs stack space proportional to the depth d of the tree since genB is not tail-recursive (and neither were formatting functions in §4). A.2 Computing the horizontal position We now annotate the stream elements with the rolling width, or the horizontal position HP in a hypothetical formatting of the document in a single long line: type HP = Int type HPB = Int type StreamHPB = E HPB HP All stream elements except GBeg are annotated with the horizontal position at the end of formatting of that element on the hypothetical single line. In particular, GEnd is annotated with final HP for its group. The node GBeg is however annotated with the horizontal position HPB at the beginning of the formatting of the group. In other words, each node is annotated with the sum of the widths of all preceding nodes including the current. The annotation is done by the simple state transducer, consuming StreamB and emitting StreamHPB of annotated elements. The horizontal position is the state: trHPB :: Monad m ⇒ GenT StreamB (StateT HP (GenT StreamHPB m)) () → GenT StreamHPB m () trHPB = foldG go 0 where go :: Monad m ⇒ HP → StreamB → GenT StreamHPB m HP go p (TE z) = let p’ = p +length z in yield (TE p’ z)  return p’ go p (LE ) = let p’ = p +1 in yield (LE p’)  return p’ go p (GBeg ) = yield (GBeg p)  return p go p (GEnd ) = yield (GEnd p)  return p Here is the annotated stream generated for our sample document doc1 from §4: Generated: Generated: Generated: Generated: Generated: Generated: Generated: GBeg 0 TE 1 ”A” LE 2 GBeg 2 TE 3 ”B” LE 4 TE 5 ”C” 20 Generated: GEnd 5 Generated: GEnd 5 The last HP is 5, at which the string ”C” and both groups end. Analysis. The transforming function go does constant amount of work. Therefore, trHPB has unit latency; total transformation time is linear in the size of the input stream. The function trHPB works in constant space. A.3 Determining group widths The annotated stream is not directly suitable for formatting: when we encounter a group, that is, a GBeg element, we have to decide if the group fits; hence we need the width of the group, or the horizontal position of the group’s end. Therefore, we transform StreamHPB into StreamHPA where GBeg will be annotated with final rather than initial HP of the group, that is, the HP of the GEnd element of the group. Clearly this requires look-ahead. Furthermore, since groups can be nested, the look-ahead buffer must be structured, so we can track several groups in progress: type StreamHPA = E HP HP type Buffer m = [Buf StreamHPA m] The overall look-ahead Buffer m is a list of simple buffers Buf that each correspond to one unfinished, nested group. A Buf accumulates stream elements corresponding a tree branch, after GBeg and up to and including the matching GEnd. The simple buffer Buf should permit the following operations: buf empty :: Monad m ⇒ Buf e m (D ) :: Monad m ⇒ Buf e m → e → Buf e m (E ) :: Monad m ⇒ e → Buf e m → Buf e m buf ccat :: Monad m ⇒ Buf e m → Buf e m → Buf e m buf emit :: Monad m ⇒ Buf e m → GenT e m () that is, the creation of the empty buffer, appending an element to the buffer b D e and prepending an element e E b in constant time, concatenation of two buffers in constant time, and emitting all elements in the buffer in linear time. Essentially, Buf is a tree, which buf emit traverses. We chose the following implementation for Buf: type Buf e m = GenT e m () buf empty = return () b D e = b  yield e e E b = yield e  b buf ccat b1 b2 = b1  b2 buf emit b = b One may substitute any other implementation: All code below accesses the buffer only through the abstract interface. The producer of StreamHPA is also a state transducer, from the stream StreamHPB built in the previous section. The look-ahead Buffer m is the state: trHPA :: Monad m ⇒ 21 GenT StreamHPB (StateT (Buffer m) (GenT StreamHPA m)) () → GenT StreamHPA m () trHPA = foldG go [] where = return (buf empty:q) go q (GBeg ) go (b: q) e@(GEnd p) = pop q (GBeg p E (b De)) go [] e = yield e  return [] go (b: q) e = return (( b D e): q) pop [] b = buf emit b  return [] pop (b’: q) b = return (( buf ccat b’ b): q) GBeg adds a new layer to the Buffer ready to accumulate elements of the new group. Text and Line elements outside of a group are emitted immediately to the output stream. Otherwise, the are accumulated in the Buf of their parent group. GEnd p brings us the final horizontal position p of the group, letting us emit GBeg p and flush the accumulated elements in the Buf. Since the terminated group may be part of another, still unfinished group, we delay the emitting elements of the terminated group and put them into the look-ahead buffer of the parent group. Only when the outer group is terminated we finally empty the look-ahead buffer emitting all its elements. The following code, reflecting the sequence of transformations so far: runGenT (trHPA ◦ trHPB ◦ genB $ doc1) + show i) (\i → putStrLn $ ”Generated: ” + prints StreamHPA for the sample document doc1: Generated: Generated: Generated: Generated: Generated: Generated: Generated: Generated: Generated: GBeg 5 TE 1 ”A” LE 2 GBeg 5 TE 3 ”B” LE 4 TE 5 ”C” GEnd 5 GEnd 5 Compared with streamHPB in §A.2, GBeg tells the final position of its group. We stress the incremental development our algorithm: after one part of the overall transformation is developed, we may immediately test it, and analyze it. Analysis. Since we cannot emit any group element until we see GEnd, the latency is of the order of n, the size of the whole document (stream). The look-ahead Buffer m is the extra space, again linear in n. Total time is determined by amortization. Assume that each element of the input stream brings us the credit of 2. We spend one credit to yield the element, and to put the element into the buffer (in general, for any constant amount of work within go). Thus all elements in the buffer have one credit left, enough to pay for the linear-time operation buf emit. Thus, the total time complexity is linear in n. Furthermore, we built the new algorithm in incremental fashion, without all tied up in knots. Not only can we clearly see non-divergence, we are also able to accurately and easily estimate time and space complexities. 22 If we hook up the stream StreamHPA to a linear-time constant-space formatter (similar to the one in §A.6 below), we obtain the overall pretty-printer, with linear-time complexity but unbounded, O(n) latency, taking the corresponding amount of extra space. To bound the look-ahead we apply the second optimization, pruning. A.4 Pruning We have just seen that determining the width of each group is expensive since we have to scan the whole group first. However, the exact group width is not necessary: if the width is greater than the page width, we do not need to know by how much. We introduce an ‘approximate horizontal position’ HPP: data HPP = Small HP | TooFar deriving Show type StreamHPP = E HPP HP to use instead of the exact final horizontal position HP to annotate GBeg elements with. GBeg is annotated with TooFar if the final horizontal position of the group is farther than the page width w away from the group’s initial horizontal position. Computing HPP requires only bounded, by w, look-ahead. The stream transformer trHPP described in this section is the optimized, pruned, version of trHPA of the previous section. The look-ahead BufferP, like the look-ahead Buffer of trHPA, is a sequence of simple Bufs that accumulate delayed elements following a GBeg up to and including the corresponding GEnd. We will need to efficiently access the sequence from both ends, however; the simple list no longer suffices. The Haskell basis library provides the data structure Seq with the needed algorithmic properties (we import Data.Sequence as S): type BufferP m = (HPL, S.Seq (HPL, Buf StreamHPP m)) bufferP empty = (0,S.empty) type HPL = Int If HP is the beginning horizontal position of the group, HPL is a w-offset position: any position after HPL is TooFar. For each accumulated group we compute HPL and make it easily accessible. Furthermore, fst BufferP provides HPL for the outermost group, so we can easily see if the current HP is too far for that group. If so, we can emit GBeg TooFar and empty its look-ahead Buf. The transformer trHPP of StreamHPB to StreamHPP is the ‘pruned’ version of trHPA (we also add debug printing to show the trace of the consumed stream). It is the state transducer with the state BufferP: trHPP :: (Monad m, MonadIO m) ⇒ PageWidth → GenT StreamHPB (StateT (BufferP m) (GenT StreamHPP m)) () → GenT StreamHPP m () trHPP w = foldG go’ bufferP empty where go’ s e = do liftIO ◦ putStrLn $ ”trHPP: read: ” + + show e 23 go s e go b@( yield go b@( yield go b@( yield ,q) (TE p z) | S.null q = (TE p z)  return b ,q) (LE p) | S.null q = (LE p)  return b ,q) (GEnd p) | S.null q = (GEnd p)  return b go ( , q) (GBeg p) | S.null q = return (p+w,S.singleton (p+w,buf empty)) go (p0,q) (GBeg p) = check (p0,q B (p+w,buf empty)) p go (p0,q) (GEnd p) | q’ : > ( , b) ← S.viewr q = pop p0 q’ (GBeg (Small p) E (b D (GEnd p))) go (p0,q) (TE p z) = check (p0,push (TE p z) q) p go (p0,q) (LE p) = check (p0,push (LE p) q) p push e q | q’ : > (p,b) ← S.viewr q = q’ B (p, b D e) pop p0 q b | S.null q = buf emit b  return bufferP empty pop p0 q b | q’ : > (p,b’) ← S.viewr q = return (p0, q’ B (p, buf ccat b’ b)) check (p0,q) p | p ≤ p0 && S.length q ≤ w = return (p0,q) check ( , q) p | ( , b) : < q’ ← S.viewl q = buf emit (GBeg TooFar E b)  check’ q’ p check’ q p | (p’, ) : < | otherwise ← S.viewl q = check (p’, q) p = return bufferP empty It is the longest function in our code. Except for check, it is essentially the same as trHPA of §A.3. When no accumulation takes place (BufferP has no Bufs), input stream elements are immediately relayed to the output stream. Receiving GBeg switches on the accumulation. We compute HPL for that group as p+w and push a new Buf to BufferP to accumulate the elements to be later emitted for that group. GEnd p terminates the accumulation for the group. The accumulated elements, preceded by GBeg (Small p), are added to the buffer of the parent group or emitted, see pop. The new, compared to trHPA, function check prunes the look-ahead: it checks to see if the current horizontal position p exceeds p0, the HPL of the outer group. If so, the outer group is wider than w, which lets us immediately emit GBeg TooFar and the elements accumulated in the outer Buf. The not-yet-terminated inner group may also turn out too wide: we have to recursively check. The function check also prunes the look-ahead BufferP when it becomes deeper than w, which may happen in the edge case of a document like the following: Group ( 24 Group (Group ... :+: Group ...) :+: Group (Group ... :+: Group ...)) whose StreamHPB includes an arbitrarily long sequence of GBeg p with the same initial group position p. The first pruning criterion will not be triggered then. Recall that genB of §A.1 has ensured that each group is at least one characterwide, with no group as a sole child. Therefore, a group that contains at least w descendant groups must be wider than w. Incidentally, this edge case has not been accounted for in [21]; the latter algorithm would need to add yet another state parameter to the formatting function. The clarity of our algorithm helped us discover the edge case. The sample pruned StreamHPP produced by trHPP for the page width 3 is as follows: trHPP: read: GBeg 0 trHPP: read: TE 1 ”A” trHPP: read: LE 2 trHPP: read: GBeg 2 trHPP: read: TE 3 ”B” trHPP: read: LE 4 Generated: GBeg TooFar Generated: TE 1 ”A” Generated: LE 2 trHPP: read: TE 5 ”C” trHPP: read: GEnd 5 Generated: GBeg (Small 5) Generated: TE 3 ”B” Generated: LE 4 Generated: TE 5 ”C” Generated: GEnd 5 trHPP: read: GEnd 5 Generated: GEnd 5 The listing contains the debugging messages printed as soon as an element of the input StreamHPB is received. The receiving and emitting of stream elements clearly interleave. When the looked-ahead group elements TE ”A”, LE, TE ”B”, LE tell that the group must be already wider than the page width 3, we indeed emit GBeg TooFar right away, without waiting for GEnd. Analysis. All S.Seq operations used in the code – adding to the left (C ) or to the right (B ) end of the queue and deconstructing the left or the right end with S.viewl or S.viewr – take constant amortized time. Therefore, the analysis of trHPP is similar to the analysis of trHPA, modulo the fact that the total size of the look-ahead BufferP is bounded by w. Therefore, latency and the extra space for the look-ahead buffer are bounded by the page width. The total processing time remains linear in the size of the input stream. A.5 Formatting The final step of pretty-printing is the formatting: transforming the pruned StreamHPP to a stream of Strings. To format stream elements TL as spaces 25 or newlines, the formatter keeps track of the boolean indicator if the current group fits on the remainder of the line. Since groups nest, we maintain the stack of boolean indicators: type Fits = [Bool] The formatter is the state transducer with the state (Fits,HPL). The HPL part of the state is the horizontal position at the end of the current line. If the final position of a group is over that HPL, the group does not fit. The group certainly does not fit if its final position is TooFar: trFormat0 :: Monad m ⇒ PageWidth → GenT StreamHPP (StateT (Fits, HPL) (GenT String m)) () → GenT String m () trFormat0 w = foldG go ([ False ], w) where go (f , l ) (TE z) = yield z  return (f , l ) go (f@(True: ), l ) (LE ) = yield ” ”  return (f , l ) go (f@(False : ), l ) (LE p) = yield ”\n”  return (f , p+w) go (f , l ) (GBeg TooFar) = return (False : f , l ) go (f , l ) (GBeg (Small p)) = return (( p ≤ l ): f , l ) go ( : f , l ) (GEnd ) = return (f , l ) The result of formatting the sample document for different page widths matches the expected one, shown in §4. Analysis. Since we immediately emit the received data, latency is unit. Overall processing time is linear in the size of the stream. Alas, we need extra space for the stack Fits, which we eliminate next. A.6 Better Formatting We exploit a particular form of the stack Fits of indicators if a group and its ancestor groups fit within the rest of the current line. Clearly if a group fits, all of its children groups fit as well. Therefore, if an element of the Fits stack is False, the elements below must all be False. Conversely, if an element is True, the elements above it, through the top of the stack, must all be True. Thus the whole stack is adequately represented by a single number, the number of True elements at the top: type FitI = Int The optimized formatting transducer is as follows: trFormat :: Monad m ⇒ PageWidth → GenT StreamHPP (StateT (FitI, HPL) → GenT String m () trFormat w = foldG go (0, w) where  return go (f , l ) (TE z) = yield z go (0, l ) (LE p) = yield ”\n”  return go (f , l ) (LE ) = yield ” ”  return go (0, l ) (GBeg TooFar) = return (0, l ) (GenT String m)) () (f , l ) (0, p+w) (f , l ) 26 go (0, l ) (GBeg (Small p)) = return (if p go (f , l ) (GBeg ) = return go (0, l ) (GEnd ) = return = return go (f , l ) (GEnd ) ≤ l then 1 else 0, l ) (f +1,l) (0, l ) (f −1,l) Analysis. Like the earlier trFormat0, the formatter clearly has unit latency and the overall linear running time. It now operates in constant space.