Fully Persistent JAMES Lists with Catenation R. DRISCOLL Dannouth College, DANIEL Hanocer, New Hampshire D. K. SLEATOR Carnegie-Mellon Unitwrsi@ Pittsburgh, Penn.~11 unia AND ROBERT E. TARJAN Princeton Unilersi& Prmcetorr, New’ Jersey, and NEC Research Institt{te, Proweton. Nc\v Jersey Abstract. This paper considers the problem of represmrtirrg stacks with catenation so that any stack, old or new, is available for access or update operations. Th]s problem arises in the implementation of list-based and functional programming languages. A solution is proposed requiring constant time and space for each stack operation except catenation, which requmes O(log log k) time and space. Here k is the number of stack operations done before the mtenation. All the resource bounds are amortized over the sequence of operations. Techniques]: Categories and Subject Descriptors: D. 1.1 [Programming Lists; F.2.2 [Analysis of Algorithms Programming; E.1 [Data Structures]: ity]: Nonnumerical Algorithms and Problems—co~~zpz~tufiot~s mz ducrete General Terms: Algorithms, Additional functional Key Words programming, A preliminary $mnposiurzz version on Discrete Design, Languages. Applicative and Problem (Functional) Complex- structures Theory and Phrases: Amortization, LISP. list. queue, stack catenation, concatenation, data structures, of tlzc 2nd Atztzaal ACM–SIAhf of this paper appeared in Proceedurgs (San Francisco, Cal if’., Jan. 28-30). ACM, New York, 1991, pp. Aigorithrz.s 89-99. The research of J. R. Driscoll wds supported in part by the National Sc]ence Foundation (NSF) under grant CCR 88-09573 and by DARPA as monitored by the Air Force OffIce of Sczcrrtific Research under gmrrt AFOSR-90-0292. The research of D. D. K. Sleator was supported in part by the NSF under grant CCR 86-58139 and by DIMACS (Center for Discrete Mathematics and Theoretical Computer Science), a National Science Foundation Science and Technology Center—NSF-STC 88-09648. The research of R. E. Tarjan at Princeton University was partially supported by the NSF under grant CCR 89-20505, by the Office of Naval Research (ONR) under grant NOO014-91-J- 1463. and by DIMACS. Authors’ addresses: J. H. Driscoll, Department of Mathematics and Computer Science, Dartmouth College, Hanover, NH 03755; D. D. K. Sleator, School of Computer Science, CarnegieMellon University, Pittsburgh, PA 15213; R. E. Tarjan, Computer Science Department, Princeton University, Princeton, NJ 08544, and NEC Research Institute, Princeton, NJ 08540. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title is @ Perm issi~n Of the of the publication and its date appe=, and notice is given that copying Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and\or specific permission. 01994 ACM 0004-541 1/94/0900-0943 $03.50 Journal of the Asmcldtmn for Compuzmg M~chlnmy. Vol 41. No 5, September 1Y94. pp. 943–959 944 J. R. DRISCOLL ET AI.. 1. Introduction In this paper we consider the problem side-effect-free procedures makelist( d ): Create and return element Return is d. the first Return a new jlrst( X): pop(x): Y, with list). X This element list first, implementing lists. These that of list X. is the same 1 whose as list are: first X with and only its first operation does not effect X. is the result of catenating list X and list followed operation a set of procedures a new list of length element deleted. This Return a new list that cate~zate(X, Y): of efficiently for manipulating by Y (X has no effect and Y may be the same on X or Y. runs in We show how to implement these operations so that makelist(d) runs in constant time and constant time and consumes constant space, first(X) consumes no space, pop(X) runs in constant time and consumes constant to space, and cuterzate( X, Y) runs in time (and consumes space) proportional of list operations done up to the time of the log log k, where k is the number catenation. All of these sequence of operations. time and space bounds are amortized over the Note. Another important operation on lists is pusMd.X), which returns the list formed by adding element d to the front of list X. We treat this operation as a special effect case of catenation, as push(d, This list languages into such simple since manipulation problem such as lisp or scheme. a language would allow procedures caterzate( makelist[ d~, X ) has the for manipulating arises naturally Incorporating a programmer lists without in list-based to solving these practical incurring problems programming our representation for lists to use these conceptually a severe cost penalty. Another application is the implementation of continuations programming language [Felleisen et al. 1988]. In addition very in a functional efficiently, the first nontrivial example of a fully persistent data structure that operation that combines two versions. To put this work in context, summarize A typical A query previous work data structure merely same X). retrieves and terminology on persistence. allows two types of operations: information, whereas an update our result is supports an we need to queries and updates. changes the informa- tion represented. Such a data structure is said to be ephemeral if queries and updates may only be done on the current version. With an ephemeral structure, the sequence of versions (one for mch update done) leading to the current version is lost. data structure, queries may In the partially persistent form of an ephemeral be performed on any version that ever existed, while updates may be performed only on the most recent version. Thus, the collection of versions forms a linear sequence, each one (except the current exactly one version. In the filly persistertt form both queries and updates are allowed on any fore, one version may be the predecessor of update that was applied to it). The graph of tree. one) being the predecessor of of an ephemeral data structure version of the structure. Thereseveral versions (one for each relations between versions is a Fully Persistent Lists with Catenation Several researchers structures either have partially 945 constructed or fully methods persistent for making and some general specific data techniques have been developed.] In particular, our previous joint work with Neil Sarnak [Driscoll et al. 1989] gives efficient methods for transforming a pointer-based ephemeral data structure into one that is partially or fully persistent in a manner over that that change satisfies in the in the limitations, ideal ephemeral however. structure operate catenating two resource ephemeral structure. One of on a single lists, bounds: structure, is not these allowed. (number of pointers pointing must be bounded by a fixed A fully persistent version factor a constant These version. a constant and techniques is that the Combining Another have updates in node) the ephemeral together, is that structure the that per important such as in-degree in the ephemeral data time of space several two versions limitation to a particular constant. of an ephemeral in query amount structure supports an in which two different versions are combined is said to be conjluent(v between versions of a confluently persistent data persistent. The relationship structure is that of a directed acyclic graph. The resulk of this paper, efficient update confluently which persistent persistence As a starting approaches. with point for lists, expands thinking If we represent then the range about this problem,, the lists as singly-linked e~ztly node for each version a special version, catenatable of data structures to can be applied. two lists X and that we mention two simple lists in the standard points Y can be catenated to the first in time tional to the length of X. This is done by duplicating duplicate X’, creating a new entry node that points element of X‘ point to the beginning of Y. Popping way, element and space of the propor- the list X and calling the to X’, and making the last the first element of a list is accomplished by creating a new entry node and making it point to the second element of the given list. This takes constant time and space. This solution is an example of a more general technique known as path- copying [Reps node, and recursively. et al. 1983]: When change (Such nodes is small all nodes a method and the paths will a node that is to be changed, point obviously in the structure to work it by replace applying best when the it by a new same the in-degree are short, ) By representing idea of the lists as balanced search trees and using the path-copying technique, pop and catenate can be made to run in O(log n ) time and space (n is the length of the longest list involved in the operation). This improves the performance of catenate, but makes popping worse. In our search for a more efficient solution, we began node-copying method of Driscoll et al. [1989], which by trying to apply the can make lists fully persistent using constant time and space per change. This idea breaks because the method does not allow a pointer in a particular version changed to point to a node that is not part of the same version. This down to be is because the node-copying method assigns a Lersion tmrnber to each version, maintains a data structure that contains version numbers throughout; navigate (follow pointers) through a particular version requires comparing and to the 1 For examples of partially m- fully persistent data structures, sec Chazelle [19S5], Cole [1986]. Dobkin and Munro [1985], Hood and Melville [1981], Krijnen and Mcertens [1983], Myers [ 1982: 1983; 1984], Reps et al. [1983], and Swart [1985]. For examples of general techniques. see Driscoll et u]. [1987], Dietz [1989], Overmars [1981], and Sarnak [1986]. 946 J. R. DRISCOLL version number making decisions of a pointer of that based to point version with on these into those encountered comparisons. a different in the traversal, To accommodate version would require ET AL. and the changing a new navigation algorithm. Rather than inventing a more powerful seem plausible), we instead constructed itself made up of fully persistent (but navigation scheme an efficient not efficiently (which does not list representation that is catenatable) lists. These lists are linked together to form a tree whose leaves are the elements of the list to be represented. We devised two solutions based on this underlying representation. In our first method the subtrees of the root increase in size exponentially from left to right, be defined later and these that are 3-collapsible subtrees roughly says that the leaves We call this the increasing subtree method. root). in size exponentially. length n. Popping only O(log operates of the 3-collapsibility n) of them to run the Because subtree in constant left property to are near the the subtrees are required on the leftmost property (a structural near increase to represent of the root, time. a list of and makes Catenating moves use the subtrees of the right list into the data structure of the left list one at a time in such a way as to preserve 3-collapsibility and the exponential growth of the subtrees. Because catenatable lists can get quite large with very few operations (repeatedly length catenating 2~ ) we have a singleton devised list a method to itself that k times allows results this O(log in a list n) bound with to be improved to O(log k), where k is the total number of list operations. In our more sophisticated solution (called the finger tree method), we change the representation of the list of exponentially a finger search tree. This tree This structure allows popping lists of total length ~~ in O(log growing subtrees, replacing it by has O(log n) leaves, so its depth is O(log log n). in constant time and space and catenating two log n) time and space. As before, the n in this bound can be replaced by k, where k is the total number of list operations. This is a dramatic improvement over the naive method, which can require 0(2~ ) time, and even over an O(log n)-time catenation method, which can require This 0(k) paper memory time and space per catenate. is organized as follows: Section architecture and explains we shall the fundamental use, describes operations Section 2 outlines the basic the fully representation persistent of lists, of delete, link, and pull that will be 3 describes the increasing subtree applied to this representation. method and the trick for improving the time and space bound for catenation to O(log k). Section 4 outlines the finger tree method, and Section 5 concludes with open problems and directions for future research. 2. Basic Representation Our representation of a list is a rooted ordered tree with the list elements in the leaves, such that a left-to-right traversal of the tree visits the elements in the order they occur in the list. Each of the internal nodes of the tree has two or more (except Each pointers children. No further structural constraint is imposed implicitly by the operations that manipulate them). internal node of the tree is represented by a header to the first element of a singly-linked list of pointers on these trees node that has to its children. Fully 947 dlef9 FIG. 1. A representation of the list [a, b, c, d, e, f, g, h, i, j, k]. Cln the left is a picture of the structure. The bracketed circles represent singly-linked lists of pointers to children. These pointers are shown as dotted lines. The portions of the structure corresponding to the four internal nodes of the tree are circled. Each of these is made fully persistent, and can be thought of as a single memory element. On the right is a picture of the tree. The root node is slightly more complicated. Here, the list elements for the children that are not leaves are singly linked together by additional pointers. The header node for the root also has pointers to the first and last of these nonleaf children, and to the last child. (See Figure 1.) Since the representation types, bounded persistent in-degree, of an internal node has a constant number of node and a single entry point, it can be made fully at a cost of constant space per change by the node-copying method [Driscoll et al. 1989]. Each time such a change occurs, a new version of that uersion number. The version number node is created and is given a unique serves as a pointer to the node. It is natural to view the persistence machinery as providing contains header memory elements (pointed all the node data and necessary a linked list). to by version to represent Such numbers), an internal a memory each of which node ellement can of the tree be modified (creating a new memory element of this type) at a cost of constant space for each elementary change. While navigating pointers within element, the version number of the element is used. When a pointer (a additional a memory from one memory element to another is followed, a new version number is given to the navigation algorithm for use within the new memory element. We now describe the three primitive operations on these trees: link, delete, and pull. The link operation takes two trees and makes the root of the second tree the last child of the root of the first. This requires adding a new element to the end of the root list of the first tree, making it point to the root of the second tree (the pointer being the root list of the second tree), the version number of the list representing updating the link from the last nonnull child 948 J. R. DRISCOLL E-l_ AL, a*d #z W/x bc FIG, 2, Linking two trees, of the root to point to the new child (if this new child is not a leaf), and updating the last three fields of the header node. When a root node becomes a nonroot node. certain information the pointer to the rightmost No harm is done by this child, extra stored there and the links information. is no longer needed (such as among nonnull child pointers). All of these changes can be accomplished in constant time and space; and, because the lists are persistent, the original trees are still available. (See Figure 2.) It is also possible to link the left tree below the root of the right tree, making the root of the left tree the first child of the balanced linking It is important lists does beginning problem method not of root of the method to note allow that the of right tree. This form of linking described in Section 3. that the persistence mechanism the root list of the first second. If this were tree to allowed, is useful in the used to represent point directly it would the to the solve the of efficient catenation. Instead, we must settle for the less satisfactory described above and work to ensure that the way we link lists together keeps the beginning of the list easily accessible. The delete operation removes the first child of the root, provided it is a leaf. It is invalid to apply delete to a tree whose first child is not a leaf, and in what follows we shall maintain the invariant that the first child of the root is always a leaf. The operation of deletion can be accomplished in constant time and space while maintaining persistence. The objective of the pull operation is to bring subtrees up toward the root. It is the tool we shall use to maintain the invariant that the first child of the root is a leaf. Pull takes the first nonleaf child of the root, and “pulls” its leftmost an internal node with only child into the root list. (See Figure 3.) If this creates one child, this child is pulled up as well, and the now childless node is removed from the root list. This is shown in the second pull of Figure 3. If there is no applied to a nonleaf child of the root, the tree is said to flat. The pull operation flat tree does nothing. The pointer to the first nonnull child (and the links among the nonnull children) allow this operation to be accomplished in constant time, and to be made fully persistent with only constant space consumption. LEMMA 1. The number of pulls required to flatten a tree is the number of leaues minus the degree of the root. Full yPemisten tLists with Catenation 949 ?-4)-+ de FIG.3. This degree lemma is a consequence of the root by exactly The pull operation. of the observation Note. In our basic representation all the nodes Therefore, on the path in designing that each pull increases the one. leading any change from in the tree the location our data structure, requires of the change we must be wary changing to the root. of making changes deep in the tree. 3. Preliminav Methods The basic operations varying efficiency the root of each leaves it operation leaf by We denote linking repeatedly is then to give several different algorithms with trade-offs. We first describe the balanced linking method. tree, we keep the size of the tree, which is the number contains. implemented can be combined deleted. the pulls the size smaller the tree until Catenation takes of tree a tree below T the the left child constant time by IT 1. Catenation larger of the root tree. The At of is pop is a leaf. This and space, and pop takes time and space at most proportional to the number of catenations done before the pop. This algorithm is very closely related to the method of path compression with weighted union for maintaining disjoint from the analysis of path compression structure the amortized operations. Unfortunately, sufficient to give efficient sets [Tarjan 1983 b]. In fact, it follows that in an ephemeral version of the cost of a pop is 0( a (k)), an amortized full persistence. where k is the number bound of this form is simply This is because a “bad” version, Of list not one in which pop is expensive, can generate arbitrarily many other bad versions. This would happen if, for example, several different short lists are catenated to the same bad version. Popping each of these new lists will be expensive and will ruin any global amortized bound. There are two ways to circumvent this problem. One is to find some way to make popping globally improve the representations of the lists (rather than locally improving a single version of a list). Another is to find a way to make the amortized bound worst-case. We pursue the latter avenue. applying “delete, then pull c times” until Call a tree c-collapsible if repeatly the tree becomes empty preserves the invariant that the first child of the root is a leaf. Our goal is to arrange that all trees are c-collapsible for some constant c. We now show that c-collapsibility is preserved under linking with a sufficiently small tree. 950 J. R. DRISCOLL ET AL. Let T, and Tz be two trees, and let T be the result of limking T1 to LEMMA 2. t~le right of TI, making the root of Tz the last child of the root of T,. If T, is c-collapsible and IT1 I s (c – 1) “ IT, I then T is c-collapsible. PROOF. TI Because is c-collapsible, ITI I collapsing steps (each one is a of T? does not will work on T (the presence At this point, C]T1I pulls have been applied to T. delete followed by c pulls) interfere with the deletions). IT I = IT, I + ITZI s CIT11, so CIT1I pulls are sufficient to flatten T. ThereIT1I collapsing steps, T is flat: that is. all the leaves are children of But fore, after the root. ❑ In particular, if we choose c = 3, then a tree of size up to 2n can be linked to a tree of size n while preserving 3-collapsibility. tion to provide a solution with O(log n) catenation Represent a list by a fully persistent list of trees We now use this observatime. satisfying the invariant that every tree is 3-collapsible and the ith tree has more than twice as many leaves as the (i – 1)st tree. With each tree in the list, store the size of the tree. The doubling property implies that have at least 2’ elements. followed empty by 3 pulls tree from to the first the front is more tree. trees in its representation t >2 element of such a list, apply If this leaves an empty tree, then of the list of trees. This operation and space, and the resulting Catenation a list with To pop the first list has the doubling complicated. To catenate must delete, delete uses constant the time property. two lists of trees, process each tree of the right list in order from left to right. For each such tree T, comtree T’ of the left list. pare the size of T with the current size of the rightmost If IT I s 2. IT’1, then T onto link T’. Otherwise, make T the new rightmost tree of the left list, (After this happens. no further links will occur in the catenation. ) (See Figure 4.) This process preserves the doubling property and 3collapsibility. Catenation takes O(log n) time and space because the number of trees representing a list of n elements is at most logz(n + 1). Although involved, terms catenation a sequence of the has a cost that of k catenates number is logarithmic can generate k, the of operations in the size of the a list of length previous result lists 2~. Thus. only gives in us an 0(k2 ) bound on the total time and space used to process the sequence. We can do better in an amortized sense with the following trick: Make initial guess k,] = 2 of the Carry out the operations total normally. comprising a list. Keep a record list becomes longer than logzkt) length of the resulting number list of list but keep track operations of the total size of all trees of all the list operations performed during a catenation then truncate of trees is exactly logz L(l. an to be performed. This as well. If a it so that the ensures that catenation will cost O(log k(l) time and space, and that at least the first LO elements of each list will be correctly representccl. As long as no more than ktl pops occur, all the pops will return the correct values. Should the actual number of operations exceed the guess k,,, then throw away the lists, pick a ncw guess k, = k;, and rebuild the lists using the recorded sequence of operations and this new guess. The following analysis shows that with this method the amortized cost of a pop is 0(1) and the total amortized cost of a catenate is O(log k). Since the guess k, at least doubles at every phase, the total number of pops, each costing 0(1), This performed in all phases, is at most k + k + k/2 0(k) cost is a cost of 0(1) per operation, whether + k/4 a pop + . . . = O(k). or a catenate. 951 Fully Persistent Lists with Catenation 4JY849 20 100 I 201 1 FIG. 4. Catenating two lists in O(log H) time. 20 Since guess k~ is at most the final kz, since logz k, +, = 2 logzkl, and since in a is O(log k,), the total cost of repeating each phase the cost of a catenate catenate during each phase is 0[&logk]=0(210gk+ The same trick path-copying logk+;logk+]=O( can be used algorithm to transform mentioned in logk). the the O(log n) pop introduction into k ) per operation. This will work even if the length = 2k, ) rather than being squared at each iteration. bound O(log and one catenate with cost is doubled(k,+, 4. The Finger Tree Method The idea of the finger tree method is to modify the list-of-trees method of Section 3 by replacing the list of collapsible trees~ by a balanced binary tree. Each leaf of this balanced tree is the root of one of the collapsible trees. We retain the size of its linked one catenated copying stops invariant that each successive collapsible tree is more than twice the tlhe catenation operation predecessor. In the list-of-trees method, collapsible tree at a time until the next onle was too big, and then the remainder of the O(log n)-length list of collapsible trees by the list. The point linking method and uses the starts same in the list of collapsible catenating split point, is called but takes trees at which split point. the advantage the algorithm The of the finger tree balanced tree structure to streamline the process of transferring the collapsible trees. The algorithm to catenate a structure B to the right of a structure A works roughly as follows: Find the split point in B. Split the balanced binary tree of B in two at the split point, creating Bl and B.. Link BI onto the rightmost All of the collapsible tree of A. Join the binaty trees of A and B, together. steps will be done in a fully persistent fashion, and tihey all will be shown to take which 4.1 Tarjan time and is O(log space proportional to the depth of the balanced binary tree, log n). R~D-BLAcK TREES. 1983b] is a binary A red-black tree in which tree [Guibas each internal and Sedgewick nctdes has pointers 1980: to both L For brewty we suppress the “c” of “’c-collapsibility” when wc wish to avoid specifying a constant. Wc shall show later that all the trees that we call collapsible are actually 4-collapsible. 3 The uttemal nodes of o binary tree have two children, and the external }zodes or [ea{ws of the tree have no children. Every node of a binary tree is either an internal node or a Icaf. 952 J. R. DRISCOLL of its childreni and a one bit field storing the color of the node, which ET AL. is either red or black. The colors of the nodes satisfy the following three conditions: (a) Every leaf is black; (b) Every red node has a black parent (or is the root): (c) Every path from the root to a leaf contains the same number of black nodes. For purposes of analyzing rank that red–black is a function trees, it is convenient to assign to each node an integral of the coloring of the tree. The rank of each leaf is O, and the rank of an internal and one greater than that node is equal to the rank of its red children (if any) of its black children (if any). We can interpret the constraints on the coloring of the tree with respect to the ranks of the nodes as follows: (1) The rank of each leaf is O; (2) The rank of each node that is the parent of a leaf is 1; (3) In traversing from a node to its parent, increases by O or 1; (4) In traversing from a node to its grandparent, increases by 1 or 2. depth of a red–black The a path from modified form the depth later 3. 2t PROOF. then the induction, The tree is the maximum a leaf. The 4.1 of Tarjan’s tree with number following of internal lemma monograph (which [Tarjan ~z leaves is O(log nodes on is a slightly 1983 b]) shows that n), and will be useful to us purposes. Tile subtree rooted at the node a node of rank r in a red–black tree contains [eales. If r = O, then is a leaf, and the lemma holds. If r > (), node is internal and has hvo children of rank at least r – 1. By each of these is the root of a subtree with at least 2’”– 1 leaves. ❑ fundamental inserting to of Lemma for other least root of a red–black LR’WA at the the rank the rank a new leaf, trees together. operations that we shall deleting a leaf, splitting The split operation takes need a tree as input to apply into to these two, a red–black trees and joining tree along are two with a particular leaf 1, and produces two red–black trees, one containing all the leaves to the left of 1, and one containing 1 and all the leaves to the right of f. The join operation is the inverse whose leaves are those of the first input tree. Algorithms of split. It produces input tree, followed exist to perform a new red–black tree by those of the second all ot’ these operations in time O(log /z) of leaves in all of the trees involved in the operation). (where n is the number The bottom-up algorithms for insertion and deletion [Tarjan 1983a; 1983b] can roughly be described as follows: A constant amount of restructuring occurs in the vicinity of the change, then color changes are made on some initial part of the path from the change to the root of the tree, then up to three rotations are done at the top of this path of color changes. for representing a list of elements—one A finger search tree is a data structure of which is a special element called the finger-that allows very efficient accesses, insertions and deletions in the vicinity of the finger.h In particular, where d is the these three operations can be performed in time O(logd), ~ Parent pointers may also be included lf needed They are not needed in our application. A rotation is a local restructuring operation in a binary tree that maintams the symmetric order of the leaves and changes the depths of various nodes, h For example, see Brown and TarJan [ 1!)80]. Guibas et al. [1977], Kosaraju [1981], and Tsakalidis [1 984]. Fully Persistent Lists with Catenation distance in the list between method of Tsakalidis In Tsakalidis’s We assume tree.’ The the finger [1984] method, throughout leftmost consists It of internal has the of nodes with (independent search the spine colors, and of following is the leftmost of the tree (along tree. leaf of the with the internal in a special way, but the rest of tree. path and adjacent internal nodes data structure without explicitly important number causes only This the The properties: of fields; (2) Each (1) The node spine of the spine node) is pointed to only by nodes of the spine; (3) The (the number of pointers to it) is bounded by a constant tree to change. the finger nodes of the size of the tree); the finger of the operation. for our purposes. of the list is a leaf in a red–black that represented a red–black the leftmost use of this a bounded (except for one entry indegree of any node useful each element this discussion path it here. and the location is particularly nodes adjacent to this path) are the tree is just as it would be in Tsakalidis’s representation of is called the spine. We make describing 953 a constant is a consequence 0(1) red–black trees. These four properties (4) An restructuring allow insertion or deletion operation in number of pointers and fields in of an implicit insertion us to make and representation deletion this data structure of the algorithms fully in persistent by the node-copying method [Driscoll et al. 1989] using only constant space per insertion and deletion, while preserving the O(log d) running time for these operations. 4.2 THE REPRESENTATION. We are now ready to describe our representa- tion of lists. The list is partitioned into consecutive blocks of elements of sizes b,, b ~,. . . . b~. These sizes are arbitrary, subject to the constraint that b,+, >2 b, for all 1< i < k. Each block of elements is stored in the leaves of a 4collapsible tree. Each of these collapsible trees is stored according to the basic representation described in Section 2, and the root of each tree stores its size. A persistent version of Tsakalidis’s structure—which we shall call the finger tree—connects the collapsible trees together. are the leaves of the finger tree. The internal data structure) memory of the elements. to the root stored as a fully the spine. To find finger As usual, of a collapsible the persistent split tree tree) version point are a pointer efficiently The roots of the collapsible trees nodes (except those in the spine represented from nodes a version number. is actually of Tsakalidis’ we as standard one of these store ephemeral additional persistent to another The (or spine representation information is of in the internal nodes of the finger tree (except those in the spine). In such a node x, whose descendants are collapsible trees of sizes b,, b,+ ~,..., b,, we store the following dants information: of this node) can be maintained 1 and r are the S, = and ~, ~ ~~, bl (thenumber of leaves QX = max, ~ ~~ ~{bl – 2 ~, efficiently left and ~., < lb~}. under any local restructuring right child, respectively, of SX = Sl + S, and Q. = max{Q1, Q, – 2S1}. This completes the static description of our describe and analyze pop and catenate. data that are descen- These quantities of the tree since, if a node x, we have structure. 7 In our application we only need this special case. Furthermore, such a solution to allow an arbitra~ finger location by maintaining two such trees. It remains to can be adapted 954 J. R. DRISCOLL ET AL. 4.3 POP. We first observe that the leftmost collapsible tree can be deleted from the structure (or a new leftmost collapsible tree can be inserted into the structure) persistently in constant time and space. This is because the operation on the finger tree is taking place within a constant distance of the finger (O or 1 for deletion and insertion, respectively), and because the values of S, and Q. can be updated nodes in constant in the spine, because of the rotations the spine update for which rule time. no updating that occur we must above. There To perform pop, Delete the leftmost (Since of these store these values. 0(1) these It is possible, we may introduce compute are only we do not is needed. values however, a new node This can be done for that outside of using the such nodes. ) delete the leftmost collapsible leaf from it, and apply four pull tree from operations. the finger tree. If the resulting tree is empty, the operation is complete. If it is not empty, then insert the collapsible tree back into the finger tree. All of these steps take constant time and use constant The resulting decreasing bl collapsible tree space. structure still satisfies and applying 4.4 CATENATE. We all of the required bz > 2b1, maintains four describe and pulls deleting leaves a catenation structural one a 4-collapsible method constraints; element that from a 4- tree. has the that property the time (and space) required is proportional to the depth of the finger trees involved (which is O(log log n) for forming a list of length ~Z), and that maintains all the structural constraints above. The first step of the algorithm is to transform the two finger trees into red–black trees, persistent memory in which element each internal node and is endowed with Q,. This transformation can be done in time of the spine tree, is proportional tree. spine, The of a finger colors which can be extracted and the data values from a color by a standard and values for S, and and space proportional their can be computed is represented to the depth implicit from to the size of the finger representation those of their in children, the which are available. Let the two structures to be catenated be A and B, and let the collapsible trees of A be Al, A~,. ... Af of sizes al, az,. ... af, and the collapsible trees of Bbe Bt, Bz, . . .. Bk of sizes bl, bl, ..., bk. The split point is defined to be 2a~ < b, the least i > 1 such that b, > 2 (af + ~ ]s, s, -1 b,) (or equivalently . Z&, <[-l b, ). If this inequality is not true for any i s k, then the split if the split point is defined to be k + 1. We can now describe how to determine point is in a subtree T rooted at the node x given Sl (the total size of the trees to the left of the subtree T) and af. If 2a~ < QX – 2 S1, then a point i satisfying there is no such point in T. Using the above inequalities is in T‘. otherwise, this test, it is easy to locate the split point in time proportional to the depth of the red–black tree: merely walk down from the root (updating S1), always choosing the leftmost option that contains a point satisfying the inequality. The leaf we reach in this way is the split point. i has been found, we split the Once the split point point, creating a new red–black tree TI with collapsible and a new red–black tree ~ with point is k + 1, then ~ is empty. now change the representation persistent memory element, link red–black tree at this trees B, ,Bz, . . . . B,_,, collapsible trees B,, B,+,, . . . . B~. (If the split If the split point is 1, then T, is empty.) We a single of the leftmost path of T1 into the resulting structure to the right of Af, and Fully Persistent Lists with Catenation call the result collapsibility A. Making from Af A;. (The of A;.) such change The link a change to the root 955 in representation forming persistent be changed is required A’f occurs requires (just to guarantee deep in the red –black that all the as in path-copying). nodes the tree of on the Furthermore, path the S, and Q, fields of all the nodes on this path must be updated. The cost of these changes is proportional to the depth of the red–black tree. Finally, we join the new version of the red–black tree of A with the red–black tree of T., and transform the resulting red–black tree into a finger tree (by building the spine).x The process is shown schematically The O(log log n) bound on immediately from the structural that In more check these the constraints data than the are preserved structure resulting twice the size of two places where in Figure 5. the time and space cost of catenate constraints on these trees. It remains by our from follows to prove algorithm. a catenation, its predecessor. this constraint each To verify might be collapsible tree is this, we need only violated. These are between Af_ ~ and A; and between A; and B,. The former place satisfies the constraint since a; > a~. By our choice of the split point, we know that b, > 2(af + bl + oco + b,_ ~) = 2a;, which proves that the latter point also satisfies the constraint. The only remaining detail is to show that the tree requires a more detailed analysis of the structure properties of the pull A} is 4-collapsible. This of red–black trees and operation. Let 1 be the ith leaf (i > 2) of a red--black tree. The distance LEMMA 4. between 1 and the nearest internal node on the left path is at most 2~log il. PROOF. red–black Let r = [log il. Let x be the deepest node on the left path of the tree that is of rank r. By Lemma 3, the subtree rooted at x contains at least 2’ leaves. These i, this subtree Since 2’> In traversing the path from a node of rank O increases least the rank by at least r + 1 must have been leaves are consecutive starting from the must contain 1, so x is an ancestor of from 1 toward the root of the tree, the to one of rank 1. Subsequently, every be reached. passed. Therefore, 1. Therefore, Since after .x is on this x is reached after leftmost leaf. 1. first step goes pair of steps 2r + 11steps, a node of rank at path and has rank r, itmust at most 2r steps. ❑ LEMMA 5. For any j (1 s j s i – 1),let 1 be the number of leales to the left of B] in A). The number of pulls that must be applied to A’~ until a state is reached in which B] and all the leaues to the left of B] are adjacent to the root of the tree is at most 21. PROOF. Let F be the tree (reached by applying and all of the 1 leaves to the left of Bj are adjacent pulls in which B, (See Figure 6.) to A’f) to the root. X In linking Tl into Af, we have in a sense mixed apples and oranges, since the memory elements of the nodes of Tr contain color and size fields. After the link is done, the data in these fields is no longer needed. No harm is done by the presence of this extra useless information in the tree. A slightly different approach is to completely avoid the use of the persistent memory architecture in the nodes of the red–black tree. This approach works, but a different non-uniformity results when the link is done. Now the resulting tree contains a mixture of nodes, some of which are ordinary red–black tree nodes, while others are persistent nodes. It is too expensive to clean up the entire tree immediately. Instead the pop algorithm must replace the red–black nodes by persistent nodes gradually as they are encountered. 956 J. R. DRISCOLL 4s++4$+ &.&& Af Af AI ET AI.. ‘k B 1-1 ‘k A2 Af El B (-1 FIG. 5. The top picture above shows the mltial trees .4 and B. The center plcturc intermediate point m catcrrating A and B The bottom picture shows the find result, IS an I k P pulls “ Af ---- % FIG. 6. . . B ‘J !-1 The process of transforming ,4’, to F by applying p pulls, Let p be the number of pulls required to reach this state starting from tree A\. Our goal is to bound p by 21. by starting with A; and deleting all nodes that The stripped tree S is obtained are not ancestors of a leaf in B, or a leaf to the left of B,. The path from the path of S. (See Figure 7.) Let d be root of BJ to the root of S is a rightmost the distance of A;). Let between the root q be the number the root of B~ and the root of S (or equivalently of nodes on this path that have only one child. this may not be (Although all internal nodes in A; have at least two children, true in S.) form of F. We shall now evaluate p‘. the number of Let F‘ be the stripped pulls required the root increases the degree S into F‘. Each pull either an internal node with one child. Therefore, to transform by one or deletes p‘ = (degree of root of F’) = 1 + 1 + q – (degree – (degree ofrootof of root of of S) + q S) sl+l+q–2=l+q–l. What is the relationship between p and p‘ ? Stripping accelerates the pulling process by at most one for every node on the path from the parent of the root of B] to the root of S, except for the root and the nodes on this path with one Fully Persistent Lists with Catenation 957 s d %y ,.. FIG. 7. The stripped tree S. % ‘J child. This is because promotes the subtree subtree containing the nodes with more to the left B,. to the right of the node or not.) than of the This extra promotion in the unstrapped tree. (It depends sequence subtree for immediately one child, path may not on whether To summarize, pull that also promotes the the happen stripping in the pull removed a we have p2. For j = 1, the distance between the root of BI and the root Also of A~f is 2. This note that Combining these completes observations 6. PROOF. lemma) The tree A; The into number a tree of the observation. B,, Bz, . . . . BJ_, has at least one leaf. gives p