EÆcient Hash-Consing of Recursive Types Je rey Considine jconsidi@cs.bu.edu January 29, 2000 Abstract EÆcient storage of types within a compiler is necessary to avoid large blowups in space during compilation. Recursive types in particular are important to consider, as naive representations of recursive types may be arbitrarily larger than necessary through unfolding. Hash-consing has been used to eÆciently store non-recursive types [7]. Deterministic nite automata techniques have been used to eÆciently perform various operations on recursive types [4]. We present a new system for storing recursive types combining hash-consing and deterministic nite automata techniques. The space requirements are linear in the number of distinct types. Both update and lookup operations take polynomial time and linear space and type equality can be checked in constant time once both types are in the system. 1 Introduction A recent trend in compilers is the use of typed intermediate languages while compiling to generate safer and more optimized code. One disadvantage to this approach is that saving type information can easily cause a large blowup in the space used by a compiler. Types of simple expressions can blowup exponentially when represented using a tree data structure. Recursive types allow arbitrary expansion through simple unfolding. We use the idea of hash-consing to maintain a set of types in which all equivalent types share the same representation. [7] reports sucessful applications of these techniques. We take the idea one step farther and add DFA techniques to allow the same bene ts to be extended to recursive types. DFA techniques have been used for various operations on recursive types [4]. We use them to minimize recursive type representations and avoid duplicate representations. We also present a canonical ordering algorithm to avoid the \isomorphic under permutations of states" clause in many presentations of DFA minimization algorithms. Use of canonical orderings allows linear time DFA comparisons and simpli es comparing recursive types. Given a set S of types and a type  , our system allows eÆcient update and lookup operations to be performed in two steps: a pre-processing step taking O(  2) time and a step actually accessing S using O (  log S ) time. The space required to represent S is linear in the number of reachable distinct types so it is asymptotically optimal in space. j j j j j j This work was done during the period of October 1999 to December 1999. In the mean time, Laurent Mauborgne defended his PhD thesis on representing sets of in nite trees [5] and submitted an excerpt [6] for publication in ESOP 2000. The core algorithm for eÆciently constructing in nite trees is virtually identical to the algorithm given here for representing recursive types, though the presentation is drastically di erent. The key di erence between the two representations is the space utilization - ignoring logarithmic factors, the size used per distinct type is constant in the representation presented here and dependent on the number of mutually recursive trees in the representation of Mauborgne. 1 2 Recursive Types 2.1 Syntax The common syntax of recursive types is ::= b   v: v where b is a base type and v is a type variable. We generalize this syntax to include arbitrary type constructors, in addition to \ ". Let B be a nite set of base types and C a nite set of type constructors. B and C are disjoint. Let V be the set of type variables.  ::= b c(; : : :) v: v where b B , c C , and v V .  j ! j j ! j 2 2 j j 2 2.2 Equivalence Informally, we de ne two types to be equivalent if their corresponding labeled in nite trees are the same. We will formally de ne the labeled in nite tree corresponding to a recursive type in 3.3. 3 In nite Trees In describing in nite trees, we will use the notation that ~i, ~j , and ~k are strings of positive integers and i, j , and k are individual positive integers. Let P be the set of positive integers.  is the empty string and P  is  the set nof all strings of positive o integers. is an overloaded string concatenation operator. Given I; J P , I J = ~i ~ j (~i I ) (~ j J) .    j 2 ^ 2 2 3.1 Trees is a tree if the following are true: is a non-empty set of strings built from positive integers     ~i is pre x-closed - ~i; ~j ~i ~j T T . De nition 3.1 (Trees) 1. 2. T T T 8  2 !  ;  T  P . 2     ~i j 3. T is leftward-closed - ~i; j; k (j k) ~i k T T . We call the strings in T the paths of T . Note that the rst two properties imply that  is a path of T . In words,  identi es the root node of T and T is a potentially in nite set of nite paths. De nition 3.2 (Subtrees) Let T be a tree and ~t a path in T . The subtree of T reached by ~t is de ned as follows:  o  n ~ subtree T ; ~ t = ~i t ~i T 8  !  2 ! j   2 2 Lemma 3.3 Let T be a tree and ~t a path of T . Then, subtree(T ; ~t) is a tree. Proof Sketch: If ~s is a witness that subtree(T ; ~t) is not a tree, ~t ~s is similarly a witness that T is not a tree. De nition 3.4 (Degree) Let T be a tree. The degree of T is de ned as follows: degree(T ) = T P In words, degree(T ) is the number of children of the root node of T .  j 2 \ j 3.2 Labeled Trees De nition 3.5 (Labeled Trees) Let arity be a function from C to P returning the arity of its input and let k be the maximum value returned by arity. We de ne labeled trees as follows - T = (T ; l) is a tree if the following are true: 1. T is a tree. 2. l is a function from P  to B ((t t((t  8t 2  8 2 ) T) T ! ! (l(t) (l(t) 2 2 ) C) B [ such that C ! ! (degree(subtree(T ; t)) = 0) (degree(subtree(T ; t)) = arity(l(t))) De nition 3.6 (Labeled Subtrees) Let T = (T ; l) be a labeled tree and ~t a path of T . The labeled subtree of T reached by ~t is de ned as follows:  labeledsubtree T ; ~ t n = ~i  j ~ t  ~i o 2 T ; x:l ~ tx  Lemma 3.7 Let T = (T ; l) be a labeled tree and ~t a path of T . Then, labeledsubtree De nition 3.8 (Labeled Paths) Let follows: T   ~ t; x ~ t2T j De nition 3.9 (Labeled Tree Isomorphism) Let and T2 are isomorphic i T1 = T2 and  ;~ t = (T ; l) be a labeled tree. The labeled paths of T ( )= paths T 8~ t ~ t 2 T1 ! T1  ^ is a labeled tree. T is de ned as (x = l(t)) = (T1 ; l1 )and = l2 ~t . l1 ~ t T2 = (T2 ; l2 ) be labeled trees. T1 3.3 Labeled Tree Construction from Recursive Types De nition 3.10 (unfold) Let  be a recursive type. We de ne the function unfold as follows:1  ]if  = t: unfold ( ) = [t:=t otherwise De nition 3.11 (rec unfold) Let  be a recursive type. We de ne the function rec unfold as follows:2 rec unfold ( ) = 8 b > > < if  = b B if  = c(1 ; : : :) if  = v V unfold (t:rec unfold ()) if  = t: 2 (rec unfold (1); : : :) c > > v : 2 De nition 3.12 (Finite Trees of Recursive Types) Let  be a recursive type. We de ne the nite tree of  as follows: nite tree ( ) = 8 > > < > > : fg [i (i nite tree ( ))  i fg nite tree () if  B if  = c(1 ; : : :) if  V if  = t: 2 [ fg 2 In words, nite tree ( ) is the nite set of all nite paths traversing  without unfolding. Later, a similar but potentially in nite set will be described.  1 The common axiom t: = [t:=t] , 2 As with unfold ,  = rec unfold .   unfold ( ). where  = denotes equivalence, allows one to prove  = 3 Lemma 3.13 Let  be a recursive type. Then, nite tree ( ) is a tree. Proof Sketch: This lemma may be proven using structural induction over  . Lemma 3.14 Let  be a recursive type. Then, nite tree ( ) nite tree (rec unfold ( )). De nition 3.15 (In nite Trees of Recursive Types) Let  be a recursive type. Let  = rec unfold ( ), the result of applying rec unfold to  i times. Let p = nite tree ( ). The in nite tree of  is de ned as follows: in nite tree ( ) = p In words, in nite tree ( ) is the potentially in nite set of all nite paths traversing  unfolded an in nite number of types. If  is not recursive, in nite tree ( ) = nite tree ( ), a nite set. Lemma 3.16 Let  be a recursive type. Then, in nite tree ( ) is a tree.  i i i i [i i Proof Sketch: This lemma follows from lemmas 3.13 and 3.14. Lemma 3.17 Let  be a recursive type. Then, in nite tree ( ) is decidable. Proof Sketch: Note that ~t in nite tree ( ) i ~t nite tree (rec unfold j j ( )). Let be a distinguished symbol which stands for an unde ned result. De nition 3.18 (Finite Labeling of Recursive Types) Let  be a recursive type and x a path. We de ne the nite labeling of  as follows: 8 b if  = b B and x =  > > < nite labeling (  )(x0 ) if  = c(1 ; : : :) and x = i x0 nite labeling ( )(x) = if  V > > : nite labeling ()(x) if  = t: De nition 3.19 (In nite Labeling of Recursive Types) Let  be a recursive type and x be a path. The in nite labeling of  is de ned as follows:  in nite labeling ( )(x) = y if there exists i such that nite labeling (rec unfold ( ))(x) = y otherwise Note in nite labeling ( ) has an in nite domain i in nite tree ( ) is in nite. Lemma 3.20 Let  be a recursive type. Then, in nite labeling ( ) is a well de ned and total computable 2 ~ t 2 2  i 2 i function. Proof Sketch: Note nite labeling ( )(x) = y implies nite labeling (rec unfold ( ))(x) = y and the number of applications of rec unfold necessary is bounded by the length of x. Lemma 3.21 Let  be a recursive type. Let T = (T ; l ) is a labeled tree. T = in nite tree ( ) and l = in nite labeling ( ). Then, De nition 3.22 (Recursive Type Labeled In nite Trees) Let  be a recursive type. Let T = in nite tree ( ) and l = in nite labeling ( ). The labeled in nite tree corresponding to  is de ned to be = (T ; l). T 4 4 Deterministic Finite Automata Other work has been done using deterministic nite automata (DFA's) to eÆciently implement operations such as subtyping [4]. We start with DFA's for processing strings and modify them to establish a correspondence with types in our system. We can then use adaptations of standard DFA algorithms to minimize the size of the types and traditional graph algorithms to analyze relations between types. 4.1 Standard DFA's Traditionally, a string processing DFA can be considered as a directed multi-graph with each edge labeled with a member of the input alphabet. Each node corresponds to a state of the DFA and is labeled nal or non- nal. Strictly speaking, each node has exactly one outgoing edge for each member of the input alphabet. However, it is common practive to leave out edges to \error" states, non- nal states in which all edges are self-loops, since once an error state is reached, it is impossible to reach a nal state [3]. When processing strings with standard DFA's, one node of the graph is designated as the start state of the DFA. Input to the DFA is read one character at a time and the corresponding edge is followed. When the end of input is reached, the sDFA accepts the input string i last state of the traversal is a nal state. We avoid a formal de nition of DFA's which can be found in many textbooks on automata theory. Unless otherwise speci ed, we follow the terminology of [3]. In contrast to the modi ed DFA's of 4.2, we will refer to standard DFA's as sDFA's. 4.2 Modi ed DFA's Instead of using sDFA's, we use a modi ed DFA (mDFA) with a more general labeling scheme beyond nal/non- nal. mDFA's accept strings built from members of 1; : : : ; k . Each state is labeled with the identi er of either a type constructor or a base type. In 4.3, it will be seen that type constructor labels and base type labels will be distinguishable by the number of outgoing edges. The presence of a sink state is implied but not used; all missing edges go to this sink state. String processing with mDFA's is similar to that with sDFA's. Traversal of the mDFA graph is the same as with a DFA, but the output is di erent. If the last state reached is a sink state, the mDFA rejects. Otherwise, it accepts and outputs the state's label. De nition 4.1 (mDFA Output) Let A be an mDFA. The output of A is the set of pairs of strings accepted by A and the associated output. f ( ) = (~x; y) output A f j A g accepts ~x and A outputs y given ~x g 4.3 mDFA Construction from Recursive Types De nition 4.2 (mDFA/Recursive Type Equivalence) Let  be a recursive type and A be an mDFA. Let be the labeled tree corresponding to type  . We de ne A and  to be equivalent i output(A) = paths( ). To convert a recursive type  into an mDFA A, the tree structure of  is essentially copied into an mDFA graph and back references are added to replace bound type variables. T T Lemma 4.3 (mDFA Construction from Recursive Types Algorithm) Let  be a recursive type. There exists an algorithm running in O(j j log j j) time and O(j j) space outputting mDFA A such that A is equivalent to  and if A has n states, n 2 O(j j). Proof Sketch: A can be built using two traversals of the tree representation of  . The rst pass assigns states to components of  and the second builds A. 5 The rst pass annotates the tree representation of  with state numbers in a bottom up fashion. Fresh states are assigned to base types and applications of type constructors and passed back up the tree. Type bindings are annotated with the state passed up by the body of binding and this state is passed up the tree again. This pass takes O(  ) time. Bound type variables are not assigned states. The second pass actually generates the adjacency list representation of the mDFA graph.  is traversed in a depth rst manner passing a binding environment down to each child. When a base type is reached, an appropriately labeled state with no outgoing edges is created. When an application of a type constructor is reached, an appropriately labeled state with a numbered edge to each child is created; if the child was not assigned a state in the rst pass, it is a bound type variable and looked up in the binding environment (if it is not in the binding environment,  is invalid). When a type binding is reached, a binding of the type variable to the assigned state is added to the environment that will be passed to the body of the type binding. Again, no action is taken when a bound type variable is reached. This pass takes O(  log  ) time using a balanced tree representation for the environment. This process preserves the tree structure of  except when type variables are bound and used. Ignoring uses of bound type variables for a moment, the semantics of a type binding give it the same structure as its body so sharing the state of its body is semantically correct. Considering uses of bound type variables again, the semantics of using a bound type variable are that it is equivalent to the inner most binding of that type variable so using the state the type variable is currently bound too is also correct. j j j j j j 4.4 Minimization of mDFA's De nition 4.4 (mDFA Equivalence) Let A and B be mDFA's. Then, A and B are de ned to be equiv- alent i ( ) = output(B ). output A Lemma 4.5 (mDFA Minimization Algorithm) Let A be an mDFA with n states. There exists an algorithm running in O(n log n) and O(n) space outputing an mDFA B such that A and B are equivalent and B is minimal. That is, for all mDFAs C , if A and C are equivalent and C has m states, then n  m. Proof Sketch: An sDFA is usually minimized by calculating the equivalence relation or equivalence classes of states [8].3 Once the equivalence classes of an sDFA A are known, it is straightforward to create a minimal equivalent sDFA B using a homomorphism mapping each equivalence class of A to a distinct state in B . The rst approximation of the equivalence classes is a partitionning of the states according to whether they are labeled nal or not. This approximation is re ned further by splitting equivalence classes according to the states various inputs bring them to. If there are n states in A , at most n splits are necessary, and there are many sDFA minimization algorithms running in O(n2) time. Most of these sDFA minimization algorithms also run in O(n) space.4 Adapting these algorithms to mDFA's simply consists of replacing the original partitionning with one according to the mDFA labels. There may be as many sets in the partition as there are type constructors and base types, as opposed to merely two with sDFA's. This partitionning is exactly that produced by checking 0-distinguishability.5 Since this is the criteria used for proving the correctness of the original partitionning in sDFA algorithms, the correctness proof for the sDFA algorithm can be trivially transformed into a correctness proof for the mDFA algorithm. This gives many algorithms for transforming an mDFA A into a minimal equivalent mDFA B in O(n2) time and O(n) space. Hopcroft's O(n log n) time algorithm can be transformed in the same fashion. j j 3 A notable exception is the algorithm in [3] which calculates the di erence relation from which it is trivial to calculate the equivalence relation. 4 The algorithm in [3] is an exception again since it uses O (n2 ) space to calculate the distinguishability relation. 5 A and B are k -distinguishable if there is a string of length k that is a counter example to A being equivalent to B . 6 4.5 Reconstruction of Recursive Types from mDFA's Once  has been converted into an mDFA A and minimized to get another mDFA B , we may want to reconstruct a minimal tree representation of a type  which is equivalent to  . Lemma 4.6 (Reconstruction of Recursive Types from mDFA's) Let A be an mDFA with n states. There exists an algorithm outputing a recursive type  such that A and  are equivalent,  is minimal, and the algorithm runs in O(j j log n) and O(j j) space. Proof Sketch: The tree structure of  can be generated by a depth rst traversal of B allowing nodes to be visited more than once. An environment containing all ancestors in the traversal is passed down the traversal to detect back references. Subtrees and the set of all states referenced is passed back up the tree. At each state visited, if the state is an ancestor of itself, the state is converted into a bound variable and passed back up the tree with the singleton set of that state being the only referenced state. If the state is not an ancestor of itself and has a base type label, the base type is passed back with an empty set of referenced states. If the state is not an ancestor of itself and has a type constructor label, its children are generated and their sets of free references are unioned together. The type constructor is applied to the child subtrees. If the state is in the set of free references, a type binding is added and it is removed from the set of free references. Both the type and set of free references are then passed back. All set operations except union take O(log B ) time. Union takes O( B ) time, so O( B ) is spent at each node of . The entire process takes O(  B ) time. In terms of  , this is O(  2) time.  will have the same in nite tree structure as  , but each in nite branch will be terminated at the rst node equivalent to a parent. The tree structure of  is provably minimal type since earlier termination implies di erent structure from  and later termination is not minimal. The  bindings in  are determined by the nodes at which  is terminated: the  bindings present are exactly the bindings used. j j jj j j j j j j j j 4.6 Graph Properties It is interesting to consider some properties of the graph induced by the mDFA edges. We mention some without proof but they should be intuitive. If a state can reach itself, the corresponding type is recursive. If two states can reach each other, their corresponding types are mutually recursive. If one state can not reach a second state, then the type corresponding to the rst state may be analyzed independently of the type corresponding to the second.    5 Hash-Consing Historically, hash consing is a technique originally used in LISP to avoid duplication of lists. In LISP, list structures are only created by the cons operation. By modifying cons with the help of hashing techniques, no two invocations of cons would ever return distinct copies of the same data. An early example of this technique is presented in [2]. While limiting the ability to modify lists generated in such a manner, this technique allows greater space eÆciency and constant time equality checking [1]. We use similar ideas to represent a set of types S . The equivalence class c( ) of a type  is the set of all types equivalent to  ; a set C of types is an equivalence class if C = c( ) for some type  . We think of S as a nite collection of equivalence classes, where each equivalence class is uniquely represented by a canonical member of the class and accessed through a unique handle. Internally, S is represented by A mapping from base type identi ers to handles (natural numbers). Given a total order of base types, this mapping accesible in O(S ) time. Typically there will be a xed number of base types so this will be accessible in O(1) time.  7  A mapping from tuples of type constructors and the handles of their children to handles. Given a total order of type constructors, there is a total order of these tuples and this mapping is accessible in O (log S ) time. A mapping from canonically ordered mDFA's to handles. These mDFA's have a slight modi cation from the mDFA's discussed in 4.2: base type identi ers in labels are replaced with hash-cons handles. Since the states of the mDFA's have a canonical order, the mDFA's can be given a meaningful total order. For an mDFA A, this mapping is accessible in O( A log S ) time. Semantically, each mDFA maps to the handle of its rst state and successive states are mapped to successive handles. A mapping inverting the union of the previous mappings. This mapping can be sorted by handle and accessed in O(log S ) time. j  j j  j j j j j 5.1 Hash-Consed Recursive Types When hash-consing recursive types, two modi cations from the syntax given in 2.1 are used. First, handles of hash-consed types are added to the allowed syntax for recursive types. Second, free type variables are replaced with extra base types to allow uniform treatment of type variables in 5.4. 5.2 Hash-Consing Base Types Lemma 5.1 There is an algorithm to hash-cons base types in O(log jS j) time. If the set of base types is nite and xed, there is an algorithm to hash-cons base types in O(1) time. Proof Sketch: Base types are a trivial case. Since base types are unrelated to other types, the data structures of S can be updated in O(log S ) time. If the set of base types is xed and the base types are hash-consed ahead of time, this improves to O(1) time. j j 5.3 Hash-Consing Type Constructor Applications Lemma 5.2 Let  be a recursive type without any type bindings or bound type variables. There is an algorithm to hash-cons  in O(j j log jS j) time. Proof Sketch: can be hash-consed in a bottom up manner using O(log S ) time per subtree. The total time is ( log S ).  O j j j j j j 5.4 Hash-Consing Recursive Types Lemma 5.3 Let  be a recursive type. Let h be the last hash-cons handle6 mentionned in  . Let m be the number of states of the mDFA refered to by h or 0 if h is not recursive or there are no hash-cons handles in  . There is an algorithm to hash-cons  in O (j j2 + j jm + j j log jS j) time. Proof Sketch: It is useful to separate the subtrees of  which have no free variables and consider them independently. When calculating sets of free variables in the traditional bottom up fashion, any subtree of  with no free variables can be considered independently of the rest of  . These subtrees can be hash-consed and replaced with their handle leaving a smaller but equivalent version of  to examine. 6 Hash-cons handles are ordered in the same way as natural numbers and allocated squentially in increasing order. 8 Suppose  is such a subtree of  . If 's children have no free variables, then  can be hash-consed normally as discussed in 5.2 and 5.3. Otherwise,  is a recursive type and the subtrees of  that have not been hash-consed represent a set of mutually recursive types. This is easily veri ed by induction over the structure of  by noting that each such subtree has at least one free variable which must refer to an ancestor in the tree.  is converted into an mDFA7 A and minimized with a canonical ordering to get mDFA B . All states in B not labeled with handles correspond to types that are mutually recursive with each other. However, it is not known yet whether states remaining to be hash-consed are mutually recursive with any previously hash-consed states.8 Checking all states of hash-consed mDFA's represented by handles in  is prohibitively expensive: there are O( B ) possible handles, O( B ) states that need to be checked for each possible form of mutual recursion, and the size m of the largest mDFA represented by a handle is arbitrarily large. In practice, these checks should be prunable but this is still O( B 2 m) time. By noting that only the mDFA represented by the greatest handle in  can be mutually recursive with the states of B ,9 this can be brought down to O( B m) time. If  is mutually recursive with a previously hash-consed state, then the certi cate (a successful traversal mapping  to previously hash-cons handles) gives the handle of . Otherwise hash-consing  and B 's states remaining to be hash-consed in O(  log S ) time. Once  is separated from  , O(  m +  log S ) time is spent hash-consing . Separating  into subtrees takes O(  2) time and the total time hash-consing  is O(  2 +  m +  log S ). When  has no hash-cons handles, this is O(  2 +  log S ) and O(  ) space. j j j j j j j j j j j j j j j j j j j j j j j j j j j j j j j j j j j j Theorem 5.4 (Recursive Type Hash-Consing Algorithm) Let  be a recursive type with no references to hash-consed types. There is an algorithm to hash-cons  in O(j j2 + j j log jS j) time and O(j j) space. Proof Sketch: This theorem follows from lemmas 5.1, 5.1, and 5.3. 5.5 Hashing Hashing techniques have been ignored so far but can be used to speed up the average case of various lookups. [7] reports great success with using hashing in their hash-consing scheme. If the base types and type constructors are xed before hand, the mappings from hash-cons entry to handle can be split according to base type or type constructor and the correct mapping can be chosen in constant time. 6 Conclusion We have presented an eÆcient system for managing types. Key features of this system include asymptotically optimal space usage and constant time type equality tests. While the component ideas of hash-consing and representing recursive types as DFA's have both been used before, we believe that their combination to support recursive types while hash-consing is new. An earlier form of this system has been implemented in SML. All changes since this implementation have been both simpli cations and algorithmic improvements. Later, we plan to adapt this system for use in the Church Project SML compiler. 7  is converted into a variant of mDFA's using hash-cons handles in place of base type identi ers 8 A recursive type may be de ned in terms of a previous de nition of itself. 9 The manner in which handles are assigned to mDFA's embeds the partial order induced by mDFA 9 dependencies. 7 Acknowledgements This report was written to summarize some work done in a Directed Study with my advisor, Assaf Kfoury. The original motivation for work on this problem was given by Joe Wells. Allyn Dimock was helpful in explaining the context of the Church compiler into which this work will be integrated. A DFA Canonical Ordering De nition A.1 (DFA Canonical Ordering) A function f : DF A ! DF A gives a canonical ordering to mDFA's i forall mDFAs A and B , A and B are equivalent i applies to all DFA variants we have discussed. ( ) and f (B ) are identical.10 This de nition f A Lemma A.2 (DFA Canonical Ordering Algorithm) Given an mDFA A with n states, there is an algorithm running in O(n2 ) time and O(n) space outputing B such that B is minimal and B = f (A) for some canonical ordering f . Proof Sketch: We present an algorithm to minimize a DFA to produce an minimum isomorphic DFA with a canonical state ordering. The algorithm given is for standard DFA's, but is applicable to all DFA variants we have mentionned. The algorithm constructs a canonically ordered DFA in a manner similar to the way it is minimized - the main distinction from DFA minimization is that the partitions approximating the equivalence relation are ordered internally. First, the sets of the original partition are totally ordered, placing the set of nal sets before the set of non- nal sets. As sets are split according to the previous partitionning, the subsets are ordered according to the sets that split them. Details of this splitting and ordering process follow. Approximations of the equivalence relation are represented as lists of sets of states. Given such a list L, the n states are given a partial ordering by mapping all states in the ith set in L to i. States are compared by comparing their associated values. The mapping can be created in O(n) time and accessed in O(1) time using an array indexed by state number. An element S of L is split if the members of S do not have their children mapped to the same sets. That is, each element s of S is characterized by the values its children are mapped to. The characterization of an element s can be determined in O(1) time since there is a constant upper bound on the number of children. Checking whether S needs to be split (i.e. checking whether some of the characterizations are di erent) can be done in O( S ) time. If S needs to be split, it can be split in O(n) time using pigeon-hole sort. When s is split, the subsets are ordered according to the children that split them. Splits based on di erent children are done separately in a consistent order. The total number of approximations is at most n since there are at most n non-isomorphic states and each approximation must add at least one split. The total time checking an approximation for splits is O(n) time so the total time checking for splits is O(n2). The total time performing splits is O(n2) since there are at most n splits and each split takes O(n) time. Therefore, the total time is O(n2). The proof of this algorithm's correctness is similar to that of the corresponding minimization which does not order the sets. The kth approximation separates states that are k-distinguishable. It is trivial to verify that the states are ordered by the rst string distinguishing them (note states distinguished by shorter strings are already ordered).11 j j References [1] John Allen. Anatomy of LISP. McGraw-Hill Book Company, New York, 1978. 10 Here, 11 Here, we say two DFAs are identical if their internal representations including labeling are identical. we order strings primarily by length and secondarily by lexicographic order. 10 [2] E. Goto. Monocopy and Associative Algorithms in an Extended Lisp. University of Tokyo, Japan, May 1974. [3] John E. Hopcroft, Je rey D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley Publishing Company, Inc., Reading, Massachuesetts, 1979. [4] Dexter Kozen, Jens Palsberg, Michael I. Schwartzbach. EÆcient Recursive Subtyping. In Mathematical Structures in Computer Science, 5(1):113-125, 1995. [5] Laurent Mauborgne. Representation of Sets of Trees for Abstract Interpretation. http://www.di.ens.fr/ mauborgn/t.ps.gz, November 1999. [6] Laurent Mauborgne. Improving the Representation of In nite Trees to Deal with Sets of Trees. to appear in ESOP 2000. [7] Zhong Shao, Christopher League, and Stefan Monnier. Implementing Typed Intermediate Languages. In Proc. 1998 ACM SIGPLAN International Conference on Functional Programming (ICFP'98), Baltimore, Maryland, pages 313-323, September 1998. [8] Bruce W. Watson. A taxonomy of nite automata minimization algorithms. Computing Science Report 93/44, Faculty of Mathematics and Computing Science, Eindhoven University of Technology, Eindhoven, The Netherlands, 1993. 11