Binary Decision Diagrams Fabio SOMENZI Department of Electrical and Computer Engineering University of Colorado at Boulder Abstract We review Binary Decision Diagrams presenting the properties and algorithms that are most relevant to their application to the verification of sequential systems. 1 Introduction Many tasks in the design and verification of digital systems manipulate large propositional formulae. It is therefore important to have efficient ways of representing and manipulating such formulae. In recent times Binary Decision Diagrams (BDDs) have emerged as the representation of choice for many applications. Though BDDs are relatively old [39, 1], it was the work of Bryant [8] that attracted the attention and renewed the interest of many researchers. Bryant observed that reduced, ordered, binary decision diagrams are a canonical representation of boolean functions. The import of this observation can be seen in the fact that nowadays, reduction and ordering are usually implied when referring to BDDs. Following the established use, we refer to reduced ordered binary decision diagrams simply as to BDDs. Canonicity reduces the semantic notion of equivalence to the syntactic notion of isomorphism. It is the source of both efficiency and ease of use for BDDs. On the one hand, canonicity enhances the effectiveness of memoization techniques. On the other hand, it makes the test for equivalence inexpensive. Canonicity has also one important drawback: It is the prime reason why BDDs are less concise than circuits in general. There are families of functions for which conjunctive normal forms are exponentially more concise than BDDs and vice versa. The same happens when comparing some diagrammatic representations. It would therefore be a mistake to use BDDs indiscriminately whenever a representation for boolean functions is required. It would be equally mistaken to dismiss BDDs on the grounds that multipliers have provably exponential BDDs [9]. The best results in many applications come from a flexible approach that combines the strengths of several representations (e.g., [14]). The application that led to the initial development of BDDs is switch-level simulation of MOS circuits [7]. Since then, their use has proliferated to the point that an exhaustive list of all the problems to which decision diagrams have been applied is not feasible here. We limit ourselves to mentioning some of the more popular and significant applications of BDDs and their derivatives. Verification of the correctness of hardware relies on BDDs for both the representation of the circuits and the manipulation of sets of states. Model checking algorithms based on this implicit representation of sets have successfully verified properties  of systems with very large numbers of states ( or more) [12, 44, 6]. In the optimization of logic circuits, BDDs are used, among other things, to represent “don’t care” conditions [56], and to translate boolean functions into circuits based on a specific implementation technology [42, 13, 25]. Testing and optimization of sequential circuits also benefit from the use of BDDs [18]. Various forms of decision diagrams can be 1 1 f f a a b b 1 1 b 0 1 0 Figure 1: Decision tree and decision diagram for the disjunction of  and . applied to the solution of large systems of linear equations—in particular those found in the analysis of large Markov chains [30]—and to graph algorithms like maximum flow in a network [31]. The rest of this chapter is organized as follows. In Section 2 we present informally BDDs. After introducing notation and background in Section 3, we give a definition of BDDs in Section 4. In Section 5 we discuss the main algorithms to manipulate BDDs. Section 6 is devoted to the problem of variable ordering. Section 7 discusses the efficient implementation of BDD packages. Section 8 surveys the application of BDDs to the verification of sequential hardware. Section 9 summarizes and concludes. 2 From Decision Trees to Decision Diagrams 2.1 BDDs Are Reduced Decision Trees  The left part of Figure 1 shows an ordered decision tree for the disjunction of two variables and . The node at the top—labeled —is the function node. The elliptical nodes labeled with variable names are the internal nodes, and the rectangular nodes at the bottom are the terminal nodes. The terminal nodes are labeled by either 1 (signifying true) or 0 (signifying false). Evaluation of for a given valuation of and consists of following a path from the function node to a terminal node. The label of the terminal node is the value sought. At each internal node the path follows the solid arc if the variable labeling the node evaluates to 1, and the dashed arc otherwise. The solid arc is called the then arc, while the dashed arc is called the else arc. The order in which the variables appear is the same along all paths. In this case, it is . (The smallest element of the order is the variable closest to the function node.) The right part of Figure 1 shows the Binary Decision Diagram for the order . It is obtained from the corresponding decision tree by a process called reduction.        Definition 1 Reduction consists of the application of the following two rules starting from the decision tree and continuing until neither rule can be applied. 1. If two nodes are terminal and have the same label, or are internal and have the same children, they are merged. 2. If an internal node has identical children it is removed from the graph and its incoming nodes are redirected to the child. 2 f g a b c 0 1 Figure 2: Shared BDDs.  In the case of Figure 1, reduction proceeds by merging the three terminal nodes labeled 1. As a consequence, the children of one of the internal nodes labeled become the same node. This causes the application of the second rule which produces the graph on the right of Figure 1. No further application of either rule is possible and reduction terminates. The result of reduction depends only on the function to which it is applied and on the order of the variables. It is independent of the order of application of the rules. 2.2 Shared BDDs and Strong Canonical Form In practice one does not build the decision tree and then reduces it. Rather, BDDs are created starting from the BDDs for the constants and the variables by application of the usual boolean connectives and are kept reduced at all times. At the same time several functions are represented by one multi-rooted diagram. Indeed, each node of a BDD has a function associated with it. If we have several functions, they will have    and  , we represent them like in subfunctions in common. For instance, if we have Figure 2. As a special case, two equivalent functions are represented by the same BDD (not just two identical BDDs). This approach, therefore makes equivalence check a constant-time operation. Its implementation is based on a dictionary of all BDDs nodes in existence in an application. This dictionary is called the unique table. Operations that build BDDs start from the bottom (the constant functions) and proceed up to the function nodes. Whenever an operation needs to add a node to a BDD that it is building, it knows already the two nodes (say,  and  ) that are going to be the new node’s children, and the variable (say,  ) that is  . If indeed the two children are the same, no going to label it. Therefore the operation first checks if   , it looks up new node needs to be created. (This is in accordance to the second reduction rule.) If  the unique table for the existence of the triple     , that is, for a node with the desired characteristics. Only if such node is not found, it is created (in accordance to the first reduction rule). The representation we have just outlined is called strong canonical form and is quite common in software packages that manipulate BDDs, in which equivalence is tested by a simple pointer comparison.            2.3 Attributed Arcs   The BDDs for and  are very similar. The only difference being the values of the leaves, which are interchanged. This suggests the possibility of actually using the same subgraph to represent both and  .  3    Suppose a BDD represents . If we are interested in   , it is then sufficient to remember that the function we have in the multi-rooted graph is the complement of  . This can be accomplished by attaching an attribute to the arc pointing to the top node of . An arc with the complement attribute is called a complement arc. The arcs without the attributes are called regular arcs. The use of complement arcs slightly complicates the manipulation of BDDs, but has three advantages. Obviously, it decreases the memory requirements. 1 The second, more important consequence of using complement arcs is the fact that complementation can be done in constant time—the BDD is already in place—and checking two functions for one being the complement of the other also takes constant time. Finally, inexpensive complementation allows an implementor to use De Morgan’s laws freely. For instance, once an algorithm for conjunctive decomposition of a function has been coded, disjunctive decomposition can be obtained with negligible overhead by complementing the conjunctive decomposition of the complement. Note that with complement arcs we need only one constant function (we choose 1) and hence only one leaf in the multi-rooted DAG. In Section 5 we shall see that in order to preserve the canonicity of the representation, only the arcs out of function nodes and the else arcs of internal nodes can be complement arcs. In the figures, therefore, a dotted line unambiguously designates a complement arc. Also, in drawings we shall normally align all the nodes labeled by the same variable, and indicate the variable only once at the left. This leaves the interior of the node free for a unique identifier, to which the examples will often refer. The attribute mechanism is quite general: Other attributes have been used for other purposes [47, 36].  3 Preliminaries Binary Decision Diagrams represent boolean functions. In this section we introduce the notation we use for boolean formulae and recall the basic result that we need for BDD manipulation. Given a boolean algebra , the boolean formulae on the variables     are obtained by recursively applying negation, conjunction,  and disjunction to the elements of and to the variables. We denote conjunction by ‘  ’, disjunction by ‘ ’,       is a formula. When and negation by ‘ ’. Unless otherwise noted, we assume  . Thus,     no confusion arises we write    instead of    and we drop parentheses assuming that negation has the highest precedence, and conjunction binds more strongly than disjunction. Formulae designate functions in the usual way. Definition 2 Let        be a boolean function. Then      are the positive and negative cofactors of                                with respect to   .    The cofactors are given several different names in the literature, e.g., restrictions. If    then depends on   and   is said to belong to the support of . The cofactors of with respect to   do not depend on   . Cofactors commute, that is:             1 We are assuming that the overhead for storing the attributes is negligible. This indeed the case because it is customary to store the complementation flag in the otherwise unused least significant bit of a pointer. The maximum decrease in memory requirements is by a factor of 2, but in practice this limit is seldom approached. 4 and they distribute over negation, conjunction, and disjunction:                              The following result, known as expansion theorem, is due to Boole and forms the basis for most algorithms that manipulate BDDs. Theorem 1 Let         Example 1 Suppose    be a boolean function. Then                                 (1)  . Then              4 BDDs and Their Canonicity  Definition 3 A BDD is a directed acyclic graph       representing a vector of boolean functions  . The nodes are partitioned into three subsets.  is the set of the internal nodes. The outdegree of   is 2. Every node  has a label    in the support of . 1 is the terminal node: Its outdegree is 0. is the set of function nodes: The outdegree of  is 1 and its indegree is 0. The function nodes are in one-to-one  correspondence with the components of . The outgoing arcs of function nodes may have the complement attribute. The two outgoing arcs for a node  are labeled  and  , respectively. The  arc may have the complement attribute. We use           to indicate an internal node and its two outgoing arcs.  The variables in the support of are ordered: If   is a descendant of   (     ), then         .  The function vector represented by a BDD is defined as follows:  1. The function of the terminal node is the constant function 1. 2. The function of a regular arc is the function of the head node; the function of a complement arc is the complement of the function of the head node. 3. The function of a node  (   ). is given by          "!   , where "! (  ) is the function of    4. The function of #$ is the function of its outgoing arc. The distinction between the function of a node and the function of an arc allows us to deal with attributed arcs in a natural way. Theorem 2 BDDs are canonical (the representation of a vector of boolean functions variable ordering) if: 5  is unique for a given 1. There are no distinct internal nodes   and     .  2. For every node, !  such that        ,       , and     . 3. All internal nodes are descendants of some node in .    Proof. For this proof it is convenient to reverse the established use of having the smallest variable of the  order at the top of the BDD. We suppose therefore that the variables in are       with    closest to the function node. The proof is by induction on . If , is a vector of constant functions whose representation is clearly unique because the BDD contains only one terminal node. The constant 1 is represented as a regular arc pointing to the terminal node, and the constant 0 is represented as a complement  and all functions of      have unique arc pointing to the terminal node. Suppose now that   representations as BDDs. Let be a component of . We can write         . Both   and  are uniquely determined by and   and have unique representations. Suppose does not depend on  . Then the representation of is the representation of   , which is unique (and identical to the representation of  ). Indeed the second condition of the theorem prevents the representation of from containing a node labeled   . Suppose now that does depend on   . We need to distinguish the case in which the outgoing arc of the function node for  is regular from the case in which it is complemented. Suppose the arc is regular. Then the representation of must consist of a regular arc pointing to a node labeled   with children representing   and  . Suppose the arc is complemented. Then the representation of must consist of a complement arc pointing to a node labeled   with children representing    and   . Hence, in both cases it is unique by the induction hypothesis and the first condition of the theorem. The last condition  of the theorem simply guarantees that the representation of consists solely of the representations of its components, which are unique.                                  In the following, we only consider BDDs that conform to the requirements of Theorem 2. Note that the restriction that the  arc must be regular is imposed to guarantee canonicity. Suppose we did not impose the restriction. Then from     we would have two distinct ways to represent                  in terms of its cofactors:                 (2) Instead, we use this equivalence to choose the one form in which the then arc is regular. Evaluation of a function represented according to Definition 3 amounts to counting the number of complement arcs on the path from the function node to the terminal node. An even number of complement arcs signals that the function evaluates to 1. A consequence of the requirement that the then arcs be regular is that the value of a function when all variables evaluate to 1 can be determined by inspection of the arc out of the function node, since no other complement arc can be encountered along the path. This simple observation leads to the following interpretation of the restriction of the then arcs. We represent directly only half of the functions of variables: Those that evaluate to 1 when all variables evaluate to 1.   6 5 Basic Manipulation of BDDs 5.1 Conjunction of BDDs and Related Operations The usual way of generating new BDDs is to combine existing BDDs with connectives like conjunction (AND), disjunction (OR), and symmetric difference (EX-OR). As a starting point one takes the simple BDDs 2 We are therefore interested in an for the functions    , for all the variables in the functions of interest.    , where algorithm that, given BDDs for and  , will build the BDD for is a binary connective (a boolean function of two arguments). The basic idea comes—not surprisingly—from the Theorem 1, since:                     So, if  is the top variable of and  , we can first cofactor the two functions with respect to  and solve two simpler problems recursively, and then create a node labeled  that points to the results of the two subproblems (if such a node does not exist yet; otherwise we just return the existing node). 3 , that Finding the cofactors of and  with respect to  is easy: If does not depend on  ,    is, the cofactors are the function itself. If, on the other hand,  is the top variable of , the two cofactors are the two children of the top node of . Similarly for  .        5.1.1 Conjunction     The algorithm that takes ,  , and as arguments and returns  is called Apply in the literature. We now examine in detail the special case that computes the conjunction of two BDDs. The pseudo-code is shown in Figure 3. As mentioned in Section 2.3, the application of De Morgan’s laws incurs negligible overhead thanks to the complement arcs. Therefore all boolean functions of two operands can be computed efficiently given procedures that compute the conjunction and the symmetric difference of two operands. This is indeed the approach followed in many packages because the specialized algorithms are more efficient than the general one. The algorithm of Figure 3 employs two major data structures. One is the unique table discussed in Section 2.2. Its purpose is to guarantee the canonicity of the BDDs. The second data structure is the computed table, which stores results of previous computations so that repeated computations may be avoided. The computed table is often referred to as the cache. We will not use this name to avoid confusion with the hardware cache of the computed on which the BDD application runs. The terminal cases for the recursion are: case     result 0 0          0  is called a projection function. The function  Here for simplicity we ignore the treatment of complement arcs. It is almost always the case that once an algorithm has been devised to operate on BDDs without complement arcs, it can be easily extended to operate on BDDs with complement arcs by applying Equation 2. 2 3 7 AND     if (terminal case)  return result;   else if (computed table has entry   )  return result;  else   let  be the top variable of   ; AND     ;  AND       ; if ( equals  )  ; else  = findOrAddUniqueTable   insertComputedTable     ; return  ;       ;   Figure 3: Pseudo-code for the conjunction algorithm. All these conditions can be tested in constant time. If none of them prevails the computed table is consulted. Two observations are in order here. First, the results of trivial problems—the terminal cases—are not stored in the computed table, because memory is a precious resource which should not wasted to record results that can be recomputed in about the same time that it would take to look them up. (Or less, in the case of a hardware cache miss caused by the memory access necessary to read the computed table.) Second, the  lookup is for the set   . In other words, the order of the operands is immaterial because conjunction is commutative. If the computed table lookup fails to return a result, the procedure proceeds to apply the expansion theorem (Equation 1) and solve the problem recursively. It then applies the reduction rules to the partial result by making sure that no nodes with two identical children and no duplicates of existing nodes are ever created. Finally, the result is stored in the computed table and is returned to the caller. The details of handling complement arcs are not shown in Figure 3, but are analogous to those discussed in the proof of Theorem 2.  Example 2 Consider   and  of Figure 4. The computation of  AND          AND AND      AND      AND      proceeds as follows. AND     unique table lookup    the procedure looks up the unique table. In this case the node is Notice that before creating a node already part of . Therefore, of the three nodes of  (two internal plus the terminal node—the function node is not usually included in the node count) only one needs to be created.  8 f g h a b q c r d t p u s 1 Figure 4: BDDs for Example 2. f g h a b c p q t r d s 1 Figure 5: BDDs for Example 3. 9 u  Example 3 Consider and  of Figure 5. These functions are intended to demonstrate the use of the computed table. There are two paths in from the root to node  . As a result there are two recursive calls with operands  and  . The second benefits from the cached result of the first. The computation of    proceeds as follows. AND       AND  AND        In this case three nodes are created:      AND        AND   AND         AND  AND     AND  AND   AND         is cached here cache lookup   ,     AND  AND          AND AND    AND    , and   . The time complexity of procedure AND is readily established by observing that each pair of nodes    with  in and  in  is examined at most four times thanks to the computed table. (The factor of four is due to possible paths with different complementation parity reaching  and  .) Hence, the number of recursive  calls is less than or equal to    , where   is the number of nodes in the BDD for . Since all steps of the procedure, except the recursive calls, take time bounded by a constant, we have established the following result.      Theorem 3 Procedure AND runs in time         . Though the bound is tight, it is seldom achieved in practice if the order of the variables is reasonable. It is important to observe that without the computed table, the runtime of procedure AND would grow exponentially with the number of variables. On the other hand, one may question the practicality of storing all intermediate results of very large computations. Indeed, in most implementations the computed table has fixed maximum capacity, which depends on the available memory and the computation at hand, and is managed as a hash-based cache. This implies that each new result is stored in an entry of the table determined by computing a hash function. Various replacement policies can be used to determine what results, if any, are evicted from the table to make room for the new one. In the simplest scheme, each entry holds exactly one result, and every conflict result in the eviction of the older result. This simple scheme works well provided enough memory is available, and provided hashing works well. (Something not to be taken for granted when the computed table has several million entries.) In summary, the worst case performance of practical implementations of AND is exponential in the number of variables, but the exponential behavior is seldom observed. Similar remarks apply to most algorithms that operate on BDDs. Almost invariably they use a computed table, and have runtimes that are linear in the size of each operand if a lossless computed table is used. It should be noted also that in many implementations the computed table is not flushed once a top-level call to AND terminates. This proves especially advantageous in some model checking experiments. 10 5.1.2 The If-Then-Else Operator The approach based on the expansion theorem can be applied also to the if-then-else operator (ite)—a ternary operator defined as follows:       ite        where   are three boolean functions. One interesting property of the ITE operator is that all two argument operators can be expressed in terms of it. For instance,    ite    . A detailed analysis of ite can be found in [5].   5.2 Satisfiability Problems with BDDs The satisfiability problem is the problem of deciding whether there is an assignment to the variables of a function that makes the function 1. We have seen that this problem is trivial when the function is given as a BDD, because it is sufficient to check whether the BDD is the constant 0 function and that takes constant time. A related, equally simple problem, is tautology. Checking whether a function is a tautology also takes constant time4 . In this section we consider related problems, namely: Finding one satisfying assignment; Counting the number of satisfying assignments; Finding one minimum-cost satisfying assignment (this is also known as binate covering problem). All these problems have in common the correspondence between satisfying assignments and paths in the BDD that go from a function node to the constant node. With complement arcs, all paths lead to the unique constant node and what determines the value associated with a path is its complementation parity. If a path contains an even number of complement arcs then its associated value is 1, otherwise it is 0. Multiple assignments may correspond to the same path. This occurs whenever one or more variables do not appear along the path. The path actually identifies a cube of the function, that is the conjunction of some variables or their complements. The variables that appear in the cube are those encountered along the path. A variable appears complemented in the cube if and only if the path goes through an else arc coming out of a node labeled with that variable. 5.2.1 Finding One Satisfying Assignment This problem corresponds to finding a path with even complementation parity. We present in Figure 6 a recursive algorithm, though a non recursive algorithm is obviously possible. The algorithm actually computes a cube in the ON-set of the function. Procedure OneSat is called with three arguments. The first ( ) is the  top node of the BDD for which a satisfying assignment is sought. The second ( ) represents the complementation parity of the path. It is initially 1 if the outgoing arc of the function node is regular and 0 if it is complemented. Finally, the third argument ( ) is an array of cells ( being the number of variables), initialized to all don’t cares. The analysis of the performance of this algorithm is based on the observation that every internal node of the BDD has paths to the terminal node with both parities: Otherwise, the node would correspond to a constant function, which contradicts canonicity. Therefore, the procedure may only backtrack at the nodes for which the first child examined is the terminal node. Backtrack occurs in such a case if the parity is odd 4 Clearly, we assume that the BDD has been built already—a non negligible assumption. 11   OneSat   if ( is terminal node) return ;    ;   ) return 1; if (OneSat     ;     is complemented) complement ; if (  ;  return OneSat           Figure 6: Algorithm to find one satisfying assignment.   SatHowMany   if ( is terminal node) return ; if ( is in table) return result from table; ;  countT = SatHowMany  ;  countE = SatHowMany    is complemented) countE = if (  count = (countT countE)/2; insert ( , count) in table; return count;      countE;  Figure 7: Algorithm to compute the number of satisfying assignments.   ). However, in that case the procedure is guaranteed to succeed for the other child. Hence the total (  . number of nodes visited is at most and the complexity is   5.2.2 Counting the Number of Satisfying Assignments This problems amounts to computing the number of minterms in the ON-set of a function. The result   changes with the number of variables on which the function depends. For instance,     has one satisfying assignment, whereas          has two. Therefore, the computation of the number of satisfying assignments takes the number of variables as one of its arguments. The algorithm we outline here is based on the post-order traversal of the BDD. For every node (arc) we compute the number ( ) of satisfying assignments for the function represented by that node (arc). We assume that the number of variables is , the index of a function node is 0 and the index of the terminal   node is . If  is the terminal node . For a regular arc connecting node  to node , ,  whereas, for a complement arc, . Letting  and  be the two arcs emanating from an internal node  , we have:              !    This can be seen by noting that the two terms of the expansion with respect to   are disjoint and the cofactors do not depend on   . Figure 7 gives the pseudo-code of a procedure based on these remarks. The two parameters are the top node of the BDD and the number of variables. From the efficiency point of view,   time, whereas the table makes the complexity notice that without a table the procedure would take   12  , assuming additions and multiplications can be done in constant time.5 The table is not normally saved  across top level calls to this procedure. We need a top level procedure—not shown here—to perform the initialization of the table and also to take care of the possible complement attribute of the function pointer. 5.2.3 Finding One Minimum-Cost Satisfying Assignment  In this problem variable   has a non-negative cost  associated with it. The cost of an assignment is the sum of the costs of all the variables set to 1. For instance,      is a satisfying assignment for                  . Its cost is  . This assignment is not of minimum cost unless , because       is     also a satisfying assignment of cost .  When is represented by a BDD , the following result [40] allows the solution of the problem in time  linear in the size of .   arc is 0 and the length of a  arc out of a node labeled  Definition 4 The length of an  Theorem 4 Let a solution to  be a BDD for     .       is  . Then the minimum cost assignment to       that is      is given by the shortest path of even parity connecting the root of  to the terminal node. Proof. Every path of even parity from the root to the terminal node represents a set of satisfying assignments. By taking all the variables that do not appear along the path as 0, one obtains a minimal assignment corresponding to the path. For this assignment, the cost is equal to the length of the path. (If some costs are 0, there may be several minimal assignments.) Let  be a minimal assignment corresponding to a shortest path  of  . We prove that  is minimum. Suppose there is another assignment   such that cost(  ) cost(  ).       corresponds to a path  from the root of to the terminal node. Then cost(  )  length(  )  length( ) = cost( ), a contradiction.   Note that the length of the shortest path does not depend on the variable ordering chosen. Example 4 Consider the function                  The corresponding BDD for the ordering is shown in Figure 8. One can see that the path ,   , and (the path goes through the filled nodes) is the shortest path with even parity and that it identifies an assignment of cost 2 (assuming unit costs).  time. (See, Finding the shortest path in a DAG with bounded outdegree (2 in our case) can be done in  for instance, [20, p. 536].) The additional complication arising from the complement arcs does not increase the asymptotic complexity. 5 This assumption is not superfluous, if we consider the large numbers that may be involved. In practice, one would use floating point numbers or arbitrary precision integer arithmetic to prevent integer overflow. 64-bit floating-point arithmetic may give accuracy problems even for relatively few variables, and does not work for, say, 2000 variables. Arbitrary precision arithmetic violates the assumption that additions and multiplications can be done in constant time. Furthermore, storing an integer with hundreds of digits for each node may cause the computed table to grow too large. 13 f a b c d e 1 Figure 8: Example BDD for the minimum-cost assignment problem. 5.3 Cofactors, Function Composition, and Quantification The computation of cofactors is a key ingredient in many BDD-based applications. Other important operations on BDDs, like function composition, and universal and existential quantification, are defined in terms of cofactors. We shall first analyze the algorithms for cofactors with respect to single variables and with respect to cubes in Section 5.3.1. 5.3.1 Cofactors with Respect to Single Variables and Cubes We know how to efficiently compute the cofactors of a BDD with respect to its top variable—they are just the two children of the top node—and with respect to a variable that does not appear in the BDD. In this last case the cofactors coincide with the function. We use this knowledge to solve the general case. To this end,  suppose we want to compute —the cofactor of with respect to —where is an arbitrary function and   is a cube. Let be the index of the top variable of and  be the index of the top variable of . We shall    also write  , where  is either  or   and is a cube, possibly the function 1. We first notice that  if or is constant, then . This provides the termination condition. Assuming and not constant, we consider the three following cases.                  . In this case , and as a consequence, do not depend on the BDD from the root to the leaf.) Hence,         . (We assume that the indices grow in    In practice, we just have to move down along the BDD for and recur. . In this case we write: Hence, we cofactor          with respect to  , we move down along , and recur. 14  . In this case we use the commutativity of cofactors and write:                Hence, we need to recursively compute the cofactors of  them as then and else children to a node labeled  .   and   with respect to  and connect 5.3.2 Function Composition In function composition, we want to compute:                      . From the expansion theorem, where  is also a function of      we derive by substitution:     This implies that we can compose computing                    and  by finding the cofactors of ite  (3)   with respect to and   and then         We have seen the general method to compute the cofactors in Section 5.3.1. Here we present a more efficient procedure for function composition that only requires the computation of the cofactors with respect to the top variable of a BDD and its complement. As we saw, these cofactors are simply given by the two children of the top node. The algorithm for this case is indeed reminiscent of that for cofactors with respect to cubes, which we saw in Section 5.3.1. Let be the index of the top variable of . We have three mutually exclusive cases.   . In this case,  . In this case,   does not depend on   is the top variable of   and the result of the composition is simply .  and we can apply Equation 3 directly.   . In this case, we find the variable of least index between the top variable of   . Let   be that variable: We expand with respect to                                           and recur:              and the top variable of                                (Note that we used the commutativity of cofactors here.) 5.3.3 Quantification Given a boolean function respect to   as:        and the universal quantification of   of  variables, we define the existential quantification of        with respect to  as:           15  with Example 5 As an example, let us consider the following function: Suppose we want to quantify    If  and  to verify that            Hence,   in . The two cofactors are:                and   and          are considered as defined over the same -dimensional boolean space as , then it is easy             More specifically,  is the smallest function independent of function independent of   that is contained in . Quantification is monotonic, i.e.,               and   is the largest    that contains   This can be easily seen by writing:     whence:          and               from which the stated inequalities can be derived. Quantifications of the same type commute, i.e.,  and                                 The existential and universal quantifications with respect to a cube are therefore well-defined. However, quantifications of different types do not commute. Existential quantification distributes over disjunction and universal quantification distributes over conjunction, but not vice versa, unless and  have disjoint support.                                                        These properties can be easily verified by substituting the definitions. One can also easily verify the following properties, that are useful when quantifications are performed on BDDs with complement arcs.                  A recursive algorithm for the quantification of the variables of a cube in a function for the computation of cofactors. 16  is derived as the one  Example 6 We compute      , for             using the variable order                                                                              .     Existential quantification is very easy when the function is given in sum of product form. It is indeed  sufficient to suppress the quantified variables from each product term. If all variables are suppressed from a    term, the term becomes the constant 1. In the previous example,         . Likewise, universal quantification is easy when is given in product of sum form. By contrast, quantification of either type may increase the size of a BDD.   5.4 Cofactors with Respect to Functions    A set    of -variable boolean functions is an orthonormal set if it satisfies:       and       A common example of orthonormal set is      : It is used in Boole’s expansion theorem. Also, for every  function       , the set     is orthonormal.   Given an orthonormal set    , we can expand a function with respect to it in the form                The coefficients of the expansion are related to   Theorem 5 Let                    . Then Proof. Suppose     Conversely, if                  , then                                                 17  by the following result. an orthonormal set of -variable functions and if and only if    and          an -variable function. Then  This proves the theorem. It should be noted that this theorem does not define uniquely the coefficients of the expansion, but imposes restrictions on what they can be. In other words, the coefficients are incompletely specified. It is possible to prove that every function in the interval          satisfies the condition of the theorem for  . This can also be interpreted by saying that  has to agree with wherever  is 1 and is don’t care otherwise. We shall see in the rest of this section how to exploit this freedom to obtain compact representations of the coefficients.   5.4.1 The Constrain Operator We can now define the cofactors of a function with respect to another function, in terms of orthonormal expansions. Definition 5 Let  and  be two boolean functions and let             be an expansion of with respect to the orthonormal set  generalized cofactor of with respect to  .    . Then  (   ) is a positive (negative) The generalized cofactors are incompletely specified by their defining equality. Hence, the problem arises of finding the best cofactors according to some criterion. A common criterion is the size of the representations, that is, when BDDs are used, the number of nodes. . The problem is therefore to choose the values of As we saw, the values of  are fixed wherever    where  . Coudert et al. [21] introduced an operator on BDDs, called constrain that normally returns compact cofactors. The strategy of constrain is to map each minterm in the offset of  into a minterm of the  onset of  and use this map to define the values of  wherever  . The map depends on  and on the following definition of distance.    Definition 6 Let      be variables with ordering  Let        is given by:    and       be two minterms. The distance between  and , written            ,  This definition of distance reflects the dissimilarity of the paths associated to the two minterms in the BDD with ordering      . We are now ready to define the constrain operator. Definition 7 For functions   constrained by  , written     if     if  and  , the function  where is the minterm such that      and    is minimum.  18  is defined by Notice that implies  . Therefore, is uniquely identified in Definition 6. The  algorithm for constrain visits recursively the BDDs for and  . The recursion is based on the following theorem:   Theorem 6 Let ,  , and   be boolean functions. Let          Then:       and                               Proof. We need to prove that  and   satisfy Theorem 5. We consider  , since the case for   is similar.                                                                         We choose  as the variable with the lowest index among those appearing in and  , so that we can apply cofactoring easily. Note that the straightforward expansion yields an expression that is unsuitable for the recursive formulation:               , because we are dealing with incompletely specified functions. We cannot simply say that   The recursion has several termination cases. If is constant, . If   we return 1 and if   we return 0. These are correct solutions in general and will be seen to be the only ones satisfying Definition 7. If  , then the only solution is  . If  and we are at the top-level of the recursion, any solution is acceptable. We choose to return 0, for reasons that will become apparent later. We  take care of avoiding recursive calls with  as follows.   Whenever  or   , we have identified a group of minterms of that need to be mapped.    first and then perform the mapping by taking . We compute We consider the case for      . Since the two cofactors are the same, we don’t create a new node and just return    .      Similarly, if  we return    .                Example 7 An example of application of constrain is illustrated in Figure 9.                               This example demonstrates some of the special cases. In particular,    the other branch. 19  and we return the result from f g h x p y r q t s z 1 Figure 9: BDDs for Example 7. f g h t p a b q c r d u s 1 Figure 10: BDDs for Example 8. 20 c x y z 1 Figure 11: BDD for Example 8 For another example, let us consider        . and  of Figure 10. We have:                   It should be clear that the choices we have made in our algorithm are aimed at producing a result as small as possible. It remains to be proved that the algorithm we have outlined actually conforms to Definition 7. We shall outline a possible proof. We consider first the case when cause  . All minterms for which  to be 1 as well. Hence, minterm of Definition 7 must cause to be 1. This is why we return 1. The case for   is similar.   Suppose next that   at some point of the recursion. Let be the cube of the variables on which whose distance from the splitting has occurred up to that point. We have to find the minterms with    minterms in the cube    is minimum. We first notice that these minterms must be in   . This is because the weights in the definition of distance decrease exponentially from the root to the leaf. Hence, there is   a minterm in   closer to each minterm in    than every other minterm in the rest of the BDD. Take        . If     is such that  , then  . Otherwise, will be the minterm such that  closest to  . But this is exactly what our algorithm does.  Figure 11 shows a BDD for the cube    . It illustrates the typical structure of the BDD of any cube: It is a single string of nodes. All internal nodes except the one at the bottom have one arc pointing to 0 (pointing to 1 with odd parity). Because of this, the constrain algorithm always returns the result of one branch and it ends up computing exactly the ordinary cofactor. The effects of the use of the computed table on the complexity of constrain are the same as for AND. We complete our discussion of constrain by discussing a problem it has and a possible optimization.  has fewer nodes than the BDD for , sometimes the reverse is true. Though in general the BDD for This most often occurs when the BDD for  is large and depends on many variables does not depend on.  , causing an undesirable growth of the BDD. This inconvenience These variables may be introduced in can sometimes be avoided in a rather inexpensive way by existentially quantifying the undesired variables from  (we shall see how in Section 5.4.2), but the resulting operator loses one property that is important            21     restrict   if ( or is constant) return ;  if ( ) return 1;  if (  ) return 0;  if ( ) return 0;   let  be the top variable of  ;   if ( ) return restrict  ;   ; if (  ) return restrict  if ( is not the top variable of )  return restrict ite  ;   restrict   return ite  restrict   ; Figure 12: Procedure restrict (without computed table). for FSM verification, namely, being an image restrictor. For this reason, the ‘un-optimized version’ is also of interest and we have presented it here. 5.4.2 The Restrict Operator  In Section 5.4.1 we noted that the constrain operator may introduce variables that do not appear in into  . We now see how we can use existential quantification to solve the problem. We need the following lemma.    Lemma 1 Let ,  , and  be -variable boolean functions and suppose          . Then  In other words, a cofactor computed with respect to  is also a cofactor with respect to  .  Proof. It is sufficient to observe that a function must agree with , whenever  with respect to  . This is certainly the case of , since  .    , in order to be a cofactor    . Hence, we can existentially Lemma 1 applies in particular to the constrain operator and to  quantify all the variables in  that do not appear in before computing  . Let  be the result of these  is a valid cofactor of quantifications. Then with respect to  . It is also possible to integrate the quantification into the computation of constrain. The resulting algorithm is called restrict in [22] and indicated by  . Whenever the top variable of  has a lower index than the top variable of , the procedure returns              The pseudo-code for restrict is given in Figure 12. Note that the call ite   quantifies  in .  Also, some implementations prefer to return a failure value if . This may help catch bugs in the program, but may sometimes be restrictive. Notice that restrict may actually produce smaller BDDs than the approach previously described (quantification followed by constrain). It is indeed possible to quantify variables, which only some cofactors of are independent of, from the corresponding cofactors of  . On the other hand, quantifications are performed one at the time and performance may be substantially slower, if constrain uses a cache table and restrict does not.  22 Example 9 We return now to the example of Figure 9 and we compute                                              .    One can see that in this case  has fewer nodes than  , because we avoided introducing a node  labeled in the positive cofactor with respect to  . In general, however, it may still be the case that has more nodes than . In that case, one may decide to return as the result of cofactoring.     5.4.3 Properties of Generalized Cofactors In the following we discuss properties of generalized cofactors that are useful in sequential verification. We use to indicate a generic operator.        and  be an operator satisfying: Lemma 2 Let     . Let    and                  (i.e., Proof. From operators are monotonic, we have:           Then:                        , and              is a generalized cofactor) we get    be boolean functions,  (4)       The reverse inequality follows directly from (4).   . Since the quantification  In this lemma, is a generalized cofactor operator that satisfies (4). We are obviously interested in seeing if the lemma applies to the cofactoring operators that we use. We recall that the constrain operator is defined in terms of a mapping: if          if         where is the minterm such that  rule. Then:  and  is minimum. Let  be the mapping defined by this            and   . In words,  does not depend on some of the variables in the support Let   of  . We denote by the restriction of the mapping to the variables in  . It is easily seen that:          and     must be minimal. With these preliminaries, we can now because the distance between        prove the following result.      23    by the definitions of        and          Proof. Suppose    Theorem 7 Let     . Then:         be boolean functions,               , and         and        and of   . Hence     is the  sought in (4) and Lemma 2 applies. . Then:        We can use the fact that restrict is related to constrain to prove a result similar to Theorem 7 for the former. Specifically,          Where  is a function obtained by quantifying variables in some cofactors of  . From the definition of restrict,    , and the support of  is a subset of that of  . The following property of  is of particular interest.   Lemma 3 Let     ,    If        and    . Let     , then there exists    be boolean functions, be such that:                 Proof. Let   and         . Let     if if   such that:       Since  is obtained by existential quantification of some variables in some cofactors of  ,     implies that there is a subset of variables that were quantified along the path that includes . Let this subset     be       , let       , and let      . Notice that . If we cofactor  and  with respect to , the resulting functions,  are constants. For , this derives from the requirement that the quantification is performed only if the current cofactor of does not depend on the current variable. For  , it follows from the quantifications performed on it. Furthermore,  must be 1 for at least one minterm     and    would be 0. Hence, there is a minterm of , or   such that .                 Lemma 3 allows us to prove the analog of Theorem 7 for restrict.   Theorem 8 Let     . Then:      and     Proof. Let    be the function such that                           be boolean functions,                   Then, with argument similar to the proof of Theorem 7,   and 24               . Then,         , and Let that      . If          and      , we are done. Suppose   . Then, by Lemma 3, there exists  . Hence is the  sought in (4) and Lemma 2 applies.      such Notice that it is possible to build generalized cofactors that do not satisfy (4). Each operator has a distinct advantage over the other. On the one hand, constrain distributes over product, as shown by the following result [59], where denotes composition. Lemma 4 Let      ,   , and            be boolean functions. Then:     Proof.                             In particular, by taking     one obtains distributivity with respect to product and can write the following equation that we shall use in Section 8.3.                                              (5) Unfortunately, restrict does not enjoy the same property, due to the different mappings applied to each   For instance, for    ,     , and       , we have       , but           . Therefore we can write:                                                       .                  (6)       (7) However, we cannot simultaneously apply restrict to the terms of the product and eliminate it, as we do with constrain. On the other hand, constrain may introduce in  variables that are in  , but not in . This is likely to occur when  is a complicated expression. This is a problem, especially for the method of Section 8.5.1, that restrict does not have.    5.5 Minimization The constrain and restrict operators described the previous sections are designed to produce small resulting BDDs. For this reason they have been extensively used for the purpose of BDD minimization. Suppose we 25      and the BDD of are given an interval   and that we are asked for a function such that     , the care set of , and then  is as small as possible. The problem can be solved by computing computing itself as  or as  . Any operator that computes  could be used instead of constrain or restrict, and any function from   can be used instead of  .6 Normally restrict performs better than constrain in this capacity because it does not pollute the support of the result, but there are exceptions. Besides, neither constrain nor restrict guarantees that the BDD for is smaller than the BDD for  . Indeed, when the BDD for is considerably larger than the BDD for  , it is often the case that the “minimized” BDD is larger than the original one. In the terminology of [33], these  operators are not safe. The lack of safeness is easily explained. Suppose that in computing    a node  of  is visited twice—first in conjunction with node  of , and then in conjunction with node of . Since the constraints are different in the two recursive calls originating at   , different results may be returned. The result may therefore contain more than one node corresponding to   . The compaction algorithm of [33] avoids this problem by performing minimization in two steps. In the first step the BDDs are analyzed and the cases in which duplication may not occur are identified. In the second step, safe minimization is carried out according to the information gathered in the first step. The compaction algorithm does produce on average better results than either constrain or restrict, but it is considerably slower. Another approach to improving BDD minimization relies on extending the remapping mechanism of constrain. Consider the computation of             . Simplification of the result   occurs when either   or    . In such cases the result is either   or  . One can do better by  observing that if         then there are solutions  independent of  for  . One such solution    . The resulting procedure is more powerful is given by minimizing         with the care set   than restrict, but much more expensive. Approaches intermediate in power are often preferable. Details and assessment can be found in [58]. Our discussion has been in terms of a given function to be minimized, and a care set, so that we could easily relate minimization to constrain and restrict; however, minimization algorithms can be directly formulated in terms of the interval   .                      5.6 Approximation BDD approximation is the problem of deriving from a given BDD another BDD smaller in size, and whose function is at a low Hamming distance from the input BDD. (That is, differing from the input BDD in a  be the BDD produced by the application of approximation small number of input assignments.) Let algorithm to the BDD of . Usually, the function of the approximating BDD is required to be either a subset or a superset of the input function, that is,      or            For an underapproximation algorithm (such that ),     . Hence, we only discuss underapproximation. Underapproximation algorithms must trade off the size of the result for the distance from . The two trivial solutions, 0 and itself, are seldom useful. A natural way to rank different approximations is by their   and defined in [53] by: density, denoted by           (  is the number of minterms of  .) High density corresponds to a concise representation, and is therefore desirable. (For overapproximation, we want to maximize the density of the complement of the result.) 6  In the case of constrain, replacing with any function in   does not affect the result. 26 The decision version of the underapproximation problem can be stated as follow: Given the BDD of a of the boolean function of variables for variable order , and integers and , find a function  same variables that is true for at least assignments to the variables, and such that its BDD for variable order has at most nodes. This problem is conjectured to be NP-complete. Hence, heuristic techniques are used to solve it. The algorithms proposed so far work by redirecting some arcs of the BDD for is such a way that the resulting BDD loses nodes and does not acquire any new satisfying assignment. The simplest approach redirects arcs from internal nodes to the constant 0 node. A more sophisticated approach allows redirection of an arc from one internal node to another, subject to a containment check. As a result of these transformations, some internal nodes of the BDD become unreachable from the root are are dropped. The various algorithms also differ in the order in which they select the arcs to be redirected. All algorithms examine the BDD for before they compute the approximation to gather information on the distribution of nodes and satisfying assignment in the graph. In this Linear time algorithms like Heavy-Branch Subsetting and Short-Path Subsetting [53], the approximation then proceeds on the basis of this information. Quadratic algorithms [57, 52] update the information during the approximation, and produce denser results  and a minimization algorithm at the cost of higher runtimes. Finally, given an approximation algorithm      , it is easy to see that is another approximation algorithm.              5.7 Decomposition  It is often convenient to express a function as either the conjunction or the disjunction of other functions.   If     , then   . Hence, we can concentrate on disjunctive decomposition without loss of generality. A simple approach to this problem is based on the expansion theorem, which can be applied to    obtain , with      and     . The choice of the splitting variable   influences the quality of the result. A simple yet effective procedure to estimate the size of the decomposition was proposed in [15]. Another approach to decomposition is based on the recursive application of the following formula:                        This decomposition is shown to be canonical for a given variable order in [45], which also discusses several interesting properties of constrain and of the decomposition itself. 6 Variable Ordering The size of the BDD for a given function depends on the order chosen for the variables. There are functions—sums of products where each product has support disjoint from the others—for which some orders lead to BDDs that are linear in the number of variables, whereas other orders give numbers of nodes that are exponential in the number of variables. The same occurs for the BDDs of adders. At the other end of the spectrum there are functions for which all possible BDDs are provably exponential in size and functions whose BDDs are linear for all orderings. Though clearly one would not want to choose a worst case ordering for an adder, the importance of the ordering problem cannot be argued from the existence of functions exhibiting such an extreme behavior; equally well, the ordering problem cannot be dismissed simply because there are functions that are insensitive to the ordering. Indeed, even simple heuristics easily avoid the worst case behavior for adders. The practical relevance of the variable ordering problem rests on the existence of functions lacking a well 27 understood structure, for which different orders—all derived by plausible methods—give widely different results. In Section 6.1 we consider the problem of finding an optimum order. The exact algorithms can be used for the design of various types of CMOS gates, where an order of the input variables that minimizes the transistor count is sought and the number of variables is small. The complexity of the problem is such that the exact algorithms cannot be applied to medium and large-size functions. However, their analysis will illustrate some features of BDDs with respect to variable ordering and will suggest techniques that allow one to iteratively improve an existing BDD by changing the variable order. These techniques will be the subject of Section 6.2. Additional insight is provided by some results—some of general validity and some applicable to special classes of functions—that characterize optimal orders and that will be presented in Section 6.5. Finally, Section 6.6 will cover heuristic algorithms that are used to build a BDD starting from a circuit description. 6.1 Exact Ordering Algorithms The simplest exact method to find an optimum variable order is to try all possible orders. For variables there are different orders and the cost of building a BDD is exponential in in the worst case. Hence, a   brute-force approach requires  time. This time can be improved by observing that two permutations of the variables that have a suffix in common will generate two BDDs with identical lower parts.    Suppose is a boolean function of variables     . Let be the set of the variable indices. An order of     is a permutation of . The BDD for under order is denoted  . Let be a subset of and  the set of orders whose last   members belong to , by BDD that is,                         . The variables We shall refer to the variables whose indices are in as to the bottom variables of whose indices are in will be the top variables. Also, for an order and a variable  , we define  as the number of nodes labeled   in BDD  . The following lemma [26] expresses the fact that  given a horizontal cut in a BDD for that separates top variables from bottom variables, the order of the top variables does not influence the part of the BDD below the cut. (Likewise, the order of the bottom variables does not affect the part of the BDD above the cut.)         Lemma 5 Let that for each   ,      satisfying   and   a variable such that   . we have      . Then there is a constant such  Proof. Note that is the position in the order corresponding to the first bottom variable. We can cofactor with respect to any combination of values (0 and 1) assigned to the top variables. Since cofactors with respect to variables commute, the resulting function is independent of the order of the top variables and  . Repeating this argument for all possible assignments to the top is therefore the same for all  variables, we see that for each unique cofactor of with respect to an assignment to the top variables there must be a node (possibly the constant node) in the lower part of the BDD that is the root for that cofactor or its complement. This node is unique, because the BDD is canonical. The number of unique cofactors does not depend on the order of the top and the bottom variables, but only on and . Of the unique cofactors, some will depend on   . Again, the number of those that depend on   only depends on and , but not on the relative order of the bottom variables. For every cofactor that depends on   there will be exactly one node labeled   , because   is the first of the bottom variables, so that it must label the root node if it appears at all. Hence, the number of unique cofactors with respect to all different assignments to the top variables     28  that depend on     gives the constant and the theorem is proved. Lemma 5 has many applications in the study of the size of BDDs. We now see how it is applied to the optimum ordering problem. If the optimal sizes of the lower parts of the BDD for all possible choices of of size are known, we can derive the optimal size of the lower part of the BDD for a given of size as follows [32]. We consider each variable whose index is in as first bottom variable. For   as first   . The minimum size for the lower part of the BDD with  first bottom variable, we consider  bottom variable is the minimum size previously computed for  plus the number of nodes labeled   . Note that Lemma 5 guarantees that this number does not depend on the specific order of the variables in  and hence it is sufficient to save the size of the best order of  . (And the order itself, of course.) The algorithm proceeds then by progressively increasing . The size of the lower part of the BDD for  is known to be 1 (the constant node). The algorithm then works its way up, until the best overall order is found. It remains to discuss how the numbers of nodes labeled with the first bottom variable are computed. One method is to build a BDD for for any order that is optimal for  and that has   as first bottom variable. The number of nodes labeled by   can then be counted by inspecting the BDD. Computing the BDDs for all from scratch and storing them would be wasteful; hence we keep just one BDD and obtain all the desired orders by permuting the variables in that BDD. In general, any permutation can be decomposed in a sequence of pairwise interchanges of variables. However, for our algorithm, all we need is to swap two variables at the time. We start from an arbitrary   and by pairwise interchange we obtain the remaining order  . We build BDD BDDs each such that having a different variable at the bottom. In this way, we have taken care of all the         . We then consider all the subsets of size 2. Let          . We locate the two BDDs  with   and  as bottom variables (they correspond to     and     , respectively). If   is the bottom variable and   is already next to last, it is sufficient to count the nodes labeled   . However, if   is higher in the BDD and the next to last variable is  , we swap  and  before counting. The process is repeated with  as bottom variable and the best result is kept as representative of . The other BDD is freed.  For   we need to consider   subsets. For each of them, we have to perform up to swaps.   Example 10 As an example we consider finding an optimal order for    . The purpose of this example is rather to show how the algorithm works than to illustrate its efficiency. For the latter, larger functions are required.7 Suppose the initial order is      . The corresponding BDD is  of    . The Figure 13 and it represents both and . We represent this by writing      algorithm first creates the BDDs with ( ) and ( ) by swapping  with  and   with  in . This completes all the subsets of size 1. The costs are all the same, namely 2. (The constant node plus the one node labeled with the variable whose index is in .) Suppose the first subset  of size 2 that is considered is  . Also, suppose that  is the first variable considered as first bottom variable. The BDD  already has  right on top of   . Hence, it is sufficient to count the number    of nodes labeled  . Since there is only one, the minimum cost for is if  is on top. Next the algorithm tries   on top. To this end, it takes and swaps   and  . The resulting BDD is . The cost associated with this choice is again 3. Since this does not improve on the previous result and there are no other variables to try, the algorithm selects  and releases .                    7 Note that   , but  !  for "!# . 29             f1 f2 f3 x1 x1 x3 x3 x2 x3 x2 x2 x1 f4 f5 f6 x3 x2 x2 x3 x1 x2 x3 x3 x1 x1 x3 Figure 13: BDDs created by the exact ordering algorithm. The arcs with an open circle are regular else arcs. The arcs with a black dot are complement arcs. Dangling arcs point to the constant 1 node. 30      With similar procedure, the BDD for is found to be the best between and  . The latter  and is released. Finally, is found to be  . Having completed the is chosen, so that  generation of the BDDs for subsets of size 2, the BDDs for the subsets of size 1 that are not also used by a larger subset are released. This is the case of in our example. All the costs are again the same, namely 3. The last step involves no swaps, since the only variable not in  must be at the top. In this example there is a three-way tie that is broken in favor of the first solution found.           There are several observations that may speed up the algorithm, sometimes dramatically. First of all, if we ever get a BDD where there is exactly one node for each variable, we can immediately stop, because the order is optimal. In our example, the initial order was optimal; hence, we would have avoided the entire computation. As a second remark, some functions are symmetric in some of their variables. In our example, was symmetric in   and  . The presence of symmetric variables results in ties between BDDs where two or more symmetric variables are permuted. If some variables are known to be symmetric, some swaps may be avoided [35]. Finally, since the algorithm implicitly enumerates the variable orders, it is natural to think of a branch and bound approach [34]. The upper bound to the optimal size is given at any point in time by the size of the can smallest BDD (not just the size of its lower part) representing a subset . The lower bound for be computed as the cost associated to (the size of the lower part of the associated BDD) plus an estimate of the minimum number of nodes required to complete the BDD. If the number of nodes labeled with the nodes are required to produce a connected DAG with a single first bottom variable is , then at least root. Indeed with nodes it is possible to build a binary tree with leaves. Therefore, a lower bound to the cost of an order obtained from is the cost associated to plus . If this exceeds the current upper bound, and its associated BDD are dropped. It is also possible to process the BDD top-down instead of bottom up. In many cases this allows a more effective bounding [24]. The effectiveness of bounding also depends on the initial order. The better the order, the tighter the upper bound. How to determine a good initial order will be covered in Section 6.6.    6.2 Iterative Improvement Techniques The exact algorithm of Section 6.1 implicitly explores all variable orders by swapping variables in the BDDs. This idea can be applied as a heuristic by limiting the number of permutations that are explored. This results in a local search scheme, whereby an initial order is repeatedly perturbed by rearranging groups of variables until some termination condition is met. Three aspects characterize the possible variants of this method. The rule for generating a new order from the given one; the criterion for accepting a new order; the termination condition. Among the rules proposed for generating a new order (these rules define the neighborhood of an order) we find: Pairwise interchange of adjacent variables [28]; All permutations of groups of  adjacent variables, usually for a small value of Pairwise interchange of arbitrary variables. 31  (  ) [34]. Sifting, that is, the movement of each variable up and down in the order, until it reaches a local minimum [54]. Group sifting, that is, the movement of a group of variables up and down in the order, until it reaches a local minimum [50, 49]. All strategies can achieve any permutation of an initial order given enough time. The advantage of considering only adjacent variables lies in the lower cost of performing the swap, which boils down to local manipulation of the BDDs. The interchange of non-adjacent variables can be effected by a series of swaps of adjacent variables. Sifting normally obtains better results than methods based on pairwise interchange or on the permutations of fixed size groups, because it performs a more systematic exploration of the search space. Even better results are obtained by group sifting. As for the acceptance and termination criteria, all known combinatorial search strategies can be applied, going from greedy to simulated annealing [4] and genetic algorithms [23]. In selecting a time consuming strategy, one has to consider carefully whether the time spent in optimizing the size of the BDDs will be recovered in the successive manipulations. 6.3 Swapping Variables in Place The iterative improvement methods of Section 6.2 are all based on successive swaps of adjacent variables. Each swap can be performed in time that is proportional to the number of nodes labeled by the two variables being swapped, if the data structures are properly chosen. Specifically, if the unique table is organized as a set of subtables—one for each variable—it is possible to access all the nodes labeled by a variable in time that does not depend on the total size of the BDDs being reordered, but only on the size of the subtable, and the number of nodes stored in it. To make the swap time independent of the total size of the BDDs it is also necessary to perform the swapping “in place,” that is, by modifying the given BDDs, instead of creating a modified copy of them. This is possible thanks to Lemma 5, which guarantees that the parts of the BDDs above and below the variables being swapped are unaffected by the swap. Suppose variables  and must be swapped, and suppose  precedes in the order. The subtable corresponding to  is scanned and all nodes are considered in turn. Let be the function associated with one such node; it can be written as:   Then swapping  and                  corresponds to rearranging the terms of :                                      Suppose, for simplicity, that  ,  ,   , and  correspond to four distinct nodes. Then the node labeled  can be relabeled ,  ,  ,   , and  can be reutilized without changes, while new nodes need to be created for   and   . Reutilization of the node labeled  is very important, because this node will be pointed in general by other nodes, or by the application. If a new node were used, the pointers to the old node would have to be searched for and changed. Thus, the operation would not be local to the subtables for  and . The nodes for  and  are no longer pointed by the node for after the swap. Once the table for  has been scanned entirely, all nodes in the table for that are no longer needed are freed. Such nodes are identified by associating a reference count to them. It is possible for some nodes in the table to retain        32 a non-zero reference count at the end of the process. These nodes have pointers from above  : They are moved to the table that now contains the nodes labeled . We now remove the assumption that  ,  ,   , and  correspond to distinct nodes. If neither child of the node for is labeled , then the node is moved to the other subtable; otherwise swapping proceeds   , no node for   will be created. If in addition, as described above. However, if, for instance,    , and    , then the swap will cause the node count to decrease by one. Each elementary swap operation may increase, decrease, or leave unchanged the node count. In the example just discussed, the decrease in node count is due to the elimination of a node with identical children. It is equally possible for a node to be saved as a result of increased sharing. Consider the two functions  and   . With the order   , they they do no share any node, whereas with the order share the one node labeled  . Hence, the total size decreases, even though the size of each function does not change.               6.4 Dynamic Reordering Techniques that iteratively improve variable orders are frequently employed “dynamically:” if the size of the BDDs grows past a given threshold during the execution of an operation, the operation itself is suspended and reordering is performed, usually by some variant of sifting. After reordering the operation that was interrupted is usually restarted. This dynamic approach has proved very effective in many applications, because it allows reordering to intervene when it is actually needed. On the other hand, in sequential verification it is not uncommon that most of the time is taken by reordering, and it appears that a certain amount of control on the process by the application is beneficial [38]. Writing the manipulation functions in such a way that they can be safely interrupted by reordering requires some care. The details depend on the implementation of the BDD package. 6.5 Some Results on Optimal Ordering When generating BDDs for the outputs of a circuit, we may use information on the structure of the circuit to find a good, sometimes optimal order for the variables. Consider the case of sums of products where each product term has support disjoint from the other terms. For instance:             For these functions every order that does not interleave the variables of two different product terms is optimal because there is exactly one node per variable. We shall refer to any such order as to a non-interleaved order. The reason why non-interleaved orders give linear-size BDDs for these sums of products is that the amount of information that needs to be remembered by a sequential processor that processes the inputs in the noninterleaved order is bound by a constant. This argument can be generalized by introducing the notion of width of a circuit. We regard a combina   . The nodes of the graph correspond to the gates of the circuit and the tional circuit as a DAG edges to the interconnections. A topological order  of  is a partial order such that:     $     There are many such orders in general. The width of  !           33       at level  of order  is defined as:            z1 z . . . . q y1 . . . ys . . . . x1 z q+1 z m . . . . . . xp xp+1 xn Figure 14: Illustration of the concept of width of a circuit. that is, the number of edges connecting nodes at level less than  to nodes at levels  and above. The width of with respect to  is:  and finally the width of is given by:    !      !   !     !   Through this definition, the width of a circuit is related to the amount of information that crosses a section of the circuit. The amount of information is related, in turn, to the number of wires that carry it. This is illustrated in Figure 14. Every level partitions the circuit in two. On the left in Figure 14 are all the gates and primary inputs with levels less than  . On the right are the gates and primary inputs with levels greater than or equal to  . Shown between the two sub-circuits are the connections whose number ( ) gives the width of the circuit at level  , for the chosen topological order.        Example 11 A simple circuit is shown in Figure 15. One can see that ! . ! ! However, it is possible to change the topological order so that the width is 4 and by decomposing the three input OR gate into two two-input OR gates, the width can be further reduced to 3. This example illustrates that the width depends on the circuit representation as well as on the topological order. A variable order consistent with this topological order is       It is intuitive that circuits with low width have small BDDs. With reference to Figure 14, suppose the    variables are ordered from   to  . Taking all cofactors of the function of the circuit with respect to  all input variables at levels  or higher (     ) must result in unique cofactors at most. Hence the  number of nodes labeled by  is less than or equal to . Let be a (total) order of the variables compatible  —the BDD with a topological order  8 . It is possible to prove a result [3] linking the size of    . for built with order —to !    Theorem 9 Let    be the DAG of a circuit with input variables,       , and  is a topological order of  , and is a variable order compatible with  , then:  8         Note that a topological order specifies a level for all input variables. 34       outputs. If a b 0 c 0 d 1 1 2 3 1 e F 2 1 4 2 f 2 g r=1 r=2 3 r=3 r=4 Figure 15: An example of a circuit with a topological order and the widths at the different levels. Notice that even with the best topological order the bound may be loose, especially for multiple-output functions, where no sharing among the different outputs is assumed. The interest of Theorem 9 is not in the ability to establish a tight bound for the size of a specific BDD. It can be used to prove that a family of circuits has polynomial-sized BDDs, if it is possible to prove an upper bound on the width of the circuits in the set9 . Non-interleaved orders are optimal for fanout-free circuits. The graph of a fanout-free circuit is a rooted tree. A non-interleaved order is obtained, for instance, by drawing the tree without intersecting arcs and then numbering the leaves—each corresponding to a variable—from left to right. 6.6 Heuristic Algorithms In this section we consider the problem of finding a good order of the primary inputs, to build a BDD for a combinational circuit. The order may be used as starting point for the exact algorithm of Section 6.1 or the iterative improvement algorithms of Section 6.2. It may also be used directly, when building any BDD for the given function is the objective and the resulting BDD is not too large. In all cases, the quality of the orders found affects the time required or even the ability to achieve the desired result. However, since the methods we shall consider here are intended to work on fairly large circuits, they are rather simple procedures, inspired by the results of Section 6.5 and by other heuristics that we shall introduce. We shall consider single output functions first, and then we shall deal with the case of multiple outputs. 6.6.1 Single-Output Functions We begin by considering a circuit in sum-of-product form. A good choice for the variable to appear at the top of the BDD is the most binate variable, i.e., the variable that explicitly appears in most product terms. This choice minimizes the total number of product terms in the two cofactors. If we assume the number of product terms as a measure of the complexity of a function, then simplifying the cofactors, i.e., the two children of the top node, is likely to yield a smaller BDD. One can also see the most binate variable as the one having the largest effect on the value of the function. A simple, yet effective, method consists therefore of 9 Specifically, the width should grow at most logarithmically. 35 ordering the variables from the most binate to the least binate. An alternative to this method is to transform the two-level circuit into multiple-level and then apply the methods discussed next. For a fanout-free circuit composed of AND, OR, and NOT gates, a BDD with exactly one node per variable can be built by ordering the variables with a simple depth-first search of the circuit graph (a rooted tree in this case)10 . The order of the variables is given by the order in which the corresponding leaves of the tree are visited. Assume for simplicity that the tree is binary. It is easily seen that the depth-first search will reach all the leaves in one sub-tree rooted at the top node of the tree, before reaching any leaf in the other sub-tree. The recursive application of this argument shows that the order produced by depth-first search is non-interleaved. For circuits that contain fanout, it is still possible to use depth-first search to order the variables. This will have the effect of approximating a non-interleaved order, though no guarantee of optimality will be made. Furthermore, the order in which the children of a node are visited is no longer immaterial. We present two criteria to decide the order in which the children of a node are visited. The first criterion tends to place the input variables that fan out closer to the top of the BDD than those variables that do not fan out. More precisely, the children of each node are divided in three sets. The first set contains the input variables that fan out and that are not in the order yet.11 These variables are immediately inserted in the order. The second set contains the internal lines and the third set contains the input variables that do not fan out. The variables in the third set are collected in a list and inserted in the order only after all the recursive calls for the internal lines have been completed. The variables that do not fan out are inserted after the last variables that fans out in that part of the tree. Placing the variables that fan out closer to the top of the BDD can be seen as the analogous in the multiple-level context of the most binate variable heuristic. The second criterion to rank the children of a node is given by the depth of the subgraph rooted at that node. In the case of the carry out of an adder, visiting the deepest subgraph first will order the inputs from the least significant to the most significant bit. Visiting the most shallow subgraph first will result in ordering the inputs in the reverse order. This suggests that the method may work well also for less structured circuits. Indeed one may observe that visiting the subgraphs in depth order is related to finding a topological order of the circuit graph with a low width. An alternative approach to ordering the inputs of a circuit is also based on the idea of putting at the top of the BDD the variables that influence the values of the output most. To this purpose, a weight is assigned to each net in the circuit. The output is assigned a weight of 1, while the other nets are assigned weights according to the following rules. Each input of a gate with inputs receives of the weight of the gate output; a fanout stem receives the sum of the weights of its branches. These rules are applied until all inputs have weights assigned to them. The heaviest input is then chosen as the first variable of the order. The chosen input is then removed from the circuit. (I.e., it is assigned a noncontrolling value; for an exclusive OR, the assigned value is immaterial.) The weights are then recomputed for the simplified circuits and the second variable of the order is thus identified. The process continues until all inputs are ordered. Example 12 An example of weight assignment is shown in Figure 16. Assuming ties are broken in favor 10 NAND and NOR gates pose no problem. However, if the circuit contains also exclusive OR and exclusive NOR gates, there may be more than one node per variable and the exact number of nodes may depend on the particular non-interleaved order chosen. This is because exclusive OR and NOR gates have no controlling values. 11 These variables may have been visited during the search of another node. Hence, they may already be in the order. 36 1/16 a b 1/16 c d 1/8 1/8 1 1/8 1/4 3 1/8 1/4 2 5 1/2 1/8 e F 1/4 4 1/8 7 1 1/4 f g 1/2 6 1/4 Figure 16: Example of weight assignment.  of the input that comes first in alphabetic order, the first input selected is . The complete order is:      This method does not directly address the main concern of the ones based on depth-first search, namely, keeping together the inputs that affect the same part of the circuit. However, when an input is removed from the circuit, the weights of its neighbors increase, thus increasing the likelihood that they will be selected next. One may verify that on a fanout-free circuit this method produces one of the depth-first orders with the depths of the subtrees used as tie breakers. 6.6.2 Multiple-Output Functions The usual method to order the inputs of multiple output functions is to order the support of each output in turn. The problem is thus split into finding a good order for the outputs and finding a good order for the inputs. We have seen methods for the latter and hence we briefly discuss here the problem of ordering the outputs. Once again, we take the adder as a paradigm. If we want the most significant bits of the inputs at the top of the BDD, we need to consider the most significant output first. This corresponds to considering first the outputs with the deepest subtree. This heuristic based on depth is related, as when applied to input ordering, to the width of the circuit graph. An alternative approach orders the outputs so as to minimize the number of variables that are added to the support when adding a new output to those already ordered. Let    be the outputs of the circuit for which a variable order is sought. Let  be the support of  , i.e., the set of variables on which  depends. Then one may try to minimize                An exact solution to this problem is computationally expensive, and hence a greedy strategy is normally applied. One such strategy tries all the -tuples of outputs as first outputs of the order and then completes the rest of the order by choosing each time the locally best output, i.e., the one whose addition causes the least increase in the size of the support. Once the orders for two or more outputs are given, they must be merged to yield an single order for all the outputs. The simplest approach is initialize the total order to the order of the first output and then append the variables of the successive outputs that do not appear yet in the order. This approach is often ineffective,   37            as argued in [27], because it unnecessarily separates variables that the orders of the individual outputs put   and have been together. Consider the case of two outputs for which the orders   determined. The combined order based on appending yields . However, the order    is equally good for the first output (which does not depend on ) and is presumably better for the second output. This observation is at the basis of the interleaving algorithm of [27].       7 Implementation Issues 7.1 Memory Management Consider building the BDD for the function of a single output circuit. The typical approach is to start by building BDDs for all the input variables of the circuit. One then computes the BDDs for the outputs of the gates fed by primary inputs only, and so on, until the BDD for the primary output is built. In the process, one computes the BDDs for many intermediate functions that are no longer of interest, once the result is obtained. One would like to release the memory used by those BDDs, but there are two problems. First, some subgraphs may be shared by more than one function and we must be sure that none of those functions is of interest any longer, before releasing the associated memory. Second, BDD nodes are pointed from the unique table and the computed table, as well as from other BDD nodes. There are therefore multiple threads and one cannot arbitrarily free a node without taking care of all the threads going through it 12 . A solution to these two problems is garbage collection. Garbage collection consists of deferring the release of memory until there is enough memory to be released to make the required effort worthwhile. A conceptually simple scheme for garbage collection is based on keeping a reference count for each internal and terminal node. This is a count of how many BDD nodes (internal and function nodes) point to an internal node. This count is incremented any time a new arc points to the node (for instance, during the execution of AND) and is decremented when nodes are freed. When a node is freed, its reference count is decremented. If the decrement results in a count of 0, then the reference counts of the children are decremented. The process is recursive. A node with a reference count of 0 is called dead. When there are enough dead nodes, garbage collection is started. During garbage collection, the unique and computed tables are scanned and all entries pointing to dead nodes are cleared. The dead nodes are disposed of at this point. The cost of scanning the tables is amortized over a relatively high number of dead nodes. When garbage collection is invoked, if the collision lists are too long, one may increase the size of the tables and rehash all the entries that do not point to dead nodes. One advantage of garbage collection is that a dead node may be resuscitated if a look-up in the computed or unique tables returns a dead node. Had the dead node been freed immediately, the re-computation would have been necessary. In this way, however, a better utilization of the allocated memory is possible. A reclaim operation is performed when a look-up returns a dead node. This operation undoes what the freeing of the node had done. The reference count of the node and of its children is incremented and if the children were also dead, the process is repeated recursively. Both freeing and reclaiming update the count of dead nodes. The memory layout for a 32-bit machine is described in Figure 17. The computed table—not shown—is just a table pointing to some of the nodes. One should note that the collision list is implemented by an extra pointer in the node data structure. This integration of the unique-table with the DAG minimizes memory occupation and access time simultaneously. The reference count and the variable index are shown sharing a 12 Locating the entry of the unique table pointing to a node just requires a hash-table look-up; on the other hand, deciding which entries, if any, of the computed table point to a node require keeping an extra pointer or scanning the table. Both solutions are less than optimal. 38 ref index then else next ref index ref index then then else else next next unique-table ref index then else next Figure 17: The memory picture. 39 single computer word. Assuming each field takes two bytes, we have limitations on the number of variables and the maximum reference count. The limit of 65535 variables is normally not serious, since in general one gets into trouble with much fewer variables for other reasons13 . On the other hand, it is quite possible for the constant node to exceed a reference count of 65535. The solution to this problem—besides the obvious solution of allocating one four-byte word for each field—is based on saturating increments. If a reference count reaches 65535, then it is no longer changed. This may cause a dead node to be considered non-dead, but in practice the nodes that reach saturation are very few and hence the consequences are negligible. 14 All things considered, the memory occupation is about 18 bytes/node on a 32-bit machine. This number is based on a load factor of 2 for the unique table (the average length of the collision list) and is so divided: 4 words for the node itself (see Figure 17); one half of the hash table bucket (2 bytes). To this one must add the memory requirements for the computed table, which vary considerably with the application. It is thus conceivable to store several million nodes on a machine with a large physical memory15 . 7.2 Efficient Utilization of the Memory Hierarchy Modern computers rely on hierarchical organization to provide low average access time to large memories. The success of this approach is postulated on the locality of reference of programs. Unfortunately, the BDD algorithms that we have presented so far have poor locality, and therefore tend to incur large penalties in their accesses to memory. In practice, if a process that manipulates BDDs grows in size significantly beyond the available physical memory, its page fault rate normally becomes so severe as to prevent termination. One approach to increase locality of reference in BDD manipulation consists of using contiguous memory locations for all the nodes labeled by the same variables and processing all the nodes for one variable before moving to another variable [48, 2, 55, 61]. This approach is commonly referred to as breadth-first processing of BDDs, though it is properly levelized processing. The algorithm for levelized calculation of a generic boolean connective is shown in Figure 18. It consists of two passes: During the fist, top-down pass requests are generated for the pairs of nodes. These requests correspond to nodes in a non-reduced version of the result. The second, bottom-up pass performs reduction by forwarding some requests to others. The requests generated in the first pass double as entries of the computed table. In other words, the levelized approach features a built-in computed table that stores all intermediate results of a given call to bfOp. (Inter-operation caching of results is not covered by this mechanism.) The lossless computed table may be very memory-consuming for operations in which the non-reduced result is much larger than the reduced one. The extra memory required for the request queues may thus offset the advantages of the increased locality of access. Combining the levelized approach with the conventional recursive one can substantially alleviate the problem [61]. Additional speed-up techniques including superscalarity and pipelining are described in [55]. Reordering tends to destroy the segregation of nodes according to the variables. The typical approach is to restore such segregation after every reordering. The levelized approach to BDD manipulation is quite effective in reducing the number of page faults. When the processes fit in main memory the situation is more complex. On the one hand, improved locality 13 However, some schemes that assign variable indexes sparsely may conflict with such a limit. Sometimes, only 8 bits are reserved for the reference count, so as to free 8 bits for flags without increasing the size of the node. If this is done, care must be exercised to periodically adjust reference counts. Alternatively, an overflow table must be kept for those reference counts that become too large. The details of how this is done depend on the implementation. 15 Virtual memory is of limited help in these cases, as explained in Section 7.2. 14 40  bfOp(op, , )   if terminal case (op, , ) return result;  minIndex = minimum variable index of ( , );  create a R EQUEST ( , ) and insert in Q UEUE[minIndex]; /* Top-down A PPLY phase. */   for (i = minIndex; i numVars; i ) bfApply(op,i); /* Bottom-up reduce phase */ for (i = numVars; i  minIndex; i ) bfReduce(i); return R EQUEST or the node to which it is forwarded;  bfApply(op,i)   is the variable with index i; while (R EQUEST Q UEUE[i] not empty)   R EQUEST( , ) = unprocessed request from R EQUEST Q UEUE [i];  if (not terminal case ((op,  ,  ),result))   nextI = minimum index of (  ,  );   result = find or add (  ,  ) in R EQUEST Q UEUE [nextI]; R EQUEST T HEN = result;  if (not terminal case ((op,   ,   ),result))   nextI = minimum index of (   ,   );   result = find or add (   ,   ) in R EQUEST Q UEUE[nextI]; R EQUEST E LSE = result;     bfReduce(i)   is the variable with index i; while (R EQUEST Q UEUE[i] not empty)   R EQUEST( , ) = unprocessed request from R EQUEST Q UEUE [i]; if (R EQUEST T HEN is forwarded to  ) R EQUEST T HEN =  ; if (R EQUEST E LSE is forwarded to  ) R EQUEST T HEN =  ; if (R EQUEST T HEN == R EQUEST E LSE ) forward R EQUEST to R EQUEST T HEN; else if ((R EQUEST T HEN,R EQUEST E LSE) found in U NIQUE TABLE[i]) forward R EQUEST to that node; else insert R EQUEST in U NIQUE TABLE[i];            Figure 18: Levelized BDD manipulation algorithm. 41 of reference should translate into increased cache hit rate. On the other hand, the overhead inherent to the two-pass approach is fairly sizable [43]. Experimental evidence suggests that the levelized approach is better than the recursive one when building BDDs for combinational circuits, but the opposite is true for sequential verification [60]. Improving memory locality for the conventional recursive approach is also possible. In [41] it is shown that the number of cache misses incurred while accessing the unique table may be substantially reduced by sorting the collision lists and making sure that older nodes occupy lower addresses in memory. An additional benefit of this approach is an efficient generational garbage collection scheme. 8 Sequential Verification We suppose we are given a sequential system modeled as a finite state machine, and a property that the system is supposed to verify. Since the system is finite-state, several logics commonly used to express properties admit decision procedures based on state enumeration. In this section we focus on the most elementary of these procedures: Reachability Analysis. Our choice is motivated by the following observations. First, reachability analysis is sufficient to decide an important class of properties: invariants, that is, propositional formulae that should hold in all possible states of the system. Second, reachability analysis exemplifies the fixed point computations that are the major constituents of more general decision procedures. Therefore reachability analysis gives us the opportunity to examine most of the issues connected with the efficient deployment of BDDs in model checking.   , where is the input We define a deterministic finite state machine (FSM) as a 6-tuple    alphabet, is the set of states,  is the output alphabet,  and  are the next-state (or transition) and output   and functions, respectively, and is the set of initial   (reset) states. In the sequel we assume that     for some and . Therefore  is a multiple-output boolean function. We call such a FSM encoded. A state  is reachable from state  , if there exist a sequence of states   and a sequence of inputs      .         , (possibly repeated), such that   ,   , and   A state is simply reachable if it is reachable from any state in . We will define reachability analysis as the problem of enumerating all reachable states of a finite state machine. Note that in this problem we are only concerned with the next-state function  . Hence, in the following, we ignore the output function  . Reachability analysis can be performed in time linear in the number of edges of the state transition graph by depth-first search. However, the number of states of a FSM grows exponentially with the number of memory elements. Hence, even a linear time algorithm may be inadequate in practice. One approach to achieve sub-linear runtimes for some problem instances is based on the use of characteristic functions.          8.1 Characteristic Functions and Relations     Let and let be a set of points of the boolean space called the characteristic function of if it is one exactly for the points of  #  $           . A function  that are in . That is:  is  We can extend the definition of characteristic function to subsets of an arbitrary finite set , by providing  an injective mapping from to . Such a mapping is called a binary encoding of . Let be a binary encoding of , that is,   42  for     . Then the image of     We define the characteristic function of              defined by:  as the characteristic function of  : according to   is  according to    Whenever the encoding is understood, we shall simply write  . With these definitions, we can associate a characteristic function to every finite set. In the following we shall mainly deal with characteristic functions  of subsets of . All set theoretic manipulations can be performed directly on the characteristic functions. In particular  : one can easily see that, for   and for arbitrary finite sets             and encoding               , we have:               Characteristic functions are interesting because they often provide a compact representation and an efficient manipulation of sets. Consider the characteristic function       One can verify that:             However, the representation of  in the form of a BDD only takes two internal nodes. Though almost all sets require exponentially large BDDs to represent their characteristic functions (see [46] for a proof of this classical result), it is true in general that we can manipulate much larger sets by dealing with their characteristic functions than we would by explicitly representing the elements of the sets. Among all sets that can be represented by characteristic functions, we now consider relations, i.e., sub be a binary relation16 . We shall use different sets of sets of a suitable cartesian product. Let  variables to identify the elements of  and and we shall therefore write                           noting that the cartesian product of two sets is obtained by taking the product of the respective characteristic functions, when they have disjoint support. This easily generalizes to -ary relations. Note that we have to use different sets of variables also in the case of  , because the elements of the relation are ordered pairs and we must be able to tell the first element from the second. The most important relation that we shall deal with in the sequel is the transition relation of a FSM:17  16 and  . Binary here means that the elements of the relation are pairs of elements from  In the following, we implicitly assume that we are dealing with characteristic functions, so that, with abuse of notation, we write  instead of  . 17 43 Definition 8 Let     is given by: relation of    be the transition function of an encoded FSM                         . Then the transition   The transition relation describes all the pairs of states that are connected by an arc in the state transition graph of and the inputs labeling that arc. The transition relation is not restricted to the representation of deterministic machines. We could have defined FSMs directly in terms of their transition relations. We did not follow this approach for two reasons. First, there is no loss of generality in restricting ourselves to deterministic transition structures. We can apply Choueka’s determinization procedure [19] to remove nondeterminism. Second, we need to define  because we are going to discuss an algorithm based on it, and it is both easier and more commonly done to derive  from  than vice versa. 8.2 Symbolic Breadth-First Search Characteristic functions allow us to represent and manipulate large sets of states and relations among them. Depth-first reachability analysis, however, deals with states individually. Breadth-first search, on the other hand, allows one to deal naturally with multiple states simultaneously and has thus become the method of choice for the traversal of large machines. When multiple states are processed simultaneously in breadth-first manner, the algorithm is called symbolic breadth-first search or traversal. The sets of states that are processed are represented by their characteristic functions and the problem is then to evaluate  for a given characteristic function as input. The result of the evaluation is the characteristic function of the set of states that are reachable in one step from any state in the set whose characteristic function is the input to  . This process is sometimes called symbolic simulation of  . We shall view this evaluation as the computation of the image of an appropriate function. From this point on, we assume that all functions are represented by BDDs. In particular,  can be represented in two ways: 1. By a vector of BDDs, one for each next state variable; 2. by the transition relation associated with  . These two methods result in two algorithms, called the transition function algorithm and the transition relation algorithm, that have a common structure. In order to present this structure, we now consider two   . definitions concerning the image of a function with domain and range Definition 9 Let is defined by:    be a multiple-output boolean function. The image of  , denoted by Img   ,   Img      #              Informally, the image of is the set of output values that can produce. If  belongs to the image of , then there is an input  such that    . We want to impose a constraint on the inputs that can be applied to and thus restrict the values that may produce. To this end, we introduce the following definition.       Definition 10 Let the image of constrained by    be a multiple-output boolean function and let , denoted by Img Img        , is defined by:   #    44    be a subset of  . Then    traverse   Reached = From = ; do  To = Img  ,From); New = To Reached; From = New; Reached = Reached  New;  while (New ); return Reached;   Figure 19: Pseudo-code of the symbolic breadth-first traversal.    . The set is called the constraint and the problem Img According to this definition, Img   is called constrained image computation. In the traversal of a FSM, the constraint of finding Img  corresponds to finding all the states that are represents a group of current states and computing Img   reachable in one step from the states represented by . Also, in the case of FSM traversal, ,  where is the number of state variables (that are both inputs and outputs of the next state function) and  , but it can be regarded as defined is the number of primary inputs. The constraint is thus defined over     as well, by  just assuming that it is defined over , but does not depend on the variables over ranging over the last coordinates. We are now ready to present a first version of the symbolic breadth-first traversal procedure. The pseudo-code is given in Figure 19. The union and difference of sets are actually performed by manipulating the corresponding characteristic functions. The number of iterations performed gives the sequential depth of the FSM, i.e., the longest of the shortest paths connecting each state to an initial state. In other words, traverse reaches each state in the minimum number of steps. This is a general property of breadth-first methods. Before we consider in detail how to compute Img  From  , we examine the assignment: From New  (8)      This statement simply says that the newly reached states will be considered as current states at the next iteration. Notice, however, that from the point of view of correctness, it is possible to assign to From any set such that: New From New  Reached    (Note that Reached has not been updated yet when the assignment to From takes place.) In particular, From could be set to equal To or Reached  New. If From New, then some states will be examined more than once. If states were individually processed that would be a poor choice. However, we are manipulating characteristic functions; the size of the BDDs and the complexity of the operations on them are only loosely related to the number of minterms of the corresponding functions. It is indeed possible that a function with more minterms than another actually has a smaller BDD. In our case, we could find a choice for From that includes some previously reached states, but results in a smaller BDD and/or a faster computation of Img  From  . One may observe that: New  New  Reached   New  Reached  Therefore, a possible choice for From that is expected to result in a compact BDD is given by: New  Reached  45 and this expression could replace New in the right-hand side of (8). The image of  can be derived by exhaustive simulation. If a constraint on the inputs is given, only those inputs that satisfy the constraint are applied. However, this method is impractical for all but the simplest cases. Therefore we seek methods of computing Img  From  , that only manipulate the characteristic functions of the sets involved in the computation. 8.3 Image Computation In this section we concentrate on the problems of image computation for the two algorithms based on the transition function and the transition relation. Throughout this section, we shall use the following example to illustrate the various techniques. Example 13 We want to compute the image of     where:              (9) For this simple example, it is easy to compute the image by exhaustive simulation: 0 0 0 0 1 1 1 1 By inspection, we see that the image of shown in Figure 20.    0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0   0 0 0 0 1 1 1 0  0 0 1 0 1 1 0 0 0 0 1 0 0 1 1 0  is  000,011,101,110,111 . The BDDs for the three functions are 8.4 Image Computation for Transition Functions      We now examine how to compute the image of a function , when the transition function is  by breaking the given as a set of BDDs, one for each next state variable. It is possible to find Img computation in two steps [21]: 1. Find a new function whose image over the entire domain domain . 2. Find the (unconstrained) image of the new function. 46   equals the image of  over the restricted f1 f2 f3 a a a b b b c Figure 20: BDDs for the image computation examples. f Img(f,C) Bm n B C x1 v1 v x Img(f,Bn ) Figure 21: Constrained image computation. 47 The first step of the procedure is pictorially represented in Figure 21. It requires an operator that, when   as another outside onto the same point of applied to a multiple-output function, maps a point #  point   . Any such operator is called an image restrictor. Noting the similarity between the re-mapping performed by an image restrictor and that of a generalized cofactor, we may ask whether any of the operators we already know is indeed an image restrictor. The answer is affirmative, in that the constrain operator (indicated by ) of Section 5.4.1 is indeed an image restricting  , the mapping generalized cofactor. This can be easily seen by observing that, when computing  performed by constrain only depends on  . Therefore, when constrain is applied to all the components of     , it performs the same mapping for all the components. This guarantees that the image of  will become the image of   , as shown in Figure 21. In summary, we have:      Img    Img   where it is understood that constrain is applied to each component of .18 Note that the restrict operator (  ) is not an image restrictor, because the mapping of the points outside  depends also on  , the function being cofactored; indeed,  determines what variables should be quantified in  . Though it is the only one that we shall consider in detail, constrain is not the only image restrictor we can define. For instance, one and map all the points outside onto   . In most cases this will may choose an arbitrary element $ be less efficient than applying constrain, though. We now assume that the constraint has already been taken care of, by applying the constrain operator and we concentrate on two techniques for the unconstrained image computation problem.     8.4.1 Image Computation by Input Splitting The first technique is based on the following expansion: Img    Img     Img     (10) Intuitively this identity says that the image of the function is the union of the set of output values that can  and the set of output values that can be obtained by setting   ; hence be obtained by setting   the name of image computation by input splitting. Equation 10 is applied recursively until a terminal case      , where    is found. The simple terminal case is when is constant. Suppose . Then   if  the characteristic function of the image of is   , where  and   otherwise. Though this terminal case is sufficient to build a correct algorithm, efficiency requires a more sophisticated analysis of terminal conditions. In particular, we consider three mechanisms.            Decomposition due to disjoint support; Identical and complementary components; Identical subproblems. Decomposition due to Disjoint Support. Consider the case of are disjoint. Then, Img  Img    Img    18       , where the supports of   and  It is important for the variable order not to change during image restriction with constrain because the mapping, although independent of the component of , does depend on the variable order. 48 Img            Img     disjoint  Img           Figure 22: Example of image decomposition. In terms of characteristic functions, it is sufficient in this case to take the product of the characteristic functions computed for the images of  and . This result can be easily generalized to the case of a multiway partition of     , such that the component functions in one block have support disjoint from the functions of the other blocks. An example is shown in Figure 22. The effect of partitioning, when it occurs, is dramatic, so that one should try to cause partitioning as early and as often as possible, by appropriately choosing the splitting variable.            and   Identical and Complementary Components. Suppose . Assuming  is not     constant, then Img  . If   , then Img  . In general, it is possible to remove all but one identical or complementary components from a problem, solve the simplified problem, and finally  or    . This technique does reintroduce the missing variables, by adding clauses of the form   not directly affect the size of the search space. However, it reduces the amount of work for each recursive invocation of the algorithm and also increases the chances of early termination due to the technique to be discussed next.      Identical Subproblems. It is often the case that different cofactors of are identical or strictly related. By maintaining a table of the problems solved thus far, an algorithm may avoid repeated computations of the same result. This is the familiar technique applied in BDD manipulation algorithms. As in that case, optimal exploitation of the table of computed results is an issue that may be addressed by trying to normalize the calls. The normalization is obtained by eliminating all repeated and complement components, complementing all the components pointed by complement arcs, and sorting the components according to the order of the pointers. The recognition of identical subproblems is particularly important when the image being computed is represented by a small BDD. Example 14 Consider the case when the image of a function       49    is defined by:    f x n-1 yn- 1 yn-1 xn-2 y n-2 yn-2 xn-3 x0 y0  Figure 23: A BDD for  . (Here and  are both sets of output variables.) The corresponding BDD, for an appropriate variable order, is shown in Figure 23. This BDD grows linearly with . This happens frequently with arithmetic functions that can be expressed iteratively. If identical subproblems are not detected, though, an exponential number of recursive calls will be performed, thus effectively expanding the BDD into a tree. Example 15 For the function defined by (9), the computation based on input splitting proceeds as follows. We initially select an input variable for splitting. From symmetry considerations, we see that in this case all variables give similar results; hence we choose . We then compute Img   . We have:               This is not yet a terminal case, so we split again, this time with respect to , and find:       This is a terminal case for which the solution is        We then consider Img                       50          . Therefore:              . We have:     . Considering then the negative cofactors, we get: This is again a terminal case, from which we get Img       This is a terminal case and we get:  Img       Finally, adding the two partial results, we have: Img           in agreement with the results obtained by exhaustive enumeration. 8.4.2 Image Computation by Output Splitting Rather than splitting the problem with respect to the input variables, one can partition with respect to the output functions, by using the following formula.  Img                     Img  Img                   (11)  Intuitively, this identity says that the image of can be dichotomized into the points for which  and  those for which  . The resulting method is again recursive and at each step we have to compute two constrained images. This can be done by applying constrain, as we have seen at the beginning of Section 8.4. The termination conditions are similar to those used with input splitting. The number of component functions decreases at each iteration. The number of inputs may or may not decrease, depending on the simplification that occurs in applying constrain. When the problem is reduced  the image is  , and if  , the to a single component  , if  is not constant, the image is 1. If  image is   . The efficiency of this procedure can be increased by application of the same techniques we have seen for input splitting, namely decomposition, recognition of identical or complementary components, and the use of a computed table. It is also possible to combine the input and output splitting techniques, by choosing at every step whether to decompose the domain or the range of the function. There are cases where input splitting is better than output splitting, and vice versa. At first glance, output splitting has the advantage of performing a partitioning of the problem (there are no common image points in the two sub-problems) and of visiting a ). However, this is often offset by the smaller search space in the case of FSM traversal (because increased complexity brought about by constrain, and by the decreased occurrence of decomposition. In most cases, it has been observed that input splitting is more efficient.       Example 16 We now apply the output splitting method to our example. We choose the splitting order as    . Figure 24 shows the effect of applying constrain to  . We initially compute Img    and  . We now compute and . Let                    Img and Img Hence, Img                   51 Img  Img      f 2 f1 f 3 f1 b b c Figure 24: First step in the image computation by output splitting. f 2 f’1 f 3 f’1 a a b b c c Figure 25: Second step in the image computation by output splitting. 52         . The result of applying constrain with respect to   is shown in We must now compute Img Figure 25. Since the two BDDs are identical and non-constant, Finally, combining the two sub-problems,  Img        Img                     in agreement with our previous results. 8.5 Image Computation for Transition Relations The image of a next-state function described by the associated transition relation  can be computed as:                        where is a constraint on the inputs, describing a set of present states. If no such constraint is given— alternatively, if —we simply have to existentially quantify the input variables in  . Example 17 For the function defined by (9), the transition relation is given by:                              and we want to compute: Img                      In this case the order of quantification is immaterial. We follow the alphabetic order.                                                                                                                                                                                 This simple example shows that the computation is more complex when using the transition relation than when using transition function. This is mostly due to the increased number of variables. In its original form, this method is indeed less efficient than the ones previously seen. However, there are ways to speed it up that make it competitive. We begin their study by proving some results on generalized cofactors, and constrain and restrict in particular, that are of interest in this context. 53 8.5.1 Partitioned Image Computation The major computational problem in image computation with the transition relation derives from taking  may be small, their product the conjunction of the components. Even if the individual BDDs for   may be too large. The final result—after quantification—may also be small, but we may exceed the allotted resources while building some intermediate result. A substantial increase in speed an memory efficiency comes from the following simple observation, whose proof is immediate.     Lemma 6 Let function of            be a function of   . Then:  ,                          and                be a  Lemma 6 says that we can partially distribute the quantifications over the product, if the terms of the product do not all depend on all the variables. Since the quantification of a variable normally simplifies the result, we can hope that by intertwining products and quantifications, the size of the intermediate BDDs will be better kept under control [11, 59]. The following example illustrates this point.  Example 18 We want to compute Img   , with         Img          ,             and      . According to (7), we have:                                                                           Applying restrict is simple in these cases. In particular, the second and third term of the transition relation  do not depend on   . This variable is therefore quantified in    , resulting in the constant 1 function. Hence: Img                                                                                                                                          54              Figure 26: Different tree organizations for the partitioned computation of images.                                In this example, Equation 7 was particularly handy. However, it has been observed in practice that Equation 5 is often more efficient. Several issues need to be considered in defining an algorithm based on Equations 5–7, besides the choice of what equation to adopt. The first is the general organization of the image computation, which consists of the product of several factors, and the quantification of part of the variables. If the product of two factors is taken at each step, a binary tree results. The leaves of the tree correspond to the factors of Equations 5–7. Each internal node corresponds to the product of its two children and the quantification of all the variables appearing in the result that do not appear in the rest of the tree. The tree is visited in post-order and at the end of the visit, one of Equations 5–7 is computed at the root of the tree. There are many different trees with the same number of leaves. Figure 26 shows the two extreme cases: A perfectly balanced tree and a totally unbalanced one. The number of nodes in the two trees is the same. For leaves, both trees have internal nodes. A second, important issue is the order of the factors, that is, how the factors of Equations 5–7 are mapped onto the leaves of the binary tree. The order of the factors should try to minimize the size of the intermediate results. In one strategy, the factors  are ordered so as to minimize the increase in total support (see the output ordering for the BDDs of multiple-output functions). In another approach, the order is chosen so as to maximize the number of variables that will be quantified early. This is done by looking at the private support of each term, defined as the variables that only appear in that term. Practical solutions combine in various ways these and other heuristics [29, 51]. A final consideration concerns the clustering of the factors in Equations 5–7. If many image computations must be performed, it may be convenient to partition the factors into a few blocks, and then take the product of each block. This decreases the number of products that must be taken during image computation. There is a trade-off between the reduction in the number of products and the cost of each product. The usual approach to decide how many blocks should be created is to impose a limit on the size of the BDD for each block. One factor is chosen as the seed for a block, and other factors are conjoined to it, until the product grows larger than a given threshold. The procedure is then repeated on the remaining factors. Notice that clustering the blocks has the effect of changing the structure of the computation tree.   55  8.5.2 Combining Conjunction and Quantification In the method described in the previous section, at each node of the tree two BDDs are conjoined and some variables are existentially quantified from the result. It is possible to gain some efficiency in terms of both CPU time and memory by combining the two steps into a single operation. The algorithm that computes      is based on the usual recursion. As in the case of the quantification algorithm of Section 5.3.3, is the cube of the variables to be quantified, is the index of the lowest indexed variable of and  and      is the index of the top variable of . We shall write , where is a cube, possibly the function 1. If     or   is constant (either factor is 0 or both are 1) then     . Otherwise, we consider the three following cases.       If  and  do not depend on  . In this case we write:    . In this case                                AND           AND                       AND  AND     OR    . Hence,     , then the result is 1 and it is not necessary to perform the second recursive call.  and write:   AND     . In this case we expand with respect to                    AND          8.5.3 Partitioning, Identical Subproblems, and Splitting Partitioning works in the case of the transition relation as in the case of the transition function, thanks to the following identity:                    The recognition of identical sub-problems can also be made to work in the following way. Suppose that      , the -th term of Equations 5–7, has  as top variable. Suppose next that no , appears in the the -th term. This is certainly true if no terms are combined. Terms   and  represent  identical sub-problems if   and:          They represent complementary subproblems if         or     or        and:         If terms  and  represent identical sub-problems,     is replaced by         . If they represent   complementary sub-problems,      is replaced by    can be tested      . The condition   efficiently. It is also possible to test for a stronger condition, namely   , in constant time. Notice that 56 before restrict or constrain are applied, this stronger condition is always satisfied, since all terms are of the . form  Finally, it is possible to use a table to store the results of image computations, with an approach similar to the one taken for the transition function methods. Identical problems may occur at different iterations of the breadth-first traversal. It is also possible that identical sub-problems arise in the course of one computation, if the image computation is decomposed according to the following identity:                                                                                        The similarity of this decomposition to the output splitting technique is apparent. The identity can be generalized by observing that the essential requirement for output splitting is that no term except   depend on  . Notice that Theorem 8 can be applied to the two sub-problems. There is also the equivalent of input splitting, thanks to the following identity:                                                                  We finally notice that, although we have so far assumed that there is one term for each next state variable, it is possible to reduce the number of terms by selectively taking the product of groups of terms. This may be advantageous if the BDD obtained by taking the product is not larger than the sum of the operands. This technique can be seen as a way of altering the structure of the evaluation tree. 8.6 Renaming Variables in the Traversal Procedure In a FSM we distinguish present state variables from next state variables. When the image of the next state function is computed, a set of next states is found that is reachable in one transition from a given set of present states. At each iteration of the traversal procedure, after the set of newly reached states is computed, we need to manipulate it as if it were a set of present states. This requires that the result of the image computation be expressed with the same variables that are used to represent the sets of present states (Reached, From). This problem is solved differently depending on the algorithm used for image computation. In the case of the transition function, one can directly use the same variables for From and To. There is no problem there, because the BDDs for  From and To are separate entities at all times. Therefore, one can implicitly apply the renaming of the variables that corresponds to clocking the state register to make the next states present.  57 In the case of the transition relation, it is not possible to use the same variables for both present and next states, because these variables appear together in the characteristic function of the transition relation. We have to wait until a present state variables has been quantified, before renaming the corresponding next state variable. (The simplest approach is to rename the variables after the image is computed.) The renaming of the variables is best seen as a case of function composition. Replacing  by   in the characteristic equation of the image is equivalent to computing: To    To             that we know we can compute as:   To      To     To   When all next state variables must be replaced, it is advantageous to perform the substitution in a single pass, by applying the following identity: To           To             To          This identity is the basis for a simple recursive algorithm. 8.7 Variable Selection in Image Computation We have seen that for both the transition function and the transition relation methods image computation can be decomposed by input and output splitting. An obvious question concerns the criterion by which to choose the variable to be used for the expansion. We first consider input splitting for the transition function method. Remember that in this case, we have a vector of BDDs, one for each output. These BDDs only depend on the input variables. The simplest choice is therefore the variable with lowest index appearing in those BDDs. The advantages of this method are that the cofactors can be computed trivially and that no new nodes are created. The choice of the top variable is therefore attractive from the point of view of memory. However, we may be willing to trade some memory occupation for faster computation, by choosing a splitting variable that reduces the search space more than the top variable. To that effect, we notice that the single most dramatic factor in reducing the size of the search space is partitioning based on disjoint support. Hence we look for splitting variables whose choice leads to earlier decomposition of the image computation. According to this criterion, good candidates are those variables on which many outputs depend. This simple heuristic is fairly simple to implement and has excellent performance [37]. In the method based on the transition function, output splitting does not necessarily reduce the support of the residual functions. In the transition relation method, though, both kinds of splitting can be used to reduce the supports of the residual functions. 8.8 Backward Traversal The form of reachability analysis we have examined so far can be considered forward traversal: We start from the initial states and we trace paths forward. Those states that are along the paths are clearly reachable from some initial state. If we want to know whether a given property holds at all reachable states of a finite state machine, we can find all the reachable states and then examine them for possible violations of the property. In finite state machine equivalence, the property is  for the product of the two machines being compared. 58 There is an alternative approach to the same problem: We identify all the states where the property is violated, and then find whether any of these states is reachable from any initial state. Finding whether a state is reachable from any initial state can be done by tracing a path backward in the state transition graph. Tracing paths backward requires the computation of the states from which a given set of states is reachable. This goes under the name of pre-image computation. Therefore, we now examine pre-image computation, and then we discuss the relative merits of forward and backward traversal. As in the case of image computation, there are two algorithms for pre-image computation: One is based on the transition function, and one is based on the transition relation. For the sake of conciseness, we only consider the latter. 8.8.1 Pre-Image Computation Based on the Transition Relation The computation of the pre-image based on the transition relation can be performed with the image computation algorithm. It is sufficient to interchange the roles of input and output variables. Example 19 Let us consider the following transition relation:                                          where , , and are the input variables and  , , and are the output variables. Suppose we want to compute the pre-image of  . With argument similar to that employed for image computation, we can write: Pre                                 In this case the two generalized cofactors are identical, because the constraint is a cube. Performing the computation, we get: Pre                        In general, when the transition relation is given in the form:        and the constraint is a cube, considerable simplifications occur. Since the cofactor of a function with respect to a cube does not depend on the variables appearing in the cube and cofactoring with respect to a cube distributes, all terms of the product contain zero or one output variables after cofactoring. The existential quantification fully distributes, since each output variable appears in at most one term. Finally, if a term is   after the cofactoring (this is the case if  does not appear in the cube), then the still of the form   existential quantification returns 1. Otherwise, it returns either  or   . Therefore we see that the result of the pre-image computation is simply the product of those  such that  appears in the constraint cube, taken with the appropriate phase. In our example, we can write:              59       and one can verify that the previous result is correct with minimum computation. When the constraint is not a cube, the general algorithm based on interleaving quantification and product applies, as in the case of image computation. 8.8.2 Forward versus Backward Traversal The relative effectiveness of the two methods depends primarily on the structure of the graph being traversed. Suppose, for instance, that we are comparing two identical counters for equivalence. We know that the forward traversal method has difficulty in dealing with this example, because of the large sequential depth of the produce machine. If we are comparing two modulo- counters, each with one initial state, breadthfirst forward traversal will require iterations. At each iteration, one new state is added to the set of reachable states, and this accounts for the ineffectiveness of the procedure. By contrast, in backward traversal, we initialize our search to all states of the product machine that give different counts in the two counters. This set of states has a compact characteristic function. A single pre-image computation leads to convergence, as a one minute’s thought will confirm: The pre-image is precisely the same set of states that we started from, and it does not include the initial state. Hence, we arrive immediately at the conclusion that the two counters are equivalent. Clearly, not all examples work like the counters. If the set of states where a violation occurs does not have a compact characteristic function, or if the machine to be traversed is not deep, forward traversal will typically perform better.   8.9 Transitive Closure and Iterative Squaring Breadth-first symbolic traversal is remarkably efficient, in that it allows realistic machines with very many    ) to be traversed. The efficiency stems from the ability to process many states at states (more than one time. It is then easy to see that the advantage of breadth-first symbolic traversal is lost when the FSM to be traversed is, for instance, a counter. A counter is characterized by a very high sequential depth—it actually equals the number of states. In this section we consider a technique, called iterative squaring, that  be the transition relation of FSM addresses this problem [12]. The basic idea is quite simple. Let   , where  is the vector of present state variables, is the vector of primary input variables, and is the  such that from state  of , state can set of next state variables.  describes triples of the form  be reached by applying input . If we existentially quantify all primary input variables from  , we obtain another relation:                  that describes pairs of adjacent states, i.e., states that are connected by at least one edge in the transition   in a way to be described graph of . We can compute the transitive closure of  , that we call shortly. The transitive closure describes the pairs of states that are connected by at least one path in the  state graph of . Finding the set of states reachable from then amounts to computing: Reached              since the product of and yields the pairs of states connected by a path, such that the first state of the   is pair is an initial state; the quantification returns the second components of the pairs. The term necessary, because some initial states may not be reachable from any other initial state. The transitive closure can be found by computing the least fixed point of the following recurrence equation:         60        00 10 01 11 Figure 27: A state transition graph.  gives the name of iterative squaring to this technique and is responsible for its  The term      ,    is set to    and then efficiency. In order to find                       for some . To see how the recurrence equation actually computes     is computed until    describes all the pairs of states   such that can the transitive closure of  , suppose that       be reached from  in at least 1 and at most steps. This is certainly true for . We can then prove    such that can be reached from in at least 1 and at most that  describes all the pairs of states  steps. This follows from observing that       describes triples of states   , such that   steps. Hence is reachable in at least 2 and ( ) is reachable from  ( ) in at least 1 and at most   obtained by quantification give all pairs of states that satisfy that at most steps from  . The pairs   , the pairs of states that are precisely one step apart are included, thus proving condition. By adding   our statement.      Example 20 Consider the state transition graph of Figure 27, where the primary inputs do not appear for simplicity. We have:                      The computation of the transitive closure proceeds as follows:                                                          One can verify that                                                                                                                                            , consistent with the fact that the graph of Figure 27 has no cycles. It should be noted that the number of iterations required to compute the transitive closure is logarithmic in the maximum distance between two states19 . The transitive closure also considers paths originating at 19 This is also called the diameter of the state transition graph. 61 states that cannot be reached from the initial states. Hence it may be inefficient, if a large fraction of the state graph is unreachable or if the sequential depth of the machine is low. The major problem with the transitive closure is that the BDDs that are produced during the computation may grow too large. 8.10 Approximate Reachability Analysis Even with the best algorithms and heuristics, there remain many circuits that are too difficult for the reachability techniques that we have discussed so far. Circuits with a few thousand latches are normally too difficult, and sometimes, even circuits with less than a hundred latches pose significant problems. In such cases, however, it may still be possible to perform an approximate reachability analysis. Specifically, it may be possible to find a superset of all reachable states. Alternatively, it may be possible to compute a subset of all reachable states. Both approximations may be useful. Suppose we want to prove an invariant. Let us examine the superset first. If the invariant hold for all the states of the superset, then it holds for all reachable states. On the other hand, if the invariant does not hold for some states of the superset, it may be that the states where violations occur are not really reachable. Therefore, verification based on supersets of the reachable states may produce false negatives, or, in other words, it is conservative. If we prove a circuit correct by applying a conservative technique, we know it is correct. On the other hand, for conservative verification to be useful, false negatives should be relatively rare, because proving a false negative false may be impractical, or at least very time consuming. The closer the superset of reachable states to the actual set, the lower the chance of getting false negatives. On the other hand, better approximations require more time and memory to be computed. Notice  that a trivial solution to the approximation of the reachable state set from above is given by the set of all states, where is the number of latches. Let us now turn our attention to the computation of subsets of the reachable states. A reachability analysis method based on forward traversal produces a sequence of sets of states that eventually converges to the set of all reachable states. The elements of the sequence provide increasingly accurate approximations from below to the exact solution. If we cannot complete reachability analysis, we can always stop when we run out of memory or time, and claim that we have obtained an approximation from below to the solution. The challenge, therefore, lies in devising algorithms that produce good approximations with limited resources. When verification of an invariant is based on a subset of all reachable states, it is partial. It may be that the invariant does not hold, but the states where violations occur have not been visited. On the other hand, if a violation is reported for a state in the subset, then we know that the invariant does not hold. The usefulness of partial verification relies on the fact that a new design normally contain many mistakes, and partial verification may uncover most such mistakes quickly. In this respect, partial verification is not different from simulation: Indeed simulation is one form of partial verification. Methods based on the reachability analysis techniques we have examined thus far, however, can be much more efficient than simulation.  8.10.1 Computing Supersets of the Reachable States In a sequential circuit, each state variable depends in general on some primary inputs and some state variables. If we replace some connections to state variables with connections to new primary inputs, the resulting circuit will exhibit a wider variety of behaviors: It will be able to do more things than the original circuit. This simple idea is at the heart of efficient methods for the computation of supersets of the reachable states. The circuit is partitioned in submachines, and the connections among the submachines are cut. (See Figure 28.) Once a circuit is partitioned, we can traverse each submachine in isolation. Then, we can take the 62 A B A B simplified constraints Figure 28: Partitioning for Approximate Reachability Analysis. superset of the reachable states of the original machine to be the cartesian product of the sets of states of the submachines. This procedure is simple, but a bit too drastic in simplifying the problem. The fact that the connections among the submachines are completely ignored causes a considerable loss of accuracy. Instead, we can devise several methods to retain in part the flow of information among the submachines. We outline one such approach. We process the submachines in some order. For the first machine, we perform reachability analysis, assuming that the inputs from the other machines are unrestricted. When we consider the second machine, though, we assume that the inputs coming from the first machine cannot take values corresponding to states that were found to be unreachable during the first traversal. We then consider the third machine, constraining the inputs from the first and second machine, and so on. Once all machines have been considered, we can return to the first one. This time, instead of using unconstrained inputs from the other machines, we use the constraints determined by the first round of reachability analyses. This may produce a smaller reachable set of states then the initial traversal. We continue to refine the approximation, until we reach a point in which no reachable states set shrinks further. At that point we have converged to a superset of the reachable states for the original circuit. Constraining the traversal of each submachine can be accomplished with minor extensions to the image computation techniques that we already know. Specifically, the constraint used in image computation comes from both the state variables of the submachine, and the inputs fed by the other submachines. The scheme we have discussed can be improved and modified in several ways. Another important problem is how to partition the circuit into submachines. For these aspects, the reader is referred to [16, 17]. 8.10.2 Computing Subsets of the Reachable States The standard forward traversal algorithm provides a sequence of approximations from below for the reachable states. The challenge is how to provide better approximations within given time and memory limits. We outline one possible approach. We assume that we use BDDs to represent the sets of states visited during reachability analysis. We observe that BDDs are successful because they provide a dense representation for sets. Here dense means that the ratio of the cardinality of the set over the size of the representation is high. Breadth-first traversal is more or less successful, depending on how dense the BDDs are. Therefore, if we want to improve the efficiency of traversal, we may want to increase the density of the BDDs. One way to improve the density is to improve the variable order. However, for some circuits, the set of reached states may still have a large BDD in spite of our best reordering efforts. Our next observation is that breadth-first search goes through a fixed sequence of sets of states: All the initial states, followed by all the states at distance 0 or 1 from the initial states, and so on. If the states at distance or less from the initial states are randomly scattered through the state space, their representation as a BDD is likely to be large. However, by visiting the states in some other order than the one imposed by breadth-first search, we may visit states that are tightly packed in some subspace. Such states will have a  63 dense representation. We are therefore going to mix breadth-first and depth-first search, leaving the density of the BDDs guide our meandering through the state graph: We still use as basic approach the breadth-first search algorithm. However, at each iteration we check the size of the To set. If it is too large, we extract a dense subset from To using the algorithms of Section 5.6 and we use it instead of To as constrain in the image computation. The states in To that are not in the dense subset are discarded. Therefore, we do not have a breadth-first search any more. A few interesting properties of breadth-first search are lost. For instance, when the set of new states is empty, we are not guaranteed that all states have been reached. In spite of this and similar disadvantages, the traversal based on density is appealing, because it may visit many more states than pure breadth-first search in the same amount of time [53, 52]. 9 Conclusions Binary Decision Diagrams are a popular data structure for verification algorithms and for several tasks in the design of calculational systems. We have reviewed the properties of BDDs that make them amenable for such applications and we have presented the main algorithms for their manipulation. We have emphasized the application of BDDs to reachability analysis because reachability analysis is the computational task at the basis of model checking algorithms. Many variants of BDDs have been proposed and several are in use. We have not attempted to give even a cursory treatment of them, even though some of the variants have found applications in verification algorithms. The interested reader will find two good references on the subject in [10, 46]. References [1] S. B. Akers. Binary decision diagrams. IEEE Transactions on Computers, C-27(6):509–516, June 1978. [2] P. Ashar and M. Cheong. Efficient breadth-first manipulation of binary decision diagrams. In Proceedings of the International Conference on Computer-Aided Design, pages 622–627, San Jose, CA, November 1994. [3] C. L. Berman. Circuit width, register allocation, and reduced function graphs. Technical Report RC 14129, IBM T. J. Watson Research Center, Yorktown Heights, NY, October 1988. [4] B. Bollig, M. Löbbing, and I. Wegener. Simulated annealing to improve variable orderings for OBDDs. Presented at the International Workshop on Logic Synthesis, Granlibakken, CA, May 1995. [5] K. S. Brace, R. L. Rudell, and R. E. Bryant. Efficient implementation of a BDD package. In Proceedings of the 27th Design Automation Conference, pages 40–45, Orlando, FL, June 1990. [6] R. K. Brayton et al. VIS: A system for verification and synthesis. In T. Henzinger and R. Alur, editors, Eigth Conference on Computer Aided Verification (CAV’96), pages 428–432. Springer-Verlag, Rutgers University, 1996. LNCS 1102. [7] R. E. Bryant. Symbolic manipulation of boolean functions using a graphical representation. In Proceedings of the 22nd Design Automation Conference, June 1985. [8] R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35(8):677–691, August 1986. [9] R. E. Bryant. On the complexity of VLSI implementations and graph representations of boolean functions with application to integer multiplication. IEEE Transactions on Computers, 40(2):205–213, February 1991. 64 [10] R. E. Bryant. Binary decision diagrams and beyond: Enabling technologies for formal verification. In Proceedings of the International Conference on Computer-Aided Design, pages 236–243, San Jose, CA, November 1995. [11] J. R. Burch, E. M. Clarke, and D. E. Long. Representing circuits more efficiently in symbolic model checking. In Proceedings of the Design Automation Conference, pages 403–407, San Francisco, CA, June 1991.  [12] J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang. Symbolic model checking: and beyond. In Proceedings of the Fifth Annual Symposium on Logic in Computer Science, June 1990. states [13] J. R. Burch and D. E. Long. Efficient boolean function matching. In Proceedings of the International Conference on Computer-Aided Design, pages 408–411, Santa Clara, CA, November 1992. [14] J. R. Burch and V. Singhal. Tight integration of combinational verification methods. In Proceedings of the International Conference on Computer-Aided Design, pages 570–576, San Jose, CA, November 1998. [15] G. Cabodi, P. Camurati, and S. Quer. Improved reachability analysis of large finite state machines. In Proceedings of the International Conference on Computer-Aided Design, pages 354–360, Santa Clara, CA, November 1996. [16] H. Cho, G. D. Hachtel, E. Macii, B. Plessier, and F. Somenzi. Algorithms for approximate FSM traversal based on state space decomposition. IEEE Transactions on Computer-Aided Design, 15(12):1465–1478, December 1996. [17] H. Cho, G. D. Hachtel, E. Macii, M. Poncino, and F. Somenzi. Automatic state space decomposition for approximate FSM traversal based on circuit analysis. IEEE Transactions on Computer-Aided Design, 15(12):1451–1464, December 1996. [18] H. Cho, G. D. Hachtel, and F. Somenzi. Redundancy identification/removal and test generation for sequential circuits using implicit state enumeration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 12(7):935–945, July 1993. [19] Y. Choueka. Theories of automata on  -tapes: A simplified approach. Journal of Computer and System Sciences, 8:117–141, 1974. [20] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. An Introduction to Algorithms. McGraw-Hill, New York, 1990. [21] O. Coudert, C. Berthet, and J. C. Madre. Verification of sequential machines using boolean functional vectors. In L. Claesen, editor, Proceedings IFIP International Workshop on Applied Formal Methods for Correct VLSI Design, pages 111–128, Leuven, Belgium, November 1989. [22] O. Coudert and J. C. Madre. A unified framework for the formal verification of sequential circuits. In Proceedings of the IEEE International Conference on Computer Aided Design, pages 126–129, November 1990. [23] R. Drechsler, B. Becker, and N. Göckel. A genetic algorithm for variable ordering of OBDDs. Presented at the International Workshop on Logic Synthesis, Granlibakken, CA, May 1995. [24] R. Drechsler, N. Drechsler, and W. Günther. Fast exact minimization of BDDs. In Proceedings of the Design Automation Conference, pages 200–205, San Francisco, CA, June 1998. [25] F. Ferrandi, A. Macii, E. Macii, M. Poncino, R. Scarsi, and F. Somenzi. Symbolic algorithms for layout-oriented synthesis of pass transistor logic circuits. In Proceedings of the International Conference on Computer-Aided Design, pages 235–241, San Jose, CA, November 1998. [26] S. J. Friedman and K. J. Supowit. Finding the optimal variable ordering for binary decision diagrams. IEEE Transactions on Computers, 39(5):710–713, May 1990. [27] H. Fujii, G. Ootomo, and C. Hori. Interleaving based variable ordering methods for ordered binary decision diagrams. In Proceedings of the International Conference on Computer-Aided Design, pages 38–41, Santa Clara, CA, November 1993. 65 [28] M. Fujita, Y. Matsunaga, and T. Kakuda. On variable ordering of binary decision diagrams for the application of multi-level logic synthesis. In Proceedings of the European Conference on Design Automation, pages 50–54, Amsterdam, February 1991. [29] D. Geist and I. Beer. Efficient model checking by automated ordering of transition relation partitions. In D. L. Dill, editor, Sixth Conference on Computer Aided Verification (CAV’94), pages 299–310, Berlin, 1994. SpringerVerlag. LNCS 818. [30] G. D. Hachtel, E. Macii, A. Pardo, and F. Somenzi. Markovian analysis of large finite state machines. IEEE Transactions on Computer-Aided Design, 15(12):1479–1493, December 1996. [31] G. D. Hachtel and F. Somenzi. A symbolic algorithm for maximum flow in 0-1 networks. Journal of Formal Methods in Systems Design, 10(2/3):207–219, 1997. [32] M. Held and R. M. Karp. A dynamic programming approach to sequencing problems. J. SIAM, 10(1):196–210, 1962. [33] Y. Hong, P. A. Beerel, J. R. Burch, and K. L. McMillan. Safe BDD minimization using don’t cares. In Proceedings of the Design Automation Conference, pages 208–213, Anaheim, CA, June 1997. [34] N. Ishiura, H. Sawada, and S. Yajima. Minimization of binary decision diagrams based on exchanges of variables. In Proceedings of the International Conference on Computer-Aided Design, pages 472–475, Santa Clara, CA, November 1991. [35] S.-W. Jeong, T.-S. Kim, and F. Somenzi. An efficient method for optimal BDD ordering computation. In International Conference on VLSI and CAD (ICVC’93), Taejon, Korea, November 1993. [36] S.-W. Jeong, B. Plessier, G. D. Hachtel, and F. Somenzi. Extended BDD’s: Trading off canonicity for structure in verification algorithms. In Proceedings of the IEEE International Conference on Computer Aided Design, pages 464–467, Santa Clara, CA, November 1991. [37] S.-W. Jeong, B. Plessier, G. D. Hachtel, and F. Somenzi. Variable ordering and selection for FSM traversal. In Proceedings of the IEEE International Conference on Computer Aided Design, pages 476–479, Santa Clara, CA, November 1991. [38] G. Kahmi and L. Fix. Adaptive variable reordering for symbolic model checking. In Proceedings of the International Conference on Computer-Aided Design, pages 359–365, San Jose, CA, November 1998. [39] C. Y. Lee. Binary decision programs. Bell System Technical Journal, 38(4):985–999, July 1959. [40] B. Lin and F. Somenzi. Minimization of symbolic relations. In Proceedings of the IEEE International Conference on Computer Aided Design, pages 88–91, Santa Clara, CA, November 1990. [41] D. E. Long. The design of a cache-friendly BDD library. In Proceedings of the International Conference on Computer-Aided Design, pages 639–645, San Jose, CA, November 1998. [42] F. Mailhot and G. De Micheli. Technology mapping using boolean matching. In Proceedings of the European Conference on Design Automation, pages 180–185, Glasgow, UK, March 1990. [43] S. Manne, D. C. Grunwald, and F. Somenzi. Remembrance of things past: Locality and memory in BDDs. In Proceedings of the Design Automation Conference, pages 196–201, Anaheim, CA, June 1997. [44] K. L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, Boston, MA, 1994. [45] K. L. McMillan. A conjunctively decomposed boolean representation for symbolic model checking. In R. Alur and T. A. Henzinger, editors, 8th Conference on Computer Aided Verification (CAV’96), pages 13–25. SpringerVerlag, Berlin, August 1996. LNCS 1102. [46] C. Meinel and T. Theobald. Algorithms and Data Structures in VLSI Design. Springer, Berlin, 1998. 66 [47] S.-I. Minato, N. Ishiura, and S. Yajima. Shared binary decision diagram with attributed edges for efficient boolean function manipulation. In Proceedings of the Design Automation Conference, pages 52–57, Orlando, FL, June 1990. [48] H. Ochi, K. Yasuoka, and S. Yajima. Breadth-first manipulation of very large binary-decision diagrams. In Proceedings of the International Conference on Computer-Aided Design, pages 48–55, Santa Clara, CA, November 1993. [49] S. Panda and F. Somenzi. Who are the variables in your neighborhood. In Proceedings of the International Conference on Computer-Aided Design, pages 74–77, San Jose, CA, November 1995. [50] S. Panda, F. Somenzi, and B. F. Plessier. Symmetry detection and dynamic variable ordering of decision diagrams. In Proceedings of the International Conference on Computer-Aided Design, pages 628–631, San Jose, CA, November 1994. [51] R. K. Ranjan, A. Aziz, R. K. Brayton, B. F. Plessier, and C. Pixley. Efficient BDD algorithms for FSM synthesis and verification. Presented at IWLS95, Lake Tahoe, CA., May 1995. [52] K. Ravi, K. L. McMillan, T. R. Shiple, and F. Somenzi. Approximation and decomposition of decision diagrams. In Proceedings of the Design Automation Conference, pages 445–450, San Francisco, CA, June 1998. [53] K. Ravi and F. Somenzi. High-density reachability analysis. In Proceedings of the International Conference on Computer-Aided Design, pages 154–158, San Jose, CA, November 1995. [54] R. Rudell. Dynamic variable ordering for ordered binary decision diagrams. In Proceedings of the International Conference on Computer-Aided Design, pages 42–47, Santa Clara, CA, November 1993. [55] J. V. Sanghavi, R. K. Ranjan, R. K. Brayton, and A. Sangiovanni-Vincentelli. High performance BDD package based on exploiting memory hierarchy. In Proceedings of the Design Automation Conference, pages 635–640, Las Vegas, NV, June 1996. [56] H. Savoj, R. K. Brayton, and H. J. Touati. Extracting local don’t cares for network optimization. In Proceedings of the International Conference on Computer-Aided Design, pages 514–517, Santa Clara, CA, November 1991. [57] T. R. Shiple. Formal Analysis of Synchronous Circuits. PhD thesis, University of California at Berkeley, Electronics Research Laboratory, College of Engineering, University of California, Berkeley, CA 94720, October 1996. Memorandum No. UCB/ERL M96/76. [58] T. R. Shiple, R. Hojati, A. L. Sangiovanni-Vincentelli, and R. K. Brayton. Heuristic minimization of BDDs using don’t cares. In Proceedings of the Design Automation Conference, pages 225–231, San Diego, CA, June 1994. [59] H. Touati, H. Savoj, B. Lin, R. K. Brayton, and A. Sangiovanni-Vincentelli. Implicit enumeration of finite state machines using BDD’s. In Proceedings of the IEEE International Conference on Computer Aided Design, pages 130–133, November 1990. [60] B. Yang, R. E. Bryant, D. R. O’Hallaron, A. Biere, O. Coudert, G. Janssen, R. K. Ranjan, and F. Somenzi. A performance study of BDD-based model checking. In Formal Methods in Computer Aided Design, November 1998. [61] B. Yang, Y.-A. Chen, R. Bryant, and D. O’Hallaron. Space- and time-efficient BDD construction via working set control. In Proc. of Asia and South Pacific Design Automation Conference (ASPDAC’98), Yokohama, Japan, February 1998. 67