Type-Safe Modular Hash-Consing
Jean-Christophe Filliâtre

Sylvain Conchon

LRI
Université Paris Sud 91405 Orsay France
filliatr@lri.fr

LRI
Université Paris Sud 91405 Orsay France
conchon@lri.fr

Abstract
Hash-consing is a technique to share values that are structurally
equal. Beyond the obvious advantage of saving memory blocks,
hash-consing may also be used to speed up fundamental operations
and data structures by several orders of magnitude when sharing
is maximal. This paper introduces an O CAML hash-consing library
that encapsulates hash-consed terms in an abstract datatype, thus
safely ensuring maximal sharing. This library is also parameterized
by an equality that allows the user to identify terms according to an
arbitrary equivalence relation.
Categories and Subject Descriptors
ing]: Coding Tools and Techniques
General Terms
Keywords

D.2.3 [Software engineer-

Design, Performance

Hash-consing, sharing, data structures

1. Introduction
Hash-consing is a technique to share purely functional data that are
structurally equal [8, 9]. The name hash-consing comes from Lisp:
the only allocating function is cons and sharing is traditionally
realized using a hash table [2]. One obvious use of hash-consing
is to save memory space.
Hash-consing is part of the programming folklore but, in most
programming languages, it is more a design pattern than a library.
The standard way of doing hash-consing is to use a global hash
table to store already allocated values and to look for an existing
equal value in this table every time we want to create a new value.
For instance, in the Objective Caml programming language1 it
reduces to the following four lines of code using hash tables from
the O CAML standard library:
let table = Hashtbl.create 251
let hashcons x =
try Hashtbl.find table x
with Not_found → Hashtbl.add table x x; x
The Hashtbl module uses the polymorphic structural equality and
a generic hash function. The initial size of the hash table is clearly
1 We

use the syntax of Objective Caml (O CAML for short) [1] throughout
this paper, but this could be easily translated to any ML implementation.

Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
ML’06 September 16, 2006, Portland, Oregon, USA.
Copyright c 2006 ACM 1-59593-483-9/06/0009. . . $5.00.

context dependent and we will not discuss its choice in this paper—
anyway, choosing a prime number is always a good idea.
As a running example of a datatype on which to perform hashconsing, we choose the following type term for λ-terms with de
Bruijn indices:
type term =
| Var of int
| Lam of term
| App of term × term
Instantiated on this type, the hashcons function has the following
signature:
val hashcons : term → term
If we want to get maximal sharing—the property that two values
are indeed shared as soon as they are structurally equal—we need
to systematically apply hashcons each time we build a new term.
Therefore it is a good idea to introduce smart constructors performing hash-consing:
let var n = hashcons (Var n)
let lam u = hashcons (Lam u)
let app (u,v) = hashcons (App (u,v))
By applying var, lam and app instead of Var, Lam and App directly, we ensure that all the values of type term are always hashconsed. Thus maximal sharing is achieved and physical equality
(==) can be substituted for structural equality (=) since we now have
x = y ⇐⇒ x == y.
In particular, the equality used in the hash-consing itself can now be
improved by using physical equality on sub-terms, since they are already hash-consed by assumption. To do such a bootstrapping, we
need custom hash tables based on this new equality. Fortunately,
the O CAML standard library provides generic hash tables parameterized by arbitrary equality and hash function. To get custom hash
tables, we simply need to define a module that packs together the
type term, an equality and a hash function
module Term = struct
type t = term
let equal x y = match x,y with
| Var n, Var m → n == m
| Lam u, Lam v → u == v
| App (u1,u2), App (v1,v2) →
u1 == v1 && u2 == v2
| → false
let hash = Hashtbl.hash
end
and then to apply the Hashtbl.Make functor:
module H = Hashtbl.Make(Term)

tag : int;
hkey : int;

Finally, we can redefine our hash-consing function in order to use
this improved implementation:
let table = H.create 251
let hashcons x =
try H.find table x
with Not_found → H.add table x x; x
Given a good hash function, the above implementation of hashconsing is already quite efficient. But it still has several drawbacks:
1. There is no way to distinguish between the values that are hashconsed and those which are not. Thus the programmer has to
carefully use the smart constructors only and never the true
constructors directly. Of course, the type term could be made
abstract (but then we would lose the pattern-matching facility)
or better a private datatype (where pattern-matching is allowed
but direct construction is not).
2. Maximal sharing allows us to use physical equality instead of
structural equality, which can yield substantial speedups. However, we would also like to improve data structures containing
hash-consed terms, such as sets or dictionaries implemented using hash tables or balanced trees for instance. We could think
of using the pointers to get O(1) hashing or total ordering functions over hash-consed terms (as done in [10]) but this is not
possible for two reasons. First, the O CAML compiler does not
give access to the pointer value. Second, the O CAML garbage
collector is likely to move allocated blocks underneath, thus
changing the values of pointers.
3. There is no possibility for the garbage collector to reclaim terms
that are not alive anymore (from the user point of view) since
they are referenced in the hash-consing table. If this table is
defined at toplevel (and it usually is), the hash-consed terms
will never be collected.
4. Our implementation of the hash-consing function computes
twice the hash value associated to its argument when it is not
in the hash table: once to look it up and once to add it2 . It is also
wasting space in the buckets of the hash table by storing twice
the pointer to x.
In this paper, we present an O CAML hash-consing library addressing all these issues. Section 2 introduces the design and the
implementation of this library. Then Section 3 presents several case
studies and reports on the time and space performance effects of
hash-consing.

}
The field node contains the value itself and the field tag the unique
integer. The private modifier prevents the user from building
values of type hash consed manually, yet allowing field access
and pattern matching over this type. The field hkey stores the hash
key of the hash-consed term; it is justified in the implementation
section below.
The library consists in a functor Make taking a type equipped
with equality and hashing functions as argument and returning a
data structure for hash-consing tables. The input signature is the
following:
module type HashedType = sig
type t
val equal: t × t → bool
val hash: t → int
end
with the implicit assumption that the hashing function is consistent
with the equality function i.e. hash x = hash y as soon as
equal (x, y) holds. Then the functor looks like:
module Make(H : HashedType) : sig
type t
val create : int → t
val hashcons : t → H.t → H.t hash_consed
end
The datatype t for hash-consing tables is abstract. create n builds
a new hash-consing table with initial size n. As for traditional hash
tables, this size is somewhat arbitrary (the table will be resized
when necessary). Then, if h is a hash-consing table, hashcons h t
builds the hash-consed value for a given term t.
In practice, there is also a function iter to iterate over all the
terms contained in a hash-consing table and a function stats to
get statistics on a table (number of entries, biggest bucket length,
etc.). There is also a non-functorial version of the library, based on
structural equality and the generic O CAML hash function, but this
is clearly less useful (these two functions do not exploit the sharing
of sub-terms).
Due to the private type hash consed, maximal sharing is now
easily enforced by type checking (as long as we use a single hashconsing table, of course). As a consequence, the following holds:
x = y ⇐⇒ x == y ⇐⇒ x.tag = y.tag

(1)

2. A hash-consing library
This section presents an O CAML hash-consing library. We first
detail its design and interface, then its practical use and finally its
implementation.
2.1 Design and interface
The main idea is to tag the hash-consed values with unique integers.
First, it introduces a type distinction between the values that are
hash-consed and those that are not. Second, these tags can be used
to build efficient data structures over hash-consed values, such
as hash tables or balanced trees. For the purpose of tagging, we
introduce the following record type:
type α hash_consed = private {
node : α;
2 On top of that,

O CAML’s hash tables also waste time recomputing the hash
values when resizing the hash tables, contrary to SML’s hash tables where
hash keys are stored in the buckets. We also address this issue in the next
section.

2.2 Usage
Before entering the implementation details of the hash-consing library, let us demonstrate its use on our running example. First,
we need to adapt our datatype for λ-terms to interleave the
hash consed nodes with our original definition:
type term = term_node hash_consed
and term_node =
| Var of int
| Lam of term
| App of term × term
It is important to notice that this modification is not intrusive at all
(from the syntactic point of view) and applies to mutually recursive
types as well. Indeed, the definition of a set of types such as
type t1 = hdefinition1 i
..
.
and tn = hdefinitionn i

simply needs to be turned into
type t1 = t1 node hash consed and t1 node = hdefinition1 i
..
.
and tn = tn node hash consed and tn node = hdefinitionn i
and we notice that the type definitions hdefinitioni i are left unchanged.
Then we can equip the type term node with suitable equality
and hashing function in order to apply the hash-consing functor. As
we did in Section 1, equality may safely use physical equality on
sub-terms and thus runs in O(1). The hashing function can also be
improved to run in constant time by making use of the hash keys
for the sub-terms (the hkey field) or equivalently their tags (the tag
field). Altogether, we get the following module definition
module Term_node = struct
type t = term_node
let equal t1 t2 = match t1, t2 with
| Var i, Var j → i == j
| Lam u, Lam v → u == v
| App (u1,v1), App (u2,v2) →
u1 == u2 && v1 == v2
| → false
let hash = function
| Var i → i
| Lam t → abs (19 × t.hkey + 1)
| App (u,v) →
abs (19 × (19 × u.hkey + v.hkey) + 2)
end
We get an implementation of hash-consing tables for this type via
a simple functor application:
module Hterm = Hashcons.Make(Term_node)
Then we can define a global hash-consing table and smart constructors for Var, Lam and App performing hash-consing, as follows:
let
let
let
let

ht = Hterm.create 251
var n = Hterm.hashcons ht (Var n)
lam u = Hterm.hashcons ht (Lam u)
app (u,v) = Hterm.hashcons ht (App (u,v))

These “constructors” have the expected types, namely
val var : int → term
val lam : term → term
val app : term × term → term
and thus, from this point, the user can simply ignore the hashconsing mechanics that is performed underneath when building
terms.
We can now exploit the unique tags to build efficient data structures over terms. For instance, these tags define a total ordering
over terms and we can use it to build balanced binary trees containing terms. Using functors from O CAML ’s standard library, we get
implementations of sets of terms and dictionaries indexed by terms
as follows3 :
module OrderedTerm = struct
type t = term
let compare x y =
Pervasives.compare x.tag y.tag
end
module TermSet = Set.Make(OrderedTerm)
module TermMap = Map.Make(OrderedTerm)
3 Pervasives.compare

integer overflows.

is to be preferred to a subtraction due to possible

The modules Set and Map have fairly good performances. But since
elements are actually (tagged with) integers, we can build even
more efficient data structures such as Patricia trees [12]. Our library
comes with Patricia trees-based sets and dictionaries specialized for
hash-consed values.
Similarly, we could build efficient hash tables indexed by terms,
using the field hkey (or even the tag) as an immediate hash value.
2.3 Implementation
Up to now, we have solved issues 1 and 2 that were listed Section 1
(a distinct type for hash-consed values and efficient data structures
based on physical equality). We now address issues 3 and 4.
Regarding time and space inefficiency related to the use of hash
tables from O CAML ’s standard library, the first improvement is to
build a custom hash table where buckets are simply lists of values,
and not mapping of values to themselves, saving one pointer for
each value. Then the next improvement is to write a single lookupor-insert function that computes the hash key only once. Finally, the
last improvement is to record the hash key (in the field hkey which
is exposed to the user) to avoid recomputing it when resizing the
table (if required).
Regarding the ability for the garbage collector to reclaim the
hash-consed values that are not referenced anymore (from anywhere else than the hash-consing table), the solution is to use weak
pointers. A weak pointer is precisely a reference that does not prevent the garbage collector from erasing the value it points to. Said
otherwise, a memory cell may be collected as soon as it is referenced only by weak pointers. The O CAML standard library provides arrays of weak pointers [1] where the access operation may
return either Some x when there is some available element x and
None otherwise (which means that the element has been reclaimed
by the GC at some point in the past).
Combining the ideas of a custom hash table and the use of weak
pointers, we get a weak hash table 4 , that is a hash table where the
buckets are arrays of weak pointers. The beginning of our hashconsing functor is thus as follows:
module Make(H : HashedType) = struct
type t = {
table : H.t hash_consed Weak.t array;
...
}
We omit the other fields that contain irrelevant data related to the
resizing heuristics. The insertion and resizing code is standard and
we give the main ideas only. The sole difference with respect to
O CAML weak hash tables is that we do no need to (re)compute the
hash key when inserting or resizing since it is contained in the data
itself.
let rec resize t =
... increase the size of the main array
and redistribute the elements in the
new buckets
and add t d =
let index =
d.hkey mod (Array.length t.table)
in
... lookup for an empty slot in the
bucket t.data.(index)
... if found then insert d
else increase the bucket and insert d
4 The O CAML standard library also provides weak hash tables and we
borrowed most code from this implementation.

... if the limit is reached then resize t
A SSUME

Then the main hashcons operation consists in a single lookup-orinsert function. The pseudo-code is as follows:
let hashcons t d =
let hkey = H.hash d in
let index =
hkey mod (Array.length t.table)
in
... lookup in the bucket t.data.(index)
for a value v such that H.equal v.node d
... if found then return v else
... let n = { hkey = hkey; tag = newtag ();
node = d } in
add t n;
n
where newtag is a function returning distinct integers. It may seem
rather inefficient to first scan the bucket for an equal value already
hash-consed and then, in case of a failure, to scan it again for an
empty slot to insert it. We could indeed remember any empty slot
encountered while scanning for an equal value and then use it for
insertion, if any. But in practice this is not worth maintaining this
information. Indeed, the buckets are quite small (we start with 3
elements buckets and add only 3 new slots each time we increase
them).

3. Case Studies
This section demonstrates the benefits of hash-consing on several
case studies.
3.1 Reducing λ-terms
This first case study uses our running example to perform a little benchmark that is representative of massive symbolic computations, as arising for instance in proof assistants. The benchmark is
inspired by Huet’s Constructive Computation Theory [11]. It consists in running a pure λ-calculus version of the quicksort algorithm
on a λ-term encoding a list of integers, themselves represented using Church’s numerals encoding (see [11], Section 2.2.1, pages 26–
28). The datatype for λ-terms is the one we have already used as
our running example. Only a very small code excerpt from [11]
is needed to perform the computation, namely the four functions
lift (lifting), subst (substitution), hnf (head normal form) and
nf (normal form), all of them totalizing 31 lines of code. The list
that is sorted is the 6 element list [0;3;5;2;4;1]. It may seem
rather small but it already involves 1.6 million elementary substitutions (due to an exponential complexity).
The code was compiled with the O CAML native-code compiler
(ocamlopt) on a Pentium 4 processor running under Linux. The
timings are given in seconds and correspond to CPU time. The
memory use corresponds to the maximum size reached by the major heap, in kilobytes, as reported by the Gc.stat library function.
Each program is run five times in a row and the median of the last
three runs is reported.
We first run the computation with and without hash-consing.
The results are the following:
time
memory use

without hash-consing
91.5 s
1,680 kb

with hash-consing
195 s
480 kb

As expected, the use of hash-consing greatly reduces the amount
of memory used (more than 3 times less) and this has a cost (the
program is more than 2 times slower). Indeed, much time is spent
in lookups while terms are created. It is a usual tradeoff between

B CP

Γ, l ⊢ ∆
Γ ⊢ ∆, l

8
Γ, l ⊢ ∆, C
>
>
>
>
< Γ, l ⊢ ∆, l̄ ∨ C
>
>
>
>
:

U NSAT

Γ, l ⊢ ∆
Γ, l ⊢ ∆, l ∨ C

Γ, l ⊢ ∆
Γ, l̄ ⊢ ∆
Γ⊢∆

Figure 1. Abstract version of DPLL
time and space. Though it may seem disappointing, the best is to
come.
Then we add memoization (also know as dynamic programming): each of the four functions lift, subst, hnf and nf records
its results in a hash table to avoid performing the same computation
twice. For the code without hash-consing, we use traditional hash
tables based on structural equality and a structural hash function.
But for the code where terms are hash-consed we use hash tables
indexed by the integers tags. The results are the following:
time
memory use

without hash-consing
54 s
95,300 kb

with hash-consing
5.06 s
720 kb

Now the version with hash-consing is much faster than its counterpart (more than 10 times faster) but it is also using much less memory. When compared to the original version without hash-consing
nor memoization, the final code with both features enabled performs much better regarding time and space. This example is typical of the use of hash-consing to improve not only the memory
use but, above all, the efficiency of data structures involving hashconsed terms.
3.2 Boolean Satisfiability Checking
We show in this section how the hash-consing library presented so
far can be used to write an efficient SAT-solver.
3.2.1 SAT-solvers
SAT-solvers decide the satisfiability of propositional formulas. Traditionally, they are based on the DPLL procedure [7, 6] for which
we give an abstract version Figure 1. The state of the procedure is
represented by a sequent Γ ⊢ ∆ where Γ is a set of literals (propositional variables or their negations) and ∆ is a set of clauses (disjunctions of literals).
A propositional formula F is satisfiable if and only if a sequent
Γ ⊢ ∅ can be derived from the initial sequent ∅ ⊢ ∆ where ∆ is the
conjunctive normal form (CNF) of F .
Putting a propositional formula in CNF may cause an exponential blow-up in size. We show in the next section how the use of
our hash-consing library allows us to solve elegantly the CNF conversion problem. Then Section 3.2.3 shows the implementation of
a SAT-solver on top of this CNF conversion.
3.2.2 Equisatisfiable CNF
A well-known solution to the CNF blow-up consists in introducing
proxy-variables (proxies for short) for all subformulas of the input
formula F and to initialize the solver with all the clauses defining
these proxies plus the proxy variable for F . The set of clauses thus
obtained is equisatisfiable to the original formula.

For instance, introducing a proxy X for a sub-formula P ∧ Q
(where P and Q are propositional variables) amounts to introducing the three following proxy-clauses (pclauses for short)
¬X ∨ P
¬X ∨ Q
X ∨ P̄ ∨ Q̄
whose conjuction is equivalent to X ↔ (P ∧ Q). Now, in order
to restrict further the number of clauses manipulated by the solver,
these pclauses can be incrementally introduced in the context of
the solver at the moment where the proxy X or its negation ¬X
are asserted. In that case, it is obvious that only the clause P̄ ∨
Q̄ has to be introduced when ¬X is assumed and only P and
Q when X is asserted. The following table shows the pclauses
introduced by the solver when it asserts a proxy X (or its negation)
representing formulas made with standard boolean connectives (P
is a propositional variable, X, Y and Z are proxies).
Proxy
X ↔P
X ↔Y ∧Z
X ↔Y ∨Z
X ↔ (Y → Z)

X asserted
{P }
{Y } {Z}
{Y ∨ Z}
{¬Y ∨ Z}

¬X asserted
{P̄ }
{¬Y ∨ ¬Z}
{¬Y } {¬Z}
{Y } {¬Z}

We notice that a pclause is either a disjunction of two proxies,
a set of two unit clauses (each of them containing a proxy), or a
single unit clause (with a propositional variable). From that remark,
we follow the same steps as in Section 2.2 to define the following
types for representing equisatisfiable CNF formulas.
type pclause =
C of t×t | U of t×t | L of string×bool
and view = { pos : pclause; neg : pclause}
and t = view Hashcons.hash_consed
Hash-consed values of type t represent proxies. The type
pclause is the type of pclauses. Its constructor C stands for
pclauses made of two proxies, U represents the conjunction of two
unit clauses containing proxies and L is used for building unit
clauses containing literals. Record values of type view contain the
pclauses that should be added when asserting (positively or negatively) a proxy.
In order to improve sharing, we define an equality relation
(and a suitable hashing function) that attempts to identify some
subformulas that are logically equivalent. For instance, proxies
for X → Y and ¬X ∨ Y would refer to the same variable.
We get an implementation of hash-consing tables by applying the
Hashcons.Make functor to the following module:
module View = struct
open Hashcons
type t = view
let eqc c1 c2 = match c1,c2 with
U(f1,f2) , U(g1,g2)
| C(f1,f2) , C(g1,g2) →
f1==g1 && f2==g2 || f1==g2 && f2==g1
| L(x1,b1) , L(x2,b2) → x1=x2 && b1=b2
| → false
let equal f1 f2 =
eqc f1.pos f2.pos && eqc f1.neg f2.neg
let hashc acc = function
U(f1,f2) | C(f1,f2) →
let min = min f1.tag f2.tag in
let max = max f1.tag f2.tag in
(acc×19 + min)×19 + max
| L as z → Hashtbl.hash z

let hash f = abs (hashc (hashc 1 f.pos) f.neg)
end
module H = Hashcons.Make(View)
Then, we introduce a global table to store the equisatisfiable
CNF and the corresponding smart constructors.
open Hashcons
let tbl = H.create 251
let view t = t.node
let compare f1 f2 = compare f1.tag f2.tag
let equal f1 f2 = f1.tag == f2.tag
let mk_atom a =
H.hashcons tbl ({pos=L(a,true);neg=L(a,false)})
let mk_not f = let f = view f in
H.hashcons tbl ({pos=f.neg;neg=f.pos})
let mk_and f1 f2 = if equal f1 f2 then f1 else
H.hashcons tbl
{pos=U(f1,f2); neg=C(mk_not f1,mk_not f2)}
let mk_or f1 f2 = ...
let mk_imp f1 f2 = ...
A slight improvement is made here so that the proxies returned for
X ∧ X and X ∨ X are both X.
It is then easy to define a function cnf : formula → t that
computes the CNF conversion of a propositional formula by recursively applying the smart constructors mk and, mk imp, . . .
3.2.3 Implementation
We now assume that the code presented in the previous section is
in a module Cnf. The state of our SAT solver is represented by the
set of proxies that have been assumed and a list of pair of proxies:
module S = Set.Make(Cnf)
type t = { gamma : S.t ;
delta : (Cnf.t×Cnf.t) list}
The core of the solver is a pair of two mutually recursive functions
assume and bcp. The former assumes a proxy and thus adds the
corresponding pclauses to the context. The latter performs boolean
constraint propagation.
exception Sat
exception Unsat
let rec assume env f =
if S.mem (Cnf.mk_not f) env.gamma then
raise Unsat;
if S.mem f env.gamma then env
else
let env =
{ env with gamma = S.add f env.gamma }
in
match Cnf.view f with
Cnf.Proxy {Cnf.pos=Cnf.U(f1,f2)} →
assume (assume env f1) f2
| Cnf.Proxy {Cnf.pos=Cnf.C(f1,f2)} →
bcp { env with
delta=(f1,f2)::env.delta }
| → bcp env
and bcp env =
let cl , u =
List.fold_left
(fun (cl,u) (f1,f2) →
if S.mem f1 env.gamma

|| S.mem f2 env.gamma
then (cl,u)
else if S.mem (Cnf.mk_not f1) env.gamma
then (cl,f2::u)
else if S.mem (Cnf.mk_not f2) env.gamma
then (cl,f1::u)
else (f1,f2)::cl , u
) ([],[]) env.delta

10000

100
10

in
List.fold_left assume {env with delta=cl} u
We note that all operations on Γ are efficient thanks to hash-consing
since the comparison of two proxies is a constant time operation.
The main function of the solver performs the case splitting:
let rec unsat f env =
try
let env = assume env f in
match env.clauses with
[] → raise Sat
| (a,b)::l →
unsat a {env with delta=l};
unsat (Cnf.mk_not a)
(assume {env with delta=l} b)
with Unsat → ()
Finally, the function is sat : formula → bool checks the satisfiability of a formula:

1
0.1
0.01
2

10

12

14

16

with hash-consing
without hash-consing

14
12
10
8

3.2.4 Benchmarks

0

The second one is the famous pigeon-hole principle saying that
if n + 1 pigeons occupy n holes then at least one hole contains two
pigeons:

8

16

6

i=0

6

4

Figure 2. Benchmarking deb(n) formulas

let is_sat f =
try
unsat (cnf f) {gamma=S.empty;delta=[]}; false
with Sat → true

We perform some quick benchmarks of this SAT solver on two
different kinds of valid formulas whose sizes are parameters. The
first one is due to de Bruijn and claims that, given an odd number of
boolean variables on a circular list, there are at least two adjacent
variables with the same value:
!
2n
^
deb(n) =
(pi ↔ pi+1 mod 2n ) → c → c

with hash-consing
without hash-consing

1000

4
2
0

5

10

15

20

25

30

35

Figure 3. Benchmarking ph(n) formulas
• Node(v, l, h), representing the formula (v∧l)∨(¬v∧h), where

v is a variable and l and h two BDDs.
A BDD is said to be reduced and ordered (with respect to a given
order over variables) whenever the following properties hold:
• on any path from the root to a leaf, the variables respect the

given order;
• no two distinct nodes are structurally equal (maximal sharing);

ph(n) =

n+1
^

n
_

xp,h

p=1 h=1

!

→

h n+1
_
_
_ n+1

xp,h ∧ xq,h

h=1 p=1 q=1

Figures 2 and 3 show the timings for two versions of the SAT
solver, one with hash-consing as presented in the previous sections
and one with disabled hash-consing. As we can see, turning hashconsing on is always a winning strategy. On the first example, we
even observe a different asymptotic behavior.
3.3 Binary Decision Diagrams
In this section, we show how to use our hash-consing library
to quickly implement a Binary Decision Diagrams (BDD) package [5]. The purpose is not to build a competitive BDD library
but to show how two main features of BDDs, namely sharing of
equal trees and dynamic programming, are provided for free by our
library. A BDD is a dag representing a boolean formula. It is either
• Zero, representing false,
• One, representing true, or

• no node Node(v, l, h) is such that l = h.

One fundamental property of (reduced ordered) BDDs is that each
propositional formula has a unique representation. As a consequence, a formula represented as a BDD is valid if and only if it
is reduced to One and is satisfiable if and only if it is not reduced
to Zero. Said otherwise, all the computation is concentrated in the
BDD building.
In the following, we show how our hash-consing library removes most of the implementation difficulty in the design of a simple BDD library (see for instance Andersen’s lecture notes [3] for
an introduction to BDD algorithms). To build a datatype of hashconsed values, we follow the same steps as in Section 2.2. We first
define the O CAML type for BDDs:
type variable = int
type bdd =
node hash_consed
and node =
| Zero | One

apply op b1 b2 =
if b1 and b2 are 0 or 1 then return op b1 b2
if var b1 = var b2 then
mk (var b1 ) (apply op (low b1 ) (low b2 ))
(apply op (high b1 ) (high b2 ))
else if var b1 < var b2 then
mk (var b1 ) (apply op (low b1 ) b2 )
(apply op (high b1 ) b2 )
else
mk (var b2 ) (apply op b1 (low b2 )
(apply op b1 (high b2 ))
Figure 4. Applying a boolean operation to two BDDs
| Node of variable × bdd × bdd
Then we apply the Hashcons.Make functor to suitable equality and
hashing functions to get an implementation of hash-consing tables:
module Node = struct
type t = node
let equal x y = ...
let hash x = ...
end
module HC = Hashcons.Make(Node)
Then we introduce a global table to store the BDD nodes and the
corresponding hash-consing function:
let table = HC.create 251
let hashcons = HC.hashcons table
Finally we define the smart constructors for BDDs:
let zero = hashcons htable Zero
let one = hashcons htable One
let mk v low high =
if low == high then low
else hashcons htable (Node (v, low, high))
This last constructor takes care of building reduced BDDs and
maximal sharing is ensured by the hash-consing function.
To turn a propositional formula into a BDD, it is useful to rely
on the following generic operation
val apply :
(bool → bool → bool) → bdd → bdd → bdd
which applies an arbitrary binary boolean operator to two BDDs
while maintaining the variables ordering invariant. Then we will
have the usual connectives ∧, ∨ and → as particular instantiations.
The algorithm for apply is quite simple and is given Figure 4,
where var, low and high are the accessors for the three fields
of Node. To get an efficient implementation of this algorithm, one
has to add dynamic programming i.e. to remember the results of
previous calls, as we did Section 3.1. With BDDs as hash-consed
terms, it is immediate to build a hash table indexed by a pair of
BDDs (using physical equality and a O(1) hash function based on
tags):
module H2 = Hashtbl.Make(
struct
type t = bdd × bdd
let equal (u1,v1) (u2,v2) = ...
let hash (u,v) = ...
end)
and to use it to implement dynamic programming in apply:
let apply op b1 b2 =
let cache = H2.create 251 in

let rec apply u1 u2 =
try
H2.find cache (u1,u2)
with Not_found →
let res = ... in
H2.add cache (u1,u2) res; res
in
apply b1 b2
Putting all together, we get a minimal BDD library fitting in less
than 80 lines of O CAML code. If it cannot be compared with a stateof-the-art BDD library, its performances are not so bad, as shown
on the de Bruijn (deb(n)) benchmark:
n
time in seconds

100
0.19

200
2.25

300
7.99

400
18.5

500
30.4

13
52.0

14
150

and on the pigeonhole (ph(n)) benchmark:
n
time in seconds

10
2.28

11
7.45

12
22.1

4. Conclusion
We have presented an O CAML implementation of a type-safe modular hash-consing library. Our approach has many practical benefits with respect to an ad hoc implementation. First, it introduces
a type distinction between hash-consed values and normal values,
that statically ensures maximal sharing. Then, this library is parameterized by a user equality that allows to identify terms according
to an arbitrary equivalence relation. Finally, the library tags hashconsed values with unique integers that can be used to improve data
structures such as hash tables, sets and dictionaries.
Hash-consing is also used in slightly different contexts such as
ML runtimes where it is performed in a systematic way by the
garbage collector [4, 10]. Our approach is less intrusive and the
programmer is free to use hash-consing at relevant places in his
code.
Our library is a free software available at http://www.lri.
fr/∼filliatr/software.en.html, together with implementations of sets and dictionaries using Patricia trees following [12]. It
is already used in several O CAML applications (regular expression
library, rewriting toolbox, first-order decision procedure). It would
be interesting to see its effect on other existing O CAML applications
such as the Coq proof assistant.
Our library should be easily translated to any other ML dialect.
Indeed, x == y can be replaced by x.tag = y.tag, with only a
small efficiency loss, when pointer equality is not available. Furthermore, when weak pointers are not available, traditional hashtables can be substituted to weak hash-tables, at the extra cost of
possible memory leaks.
There are still open issues. First, values always have to be constructed before being searched for in the hash-consing table, even
when they happen to be already allocated. Thus the garbage collector still has to reclaim values that were unnecessarily allocated. To
avoid this pitfall, we could have one hash-consing function for each
constructor but such a library would require meta-programming.
Serialization is another issue. Indeed, the uniqueness of tags
cannot be preserved when hash-consed values are written to files
and reloaded afterwards. The user must implement his own input/output functions to rehash all serialized values (either by saving
untagged values or by ignoring tags at loading time).
Last, the pattern-matching over hash-consed values is less convenient, especially for deep patterns. It could be easily solved with
a native support for views [13] in O CAML .

Acknowledgments
We thank the anonymous referees for their detailed comments and
suggestions.

References
[1] The Objective Caml language. http://caml.inria.fr/.
[2] John Allen. Anatomy of Lisp. McGraw-Hill Book Compagny, 1978.
[3] Henrik Reif Andersen. An Introduction to Binary Decision Diagrams.
Lecture notes, 1998. http://www.itu.dk/people/hra/.
[4] Andrew W. Appel and Marcelo J. R. Gonçalves. Hash-consing
Garbage Collection. Technical Report CS–TR–412–93, Princeton
University, February 1993.
[5] Randal E. Bryant. Graph-based algorithms for Boolean function
manipulation. IEEE Transactions on Computers, 8(C-35):677–691,
1986.
[6] M. Davis, G. Logemann, and D. Loveland. A machine program for
theorem proving. Communications of the ACM, 5(7):394–397, July
1962.

[7] M. Davis and H. Putnam. A computing procedure for quantification
theory. Journal of the ACM, 7(3):201–215, July 1960.
[8] A. P. Ershov. On programming of arithmetic operations. Commun.
ACM, 1(8):3–6, 1958.
[9] Eiichi Goto. Monocopy and associative algorithms in extended Lisp.
Technical Report TR 74–03, University of Tokyo, May 1974.
[10] Jean Goubault. Implementing Functional Languages with Fast
Equality, Sets and Maps: an Exercise in Hash Consing. In Journées
Francophones des Langages Applicatifs (JFLA’93), pages 222–238,
Annecy, February 1993.
[11] Gérard Huet. Constructive Computation Theory. Courses notes.
Available at http://pauillac.inria.fr/∼ huet/CCT/.
[12] Chris Okasaki and Andrew Gill. Fast Mergeable Integer Maps. In
Workshop on ML, pages 77–86, September 1998.
[13] Philip Wadler. Views: A way for pattern matching to cohabit with data
abstraction. In 14th ACM Symposium on Principles of Programming
Languages, Munich, January 1987.