Deconstructing Process Isolation
Mark Aiken, Manuel Fähndrich, Chris Hawblitzel, Galen Hunt, James Larus
Microsoft Research
One Microsoft Way
Redmond, WA, 98052
+1 (425) 882-8080

{maiken, maf, chrishaw, galenh, larus} @microsoft.com

ABSTRACT

1. INTRODUCTION

Most operating systems enforce process isolation through hardware
protection mechanisms such as memory segmentation, page mapping,
and differentiated user and kernel instructions. Singularity is a new
operating system that uses software mechanisms to enforce process
isolation. A software isolated process (SIP) is a process whose
boundaries are established by language safety rules and enforced by
static type checking. SIPs provide a low cost isolation mechanism that
provides failure isolation and fast inter-process communication.

Process isolation is a fundamental function of most operating systems.
Isolation protects system integrity by preventing one process from
interfering with another’s, or the system’s, code or data, and by
preventing untrusted code from accessing protected resources.
Isolation also contributes to system resilience by providing failure
boundaries that permit part of a system to fail without compromising
the whole.

To compare the performance of Singularity’s SIPs against traditional
isolation techniques, we implemented an optional hardware isolation
mechanism. Protection domains are hardware-enforced address
spaces, which can contain one or more SIPs. Domains can either run
at the kernel’s privilege level or be fully isolated from the kernel and
run at the normal application privilege level. With protection
domains, we can construct Singularity configurations that are similar
to micro-kernel and monolithic kernel systems. We found that
hardware-based isolation incurs non-trivial performance costs (up to
25-33%) and complicates system implementation. Software isolation
has less than 5% overhead on these benchmarks.
The lower run-time cost of SIPs makes their use feasible at a finer
granularity than conventional processes. However, hardware isolation
remains valuable as a defense-in-depth against potential failures in
software isolation mechanisms. Singularity’s ability to employ
hardware isolation selectively enables careful balancing of the costs
and benefits of each isolation technique.

Categories and Subject Descriptors
D.4.1 [Operating Systems]: Process Management, D.4.7
[Operating Systems]: Organization and Design

General Terms
Performance, Design, Reliability, Experimentation

Keywords
Singularity, hardware protection domain, hardware isolated
process (HIP), software isolated process (SIP)

Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
MSPC’06, October 22, 2006, San Jose, CA, USA.
Copyright 2006 ACM 1-59593-578-9/06/0010…$5.00.

Most operating systems use a CPU’s memory management hardware
to provide process isolation, using two mechanisms. First, processes
are only allowed access to certain pages of physical memory. Second,
privilege levels prevent untrusted code from manipulating the system
resources that implement processes, for example, the memory
management unit (MMU) or interrupt controllers. These mechanisms’
non-trivial performance costs are largely hidden, since there is no
widely used alternative approach to compare them to. Mapping from
virtual to physical addresses can incur overheads up to 10–30% due to
exception handling, inline TLB lookup, TLB reloads, and
maintenance of kernel data structures such as page tables [29]. In
addition, virtual memory and privilege levels increase the cost of
inter-process communication.
As a consequence, most operating systems provide processes that are
too expensive to be used for fine-grain isolation, say between an
application and a code extension. To avoid this overhead, closelycoupled extensions are often loaded into the same address space as
their host, without any sort of isolation. Moreover, even when
components are separated into processes, the high cost of interprocess communication encourages use of shared memory, which
couples processes’ failure behavior [18].
The system architectures that result from these performance pressures
are a major source of problems in software reliability, security, and
compatibility [14]. Although extension code may be untrusted,
unverified, faulty, or even malicious, it is loaded into its host’s
address space with no hard interface or boundary between the two—
because the cost of traditional hardware protection mechanisms is too
high. The outcome is often unpleasant, because a failure in the
extension compromises or terminates the host. For example, Swift
reports that faulty device drivers cause 85% of diagnosed Windows
system crashes [38]. Moreover, without hard boundaries, extensions
can bypass public abstractions to access implementation internals,
which constrains future program evolution and compels extensive
compatibility testing.
To address these problems, we constructed a new operating system
called Singularity [27] that uses two mechanisms to offer isolation
with a wider range of cost-benefit tradeoffs. At one end of the range
are hardware isolated processes (HIPs), which are analogous to
processes in other operating systems. Hardware isolation is built on a
mechanism called a protection domain. At the other extreme are

Conventional OS
processes

kernel

Singularity (SIP-only)

Hardware-isolated address
space
(protection domain)
Software-isolated object
space
Process
(code +
data)

Device
Driver

Singularity (hybrid)

Figure 1. Alternative architectures enabled by software isolated
processes.
software isolated processes (SIPs), which use programming language
safety and system architecture to provide a less expensive isolation
mechanism. Singularity also supports hybrid combinations of
hardware and software isolation. One or more SIPs can reside in a
protection domain, in configurations ranging from pure SIP to pure
HIP. Figure 1 shows protection domains used to support combinations
of software and hardware isolation with different cost/benefit tradeoffs.
The design and implementation of a system based on SIPs is a major
contribution of this work. A software isolated process is a collection
of memory pages and a language safety mechanism that ensures that
code in a process cannot access another process’s pages. A SIP
replaces hardware memory protection with static verification of
program safety. Singularity uses language safety and a fast
communication mechanism built on channels [15] to enforce a
system-wide invariant that neither the kernel nor any other process
contains a reference into a given process’s object space. Because
different process’ object spaces always reside on disjoint memory
pages, memory reclamation is straightforward when processes
terminate.
This architecture provides fine-granularity isolation and high
performance. If a process fails, no other process’s data is left in an
inconsistent state, and failure notification is cleanly propagated
through communication channels. Without hardware protection,
system calls and inter-process communication run significantly faster
(30–500%) and a communication-intensive benchmark runs
approximately 20% faster.
A second major contribution of this work is a direct comparison of the
costs of hardware and software isolation mechanisms. Previous
studies implemented one or the other approach or used simulation to
quantify overhead. Because Singularity implements both isolation
mechanisms, we can directly compare costs on the same platform. We
found that the two main contributors to the cost of memory
management were establishing and maintaining the virtual to physical
mapping and changing privilege levels between applications and the
kernel. Changing address spaces incurred minor costs for microbenchmarks, but was a significant overhead for a macro-benchmark
with a large amount of communications and context switching. In
addition, hardware protection increased the cost and complexity of
inter-process communication.
An additional contribution of this work is an initial exploration of
Singularity’s ability to use hardware isolation selectively, rather than

at every process boundary. For example, system processes and device
drivers (each of which run in their own process) can — but need not
— reside in the same address space as the kernel. Using a single
address space permits fast communication, but still provides strong
memory and failure isolation. For example, a driver can fail without
crashing the system. Similarly, application processes may share an
address space with its extensions. The use of hardware protection,
however, provides a defense against errors in the implementation of
software isolation. Singularity’s flexibility allows this tradeoff to be
carefully balanced.
We emphasize that this paper analyzes the use of a processor’s virtual
memory hardware for process isolation and protection. Virtual
memory is often used for on-demand paging of memory to secondary
storage, and has other uses [4].These other uses are often orthogonal
to protection, and don’t necessarily require the full cost of hardware
process protection.
The rest of the paper is organized as follows. Section 2 describes
software isolated processes (SIPs). Section 3 describes the
implementation of Singularity’s protection domains and virtual
memory. Section 4 contains performance results. Section 5 concludes.

2. SOFTWARE ISOLATED PROCESSES
Software isolated processes (SIPs) use software verification, rather
than hardware protection, to isolate portions of a system. They rely on
verifying code’s safe behavior to prevent it from accessing another
process’s (or the kernel’s) instructions or data. A verifiably safe
program can only access data that it allocated or was passed, and it
cannot construct or corrupt a memory reference.1
This invariant is usually expressed as two simpler properties: type and
memory safety. Type safety ensures that the only operations applied
to a value are those defined for instances of its type. Memory safety
ensures the validity of memory references by preventing null pointer
references, references outside an array’s bounds, or references to
deallocated memory. These properties can be partially verified by a
compiler, but in some circumstances require run-time tests. Java and
C# are expressive, safe programming languages that can be compiled
with a small performance penalty for safety [16]. The overhead of
these tests in two large Singularity benchmarks is approximately 4.5%
(Section 4.4). Other modern languages, such as Perl, Python, and
Ruby, are also safe.
In Singularity, all untrusted code that runs in a SIP must be verifiably
safe. The SIP may also contain trusted, unverified code that cannot be
expressed in a safe language, for example, parts of the language
runtime, a memory allocator, or an accessor for memory-mapped I/O.
The correctness and safety of this type of code needs to be verified by
other means.
A SIP is a closed object space that resides on a collection of memory
pages exclusively owned by a process (Figure 2). An object space is
the complete collection of objects reachable by a program at a point
in its execution. Singularity ensures that two processes’ objects
spaces are always disjoint by not providing shared-memory
communication, and by using the language type system to prevent

1

Object allocation creates new references and requires trusted, unsafe
code. In Singularity, object allocation occurs within a trusted
runtime, which logically is an extension of the kernel. Verifying the
safety of a memory allocator or garbage collector is an interesting
open research question [43].

Kernel

Process 1

Process 2

Memory
pages
Exchange
Heap

Figure 2. Software Isolated Processes (SIPs).
processes from sending a object reference through inter-process
communication [15].
In Singularity, all inter-process communication occurs through
message passing across channels, which allows two processes to
exchange data through an area of memory called the exchange heap
(Figure 2) [15]. Data blocks in this heap are structs, not objects (i.e.,
they do not contain methods) and are not allowed to contain object
references. Messages can contain references to exchange heap data,
making it possible to send structured data between processes.
However, an object reference cannot be embedded in a message.
Moreover, the system maintains the invariant that there exists at most
one pointer to an item in the exchange heap. When a process sends a
message, it loses its reference to the message, which is transferred to
the receiving process (analogous to sending a letter by postal mail).
Therefore, processes cannot use this heap as shared memory, and
messages can be exchanged very efficiently through pointer passing,
not copying.
Unlike Singularity’s segregation of SIPs on disjoint memory pages,
other safe language systems have taken the approach of having
processes allocate memory from a single, shared heap. Singularity’s
design has several advantages. First, it reduces coupling between
processes by eliminating the shared memory allocator and garbage
collector and by permitting each process to allocate and reclaim its
own memory using the techniques that are appropriate. The large
number of existing garbage collection algorithms, and experience,
suggest that no one garbage collector is appropriate for all systems or
applications, so this flexibility helps achieve good system
performance [17]. A shared heap and garbage collector constrain the
object format and run-time system of every process that uses it.
Moreover, these shared facilities are a common point of failure.
Second, Singularity’s design facilitates process termination, as the
system can simply reclaim entire memory pages, rather than relying
on garbage collection to reclaim individual objects.
Memory pages from different SIPs can reside in the same address
space (physical or virtual), so that switching processes need not incur
the usual overhead of flushing the TLB. Moreover, since there are no
cross-process references, SIPs can equally well reside in distinct
address spaces, either to overcome a 32-bit address limitation, or to
enhance software isolation with hardware mechanisms (below).
Singularity also verifies that untrusted code does not contain
privileged instructions, so all processes can run at Ring 0 (kernel
level) on an x86 processor. As a consequence, when hardware
protection is not used, system calls and inter-process communication
on Singularity are up to an order of magnitude faster than other
systems (Section 3.4). With lower overheads, SIPs can be used at a
very fine granularity for failure containment.

2.1 Discussion
In Singularity, SIP isolation is built from language safety and system
design. Language safety is a desirable end in itself, as it helps
eliminate or mitigate software defects such as buffer-overrun attacks.
When safe languages are already in use, the isolation provided by
SIPs incurs little additional cost, beyond a deliberate loss of the
ability to share data between processes. However, because SIPs
preclude unsafe languages such as C++, they involve a potential loss
of language freedom. This has not been a problem in Singularity,
since Sing#, our slightly extended dialect of C#, is a flexible and
expressive language. While we implemented Singularity in Sing#, any
language with a compiler that emits type-safe MSIL could run on the
system (e.g., Visual Basic, F#, or Iron Python). We note in passing
that software fault isolation [42] could encapsulate unsafe code in a
SIP. However, SFI is less attractive since its run-time overhead is
higher than a well-implemented safe language, and SFI does not
correct defect-prone features of unsafe languages.
A serious concern with SIPs, however, is the correctness of the
implementation of language safety. At present, we depend on our
compiler (Bartok) to verify the type safety of a program, not introduce
errors during optimization, and produce run-time checks. Bartok uses
a typed intermediate language [9], which helps ensure that it
maintains type safety through the compilation process from typed
MSIL to untyped x86 code. Nevertheless, it is a complex, optimizing
compiler. We are working toward removing Bartok from the trusted
computing base by using typed assembly language (TAL) [19, 31]. A
TAL compiler produces a proof of the type safety of compiled x86
code along with the code itself. A properly structured proof of type
safety can be verified by a small, simple checker whose correctness is
far easier to ensure than a compiler’s. We are confident that TAL can
prevent unsafe code from executing on Singularity.
SIPs are also dependent on the correctness of trusted, unsafe code,
such as the runtime and, most notably, the garbage collector. We are
actively investigating techniques for verifying the correctness of these
parts of the system.
Hardware isolation is able to detect errors in trusted, unsafe code that
underpins software isolation, for which full verification of correctness
is not yet possible. This demonstrates its usefulness in systems where
the strongest possible isolation is desirable. Of course, hardware
protection comes at a cost. Section 3 describes how hardware
protection features are implemented in Singularity and Section 4
quantifies its overhead.
Another concern about SIPs is the possibility of hardware faults
corrupting a pointer value or a computation, which could cause a
program to violate type safety guarantees and subvert its SIP. The
same concern applies when using hardware protection.
Govindavajhala and Appel showed that memory errors could be used
to subvert type safety in a Java virtual machine [22]. Their
recommendation, which we share, is increased use of hardware error
detection and correction techniques, such as error correcting codes
(ECC) on memory and data paths, to ensure the assumption of correct
execution that underlies all program safety. Advances in
semiconductors are making transistors less reliable, so hardware error
detection and correction is likely to become more common in future
processors [32].
Singularity differs from other single address operating system, such as
Pilot, Cedar, Smalltalk, Lisp Machines, Oberon, or Inferno [12, 20,
35, 39, 44], which encouraged sharing objects between processes and
did not segregate a process’s objects. These systems presented a
programming model similar to threads in a process, rather than SIPs’

process-like, segmented object spaces. They also relied on garbage
collection to reclaim memory when a process terminates. Garbage
collection has a higher per-object overhead than reclaiming entire
pages, and can be thwarted by pointers to “dead” objects. These
systems also used a single garbage collector for all processes and did
not have the flexibility to select a collection algorithm appropriate to
a computation.
The JX system is similar to Singularity in many respects. It is a
microkernel system written almost entirely in a safe language (Java)
[21]. Processes on JX do not share memory and communicate through
synchronous RPC with deep copying of parameters. The processes
run in a single hardware address space and rely on language safety for
isolation. The primary differences between JX and Singularity are the
communication and extension mechanisms. Singularity processes
exchange data, not objects, which eliminates the need for
communicating processes to share a common object layout and
method bodies. Moreover, Singularity uses asynchronous message
passing over strongly typed channels, which is more flexible than
RPC, but still permits verification of communication behavior and
system-wide liveness properties [34]. Finally, Singularity does not
allow dynamic code loading, but instead runs extensions in their own
SIP(s).
Other systems implemented isolation mechanisms for Java. The JKernel implemented protection domains in a JVM process, provided
revocable capabilities to control object sharing, and developed clean
semantics for domain termination [24]. Luna refined the J-Kernel’s
run-time mechanisms with an extension to the Java type system that
distinguishes shared data and permits control of sharing [25].
KaffeOS provides a process abstraction in a JVM along with
mechanisms to control resource utilization in a group of processes
[5]. Java incorporated some of these ideas into isolates [33], which
are similar to AppDomains in Microsoft’s CLR. Singularity differs
from these systems in several ways. It eliminates the duplication of
resource management and isolation mechanisms between an operating
system and language runtime system by integrating the two. It also
segregates a process’s objects, so that a terminated process’s memory
can be reclaimed without garbage collection. SIPs also strictly prevent
sharing, which provides a greater amount of isolation and fault
tolerance and permits each process to have a fully independent
language runtime system, even to the extent of entirely different
runtimes and garbage collectors.

3. HARDWARE PROTECTION DOMAINS
For Singularity, hardware protection offers a supplemental layer of
protection beyond the isolation provided by SIPs. Hardware isolation
offers the ability to detect violations of software-isolation
mechanisms, providing a potentially valuable defense-in-depth.
Singularity implements hardware isolation through a protection
domain (domain for short), which is a hardware-enforced protection
boundary that can host one or more SIPs. Each protection domain
consists of a distinct virtual address space. The processor’s MMU
enforces memory isolation in a conventional manner. Each domain
has its own exchange heap, which is used for communications
between SIPs within the domain. A protection domain that does not
isolate its SIPs from the kernel is called a kernel domain. All SIPs in
a kernel domain run at the processor’s supervisor privilege level
(“Ring 0” on the x86 architecture), and share the kernel’s exchange
heap, thereby simplifying transitions and communication between the
processes and the kernel.
Communication within a protection domain continues to use
Singularity’s efficient reference-passing scheme. However, because

each protection domain resides in a separate address space,
communication across domains involves data copying. The messagepassing semantics of Singularity channels makes the two
implementations indistinguishable to application code (except for
performance).
A protection domain could, in principle, host a single process
containing unverifiable code written in an unsafe language such as
C++. We have not explored this possibility. Rather, protection
domains always contain a SIP, which continues to provide isolation
and a failure boundary in case the original process code loads
extensions, libraries, or other code.
Because multiple SIPs can be hosted within a protection domain,
domains can be employed selectively to provide hardware isolation
between specific processes, or between the kernel and processes. The
mapping of SIPs to protection domains is determined by system
policy configuration and can be changed by an administrator. A
Singularity system with a distinct protection domain for each process
is analogous to a traditional hardware-isolated microkernel system. A
Singularity system with a kernel domain hosting the file system,
network stack, device drivers, and kernel is analogous to a
conventional, monolithic operating system, but is more resilient to
driver or subsystem failures because each component is contained in a
SIP.

3.1 Virtual Memory
Virtual memory is the primary mechanism that a traditional operating
system uses to enforce isolation between processes. Singularity uses
virtual memory in a similar way to implement protection domains.
Singularity only uses virtual memory for protection and does not page
memory to disk.
As with many other systems, Singularity organizes its memory into
two main regions: the kernel range and the process, or “user,” range.
The user memory range in a given protection domain maps to
different physical pages than any other protection domain’s user
range, which ensures hardware-enforced isolation. Every domain’s
address space includes an identical mapping for the kernel range of
memory, which ensures that kernel data structures are always
addressable, regardless of which process is running. Singularity uses
a processor feature that marks the memory-mapping entries for the
kernel range of memory as “global,” which indicates that they should
not be flushed from the TLB when the TLB is implicitly invalidated
during an address-space switch.
Compared to the basic Singularity configuration, which only uses
physical memory, satisfying a kernel or process request to allocate a
(virtually) contiguous range of memory is significantly more complex.
First, the kernel must find a suitable, unused range of virtual memory
addresses. Next, the kernel must find an unused physical page of
memory for each page within the range, and manipulate the mapping
structures to map the two addresses. The additional complexity causes
a roughly five-fold increase in memory-management cost (Section
4.1).
Virtual memory also imposes additional costs on some context
switches. The Singularity scheduler saves and restores the processor
context when switching between runable threads. When scheduling a
thread from a different protection domain, Singularity loads the
processor control register that controls the address space, which
invalidates all non-global TLB entries. As a result, all references to
the user memory range incur additional memory accesses to refill the
TLB. A system running a large number of protection domains (and

therefore address spaces) will incur overhead due to TLB misses. This
effect is quantified in Section 4.3.

3.2 Privilege Modes
Singularity can behave like most systems and ensure the integrity of
the process/kernel hardware boundary by running processes outside a
kernel domain at a lower privilege level (“Ring 3” on x86 processors).
Since the instructions that manipulate the virtual memory mappings
are invalid at lower privilege levels, this mechanism ensures that the
operating system retains sole control of memory protection. In
addition, the kernel range pages are marked as inaccessible to
unprivileged code, which ensures the integrity of kernel memory.
A major cost of the privilege mechanism arises when a process
invokes the kernel. When a Singularity process is running in a kernel
protection domain, and therefore executing at elevated privilege, the
kernel can be invoked with the same procedure-call mechanism that a
process uses for internal calls. The only additional overhead is the
bookkeeping necessary to mark a transition between two garbagecollection domains. This demarcation on the stack ensures that the
kernel’s garbage collector does not traverse process data structures
(and vice versa). A process running in a non-kernel protection domain
(at lowered privilege) incurs additional costs. It must use an
instruction that invokes the kernel while simultaneously and safely
elevating the processor’s privilege level. Singularity uses the sysenter
/ sysexit mechanism offered by modern x86-class processors. This
pair of instructions offers a streamlined mechanism for invoking
privileged code from an unprivileged context, but the process remains
considerably more expensive than a simple procedure call. The fourfold cost increase incurred by privilege transitions is detailed in
Section 4.1.

3.3 Inter-process Communication
Singularity allows the efficient exchange of structured data (but not
objects) through its communication channels. Message passing within
a protection domain is inexpensive since the system exploits
language-level invariants and a common exchange heap to avoid
copying data between sender and receiver.
Our C# language dialect imposes a linearity constraint on data in the
exchange heap: exactly one reference exists to each object [15]. This
requires the sender of data through a channel to relinquish any
reference to the transmitted data, and application code is statically
checked to ensure that it respects this constraint. This invariant
enables Singularity processes in the same domain to exchange data
without copying, since the sending process cannot read or modify the
data after it is sent (and cannot distinguish copying from pointer
passing). Entire trees of data can be safely moved solely by reference
passing, due to the guarantee that no references may exist to data
within the tree at the moment the tree is transmitted.
Communication channels are comprised of two endpoints, each
owned by exactly one process (both can be owned by the same
process). If a channel spans a non-kernel protection domain boundary,
the sending and receiving processes do not share an exchange heap,
and the usual reference-passing technique is no longer usable. Data
must be copied from one protection domain to the other, with the
assistance of the kernel. In our current implementation, data is first
copied from the transmitting process into the kernel domain’s
exchange heap. As the data is copied, it is deallocated in the sending
process’ exchange heap, preserving the invariant that the sending
process loses access to transmitted data. When the receiving process
attempts to retrieve channel data, the kernel copies the data into the
receiving process’s exchange heap.

This multi-step process is considerably more costly than reference
passing. When communicating across protection domains, the cost of
sending a one-byte message is roughly 10 times higher, and it
increases with message size to 25 times for a 32KB message (Section
4.1). An obvious optimization performed by our implementation is to
avoid the copy into the kernel domain, if the sender and the kernel
share the same domain, and similarly in the case the receiver shares
its domain with the kernel. For blocks of page size or larger, we also
perform memory remapping instead of copying. These optimizations
are all transparent to the sender and receiver. We are also
investigating techniques to eliminate the need to copy data in two
stages.

3.4 Discussion
Hardware isolated processes (HIPs) are the norm for modern
operating systems, such as Unix or Windows. HIPs rely on a
processor’s memory management unit (MMU) to implement a distinct
virtual address space for each process. Each process has a per-process
binding for each virtual address, and the MMU detects references to
undefined addresses. Mapping and protection are implemented by a
virtual memory system, which is a combination of hardware and
software whose design varies among machines [28]. Hardware for
HIPs does not come for free, though its costs are diffused and difficult
to quantify:
Virtual memory systems (with the exception of software-only systems
such as SPUR [46]) rely on a hardware cache of address translations
to avoid accessing page tables at every processor cache miss.
Managing TLB entries has a cost, which Jacob and Mudge estimated
at 5–10% on a simulated MIPS-like processor [29]. The virtual
memory system also brings its data, and in some systems, code as
well, into a processor’s caches, which evicts user code and data.
Jacob and Mudge estimate that, with small caches, these induced
misses can increase the overhead to 10–20%. Furthermore, they found
that virtual memory induced interrupts can increase the overhead to
10–30%. Other studies found similar or even higher overheads,
though the actual costs are very dependent on system details and
benchmarks [3, 6, 10, 26, 36, 40, 41]. In addition, TLB access is on
the critical path of many processor designs [2, 30] and so might affect
processor clock speed.
Virtual memory increases the cost of calls into the kernel and process
context switches [3]. The reported overhead of crossing the kernel
protection boundary is 82 cycles on a Pentium processor [23]. In
addition, on processors with an untagged TLB, such as most current
implementations of the x86 architecture, a process context switch
requires flushing the TLB, which incurs refill costs.
HIPs provide protection at the granularity of a memory page
(typically 4KB). This size is much larger than a typical object (in
SPECjvm98 benchmark applications, the average ranges from 22–
29,949 bytes, with most well under 50 bytes2) or memory-mapped I/O
port, so a protection boundary encompasses large pieces of code or
entire data segments, not individual objects. Alternatives, such as
Mondrian memory protection [45], provide finer granularity access
control, but are not yet available.
Singularity is a microkernel operating system that differs in several
respects from other microkernel systems, such as Mach, L4, SPIN,
Vino, and Exokernel [1, 7, 13, 23, 37]. Microkernel operating systems
partition a monolithic kernel into components that run in separate
processes. Previous systems, with exception of kernel extensions for
2

Emery Berger, personal communication, 3/21/06.

Kernel

SIP-Phys

SIP-Page

HIP-R0

HIP-R0-S

HIP-R3

HIP-R3-S

System
process

Application

SIP

Kernel
domain

Non-kernel
domain

Figure 3. Measured Singularity configurations.
SPIN, were written in an unsafe programming language and used
processor memory management hardware and protection rings as an
isolation mechanism. Singularity uses language safety and messagepassing communication to isolate processes at a lower cost, thereby
addressing an important difficulty in other microkernel systems.
Because hardware-enforced processes incur considerable overhead,
microkernel systems evolved kernel extensions to lessen this cost
while protecting system integrity. SPIN implemented extensions in a
safe language and using programming language features to restrict
access to kernel interfaces [8]. Vino used sandboxing to prevent
unsafe extensions from accessing kernel code and data and
lightweight transactions to control resource usage [37]. Both systems
allowed extensions to directly manipulate kernel data, which left open
the possibility of corruption through incorrect or malicious operations
and inconsistent data after extension failure. Singularity’s stronger
extension model prevents data sharing between a parent and an
extension.
Nooks provides lightweight protection domains for Linux kernel
extensions such as device drivers [38]. These domains use the MMU
to restrict a driver to read-only access to the portion of kernel’s
address space that does not belong to a driver. Nooks also interposes
instrumentation on all control and data transfers between the kernel
and the driver. Drivers are written in unsafe language, so Nooks
isolation mechanism is more expensive and domain-specific than
SIPs.
van Doorn built a Java virtual machine that isolated classes in
distinct, hardware-enforced address spaces [11]. To retain Java’s
semantics, it allows cross-domain pointers, which required
considerable effort to maintain compatible mappings in different
spaces. These pointers, and the shared language runtime and garbage
collector, undermined much of the failure isolation.

4. PERFORMANCE
We measured the performance of six Singularity configurations
(Figure 3):
SIP-Phys. The kernel and all processes execute in SIPs running in
Ring 0 and accessing physical memory (the virtual memory hardware
is entirely disabled).
SIP-Page. The kernel and all processes execute in SIPs running in
Ring 0, with the MMU’s virtual to physical mapping enabled.
Performance differences relative to SIP-Phys measure the cost of
manipulating the MMU and the additional cache misses to refill the
TLB.
HIP-R0. The kernel, device drivers, and system processes execute in
one kernel protection domain. The application executes in a different
kernel domain (which shares an exchange heap with the kernel). The
kernel and all processes run in Ring 0. Note that code in a domain
executes in a SIP, so it still incurs the overhead of enforcing language
safety. Performance differences relative to SIP-Page measure the cost
of changing the address space (but not privilege level) when
transferring control between the application and the system.
HIP-R0-S. This configuration is the same as HIP-R0, except that
each process (including drivers and system processes) executes in its
own kernel domain. All domains share an exchange heap.
Performance differences relative to HIP-R0 measure the cost of
changing the address space (but not privilege level) when transferring
control between all processes and the kernel.
HIP-R3. The kernel, drivers, and system processes execute in one
protection domain. The application executes in a different, non-kernel
protection domain (which does not share an exchange heap with the
kernel). The kernel domain runs in Ring 0 and application domain
runs in Ring 3. This is comparable to protection on a conventional
operating system, except for the isolation provided by SIPs in the
kernel domain. Performance differences relative to SIP-Phys measure

Cycles (Relative to SIP-Phys)

6.0

9.6

SIP-Phys
SIP-Page
HIP-R0

21.5

5.0

11.4

15.3

24.6

HIP-R0-S
HIP-R3

4.0

HIP-R3-S

3.0
2.0
1.0

-3
2K

-1
K

PS
R

PS
R

-1

-3
2
PS
R

PS
R

-3
2K
SR

-1
K
SR

SR

-3
2

-1
SR

AB
IC

al
Pa
l
ge
Al
C
re
lo
c
at
eT
hr
ea
Th
d
re
ad
Yi
el
C
re
d
at
eC
ha
n
N
am
eB
in
C
d
re
at
eP
ro
c

0.0

Benchmark

Figure 4. Microbenchmark performance.
the cost of conventional hardware isolation. Differences relative to
HIP-R0 measure the cost of privilege levels.
HIP-R3-S. This configuration is the same as HIP-R3, except that
each process (including drivers and system processes) runs in its own
unprivileged, non-kernel domain in Ring 3. The kernel runs in a
kernel domain in Ring 0. Each domain has I ts own exchange heap.
This configuration is comparable to the protection on a traditional,
hardware-isolated micro-kernel operating system.
All configurations ran on AMD Athlon 64 3000+ (1.8 GHz) with an
NVIDIA nForce4 Ultra chipset, 1GB RAM, a Western Digital
WD2500JD 250GB 7200RPM SATA disk (without command
queuing), and the nForce4 Ultra native Gigabit NIC (without
hardware TCP offload acceleration). Singularity ran with a nonconcurrent mark-sweep collector in both the kernel and processes
(including drivers) and a simple round-robin scheduler.

4.1 Micro Benchmarks
The micro benchmarks (Figure 4) report the cost of low-level
systems operations, relative to the SIP-Phys configuration. The
“ABICall” benchmark measures the cost of a simple call on the
Singularity kernel. The “PageAlloc” benchmark measures the cost of
allocating and committing one page of memory. The “CreateThread”
benchmark measures the cost of creating a thread in an existing
process. The “ThreadYield” benchmark measures the cost of
scheduling a thread in a process. The “CreateChannel” benchmark
measures the cost of creating a channel. The “NameBind” benchmark
measures the cost of binding a channel to a name in the system’s
name server. The “CreateProc” benchmark measures the cost of
creating a process. The “SR” benchmarks measures the cost of
sending and receiving a message (1, 32, 1K, and 32K bytes,
respectively) between two threads in the same process. The “PSR”
benchmarks measures the same scenario between threads in distinct
processes. Numbers are the average time (in cycles) to execute a test
20,000 times, except ABICall, PageAlloc, and CreateProc, which
executed 10,000, 1,000, and 100 times, respectively.

The cost of these benchmarks increase dramatically when virtual
memory and processor privilege levels are implemented (SIP-Page
and HIP-R3, respectively). The performance effects of separate
address spaces are minor. For example, the ABICall benchmark
shows that privilege levels increase the cost of a process-kernel
transition by a factor of 3.8 (80→304 cycles: SIP-Page→HIP-R3).
Similarly, the PageAlloc benchmark shows that maintaining page
tables increases the cost of allocating a page of memory by a factor of
4.9 (385→1876: SIP-Phys→SIP-Page). Processor privilege levels
further increase this cost by 30% (2,198: HIP-R3).
For other benchmarks, page tables increase the cost of creating a
thread by a factor of 2.4 (16,933→41,406 cycles: SIP-Phys→SIPPage), creating a channel by a factor of 2 (4,007→7,940), binding a
channel by 24% (39,067→48,251), and creating a process by 35%
(388,162→522,727). Changing address spaces has relatively little
effect on these benchmarks, except that name binding increases in
cost by 18% (48,381→57,276: HIP-R0→HIP-R0-S) and process
creation by 14% (520,473→593,418) when each process runs in its
own domain. Implementing privilege levels has a larger effect.
Creating a thread becomes 83% more expensive (41,616→76,188:
HIP-R0→HIP-R3), creating a channel becomes 90% more expensive
(8,192→11,784),
name
binding
81%
more
expensive
(48,381→75,819), and process creation 60% more expensive
(520,473→830,999).
The cost of communication is dependent on the system configuration
and message size. Implementing virtual memory increases the cost of
a one-byte intra-process communication by approximately 33%
(984→1,306 cycles: SIP-Phys→SIP-Page) and a one-byte interprocess communication by 36% (1,041→1,415 cycles: SIPPhys→SIP-Page). Separate address spaces in kernel domains have
relatively little effect on communication cost. However, implementing
privilege levels increases the cost a one-byte intra-process
communications by 94% (1,312→2,534: HIP-R0→HIP-R3) and a
one-byte inter-process communication costs by 41% (1,826→2,580:
HIP-R0→HIP-R3). The effect of copying data, as opposed to pointer
passing, is illustrated by inter-process communication in HIP-R3-S,

SIP-Phys
SIP-Page
HIP-R0
HIP-R0-S
HIP-R3
HIP-R3-S

Relative performance

1.03

1.02

WebFiles
5.6, 5.5, 6.6, 6.8

1.4
1.3
Relative performance

Bartok

1.01
1.00

0.99

1.2
1.1
1.0
0.9

0.98

0.8
Cy
c le

CP
I

s

Ins

ts
Re
ti r
ed

TL
B_
L2

_R
eq
ue
s

Cy
c le

s

CP
I

ts

Ins

ts
Re
ti r
ed

TL
B_
L2

_R
eq
ue
s

ts

Figure 5. Macrobenchmark performance. Metrics are relative to SIP-Phys, except for TLB cache misses, which are relative to SIP-Page.

which is 3.9 – 9.6 times as expensive as HIP-R3, in which the
communicating processes are in the same protection domain.

4.2 Other Systems
To validate these results, we compared simple operations on
Singularity and several other operating systems. We used FreeBSD
5.3, Red Hat Fedora Core 4 (kernel version 2.6.11-1.1369_FC4), and
Windows XP (SP2).
Table 1. Cost of basic operations.
Cost (CPU Cycles)
ABI Call

Thread yield

PSR-1

Create Proc

Singularity
SIP-Phys

80

365

1,041

388,162

Singularity
HIP-R3

304

638

2,580

830,999

FreeBSD

878

911

13,304

1,032,254

Linux

437

906

5,797

Windows

627

753

6,344

719,447
5,375,735

Table 1 reports the cost of the basic operations in Singularity and
three other operating systems. On the Unix systems, the ABI call was
clock_getres(), on Windows, it was SetFilePointer(),
ProcessService
and
on
Singularity,
it
was
.GetCyclesPerSecond(). All these calls operate on a readily
available data structure in the respective kernels. The Unix thread
tests ran on user-space scheduled pthreads. Kernel scheduled threads
performed significantly worse. The “PSR-1” measured the cost of
sending a 1-byte message from one process to another and then back
to the original process. On Unix, we used sockets, on Windows, a
named pipe, and on Singularity, a channel.
Basic thread operations in Singularity, such as yielding the processor
or synchronizing two threads, are comparable or slightly faster than
the other systems. Nevertheless, because of Singularity’s SIP

architecture, cross-process operations run significantly faster than in
the mature systems. Calls from a process to the kernel are 5–10 times
faster on Singularity, since the call does not cross a hardware
protection boundary. A simple RPC-like interaction between two
processes is 4–9 times faster. And, creating a process is 2–14 times
faster than the other systems.
Singularity is a new system and its performance has not been heavily
tuned. Nevertheless, this comparison shows that Singularity’s
implementation is competitive with commercial operating systems,
and helps validate the Singularity measurements.

4.3 Macro Benchmarks
We used two macro benchmarks to measure the overall performance
of the various system configurations. The first benchmark (“Bartok”)
was an execution of the Bartok compiler compiling the Singularity
kernel. The kernel is approximately 165,000 lines of code (1.6MB of
MSIL) and compiles in approximately 40 seconds. The compiler is a
single process that uses hundreds of megabytes of memory, so this
benchmark consists of a single memory-intensive process with little
inter-process communication. The other benchmark (“WebFiles”) is a
synthetic program that replays the file accesses that occur during the
SPECWeb99 benchmark. It consists of a modified SPECWeb client
that directly accesses the file system, without invoking a web server.
This benchmark reads 50,000 files consuming 50.3MB of disk space.
On Singularity, a file read involves several processes (the application,
the file system, and the disk driver), so this benchmark measures a
communication-intensive collection of processes.
Figure 5 reports the performance metrics for these two benchmarks
under the different configurations. The bars labeled “Cycles” report
their execution times, in processor clock cycles, relative to SIP-Phys
(70 and 22.9 billion cycles for Bartok and WebFiles, respectively).
“CPI” reports the average processor cycles per instruction, relative to
SIP-Phys (1.29 and 1.30 CPI, respectively). “Insts Retired” reports
the number of executed instructions, again relative to SIP-Phys (54.4
and 17.6 million instructions, respectively). “TLB_L2_Requests”
reports the number of L2 cache misses that result from TLB refills,
this time relative to SIP-Page (66.0 and 20.4 million, respectively).

The compute-intensive Bartok benchmark incurs an overhead of
approximately 2.5% (in cycles) due to TLB misses in all system
configurations, except the base configuration, SIP-Phys, which does
not use the TLB. The non-SIP-Phys configurations incur a TLB miss
approximately every 1080 instructions and execute approximately 2%
more instructions than the base configuration. The cost of processing
these misses is unaffected by the protection mechanisms, since most
of this benchmark executes in a single process.
The picture is very different for the other benchmark. The
performance of WebFiles is considerably reduced by hardware
protection boundaries, because this benchmark heavily exercises
context switching and communications. The SIP-Page configuration
runs 6.3% slower than the baseline, as it executes 8% more
instructions and has 20 million caches misses due to TLB refills.
Adding protection domains (HIP-R0) decreases performance by
another 12.6%., relative to SIP-Phys. The number of instructions does
not change over SIP-Page, but cache misses due to TLB refills
increase by a factor of almost 6 times. Adding privilege levels (HIPR3) increases the execution time by another 14.0% relative to SIPPhys, which is a combination of executing 11.9% more instructions
and 16.8% more cache misses. Overall, HIP-R3 executes 33.0% more
cycles than SIP-Phys and 25.1% more cycles than SIP-Page. Isolating
every process in its own domain (the microkernel solution) further
increases the hardware overhead. HIP-R3-S executes 37.7% more
cycles than SIP-Phys and 29.6% more cycles than SIP-Page.

4.4 Software Safety Overhead
To measure the overhead cost of the run-time tests needed to ensure
language safety for SIPs, we prevented the Bartok compiler from
generating these tests and reran the two macro benchmarks. These
tests were the code generated for array references, pointer
dereferences, value unboxing, and type casts. Without these safety
checks, the Bartok benchmark executed 4.5% fewer cycles and the
WebFiles benchmark executed 4.7% fewer cycles. The run-time
overhead of language safety is slightly higher than hardware isolation
for the Bartok benchmark and far lower than hardware isolation for
WebFiles. However, language safety offers important benefits not
provided by hardware process protection, for example, detecting inprocess errors such buffer overruns.

5. CONCLUSION
Virtual memory hardware is a powerful, multi-faceted mechanism,
which originally permitted the implementation of demand paging in
an era of small memories and eventually became the default
implementation for process isolation. Although most operating
systems implement isolation with this mechanism, its limitations, both
in performance and protection granularity, make alternative protection
mechanisms worth considering. This paper describes two new
mechanisms and compares them directly against conventional
systems. A software isolated process (SIP) is a process whose
boundaries are established by language safety rules and enforced by
static type checking. With proper system support, SIPs can provide a
low cost isolation mechanism that provides failure isolation and fast
inter-process communication. Protection domains are hardwareenforced address spaces, which can contain one or more SIPs.
Domains can either run at the kernel’s privilege levels and share an
exchange heap or be fully isolated from the kernel and run at the
normal application privilege level. These two mechanisms can be
flexibly combined in hybrid arrangements that balance the security of
hardware isolation against its costs.

As techniques become available to ever more thoroughly verify the
safety of software isolation techniques, hardware isolation may
become less and less necessary in practice. In the meantime,
Singularity allows hybrid system architectures that are more efficient
than hardware protection alone, but provide in-depth protection
against isolation failures. This paper also identifies an opportunity to
revisit the design of MMUs: in situations where full hardware
protection is not used, it may be possible to streamline the remaining
virtualization functions.

6. ACKNOWLEDGMENTS
Many thanks to Paul Barham, Tim Harris, and Rebecca Isaacs for
their perceptive comments and assistance with this paper.

7. REFERENCES
1. Accetta, M., Baron, R., Bolosky, W., Golub, D., Rashid, R.,
Tevanian, A. and Young, M. A New Kernel Foundation for UNIX
Development. in Summer USENIX Conference, Atlanta, GA, 1986, 93112.
2. Allen, D.H., Dhong, S.H., Hofstee, H.P., Leenstra, J., Nowka, K.J.,
Stasiak, D.L. and Wendel, D.F. Custom Circuit Design as a Driver of
Microprocessor Performance. IBM Journal of Research and
Development, 44 (6), 2000.
3. Anderson, T.E., Levy, H.M., Bershad, B.N. and Lazowska, E.D. The
Interaction of Architecture and Operating System Design Proceedings of
the Fourth International Conference on Architectural Support for
Programming Languages and Operating Systems, Santa Clara, CA, 1991,
108-120.
4. Appel, A.A. and Li, K. Virtual Memory Primitives for User
Programs Proceedings of the Fourth International Conference on
Architectural Support for Programming Languages and Operating
Systems, Santa Clara, CA, 1991, 96-107.
5. Back, G., Hsieh, W.C. and Lepreau, J. Processes in KaffeOS:
Isolation, Resource Management, and Sharing in Java Proceedings of the
4th USENIX Symposium on Operating Systems Design & Implementation
(OSDI), San Diego, CA, 2000.
6. Bala, K., Kaashoek, M.F. and Weihl, W.E. Software Prefetching and
Caching for Translation Lookaside Buffers Proceedings of the First
Symposium on Operating Systems Design and Implementation (OSDI),
1994, 243-253.
7. Bershad, B.N., Chambers, C., Eggers, S., Maeda, C., McNamee, D.,
Pardyak, P., Savage, S. and Sirer, E.G. SPIN: An Extensible Microkernel
for Application-specific Operating System Services Proceedings of the
6th ACM SIGOPS European Workshop, Wadern, Germany, 1994, 68-71.
8. Bershad, B.N., Savage, S., Pardyak, P., Sirer, E.G., Fiuczynski, M.,
Becker, D., Eggers, S. and Chambers, C. Extensibility, Safety and
Performance in the SPIN Operating System Proceedings of the Fifteenth
ACM Symposium on Operating System Principles, Copper Mountain
Resort, CO, 1995, 267-284.
9. Chen, J. and Tardit, D. A Simple Typed Intermediate Language for
Object-oriented Languages Proceedings of the 32nd ACM SIGPLANSIGACT Symposium on Principles of Programming Languages (POPL
05), Long Beach, CA, 2005, 38-49.
10. Chen, J.B., Borg, A. and Jouppi, N.P. A Simulation Based Study of
TLB Performance Proceedings of the 19th Annual International
Symposium on Computer Architecture (ISCA '92), Queensland, Australia,
1992, 114-123.
11. Doorn, L.v. A Secure JavaTM Virtual Machine Proceedings of the
9th USENIX Security Symposium, Denver, CO, 2000, 19-34.
12. Dorward, S., Pike, R., Presotto, D.L., Ritchie, D.M., Trickey, H. and
Winterbottom, P. The Inferno Operating System. Bell Labs Technical
Journal, 2 (1), 1997, 5-18.

13. Engler, D.R., Kaashoek, M.F. and O'Toole, J., Jr. Exokernel: an
Operating System Architecture for Application-Level Resource
Management Proceedings of the Fifteenth ACM Symposium on Operating
System Principles, Copper Mountain Resort, CO, 1995, 251-266.
14. Erlingsson, Ú. and MacCormick, J., Ad hoc Extensibility and
Access Control. Report MSR-TR-2005-143, Microsoft Research, 2005.
15. Fähndrich, M., Aiken, M., Hawblitzel, C., Hodson, O., Hunt, G.,
Larus, J.R. and Levi, S. Language Support for Fast and Reliable Message
Based Communication in Singularity OS. in To appear: EuroSys2006,
Leuven, Belgium, 2005.
16. Fitzgerald, R., Knoblock, T.B., Ruf, E., Steensgaard, B. and Tarditi,
D. Marmot: an Optimizing Compiler for Java. Software-Practice and
Experience, 30 (3), 2000, 199-232.

31. Morrisett, G., Walker, D., Crary, K. and Glew, N. From System F to
Typed Assembly Language. ACM Transactions on Programming
Languages and Systems, 21 (3), 1999, 527-568.
32. Mukherjee, S.S., Weaver, C.T., Emer, J., Reinhardt, S.K. and
Austin, T. Measuring Architectural Vulnerability Factors. IEEE Micro, 23
(6), 2003, 70-75.
33. Process, J.C. Application Isolation API Specification Java
Specification Request, 2003, JSR-000121.
34. Rajamani, S.K. and Rehof, J. Conformance Checking for Models of
Asynchronous Message Passing Software Proceedings of the
International Conference on Computer Aided Verification (CAV 02),
Springer, Copenhagen, Denmark, 2002, 166-179.

17. Fitzgerald, R. and Tarditi, D. The Case for Profile-directed Selection
of Garbage Collectors Proceedings of the 2nd International Symposium
on Memory Management (ISMM '00), Minneapolis, MN, 2000, 111-120.

35. Redell, D.D., Dalal, Y.K., Horsley, T.R., Lauer, H.C., Lynch, W.C.,
McJones, P.R., Murray, H.G. and Purcell, S.C. Pilot: An Operating
System for a Personal Computer. Communications of the ACM, 23 (2),
1980, 81-92.

18. Flatt, M. and Findler, R.B. Kill-safe Synchronization Abstractions
Proceedings of the ACM SIGPLAN 2004 Conference on Programming
Language Design and Implementation (PLDI 04), Washington, DC,
2004, 47-58.

36. Rosenblum, M., Bugnion, E., Herrod, S.A., Witchel, E. and Gupta,
A. The Impact of Architectural Trends on Operating System Performance
Proceedings of the Fifteenth ACM Symposium on Operating System
Principles, Copper Mountain Resort, CO, 1995, 285-298.

19. Glew, N. and Morrisett, G. Type-safe Linking and Modular
Assembly Language Proceedings of the 26th ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, San Antonio, TX,
1999, 250-261.

37. Seltzer, M.I., Endo, Y., Small, C. and Smith, K.A. Dealing with
Disaster: Surviving Misbehaved Kernel Extensions Proceedings of the
Second USENIX Symposium on Operating Systems Design and
Implementation (OSDI 96), Seattle, WA, 1996, 213-227.

20. Goldberg, A. and Robson, D. Smalltalk-80: The Language and Its
Implementation. Addison-Wesley, 1983.

38. Swift, M.M., Bershad, B.N. and Levy, H.M. Improving the
Reliability of Commodity Operating Systems Proceedings of the 19th
ACM Symposium on Operating Systems Principles (SOSP '03), Bolton
Landing, NY, 2003, 207-222.

21. Golm, M., Felser, M., Wawersich, C. and Kleinoeder, J. The JX
Operating System Proceedings of the USENIX 2002 Annual Conference,
Monterey, CA, 2002, 45-58.
22. Govindavajhala, S. and Appel, A.W. Using Memory Errors to
Attack a Virtual Machine Proceedings of the 2003 Symposium on
Security and Privacy, Oakland, CA, 2003, 154-165.

39. Swinehart, D.C., Zellweger, P.T., Beach, R.J. and Hagmann, R.B. A
Structural View of the Cedar Programming Environment. ACM
Transactions on Programming Languages and Systems, 8 (4), 1986, 419490.

23. Härtig, H., Hohmuth, M., Liedtke, J. and Schönberg, S. The
Performance of µ-kernel-based Systems Proceedings of the Sixteenth
ACM Symposium on Operating Systems Principles (SOSP '97), Saint
Malo, France, 1997, 66-77.

40. Talluri, M. and Hill, M.D. Surpassing the TLB Performance of
Superpages with Less Operating System Support Proceedings of the Sixth
International Conference on Architectural Support for Programming
Languages and Operating Systems, San Jose, CA, 1994, 171-182.

24. Hawblitzel, C., Chang, C.-C., Czajkowski, G., Hu, D. and Eicken,
T.v. Implementing Multiple Protection Domains in Java Proceedings of
the 1998 USENIX Annual Technical Conference, New Orleans, LA, 1998,
259-270.

41. Uhlig, R., Nagle, D., Stanley, T., Mudge, T., Sechrest, S. and
Brown, R. Design Tradeoffs for Software-Managed TLBs. ACM
Transactions on Computer Systems, 12 (3), 1994, 175-205.

25. Hawblitzel, C. and Eicken, T.v. Luna: A Flexible Java Protection
System Proceedings of the Fifth ACM Symposium on Operating System
Design and Implementation (OSDI '02), Boston, MA, 2002, 391-402.

42. Wahbe, R., Lucco, S., Anderson, T.E. and Graham, S.L. Efficient
Software-Based Fault Isolation Proceedings of the Fourteenth ACM
Symposium on Operating System Principles, Asheville, NC, 1993, 203216.

26. Huck, J. and Hays, J. Architectural Support for Translation Table
Management in Large Address Space Machines Proceedings of the 20th
Annual International Symposium on Computer Architecture (ISCA '93),
San Diego, CA, 1993, 29-50.

43. Wang, D.C. and Appel, A.W. Type-preserving Garbage Collectors
Proceedings of the ACM SIGPLAN 2002 Conference on Programming
Language Design and Implementation (PLDI '02), Berlin, Germany,
2002, 166-178.

27. Hunt, G., Larus, J., Abadi, M., Aiken, M., Barham, P., Fähndrich,
M., Hawblitzel, C., Hodson, O., Levi, S., Murphy, N., Steensgaard, B.,
Tarditi, D., Wobber, T. and Zill, B., An Overview of the Singularity
Project. Report MSR-TR-2005-135, Microsoft Research, 2005.

44. Weinreb, D. and Moon, D. Lisp Machine Manuel. Symbolics, Inc,
Cambridge, MA, 1981.

28. Jacob, B. and Mudge, T. Virtual Memory in Contemporary
Microprocessors. IEEE Micro, 18 (4), 1998, 60-75.

45. Witchel, E., Cates, J. and Asanovic', K. Mondrian Memory
Protection Proceedings of the 10th International Conference on
Architectural Support for Programming Languages and Operating
Systems, San Jose, CA, 2002, 304-316.

29. Jacob, B.L. and Mudge, T.N. A Look at Several Memory
Management Units, TLB-refill Mechanisms, and Page Table
Organizations Proceedings of the Eighth International Conference on
Architectural Support for Programming Languages and Operating
Systems (ASPLOS'98), San Jose, CA, 1998, 295-306.

46. Wood, D.A., Eggers, S.J., Gibson, G., Hill, M.D., Pendleton, J.,
Ritchie, S.A., Katz, R.H. and Patterson, D.A. An In-Cache Address
Translation Mechanism Proceedings of the Thirteenth Annual
International Symposium on Computer Architecture, Tokyo, Japan, 1986,
158-166.

30. Kongetira, P., Aingaran, K. and Olukotun, K. Niagara: A 32-Way
Multithreaded Sparc Processor. IEEE Micro, 25 (2), 2005, 21-29.