Multiscale not Multicore:
Efficient Heterogeneous Cloud Computing

Anil Madhavapeddy
University of Cambridge
15 JJ Thomson Avenue
Cambridge CB3 0FD
United Kingdom
avsm2@cl.cam.ac.uk

Richard Mortier
University of Nottingham,
Sir Colin Campbell Building,
Nottingham NG7 2TU
United Kingdom
rmm@cs.nott.ac.uk

Jon Crowcroft
University of Cambridge
15 JJ Thomson Avenue
Cambridge CB3 0FD
United Kingdom
jon.crowcroft@cl.cam.ac.uk

Steven Hand
University of Cambridge
15 JJ Thomson Avenue
Cambridge CB3 0FD
United Kingdom
steven.hand@cl.cam.ac.uk

In this paper, we present a vision of the future of heterogeneous cloud computing. Ours is a cleanslate approach, sweeping away decades of accreted system software. We believe the advent of the latest
technology discontinuity—the move to the virtual cloud—makes this a necessary step to take, but one
capable of delivering significant benefits in the security, reliability and efficiency of our digital infrastructure
at all scales.
We motivate this vision by presenting two challenges arising in different fields yet with fundamental
commonalities best addressed by a unifying software platform supporting devices ranging from virtual
servers in the cloud, through desktops, to mobile smartphones. After drawing out this common ground, we
describe our solution and its benefits. We then describe the initial steps we have taken toward our solution,
the Mirage framework, as well as ongoing future work.

1. INTRODUCTION

Within Horizon, our goal is to construct novel
infrastructure in support of the digital economy, suitable
for decades of use by society. To begin, we are
building a clean-slate programming model for an energyconstrained, highly mobile society that generates vast
amounts of digital data for sharing and archiving.1 We
refer to this model as multiscale computing, its key
characteristic being that it enables the generation of
binaries suitable for execution on resources ranging from
individual smart-phones to sets of cloud resources.

The Internet has experienced huge growth in the last
decade, and hundreds of millions of users now depend
on it for daily news, entertainment and commerce.
Thousands of large datacenters have sprung up to fuel
the demand for compute power to drive these Internet
sites, fundamentally changing the economics of hosting
in recent years. Cloud computing (§1.1) means that
software infrastructure now runs on virtual machines,
and usage is charged “by the second” as resources are
consumed.

We first briefly introduce cloud computing (§1.1) and
mobile computing (§1.2) to a non-expert audience, and
then motivate our vision with two different challenges:
scientific computing (§2) and social networking (§3).
We then draw out the commonalities between these
seemingly diverse topics (§4), and describe how we
address them in Mirage (§5). Finally, we review related
work (§6), and discuss the current status of the project
and plans for future work (§7).

Simultaneously, we have seen mobile devices and social
networking become popular with consumers as a way
of operating on the move. Devices such as the iPhone
boast fully-featured software stacks and gigabytes of
solid-state storage. However, battery technology has not
kept pace at the same rate, and so an efficient software
stack is vital to having a device that works for reasonable
periods of time.

1.1. Cloud Computing
The common theme here is that software efficiency is
increasingly valuable alongside traditional themes such
as performance or security. However, the evolutionary
nature of the computer industry has resulted in many
software layers building up that are not well-suited to
small, efficient outputs.

c The Authors. Published by the British
Informatics Society Ltd.
Proceedings of ACM-BCS Visions of Computer Science 2010

In the late 1990s, it was common for a company needing
Internet hosting to purchase physical machines in a
hosting provider. As the company’s popularity grew, their
reliability and availability requirements also grew beyond
1 XKCD

1

expresses this better than we can: http://xkcd.com/676/

Madhavapeddy, Mortier, Crowcroft and Hand

of 11 million articles from 1851 to 1922 into PDF
format for on-line delivery. The whole process took 100
virtual machines over a 24 hour period [16]. Finally,
companies with large data processing requirements
have been constructing distributed systems to scale up
to thousands of machines, e.g., Google’s BigTable [6]
and Amazon’s Dynamo [9].
The evolutionary nature of the rise of cloud computing
has led to some significant economic and environmental
challenges. Virtualization encapsulates an entire operating system and emulates it in a virtual environment,
introducing yet another layer to the already bloated modern software stack. Vinge has noted that a compatibility
software layer is added every decade or so (operating
systems, user processes, threading, high-level language
runtimes, etc.), and that without some consolidation we
are headed for a future of “software archaeology” where
it will be necessary to dig down into software layers long
forgotten by the programmers of the day [33].

Figure 1: A modern datacenter using cheap commodity
hardware (source: scienceblogs.com)

a single provider. Sites such as Google and Amazon
started building datacenters spanning the USA and
Europe, causing their reliability and energy demands
also to grow.

In the context of a large datacenter, these layers are
an efficiency disaster: they potentially add millions of
lines of code performing redundant operations. It is here
that we position our work: compressing all these layers
into just one that executes directly against the virtual
hardware interface.

In 2006 the EPA estimated that datacenters consumed
1.5% of the total electricity bill of the entire USA—around
$4.5bn [12]—and there was a rush for hydro-electric
power in more remote locations in the US. Companies
were forced to over-provision to deal with peak demand
(e.g., Christmas time), and thus had a lot of idle servers
during off-peak periods.

1.2. Mobile Handsets
Over the past decade mobile phones have evolved
dramatically. Starting out as simple (and bulky!) handsets capable of making and receiving calls, sending
and receiving SMS messages, and providing simple
address books, they are now mobile computation devices with processors running at gigahertz speeds, many
megabytes of memory, gigabytes of persistent storage,
large screens supporting multi-touch input, and a wide
array of sensors including light-sensors, accelerometers,
and accurate global positioning system (GPS) location
devices. As a result so-called “smartphones” are now
capable of running multiple user-installed applications,
with many opening online application stores to support
users discovering, purchasing and installing applications
on their devices.3

In early 2000 researchers began to examine the
possibility of dividing up commodity hardware into logical
chunks of compute, storage and networking which
could be rented on-demand by companies with only
occasional need for them. The XenoServers project
forecast a network of these datacenters spread across
the world [14].
The Xen hypervisor was developed as the low-level
mechanism to divide up a physical computer into multiple
chunks [2]. It grew to become the leading open-source
virtualization solution, and was adopted by Amazon
as part of its “Elastic Computing” service which rents
spare capacity to anyone willing to pay for it.2 Amazon
became the first commercial provider of what is now
dubbed “cloud computing”—the ability to rent a slice of
a huge datacenter to provide on-demand computation
resources that can be dynamically scaled up and down
according to demand.

We observe a similar shift toward distributed computation in the mobile space, with use of crowd-sourcing to
collect sensor data from a global audience. Each individual mobile phone is quite powerful and so, with only
limited co-operation, can enable impressive distributed
computation such as the assembly of 3D models of
entire cities [1].

Cloud computing has proved popular in the era of Web
2.0 services, which experience frequent “flash traffic”
that require large amounts of resources for short periods
of time. Other uses are for one-off tasks, such as
the New York Times scanning their back catalogue

Devices such as the iPhone and Android phones
run commodity operating systems (OSs)—OSX and
Linux respectively—with selected customisations. For
example, both those platforms remove page swapping

2 http://aws.amazon.com/

3 The

2

biggest is the Apple App Store at http://store.apple.com/

Multiscale not Multicore: Efficient Heterogeneous Cloud Computing

Figure 2: Comparing a phone from a couple of decades ago with a modern Apple iPhone

from the virtual memory subsystem, instead warning and
then halting entire applications when memory pressure
is detected. As the persistent storage in these devices
is of commensurate size and speed to the RAM, both of
those platforms view swapping out individual pages as
of little benefit.

datasets are now massive, containing millions, billions
and more datapoints. For example, gene sequencing
generates hundreds of gigabytes of data with a
single experimental run, and many terabytes or even
petabytes over the course of an experiment. Modern
astronomical instruments generate multi-MB images
containing millions of pixels, e.g., HERSCHEL generates
4 MB images from a 1024×1024 CCD camera, and each
will take many thousand such images over its lifetime. In
particle physics, instruments such as the Large Hadron
Collider at CERN are expected to generate data at
1.5 GB/s for over a decade when fully operational [29].

Notwithstanding such customisations, power remains a
basic resource constraint that it is difficult to properly
address when basing off commodity OSs. Battery
technology for mobile handsets has not kept pace with
development of their other resources such as compute,
memory and storage. As a result, while very basic
mobile phones have had standby lifetimes of over a
week and talk-times of several hours for most of the last
decade, modern smartphones fare dramatically worse.
For example, battery lifetimes of a device such as
the iPhone—which one might expect has been heavily
optimised in this regard as the hardware and OS are
under the control of a single company—are reported to
be on the order of a couple of days standby and no more
than a couple of hours talk-time.

Datasets need not be observational in nature and may
present challenges other than sheer data volume: the
Millenium Run, a large-scale cosmological simulation,
involved simulating 10 billion “particles” each representing about a billion solar masses of dark matter, in a
cube region about 2 billion light years in length [31]. The
simulation used a super-computer in Garching, Germany
for over a month and generated 25 TB of output data.
A consequence of the availability of these massive
datasets is that scientific and statistical methods are
evolving: it is no longer necessary to extract all possible
information from tens or hundreds of datapoints. Instead,
the problem is how to build data processing pipelines that
can handle these enormous datasets, reducing them to
manageable pieces of information, often for subsequent,
more traditional, analysis. When dealing with such large
datasets, cloud computing is often the only economically
feasible approach: since these massive computation
farms of hundreds or thousands of machines are only
sporadically required for a few days at a time, it is just
not worth purchasing, operating and maintaining them
for the entire lifetime of a project basis. Instead, cloud
facilities such as Amazon AWS are used as-needed.

Consequently, modern smartphones present a situation
where efficiency is already key, and is only increasing
in importance as energy-hungry capabilities increase. At
the same time, we persist in running system software
stacks that can trace many of their core technologies
back to the days of mainframes! We thus position
our work here not so much to reduce the aggregate
energy demand of these devices or to improve their
performance particularly, but to dramatically improve the
user experience by making the devices last longer on a
single battery charge.
2. SCIENTIFIC COMPUTING
Over the last 20–30 years, many branches of science
have undergone a revolution in terms of the sizes
of datasets that are being gathered and processed:

At the same time, the software for these processing
pipelines needs to be developed, using manageable

3

Madhavapeddy, Mortier, Crowcroft and Hand

but representative fragments of the full dataset. Code
development is generally best carried out using the
scientist/developer’s local compute platform, typically a
laptop or desktop PC. Development like this is greatly
accelerated if the local runtime environment is closely
matched to the cloud environment in which the code
will be run over the entire dataset: a prime example of
multiscale computing.

Building personal containers using today’s existing
programming models is difficult. From a security
perspective it is dangerous to store decades of personal
digital data in a single location, without strong formal
guarantees about the entire software stack on which the
software is built. Simply archiving the data in one place
is not as useful as distributing it securely around the
myriad of personal devices and cloud resources to which
we now have access. Thus our vision for the future of a
lifetime of personal data management also benefits from
a new multiscale programming model.

3. PERSONAL CONTAINERS
The Internet is used by millions of users to communicate
and share content such as pictures and notes. Internet
users spend more time at “social networking” sites than
any other.4 The largest site, Facebook, has over 300
million registered users, all sharing photos, movies and
notes. Personal privacy has suffered due to the rapid
growth of these sites [21] as users upload their public
and private photos and movies to a few huge online
hosting providers such as Google and Facebook.

Personal Containers are purpose-built distributed
servers for storing large amounts of digital content
pertaining to one person. They run on a variety of
environments from mobile phones to desktops to the
cloud. Sources feeding data into an individual’s personal
container can include local applications (e.g., Outlook,
iPhoto), online services (e.g., Google, Facebook), or
devices (e.g., real-time location data, biometrics). Unlike
existing storage devices, a personal container also has
built-in logic to manage this data via, e.g., email or the
web. This encapsulation of data and logic is powerful,
allowing users to choose how they view their own data
rather than being forced to use a particular website’s
interface. It also ensures data never leaves the personal
container unless the user requests it, an important
guarantee when sensitive information such as location
or medical history is involved.

These companies act as a hub in the digital economy
by sharing data, but their centralized nature causes
serious issues with information leaks, identity theft,
history loss, and ownership. The current economics of
social networking are not sustainable in the long-term
without further privacy problems. Companies storing
petabytes of private data cover their costs by selling
access to it, and this becomes more intrusive as the
costs of the tail of historical data increases over time.5

The personal container network is built on the same
distributed systems principles as email, in which no
central system has overall control. Instead, individuals
direct queries to their personal container, whether it is
running on their phone, at their home, or in the cloud.
In this way we liberate our data from the control of a
few large corporations, reclaiming it to ourselves. It thus
becomes a matter of incentive and utility whether we
choose to share our data with others, whether individuals
or corporations. However, building personal containers
requires a highly efficient software stack since the
economies of scale available to today’s massive entities,
such as Facebook, are unavailable to the individual.
Furthermore, personal containers are envisaged running
on such a wide range of platforms that the software stack
in question must be multiscale.

This dependence on commercial providers to safeguard
data also has significant wider societal implications. The
British Library warned “historians face a black hole of
lost material unless urgent action is taken to preserve
websites and other digital records.”6 If Facebook suffered
large-scale data loss, huge amounts of personal content
would be irretrievably lost. The article notes it is
impractical for a single entity to archive the volume of
data produced by society.
We believe that an inversion of the current centralized
social networking model is required to empower users
to regain control of the digital record of their lives. Rather
than being forced to use online services, every individual
needs their own “personal container” to collect their own
private data under their control. By moving data closer
to its owners, it is possible to create a more sustainable
way of archiving digital history similar to the archival of
conventional physical memorabilia. Users can choose
how to share this data securely with close friends and
family, what to publish, and what to bequeath to society
as a digital will.

4. MULTISCALE NOT MULTICORE
In the previous sections we presented quite different
applications which nonetheless share common ground:
the need for multiscale computing. We distinguish
multiscale from multicore: the benefit is gained not by
being able to utilise the many cores now available on
modern processors, but by being able to scale the
application up to make use of available resources,
whether one or many handheld devices, or a virtual
datacenter in the cloud.

4 http://www.nielsen-online.com/pr/pr_090713.pdf
5 e.g.,

Facebook’s Misrepresentation of Beacon’s Threat to Privacy:
Tracking users who opt out or are not logged in, http://bit.ly/i2s4I
6 Websites
‘must
be
saved
for
history’ ”,
Guardian,
http://www.guardian.co.uk/technology/2009/jan/25/
preserving-digital-archive

4

Multiscale not Multicore: Efficient Heterogeneous Cloud Computing

Note that we are not addressing the general problem of
how best to segment a particular problem for parallel
execution: it is still up to the programmer to manage
distribution of work across multiple available resources.
However, our approach should make the programmer’s
life easier by providing a consistent programming
framework to help them manage their use of the
available resources, as well as modern tools such as
type-checking to help them manage common problems
such as concurrency.

challenges in fully realising this vision; we briefly note
some of them here:
• Programming languages. How can we ensure
that personal containers are secure and efficient
enough to hold a lifetime of personal data?
• Ubiquitous computing. How do we enable personal
containers to be run on diverse hardware platforms
such as mobile devices or online cloud computers?
• Resource management. How do we efficiently
handle storage, encryption and history, and ensure
easily availability without losing privacy?

The basic underlying principle of multiscale computing is
that of simplicity, giving rise to attendant benefits such
as efficiency, reliability, and security. Current software
stacks, such as LAMP,7 are too “thick,” containing
now unnecessary support for legacy systems and
code. By using results from the last 20 years of
computer science research in fields such as privacy,
virtualization, languages (both practical runtimes and
theoretical underpinnings), we can dramatically simplify
the software stack in use. As a result we believe we can
achieve the following benefits:

• Formal methods. Can we mathematically reason
about both our stored data itself and our access to
it, to make stronger privacy guarantees while still
allowing useful datamining?
In the following sections we describe the initial steps we
are taking to address these challenges with the Mirage
platform in the context of the Horizon Institute.

• Simplicity. By providing a common development
toolchain and runtime environment, a single program can be easily retargeted to different platforms, ranging from mobile devices to cloud-hosted
virtual machines. This allows the programmer to
focus on their program without needing to explicitly
manage differences in storage, networking and
other infrastructure.

5. THE MIRAGE APPROACH
Mirage is a clean-slate system that compiles applications directly into a high-performance, energy-efficient
kernel that can run directly on mobile devices and
virtual clouds. Applications constructed through Mirage
become operating systems in their own right, able to
be targeted at both Xen-based cloud infrastructure, and
ARM-based mobile devices. Mirage can also output binaries which execute under POSIX-compliant operating
systems, aiding development and debugging. Mirage
also exposes new capabilities directly to the software,
allowing applications to make more effective use of live
relocation [7], resource hotplug [38], and low-energy
computation modes.

• Efficiency. As noted earlier, the legacy support
in modern software stacks introduces overheads,
wasting energy and as a result, money.8 Further,
services such as AWS are now introducing “spot
pricing,” providing the ability for consumers to
bid for spare capacity, paying in accordance with
demand.9 To make efficient use of such a facility
requires the ability to instantiate and destroy virtual
machine images with very low latency, difficult
with today’s heavyweight software stacks such as
LAMP.

In this section, we first describe how we simplify the
software stack by eliminating runtime abstractions (§5.1).
We also take advantage of advances in modern
programming languages to perform a number of
checks at compilation time, so they can be removed
from running code (§5.2). Continuing in the same
spirit, Mirage statically specializes applications at
compilation-time based on configuration variables.
This eliminates unused features entirely, keeping the
deployed executables smaller (§5.3). Finally, we provide
brief overviews of the concurrency model (§5.4) and
storage (§5.5) in Mirage.

• Security. By providing a single bootable image,
generated from code written in a strongly typed
and automatically type-checked language, the
reliability of programs is increased. Additionally,
since the virtual machine’s configuration is
compiled into the image, it becomes possible to
reason about policies applied when accessing
resources such as the network, further increasing
the application’s security.

5.1. Reducing Layers

Although we rely heavily on recent research in the areas
mentioned, we believe there remain further research

Although virtualization is good for consolidating underutilized physical hosts, it comes with a cost in operational
efficiency. The typical software stack is already heavy
with layers: (i) an OS kernel; (ii) user processes;
(iii) a language-runtime such as the JVM or .NET

7 Linux,

Apache, MySQL, PHP
leave it to the reader to decide which is more important.
9 http://cloudexchange.org/
8 We

5

Madhavapeddy, Mortier, Crowcroft and Hand

This is just one example of how standard operating system idioms such as copy-on-write memory can introduce
inefficiency into functional language implementations.
Others include the process abstraction, as the static type
safety guarantees made by typical functional languages
eliminate the requirement for hardware memory protection unless there remain other unsafe components in the
system. As Mirage is intended for applications written
exclusively in type-safe languages, we can eliminate a
number of these layers of isolation, in turn reducing the
work done by the system at runtime.

Application Code
Threads
Language Runtime
User Processes

Application Code

OS Kernel

Mirage Kernel

Hypervisor

Hypervisor

Hardware

Hardware

The inefficiency of functional languages compared to C
or Fortran has traditionally been cited as the reason they
are not more widely adopted for tasks such as scientific
computing, along with the lack of mature tool-chains
and good documentation [36]. We are aiming to close
the performance barrier to C with Mirage, as with our
previous work in this area [26; 24].

Figure 3: A conventional software stack (left) and the Mirage
approach with statically-linked kernel and application (right)

CLR; and (iv ) threads in a shared address space.
Virtualization introduces yet another runtime layer which
costs resources and energy to maintain, but is necessary
to run existing software unmodified (Figure 3).

5.2. Objective Caml

The combination of operating system kernels (comprising millions of lines of code) and language runtimes (also
typically millions of lines of code) has lacked synergy due
to the strong divide between “kernel” and “userspace”
programming. Standards designed decades ago, such
as POSIX, have dictated the interface to the kernel,
and language runtimes have been forced to use this
inefficient legacy interface to access physical resources
even as hardware has rapidly evolved. Similarly, there
are numerous models for concurrency: virtual machines,
the OS kernel, processes and threads are all scheduled
independently and preemptively. Modern multicore machines thus spend an increasing amount of time simply
switching between tasks, instead of performing useful
work. Mirage exposes a simpler concurrency model to
applications (§5.4).

Mirage is written in the spirit of vertical operating
systems such as Nemesis [17] or Exokernel [10], but
differs in certain aspects: (i) apart from a small runtime,
the operating system and support code (e.g., threading)
is entirely written in OCaml; and (ii) is statically
specialised at compile-time by assembling only required
components (e.g., a read-only file system which fits
into memory will be directly linked in as an immutable
data structure). These features mean that it provides
a stronger basis for the practical application of formal
methods such as model checking; and the removal of
redundant safety system checks greatly improves the
energy efficiency of the system.
Objective Caml [23] (or OCaml) is a modern functional
language based on the ML type system [22]. ML
is a pragmatic type system that strikes a balance
between the unsafe imperative languages (e.g., C) and
pure functional languages (e.g., Haskell). It features
type inference, algebraic data types, and higher-order
functions, but also permits the use of references and
mutable data structures—and guarantees all such sideeffects are always type-safe and will never cause
memory corruption. Type safety is achieved by two
methods: (i) static checking at compilation time by
performing type inference and verifying the correct use
of variables; and (ii) dynamic bounds checking of array
and string buffers.

A typical functional language application consists of
two components: (i) a “mutator” which represents the
main program; and (ii) a garbage collector to manage
the memory heap. When executing under a UNIX-like
operating system, an application uses the malloc(3) or
mmap(2) library functions to allocate an area of memory,
and the garbage collector manages all further activity
within that region.
However, this can result in some harmful interactions—
consider a server which invokes fork(2) for every
incoming network connection. The garbage collector,
now present in multiple independent processes due
to the fork, will sweep across memory to find unused
objects, reclaim them, and compact the remaining
memory. This compaction touches the complete process
memory map, and the kernel allocates new pages
for each process’ garbage collector. The result is an
unnecessary increase in memory usage even if there is
a lot of data shared between processes.

OCaml is particularly noted among functional languages
for the speed of binaries which it produces, for several
reasons: (i) a native code compiler which directly
emits optimised assembler for a number of platforms;
(ii) type information is discarded at compile time,
reducing runtime overhead; and (iii) a fast generational,
incremental garbage collector which minimises program
interruptions.

6

Multiscale not Multicore: Efficient Heterogeneous Cloud Computing

Xen

virtual address space

OS text
and data

We do not modify the OCaml compiler itself, but rather
the runtime libraries it provides to interface OCaml with
UNIX or Win32. This code is mostly written in C, and
includes the garbage collector and memory allocation
routines. We recreate from scratch the built-in library
to support a subset of the standard features (e.g., we
omit threading and signal handling). The application is
structured as a single co-operative thread and executed
until that thread terminates.

network
buffers

IP header
TCP header
transmit
packet
data

4KB

The lack of dynamic type information greatly contributes
to the simplicity of the OCaml runtime libraries, and
also to the lightweight memory layout of data structures.
OCaml is thus a useful language to use when leveraging
formal methods. It supports a variety of programming
styles (e.g., functional, imperative, and object-oriented)
with a well designed, theoretically-sound type system
that has been developed since the 1970s. Proof
assistants such as Coq [20] exist which convert
precise formal specifications of algorithms into certified
executables.

IP header
TCP header
receive
packet
payload

garbage
collected
heap

stack

Figure 4: Virtual memory layout of a Mirage kernel

kernel which is directly executed by Xen as a guest
operating system. At start-of-day, Xen allocates a fixed
number of memory pages to Mirage, all in a single
address space; since OCaml is type-safe, we do not
require multiple virtual address spaces. The general
layout for 32-bit x86 processors is illustrated in Figure 4
and has the following regions: (i) the top 64MB reserved
for the Xen hypervisor; (ii) the code area of the
executable program’s instructions and static data; (iii)
the network area where pages for remote data traffic
are mapped; and (iv ) the heap area managed by the
garbage collector.

5.3. Static Specialization
Operating systems provide various services to host
applications, such as file systems, network interfaces,
process scheduling and memory management. Conventional monolithic kernels include large amounts of code
to support many permutations of these services ondemand, an essential approach when running varied applications such as desktop software or development environments. However, these large codebases can make
bug-hunting and formal reasoning (e.g., model checking)
about the complete system a daunting task [30].
Specialized devices such as network routers split these
concerns into: (i) a general-purpose control plane; and
(ii) a specialized data plane for high-volume traffic. The
control plane is designed to be flexible to allow easy
management of a network device. It exists primarily to
configure the data plane, through which data flows with
much less computation.

The exact sizes allocated to each of the regions are
configurable, but can be aligned to megabyte boundaries
to facilitate the use of large page sizes, gaining the
benefits of reduced TLB pressure.
The runtime libraries (e.g., the garbage collector and
interface routines to Xen) are written in C, and are
thus potentially unsafe. To simplify the code and make
static analysis easier, the runtime is single-threaded
and dynamic memory allocation is minimized. The
standard OCaml garbage collector is used to manage
the dynamic heap area, with memory allocation routines
being modified to the new address layout. Excess heap
and stack pages are loaned back to the hypervisor when
not in active use.

We treat a general purpose OS such as Linux as the
Control Operating System (COS). The COS, running
under Xen as a guest, exists purely to configure,
compile, and launch a Mirage application. It does not
provide application services to the outside world, and is
accessed purely through a management interface.
An example of specialization for a web server would be
to compile read-only static content into the binary, along
with a database library to handle more dynamic queries.
If write interfaces aren’t required, then they are simply
not compiled in to the final binary, further reducing its
size.

5.4. Concurrency
Mirage defines two distinct types of concurrency: (i)
lightweight control-flow threads for managing I/O and
timeouts; and (ii) parallel computation workers. This is
an important distinction, since it is difficult to create

Static specialization simplifies the memory layout of a
Mirage instance, since it results in a single executable

7

Madhavapeddy, Mortier, Crowcroft and Hand

aspect. For example, an application to perform
processing of images on mobile phones could be written
as a single application with some threads running on
mobile phones, and others running in the cloud to
perform the actual processing. This way energy is not
wasted on the mobile device by default, but the user can
manually trigger the task to run there if desired, e.g., if
they do not trust the cloud or lack connectivity, since the
code is compatible.

a threading model which is suitable for both largescale distributed computation and highly scalable singleserver I/O [34]. Our I/O threads are based on the Lwt
co-operative threading library [35] which has a syntax
extension to hide the yield points from the programmer,
giving the illusion of normal pre-emptive threading.
Lwt has a thread scheduler written in OCaml that can
be overridden by the programmer. This opens up the
possibility of user-defined policies for thread scheduling
depending on the hardware platform in use. In mobile
phones, for example, the scheduler could be poweraware and run some threads at a lower CPU speed,
or even deliberately starve unnecessary computation in
the event of low battery charge. For server operation,
lightweight threading is significantly faster than using
pre-emptive threads due to the removal of frequent
context switching.10

5.5. Storage
These days data storage is cheap and we have to
deal with massive datasets. Scientists, market analysts,
traders and even supermarkets compute statistics over
huge datasets often gathered globally. We have already
recounted examples of the size of modern scientific
computing experiments (§2).
Multiscale computing thus needs low-overhead access
to large datasets, as well as the ability to quickly
partition it for distribution across smaller devices such as
desktop computers or mobile devices. Since Mirage is
designed to be platform-independent, it does not expose
a conventional filesystem, but instead has a persistence
mechanism that operates directly as an extension to
OCaml. We use multi-stage programming [32] to analyze
user-defined types at compilation time and automatically
generate the appropriate code for the target hardware
platform.

A useful feature of Lwt is that the blocking nature
of a function is exposed in its OCaml type, letting
the programmer be certain when a given library call
might take some time. Any blocking function can be
macro-expanded at compile-time to block more often,
for example with a high-level debugger. We have
previously explored these ideas while integrating modelchecking into server applications [25] and are now
prototyping a web-based debugger specially designed to
find problems in Mirage code.
For running computations that require more than one
core, we adopt a parallel task model based on the join
calculus [13], which is integrated directly into OCaml.
The actual implementation depends on the hardware
platform: (i) cloud platforms spawn separate VMs which
use fast inter-domain communication or networking if
running on different hosts; (ii) desktops running an
operating system spawn separate user processes using
shared-memory; and (iii) mobiles which are energy
constrained remain single-threaded and use Lwt to
interleave the tasks as with lightweight threads.

This approach has several advantages: (i) the programmer can persist arbitrary OCaml types with no explicit
conversion to-and-from the storage backend; (ii) code is
statically generated that can work efficiently on the target
platform (e.g., SQL on phones, or direct block access in
the cloud); and (iii) a high degree of static type-safety
is obtained by auto-generating interfaces from a single
type specification.

The use of separate lightweight virtual machines to
support concurrency ensures that our task model scales
from multicore machines to large compute clusters of
separate physical machines. It also permits the lowlatency use of cloud resources which allow “bidding”
for resources on-demand. A credit scheduler could thus
be given financial constraints on running a parallel task
across, e.g., 1000 machines at a low-usage time, or
constraining it to just 10 machines during expensive
periods.

The storage backend is immutable by design, and so
changes to data structures cause records to be copied.
This means that it is naturally versioned and reverting
to older versions is easy. Values no longer required can
be released and, as with the memory heap, a storage
garbage collector cleans up unused values. The use of
an integrated and purely functional storage backend is a
key design decision for Mirage as it permits many parallel
workers to “fork” the storage safely and cheaply, even if
a huge dataset is being shared. Concurrent access to
records is guaranteed to be safe due to the immutability
of the backend.

We are not trying to solve the general problem of
breaking an application down into parallel chunks.
Instead, we anticipate new use-cases for building
distributed applications that exploit the cross-platform

This immutability overcomes one of the biggest barriers
to using the online cloud for large datasets—getting the
data in and out is expensive and slow.11 An immutable
data store ensures that a single copy of the data can be

10 http://eigenclass.org/hiki/lightweight-threads-with-lwt

11 To

mitigate this, Amazon offers physical import/export to EC2, see
http://aws.amazon.com/importexport

8

Multiscale not Multicore: Efficient Heterogeneous Cloud Computing

worked on by many consumers, and avoiding the need
to fork even if the operations being performed on it are
very different.

programs for formal verification of system properties.
Mirage takes a similar approach to language-based
safety, but with more emphasis on generative metaprogramming [26] and a lighter statically-typed runtime.
Our multiscale threading model also helps target smaller
mobile devices which would struggle to run the full CLR
environment.

In scientific computing most datasets are written as
streams and rarely modified. Instead, analysis code
is run over the data and derived values computed,
which are often hard to track as they depend on the
versions of both the input dataset and the code used.
We are prototyping a version-controlled database which
tracks both values and code side-by-side, ensuring that
scientists can easily trace the provenance of potentially
huge datasets by examining the code that generated
the intermediate analyses. For personal containers, the
database lets us easily archive personal history as it
changes. Older data can be “garbage collected” into
cheaper, off-line archival storage while current data
remains live and searchable from the database.

Parallel processing of large datasets is an active area of
research, e.g., direct language support with Data Parallel
Haskell [5] or software transactional memory [18]. Our
main contribution here is to provide an efficient runtime
for single-core processing, and a common immutable
data store to allow large clusters of cloud-based VMs to
operate over the data.
The concept of Personal Containers was inspired by
the Personal Server from Intel Research [37]. The
Personal Server is a small, efficient mobile device
carrying personal data and wirelessly communicating
with other devices. We seek to generalise this concept
into containers that work seamlessly between the cloud
and personal devices, and also provide a programming
model to make it easier to write secure, fast, ubiquitous
computing applications.

This approach is not initially intended to be suitable
for all purposes (e.g., it would struggle with mutationheavy richly-linked graph data), but we anticipate that it
will find key uses in dealing with read-heavy data that
can be stored a single time in the cloud and made
available to multiple users safely and efficiently. For
development, the storage extension also has a SQL
backend to integrate with existing database servers and
work with smaller slices of the larger datasets [15].

7. STATUS
We are in the process of bringing this vision into
reality. We have the first revision of Mirage running,12
targeting Xen cloud systems and desktop operating
systems, and are working on retargeting for ARM-based
handhelds. There are many “bare-metal” optimizations
in progress for the cloud target which promise improved
performance, as well as integrating improved versions of
our previous work on fast, type-safe network parsing [26]
and model-checking [25]. Even unoptimised, the runtime
library for a cloud backend is around 1MB in size,
in contrast to Linux distributions which are difficult to
squeeze below 16MB and Windows which runs into
hundreds of megabytes.

6. RELATED WORK
The Mirage operating system is structured as a
“vertical operating system” in the style of Nemesis [17]
and Exokernel [10]. However, in contrast to either
it simultaneously targets the hardware-independent
virtual cloud platform and resource-constrained mobile
devices. Mirage also provides high-level abstractions for
threading and storage using OCaml, making it far easier
to construct applications than writing them directly in C.
The theme of energy as a first-class scheduling variable
was also explored in Nemesis [28].
More recently, Barrelfish [3] provides an environment
specifically for multicore systems, using fast sharedmemory communication between processes to work.
It also incorporates support for high-level Domain
Specific Languages to generate boiler-plate code for
common operations [8]. Mirage aims to remove the
distinction between multicore and clustered computers
entirely, by adopting multiscale compiler support to use
a communication model appropriate to the hardware
platform’s constraints.

The database backend has been implemented with
a SQL backend to ensure that the syntax extension,
immutable semantics and type-introspection algorithms
are sound. It is available for standalone use in other
OCaml code as open-source software.13 The custom
cloud backend for Xen will have the same behaviour but
with higher performance due to the removal of the SQL
translation layer.
Personal Containers have been prototyped in collaboration with the PRIMMA project14 in the form of Personal
Butlers which guard user’s privacy online [39]. We aim
to release a first public and open-source version of

Our use of OCaml as the general-purpose programming
language has similarities to Singularity [19], which is
derived from the Common Language Runtime. Singularity features software-isolated processes, contractbased communications channels, and manifest-based

12 http://mirage.github.com/
13 http://github.com/mirage/orm
14 http://primma.open.ac.uk/

9

Madhavapeddy, Mortier, Crowcroft and Hand

personal containers in 2010, with support for a variety
of protocols such as IMAP and POP3, Facebook integration, and local application support for MacOS X.

parallelism in haskell. In Euro-Par ’01: Proceedings
of the 7th International Euro-Par Conference
Manchester on Parallel Processing, pages 524–
534, London, UK, 2001. Springer-Verlag.

We are now investigating possible deployment prospects
within work being carried in the Horizon Institute.15 This
is looking at research under the broad heading of the
Digital Economy, with infrastructure one of its three main
challenges. We aim to make use of Mirage to build some
of this infrastructure allowing us to deploy and test both
the Mirage multiscale OS and Personal Containers at
scale.

[6] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A.
Wallach, M. Burrows, T. Chandra, A. Fikes, and
R. E. Gruber. Bigtable: a distributed storage system
for structured data. In OSDI ’06: Proceedings of
the 7th symposium on Operating systems design
and implementation, pages 205–218, Berkeley, CA,
USA, 2006. USENIX Association.
[7] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul,
C. Limpach, I. Pratt, and A. Warfield. Live migration
of virtual machines. In Proceedings of the 2nd
Symposium of Networked Systems Design and
Implementation, May 2005.

We are also collaborating with hardware vendors to
investigate the possibility of pushing the approach taken
by Mirage even further down the stack, e.g., into the
networking hardware itself using SR-IOV and smart
NICs [27], allowing even greater optimisation possiblities
in terms of the efficiency and safety of ICT equipment.
Moving up the stack, the use of OCaml as our base
language provides us with a solid foundation to integrate
more experimental type systems, such as dependentlytyped DSLs [4] and linear typing [11]. These allow more
dynamic checks to be eliminated safely at compilationtime, further improving the efficiency of runtime code
without sacrificing safety properties.

[8] P.-E. Dagand, A. Baumann, and T. Roscoe. Fileto-Fish: practical and dependable domain-specific
languages for OS development. In 5th Workshop on
Programming Languages and Operating Systems
(PLOS). ACM, 2009.
[9] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian,
P. Vosshall, and W. Vogels. Dynamo: Amazon’s
highly available key-value store. In SOSP ’07: Proceedings of twenty-first ACM SIGOPS symposium
on Operating systems principles, pages 205–220,
New York, NY, USA, 2007. ACM.

REFERENCES
[1] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and
R. Szeliski. Building Rome in a day. In International
Conference on Computer Vision. IEEE, 2009.

[10] D. R. Engler, M. F. Kaashoek, and J. O’Toole
Jr. Exokernel: an operating system architecture
for application-level resource management.
In
Proceedings of the 15th ACM Symposium on
Operating Systems Principles (SOSP), pages 251–
266, Copper Mountain Resort, Colorado, December
1995.

[2] P. Barham, B. Dragovic, K. Fraser, S. Hand,
T. Harris, A. Ho, R. Neugebauer, I. Pratt, and
A. Warfield. Xen and the art of virtualization.
In Proceedings of the 19th ACM Symposium on
Operating Systems Principles (SOSP), pages 164–
177, New York, NY, USA, 2003. ACM Press.

[11] R. Ennals, R. Sharp, and A. Mycroft. Linear types
for packet processing. In D. A. Schmidt, editor, 13th
European Symposium on Programming (ESOP),
part of the Joint European Conferences on Theory
and Practice of Software (ETAPS), volume 2986 of
Lecture Notes in Computer Science, pages 204–
218, Barcelona, Spain, April 2004. Springer.

[3] A. Baumann, P. Barham, P.-E. Dagand, T. Harris,
R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach,
and A. Singhania. The multikernel: a new OS
architecture for scalable multicore systems. In
SOSP ’09: Proceedings of the ACM SIGOPS
22nd symposium on Operating Systems Principles,
pages 29–44, New York, NY, USA, 2009. ACM.

[12] EPA. EPA report to Congress on server and data
center energy efficiency. Technical report, U.S.
Environmental Protection Agency, 2007.

[4] S. Bhatti, E. Brady, K. Hammond, and J. McKinna.
Domain specific languages (DSLs) for network
protocols (position paper).
In ICDCSW ’09:
Proceedings of the 2009 29th IEEE International
Conference on Distributed Computing Systems
Workshops, pages 208–213, Washington, DC,
USA, 2009. IEEE Computer Society.

[13] F. L. Fessant and L. Maranget. Compiling joinpatterns. Electr. Notes Theor. Comput. Sci., 16(3),
1998.
[14] K. A. Fraser, S. M. Hand, T. L. Harris, I. M.
Leslie, and I. A. Pratt. The XenoServer computing
infrastructure. UCAM-CL-TR 552, University of
Cambridge, Jan. 2003.

[5] M. M. T. Chakravarty, G. Keller, R. Lechtchinsky,
and W. Pfannenstiel.
Nepal - nested data
15 http://www.horizon.ac.uk

10

Multiscale not Multicore: Efficient Heterogeneous Cloud Computing

[15] T. Gazagnaire and A. Madhavapeddy. Staticallytyped value persistence for ML. In Proceedings
of the Workshop on Generative Technologies
(WGT’2010), March 2010.

[29] G. L. Presti, O. Barring, A. Earl, R. M. G. Rioja,
S. Ponce, G. Taurelli, D. Waldron, and M. C. D.
Santos. Castor: A distributed storage resource
facility for high performance data processing at
cern. In MSST ’07: Proceedings of the 24th
IEEE Conference on Mass Storage Systems and
Technologies, pages 275–280, Washington, DC,
USA, 2007. IEEE Computer Society.

[16] D. Gottfrid. Self-service, prorated super computing
fun! - open blog - nytimes.com. New York Times,
November 2007.
[17] S. M. Hand. Self-paging in the Nemesis operating
system. In 3rd Symposium on Operating Systems
Design and Implementation (OSDI), pages 73–86,
Feburary 1999.

[30] B. Schwarz, H. Chen, D. Wagner, J. Lin, W. Tu,
G. Morrison, and J. West. Model checking an
entire Linux distribution for security violations. In
Proceedings of 21st Annual Computer Security
Applications Conference (ACSAC), pages 13–22.
IEEE Computer Society, 2005.

[18] T. Harris and K. Fraser.
Language support
for lightweight transactions.
SIGPLAN Not.,
38(11):388–402, 2003.

[31] V. Springel, S. D. M. White, A. Jenkins, C. S.
Frenk, N. Yoshida, L. Gao, J. Navarro, R. Thacker,
D. Croton, J. Helly, J. A. Peacock, S. Cole,
P. Thomas, H. Couchman, A. Evrard, J. Colberg,
and F. Pearce.
Simulations of the formation,
evolution and clustering of galaxies and quasars.
Nature, 435(7042), 2005.

[19] G. C. Hunt and J. R. Larus. Singularity: rethinking
the software stack. SIGOPS Oper. Syst. Rev.,
41(2):37–49, 2007.
[20] INRIA-Rocquencourt. The Coq proof assistant.
[21] B. Krishnamurthy and C. E. Wills.
On the
Leakage of Personally Identifiable Information Via
Online Social Networks. Second ACM SIGCOMM
Workshop on Online Social Networks, 2009.

[32] W. Taha.
A gentle introduction to multistage programming. In Domain-Specific Program
Generation, volume 3016 of Lecture Notes in
Computer Science, pages 30–50, Dagstuhl Castle,
Germany, March 2004. Springer.

[22] X. Leroy. The Zinc experiment: An economical
implementation of the ML language. 117, INRIA,
1990.

[33] V. Vinge. A Deepness in the Sky. Tor Books, March
1999.

[23] X. Leroy, D. Doligez, J. Garrigue, D. Rémy, and
J. Vouillon. The Objective Caml system, 2005.

[34] R. von Behren, J. Condit, and E. Brewer. Why
events are a bad idea (for high-concurrency
servers). In HOTOS’03: Proceedings of the 9th
conference on Hot Topics in Operating Systems,
pages 4–4, Berkeley, CA, USA, 2003. USENIX
Association.

[24] A. Madhavapeddy.
Creating High-Performance
Statically Type-Safe Network Applications. PhD
thesis, University of Cambridge, 2007.
[25] A. Madhavapeddy.
Combining static model
checking with dynamic enforcement using the
statecall policy language. In K. Breitman and
A. Cavalcanti, editors, ICFEM, volume 5885 of
Lecture Notes in Computer Science, pages 446–
465. Springer, 2009.

[35] J. Vouillon. Lwt: a cooperative thread library. In
ML ’08: Proceedings of the 2008 ACM SIGPLAN
workshop on ML, pages 3–12, New York, NY, USA,
2008. ACM.

[26] A. Madhavapeddy, A. Ho, T. Deegan, D. Scott, and
R. Sohan. Melange: creating a “functional” Internet.
SIGOPS Oper. Syst. Rev., 41(3):101–114, 2007.

[36] P. Wadler. Why no one uses functional languages.
SIGPLAN Not., 33(8):23–27, 1998.
[37] R. Want, T. Pering, G. Danneels, M. Kumar,
M. Sundar, and J. Light. The Personal Server:
Changing the way we think about ubiquitous
computing. In UbiComp ’02: Proceedings of the 4th
international conference on Ubiquitous Computing,
pages 194–209, London, UK, 2002. SpringerVerlag.

[27] K. Mansley, G. Law, D. Riddoch, G. Barzini,
N. Turton, and S. Pope. Getting 10 Gb/s from
Xen: Safe and fast device access from unprivileged
domains. In L. Bougé, M. Forsell, J. L. Träff,
A. Streit, W. Ziegler, M. Alexander, and S. Childs,
editors, Euro-Par Workshops, volume 4854 of
Lecture Notes in Computer Science, pages 224–
233. Springer, 2007.

[38] A. Warfield, K. Fraser, S. Hand, and T. Deegan.
Facilitating the development of soft devices. In
Proceedings of the 2005 USENIX Annual Technical
Conference (General Track), pages 379–382.
USENIX, April 2005.

[28] R. Neugebauer and D. McAuley. Energy is just
another resource: Energy accounting and energy
pricing in the Nemesis OS. In HotOS, pages 67–
72. IEEE Computer Society, 2001.

11

Madhavapeddy, Mortier, Crowcroft and Hand

[39] R. Wishart, D. Corapi, A. Madhavapeddy, and
M. Sloman. Privacy Butler: A personal privacy
rights manager for online presence.
In IEEE
Percom Workshop on Smart Environments 2010,
September 2010.

12