Back to the future in one week —
implementing a Smalltalk VM in PyPy

?

Carl Friedrich Bolz2 , Adrian Kuhn1 , Adrian Lienhard1 , Nicholas D. Matsakis3 ,
Oscar Nierstrasz1 , Lukas Renggli1 , Armin Rigo, and Toon Verwaest1
1

2

Software Composition Group
University of Bern, Switzerland
Softwaretechnik und Programmiersprachen
Heinrich-Heine-Universität Düsseldorf
3
ETH Zurich, Switzerland

Abstract. We report on our experiences with the Spy project, including
implementation details and benchmark results. Spy is a re-implementation
of the Squeak (i.e., Smalltalk-80) VM using the PyPy toolchain. The
PyPy project allows code written in RPython, a subset of Python, to be
translated to a multitude of different backends and architectures. During the translation, many aspects of the implementation can be independently tuned, such as the garbage collection algorithm or threading
implementation. In this way, a whole host of interpreters can be derived from one abstract interpreter definition. Spy aims to bring these
benefits to Squeak, allowing for greater portability and, eventually, improved performance. The current Spy codebase is able to run a small set
of benchmarks that demonstrate performance superior to many similar
Smalltalk VMs, but which still run slower than in Squeak itself. Spy was
built from scratch over the course of a week during a joint Squeak-PyPy
Sprint in Bern last autumn.

1

Introduction

In this paper we present a preliminary report on the Spy project. Spy is an
implementation of the Squeak[4] variant of Smalltalk built using the PyPy
toolchain[9]. The goals of the Spy project are to allow the popular Squeak
platform to be easily ported to high-level runtimes, such as the Java Virtual
Machine (JVM) and Common Language Runtime (CLR), as well as to eventually improve Squeak’s performance through the use of PyPy’s Just-in-time
(JIT) compiler generation techniques. The Spy project also serves to highlight
some of the distinctive features in PyPy’s approach to building virtual machines,
especially when it is compared to Squeak.
Squeak is an open source, full-featured implementation of Smalltalk. One of
its distinctive features is that the virtual machine itself is written in Slang[4], a
restricted subset of Smalltalk. Slang is designed to be easily translated into C,
?

In Self-Sustaining Systems, LNCS, vol. 5142, Springer, 2008, pp. 123-139.

2

C.F. Bolz, et al.

meaning that the core VM can be translated into C and then compiled with a
standard C compiler. This allows Squeak to enjoy reasonably fast execution and
high portability, while preserving the ability to read and understand the VM
source code without leaving Smalltalk.
The PyPy project4 is a toolchain for building interpreters[9]. It allows interpreters to be written in RPython, a restricted form of Python, and then
translated to a more efficient lower-level language for execution. PyPy is able to
translate RPython programs to many different backends, ranging from C source
code, to JavaScript (for execution in the browser), to bytecodes for the JVM
or CLR, although not all of these backends are as full-featured as the others.
In addition to simple translation, PyPy can introduce optimizations along the
way, and can even generate a just-in-time compiler semi-automatically from the
interpreter source. These features are described in depth in other publications[1,
6].
At first glance, it may seem that the role of Slang in Squeak and RPython
in Spy/PyPy are very similar. Both are restricted forms of a dynamic language
used to code the core of the interpreter. However, the similarity is only skin deep.
Slang, the restricted form of Smalltalk used by Squeak, is designed to be easily
translated to C. Slang only contains constructs that can be directly mapped to
C. RPython, on the other hand, is much closer to the full Python language and
includes such features as garbage collection, classes with virtual functions, and
exceptions. Having such facilities available frees the programmer to focus on the
language design issues and ignore the mundane, low-level details of writing a
VM.
The main contributions of this paper are
– We report on our experiences using the PyPy toolchain to realize Spy, a
Smalltalk-80 VM.
– We present implementation details and discuss design decisions of the realized VM.
– We compare benchmarks of Spy with those of similar Smalltalk VMs.
The remainder of this paper is structured as follows: In Section 2 we present
a brief overview of the PyPy project. In Section 3 we explain how the PyPy
approach has been adapted in the Spy project to the implementation of a Squeak
VM. Related work is presented in Section 4. Section 5 presents the results of
various benchmarks to validate the effectiveness of the Spy implementation. We
provide remarks on future work in Section 6 and eventually present in Section 7
our conclusions. Additionally, the source code of the benchmarks is given in
Appendix A, and download and build instructions for both PyPy and Spy are
given in Appendix B.

2

PyPy in a Nutshell

Although the initial goal of the PyPy project was to implement a next-generation
interpreter for Python, the project has gradually evolved into a general-purpose
4

http://codespeak.net/pypy/

Back to the future in one week

3

tool that can be used for any number of languages. In addition to Python and
Smalltalk, (partial) interpreters have been developed for Prolog, Scheme and
JavaScript.
The goal of the PyPy project is to create an environment that makes it
easy to experiment with different virtual machine designs, but without sacrificing efficiency. This is achieved by separating the semantics of the language
being implemented, such as Python or Smalltalk, from low-level aspects of its
implementation, such as memory management or the threading model. A complete interpreter is constructed at build time by weaving together the interpreter
definition and each low-level aspect into a complete and efficient whole[9].
The project currently includes a wide variety of backends that support translations from RPython into C/Posix, LLVM[5], CLI/.NET, Java bytecodes, and
JavaScript, although only the first three are fully functional.

Prolog

Python
Interpreter

Scheme
Other
interpreters…

JavaScript

Type and Flow
Analysis

Specialize for object
oriented environment

CLI backend

pypy.net

prolog.net

JVM backend

js.net

JS backend

JPyPy

js-jvm

Specialize for lowlevel environment

pypy-c

prolog-js

C backend

LLVM backend

scheme-c

prolog-c

…-llvm

pypy-llvm

Fig. 1. The PyPy toolchain specializes high-level interpreters for different languages
into compilers or VMs for different platforms.

The translation process works by using abstract interpretation to convert the
RPython programs into flow graph form. The graphs are then used for wholeprogram type inference, which assigns a static type to all values used in the
program. The ability to perform type inference on the input programs is the

4

C.F. Bolz, et al.

key requirement for the PyPy toolchain. This means that RPython is defined,
rather imprecisely, to be the subset of Python which our tools can statically
check. In practice, RPython forbids runtime reflection and any type-inconsistent
usage of variables (e.g., assigning both integers and object references to the
same variables). Despite these restrictions, RPython is still quite expressive,
supporting exceptions, single inheritance with explicitly declared mixins (instead
of Python’s full multiple inheritance), dynamic dispatch, first class function and
class values, and runtime isinstance and type checks.
Once the flow graphs have been built and typed, they can be transformed
by any number of translation aspects[1] which implement low-level details, such
as garbage collection or a variant of the CPS-transformation. These translation
aspects give tremendous flexibility in controlling the behavior and performance
of the final interpreter and also illustrate one of the advantages of specifying the
interpreter in a higher-level language like RPython. Because RPython does not
specify low-level details such as the garbage collection strategy, the toolchain is
free to implement them however it sees fit. In contrast, changing the traditional,
C-based Python interpreter so as not to use reference counting would require
pervasive changes throughout the entire codebase.
The promise of PyPy and RPython is that it should be possible to develop
a single interpreter source which can be used via different choices of translation aspects and backends, to create a whole family of interpreters on a wide
variety of platforms, as illustrated by Figure 1. This avoids the problem that
many languages face, i.e., to keep the interpreter definition in sync across all
platforms on which it is supported, and to allow all versions to benefit from
new features and optimizations instantly. As an example, consider the Jython
project5 , which defines a Python interpreter on the JVM. Because Jython and
CPython do not share the same source, Jython lags several versions behind its
C counterpart, making it increasingly challenging to use with modern Python
programs. PyPy essentially offers a model-driven approach [10] to programming
language implementation — it transforms platform-independent models (i.e.,
high-level interpreters) into implementations for multiple platforms.
Another advantage of this approach is that since RPython is a proper subset
of Python, an RPython program can be fully tested and easily debugged by
letting it run on a Python interpreter. The program is only translated to a
different language if it is reasonably bug-free. This increases testability, eases
development, and decreases turnaround times.

3

Spy Implementation

Similar to most Smalltalk VMs, Spy consists of four main parts: a bytecode
interpreter, a set of primitives, an image loader, and an object model.
5

http://jython.org/

Back to the future in one week

3.1

5

Interpreter, Primitives, and Image Loading

The core components of a Smalltalk VM are the bytecode interpreter, primitive
methods, and the image loader. For the most part Spy does not deviate from
the traditional Smalltalk VM design [3], though in some cases we made minor
alterations. For example, Spy is not based on an object table, i.e., objects reference each other directly without a level of indirection. This is similar to Squeak’s
approach as described in Section 4.
Bytecodes in Smalltalk are generally used to implement control flow and
message sends, and to introduce constant values into the computation. Spy’s
bytecode interpreter takes a traditional form, consisting of a table of function
pointers, which is indexed by the current bytecode on every iteration.
As a performance optimization, during the translation process to C, we are
able to take advantage of the fact that the function table is immutable. This
allows us to alter the dispatch loop so that it uses a local switch to translate
bytecodes to method calls, rather than having an indirection via the global
opcode table (see Figure 2). This will not only localize lookups but it will also
use direct instead of indirect calls. Which will then allow for further optimizations
such as inlining of the actual code related to the bytecodes.
Before:
table = [method_for_opcode_0, method_for_opcode_1, ...]
while 1:
byte = get_next_byte()
method = table[byte]
method()
After:
while 1:
byte = get_next_byte()
switch on byte:
case 0:
method_for_opcode_0()
case 1:
method_for_opcode_1()
...

Fig. 2. Translation of the dispatch loop from a bytecode table to a local switch.

Compared to other virtual machines, Smalltalk contains relatively few bytecodes. For example, there are no bytecodes for low-level operations such as doing arithmetic. Instead, these operations are implemented as primitive methods,
which are methods that are implemented in the core virtual machine, either for
efficiency’s sake or because they encode a fundamental operation which isn’t
possible to express in the language itself, such as integer addition.

6

C.F. Bolz, et al.

Primitive methods are invoked as the result of normal message sends. When
an object receives a message in Smalltalk, the first thing that happens is the
lookup of the corresponding method implementation. The resulting method object contains the bytecodes to execute and, optionally, a primitive method identifier, which is just a small integer. If a primitive method identifier is supplied,
the VM uses the integer to index into its primitive method table to find a built-in
function to execute.
As in Squeak, primitive methods in Spy are implemented as a series of functions placed into a table. In Spy, however, we are able to take advantage of
several RPython features to make their implementation less tedious and errorprone. The first feature are exceptions: in the Squeak VM, when a primitive
method wants to signal failure, it does so by setting a field, primitiveFailed,
of the global interpreter object to true. This means that all following code must
be guarded to ensure that it does not execute once the primitiveFailed field
is set to true. In RPython, however, we can use a Python exception to signal a
failure condition, resulting in less cluttered code.
The second RPython feature, which proved to be very important is its capacity for meta-programming. Because primitive methods execute directly on
the VM structures, they often contain a certain amount of repetitive code that
loads method arguments from the stack, inspects their types, and finally pushes
any result back onto the stack. Using Python annotations, however, we are able
to attach a declarative tag to each primitive method stating the number of stack
arguments it expects, any preprocessing they require, and whether or not a result
is pushed back on the stack after execution. This not only makes the primitives
shorter, it helps to avoid errors. In particular, we were able to use these annotations to specify when an argument represents an array index: since array indices
are 1-based in Smalltalk, the preprocessor is not only able to confirm that the
index is an integer, but can automatically subtract 1 to convert it to a 0-based
RPython array index, leading to much cleaner code.
Figure 3 shows the definition of the primitive method for computing square
roots. The @expose_primitive annotation on the first line declares both the
primitive code, which is the symbolic constant FLOAT_SQUARE_ROOT, and the
fact that the function expects only one argument from the stack, which should
be a floating point value. Note that the object on the stack is actually a wrapped
floating point value, but the preprocessor automatically inserts code to unwrap
it and extract the RPython floating point value within. This unwrapped value is
passed to the implementation function. Within the body of the function, there
is a test that ensures that the argument is positive which raises an exception
(PrimitiveFailedError) should that not be the case. Otherwise, the square
root is computed using the standard RPython function math.sqrt, wrapped in a
Smalltalk object, and returned. Note that the return value will be automatically
pushed on the stack.
For comparison, Figure 4 shows the the same primitive method in Slang.
The key difference to RPython is that Slang does not provide object-oriented
language features. Slang is, roughly spoken, C code disguised as Smalltalk syntax.

Back to the future in one week

7

@expose_primitive(FLOAT_SQUARE_ROOT, unwrap_spec=[float])
def func(interp, f):
if f < 0.0:
raise PrimitiveFailedError
w_res = utility.wrap_float(math.sqrt(f))
return w_res

Fig. 3. The definition of the primitive square root operation in RPython. The code uses
high-level features, such as method decorators, exceptions, and object-orientation.

For example, to indicate failure, a global field of the interpreter is used rather
than throwing an exception. Pushing and popping has to be done manually.
But in particular the call to #cCode:inSmalltalk: breaks abstractions and
testability: as a first argument it is given a fragment of C code, as a second
argument a Smalltalk closure. When translating the VM down to C, the code
fragment is literally copied into the generated source code. When debugging the
VM within another Smalltalk image, the closure is evaluated. As both are not
causally connected, it might even happen that a bug in the C code does not
appear when debugging the VM and vice versa!
primitiveSquareRoot
| rcvr |
self var: #rcvr type: ’double ’.
rcvr := self popFloat.
self success: rcvr >= 0.0.
successFlag
ifTrue: [self pushFloat:
(self cCode: ’sqrt(rcvr)’ inSmalltalk: [rcvr sqrt])]
ifFalse: [self unPop: 1]

Fig. 4. The definition of the primitive square root operation in Slang. The code is,
roughly spoken, C code disguised as Smalltalk syntax. Object-oriented features are not
used, e.g., failure is signalled with flag rather than an exception.

Image loading is one area where Spy differs significantly from Squeak. Traditionally, a Squeak image is simply a dump of the core memory into a file.
Loading an image can be done by simply memory-mapping the image file, followed by some minimal pointer and integer adjustments. This technique works
well when you can guarantee that the memory layout between virtual machines
is compatible. Unfortunately, the memory layout for a RPython program is not
specified and sometimes even outside of the control of PyPy’s toolchain, if the
translation target is a high-level VM such as the JVM or .NET’s CLR. Since we
wanted to retain compatibility with Squeak’s image formats, we implemented an

8

C.F. Bolz, et al.

image loader that proceeds by parsing the Squeak image file formats, decoding
the object headers, and constructing equivalent objects in our own VM.
3.2

Object Model

The Squeak implementation uses three different addressing schemes for its objects: bytes, words, and pointers. Each object structure begins with a format
word that describes which kind of object it is. This determines how the raw
bytes that make up an object in memory are interpreted: a “bytes” object treats
them as an array of bytes, which is useful for classes like strings. “Words” objects
store words, and are useful for vectors of integers. Finally, “pointers” objects contain pointers, and are used for almost all other kinds of objects. In addition, as
is common in many VMs, small integers are represented as tagged pointers.
The Spy model is somewhat more complex. In addition to bytes, words,
and pointers objects, we have special classes for representing compiled methods,
method and block contexts (stack frames), small integers, and floating point
values. Please refer to Figure 5 for the full class hierarchy.

W_Object
size
getclass
gethash
at0
atput0
fetch
store
shadow_of_my_class
equals
W_SmallInteger

W_Float

W_AbstractObjectWithIdentityHash

value

W_AbstractObjectWithClassReference

W_PointersObject

W_BytesObject

W_CompiledMethod W_ContextPart

W_WordObject

W_BlockContext

W_MethodContext

Fig. 5. Different kinds of objects in the Spy implementation.

All these classes are subclasses of an abstract class representing a Smalltalk
object (W_Object). Therefore they all implement the same interface, which makes
them equivalent from the Smalltalk point of view, and any of them can be used
anyplace that a standard Smalltalk object is expected. The concrete implementation, however, differs from class to class. For the classes that are close to the VM

Back to the future in one week

9

internals (such as compiled methods, method and block contexts), the implementation is made as convenient for the VM as possible, whereas the implementation
of generic Smalltalk objects is less complex.
As illustrated by Figure 5, small integers are implemented by a normal class
that has exactly one field, which contains the value of the integer itself. This
deviates from the original design in the Squeak VM where they are represented
by tagged pointers. To compare results of both styles, we could easily mimic
the behavior of the Squeak VM by plugging an extra transformation into the
toolchain. With the transformation turned on, the resulting C source generated
by the toolchain would actually use tagged pointers as representation of small
integers. This is by itself already another example where the RPython code abstracts over low-level details. We can assume a consistent model everywhere and
do not need to check for tagged pointers throughout the source code, while resulting in the exact same behavior. A small bit of experimentation seemed to
indicate that using tagged pointers for small integers actually worsens performance. The necessity of checking whether a pointer is a heap pointer or a small
integer around every method call offsets all the benefits of the smaller memory
footprint that comes with tagged small integers. It is important to note, however,
how tedious it would be to experimentally introduce or remove tagged pointers
with a traditional, low-level interpreter.
The class hierarchy illustrated by Figure 5 is internal to the VM, it is not
related to Squeak’s class hierarchy. All these classes are internally used for wrapper objects, hence the W prefix, and do not denote the high-level class of objects.
Which high-level class an object has is completely under control of Squeak itself,
it is stored in the W Object.shadow of my class field. Thus, the VM’s class hierarchy can be used to run any version of Squeak. Both Squeak 2.0 and Squeak
3.9 images run with the current implementation of Spy.
3.3

Shadow Objects

As noted in the previous section, Squeak does not distinguish between objects
based on the role that they play in the system, but only based on the kind of
data that they contain (bytes, words, or objects). For example, a class object is
simply an object that is used as a class by some other object. It is not necessarily
an instance of a particular class, though the layout of the object in memory must
be compatible with what the VM expects6 . This implies that, at image loading
time, it is impossible to distinguish which objects are, or will be later, used as
classes, and so we cannot use a special subclass of W_Object to represent them.
Unfortunately, being forced to use a generic data layout for such special kinds
of objects as classes can be very inefficient. The memory layout of Smalltalk data
6

To be used as a class, a Squeak object must have at least three instance variables, of
which the first must refer to its superclass, the second must refer to a method dictionary, and the third must contain a magic number encoding the format of instances.
Any object that meets these criteria, and implements primitive #70 (primitiveNew),
can be used to create instances of itself.

10

C.F. Bolz, et al.

structures were chosen with an eye towards reducing memory consumption, and
not for ease of access. Spy could be made far more efficient if it could use
native RPython data structures instead. For example, each class has a method
dictionary that is normally stored as a native Smalltalk Dictionary instance.
If this method dictionary could be converted by the VM into a native RPython
dictionary, then Spy could take advantage of the highly optimized RPython
dictionary implementation.
To resolve this dilemma, Spy allows every Smalltalk object to have an associated “shadow” object. These shadow objects are not exposed to the Smalltalk
world. They are used by the VM as internal representation and can hold arbitrary information about the actual object. If an object has a shadow object
attached, the shadow is notified whenever the state of the actual object changes,
to keep both views of the object synchronized. One way of looking at shadows is
as a general cache mechanism. However, the approach is far more powerful, since
arbitrary meta-level operations can be triggered when the update notifications
are received7 .
In the current implementation, the shadows are used to attach nicely decoded information about classes to all objects which are used as classes. This
allows any object to be used as a class, even ones that are not instances of the
Smalltalk class Class. The shadow object is created and attached to the class
the first time the VM sees the object being used as a class. It stores all required
information about the class in a convenient, easily accessible data structure (as
opposed to the obscure bit format used at the Smalltalk level). The class shadow
contains the format and size of instances, an RPython-level dictionary containing the compiled methods, and the name of the class (if it has one). All of this
information is kept in sync with the “real” Smalltalk object that the shadow is
associated with.
At the moment8 classes are the only objects that have shadows attached
to them. It seems likely that we will change some of the objects that are now
implemented with special RPython classes to use shadows as well later. For example, Squeak allows arbitrary objects to be used as method and block contexts,
but the current Spy implementation does not. This could be resolved by using
shadow objects to contain any extra information associated with objects that
are used as a context.
3.4

State of the Implementation

The VM parts described in the previous subsections add up to the current implementation of Spy. This implementation is able to load Squeak images (tested
7

8

Shadow objects are not related to the concept of mirrors. Mirrors are a mechanism
to introduce reflection on demand. Shadow objects are an implementation artifact
of our design allowing us to benefit from native RPython data structures. They have
nothing to do with reflection per se, though they could be used for this purpose too.
This paper refers to revision 49630 of Spy on codespeak’s SVN repository, for more
information please refer to the download instructions in Appendix B.

Back to the future in one week

11

with the Squeak 2.0 mini image and more recent Squeak 3.9 images) and execute
all bytecodes and a subset of primitives. The most important missing primitives
are the graphical primitives. We do already support enough for the VM to be
able to run the tinyBenchmarks. Furthermore we are still lacking support for
threading and image saving.

4

Related work

In this section we present existing Smalltalk VMs, and discuss how their implementation relates to Spy. The VM of the Squeak dialect of Smalltalk follows
closely the specification given in the Smalltalk-80 Blue Book[3]. The Blue Book
specifies an object memory format, the bytecodes, the primitives, and the interpreter loop of a Smalltalk VM.
4.1

Squeak VM

The main difference of Squeak’s VM [4] compared to the Smalltalk-80 specification of the Blue Book [3] is the object memory format. The object memory
specified in the blue book is based on an object table. An object table introduces
a level of indirection for object references. In contrast, Squeak implements object references as direct pointers, that is, an object reference is just the address
of that object in memory. Today, this is the common approach taken by most
virtual machines.
Squeak’s object memory layout consists of a header for the class pointer,
hash bits, GC flags, size etc. and a fixed number of fields. There are four kinds
of object formats. Objects with named instance variables, indexed object fields,
indexed word size or byte size fields. Everything, including interpreter-internal
data such as execution contexts, processes, classes, and methods, is represented
as a normal object on the heap. An exception is the case of small integers, which
are represented as tagged pointers. Special objects that have to be known to the
interpreter, for example the process scheduler, are stored in a global table.
The majority of the Squeak VM is implemented in a subset of Squeak Smalltalk, named Slang. The Slang source code is then translated to C code to compile
and link with the low-level, platform-specific C code. Slang is a very restricted
subset of Smalltalk which does not support classes, exception handling, or other
object-oriented language features. Therefore, Slang does not provide a higher
level of abstraction than C.
Nevertheless, using Slang has advantages compared to writing C code manually. First, the translator applies several optimizations such as generating C
switch statements for the dispatch loop or inlining procedure calls. Second, since
Slang is a Smalltalk subset it can be run within another Squeak image, which
can be very useful for debugging. As Squeak allows for incremental compilation,
the implementation of the VM can be changed while it is running. In this way,
time consuming edit-transform-compile-run cycles can be avoided.

12

C.F. Bolz, et al.

The approach taken by PyPy is similar to that of Squeak/Slang, as the VM
implemented in RPython can also be run directly without transformation and
compilation. However, the key difference is that RPython (restricted Python)
is much less restrictive than Slang. RPython provides object-oriented language
features such as objects, class-based inheritance, exceptions, and translationtime reflection and metaprogramming [9, 8]. As discussed throughout this paper, RPython’s extended capabilities simplify the implementation of the VM in
many ways, ranging from using code generation and annotations to avoid boilerplate code, to the automation of complex, far-reaching optimizations like tagged
integers.

5

Evaluation

In this section we present a comparison of performance and codebase size of
different Smalltalk VM implementations.
5.1

Performance Benchmarks

To analyze VM performance, we use the TinyBenchmarks suite which is part of
the Squeak mini image[2]. The TinyBenchmarks tests bytecode interpretation
and message send performance. We refer the reader to Appendix A for the complete source code of the benchmark suite. For the Smalltalk platforms that do
not support direct loading of our reference image, we ported the source code
manually. All the platforms successfully run the TinyBenchmarks and produced
the following two figures:
Bytecodes per second. To compare the performance of a virtual machine, we need
to know how fast the bytecodes are processed by the VM. This is the first number
reported by TinyBenchmarks. The value is calculated from a bytecode-heavy
implementation of the “Sieve of Eratosthenes”. The result is calculated using the
runtime performance of this algorithm and the number of executed bytecodes.
The number of executed bytecodes is the number of bytecodes that Squeak
executes when running the benchmark, which makes the number meaningful
even on different implementations.
Sends per second. In Smalltalk everything happens by message sends, with the
exception of some transparently inlined control structures. Therefore an efficient
implementation of message sends is crucial. The second number reported by
TinyBenchmarks is the message sends (method lookup and method invocation)
per second. It is calculated from the the runtime performance of a send-heavy
recursive calculation of Fibonacci numbers.
In Figure 6 we present the result of running the TinyBenchmarks on various
VMs in relation to the original Squeak VM. The machine used was an Apple Mac

Back to the future in one week

Sends (normalized to Squeak)

13

Bytecodes (normalized to Squeak)

10

1

0.1

0.01

0.001

Hobbes

VisualWorks

OMeta/JS

Pepsi Compiler

Pepsi Smalltalk VM

Potato

SPy VM simulated

SPy VM on the CLI

SPy VM in C

Squeak VM, simulated

Squeak VM

0.0001

Fig. 6. Benchmark results for the TinyBenchmarks for various VMs, normalized to the
Squeak VM

14

C.F. Bolz, et al.

Pro (2 × 3 GHz Dual-Core Intel Xeon, 3 GB RAM). All the benchmarks were
run 20 times; the final numbers are the arithmetic mean of those measurements.
Squeak VM. This is the original Squeak VM, written in Slang and transformed
to C. It is heavily optimized and represents our point of comparison.
Squeak VM, simulated. The Slang code running in the Squeak VM interpreting the image is about 1000 times slower. The system is hardly usable
like this, but it is a valuable means to debug the VM with the Smalltalk
tools.
SPy VM in C. The result of our written VM after a week of intensive development is not at all bad. It runs at about a tenth the speed of the Squeak
VM. This particular version was translated to C, using PyPy’s generational
GC and profile-guided optimizations.
SPy VM on the CLI. Spy translated to CLI (.NET Common Language Infrastructure) bytecode and running that on Mono is a significant factor slower
than translating Spy to C. We assume that this is partially due to PyPy’s
CLI backend rendering some RPython constructs inefficiently.
SPy VM, simulated. This is Spy running untranslated on top of CPython9
(the normal Python interpreter). Similar to running Slang code simulated
on another Squeak, this is unusably slow but very useful for testing and
debugging.
Potato. The VM running on Java is amazingly fast. Certainly this also has
to do with the experience of the author with implementing other Smalltalk
VMs.
Pepsi Smalltalk VM. The Squeak VM written with Pepsi is rather slow. The
reason for this is that the VM is written in a highly dynamic Smalltalk-like
language, which requires a repetition of lookups and message send per single
bytecode in the interpreting VM.
Pepsi Compiler. Eliminating these lookups through the compilation of the
code down to machine language, brings a huge performance boost. Currently
this does not happen automatically through a JIT compiler, but it can be
simulated by compiling the Smalltalk code of our benchmark using the Pepsi
compiler. This removes the interpretation step from the code, but retains the
fully dynamic object model.
OMeta/JS. We ran our OMeta/JS in the Safari Web Browser, as it has one
of the fastest JavaScript engines available. We were amazed that it is in the
same league as the CLI and Pepsi VM.
VisualWorks. VisualWorks is the fastest Smalltalk VM available today. It uses
both sophisticated JIT compiler and memory management technology. The
source code is not publicly available.
Hobbes. Running the benchmark reveals that Hobbes is around 100 times
slower than the original Squeak VM. However, we have to point out that
the Smalltalk-80 user-interface is responsive and comparable in speed to the
machines of that time. A reason for that is certainly that Hobbes is running
on VisualWorks.
9

http://python.org

Back to the future in one week

5.2

15

Lines of Code

To give a rough estimate of the comparative complexity of different VM implementations, we included Table 1 with a listing of approximate size of the
respective codebase, measured in thousands of lines of code (KLOC).

Implementation
Squeak VM
Squeak VM (translation)
Spy VM
Spy VM (translation)
Spy VM (translation)
Hobbes VM
Potato VM
Pepsi VM
Pepsi VM (translation)
OMeta
VisualWorks
#Smalltalk
Little Smalltalk
Little Smalltalk

Language KLOC
Slang
8.9
C
22.8
RPython
2.6
IL
130.4
C
187.7
Smalltalk
10.0
Java
4.7
Pepsi
10.9
C
2.1
Javascript
1.4
C
174.7
Smalltalk
7.0
C
4.0
Java
1.8

Table 1. Comparison of VM implementations in KLOC.

As shown in Table 1, Spy’s RPython source is relatively compact. Spy’s
RPython source measures only 2600 lines of code, whereas the Slang source for
Squeak requires 8900, and even the relatively compact Potato VM weighs in at
4700. This provides further evidence that the higher level of abstraction afforded
by RPython is useful for keeping the implementation clean and uncluttered. As
discussed in Section 3, we took advantage of a number of RPython features,
including annotations, exceptions, and post-processing transformations, to simplify the Spy source and to improve performance. Without such features, Spy
would be significantly more complex and more difficult to maintain. Please also
note that although our implementation is fairly complete, there are still missing
parts (see Section 3.4).

6

Future Work

The section discusses future work regarding Spy. Currently Spy lacks support
for several primitives that are needed to make it a realistic replacement of the
original Squeak VM. In particular these are primitives for UI and threading.
With regard to Squeak’s plugin mechanism, we aim to find a way to reuse its
interfacing with external functions so we can avoid redoing the work to interface

16

C.F. Bolz, et al.

with third-party libraries. When this is done Spy should be a slow but usable
replacement for the original Squeak.
Afterwards, we can concentrate on speed optimizations. We plan to implement some straightforward optimizations in the VM, the most obvious example
being a method cache. The shadow approach described in Section 3 should make
this straightforward, since the shadows of classes are already kept up-to-date automatically and are thus an obvious place to put a method cache. This should
get rid of the most obvious inefficiencies in the current VM.
An area that the PyPy project is currently researching is the automatic
generation of just-in-time compilers from interpreters using partial evaluation
techniques. The language implementor needs to guide this process with a small
number of hints in the interpreter source code. This already works well for PyPy’s
Python interpreter, where speedups of up to 200 times over normal interpretation
can be achieved for simple integer arithmetic [6]. We hope to be able to apply
these techniques to Spy as well to get a high-speed VM implementation that
could eventually surpass Squeak’s performance. This would allow us to get a justin-time compiler with very little effort, while retaining our easy-to-understand
interpreter source code.
While the Spy project specifically tries to use PyPy’s toolchain to implement a Squeak VM, it would be worthwhile future project to try to apply some
of the ideas of the PyPy project to a pure Squeak setting. This would mean implementing a translation toolchain for a subset of Smalltalk that is higher-level
than Slang and then building a VM in it. Doing that would allow it to evaluate
PyPy’s concepts and to explore the design space for this sort of approach.

7

Conclusion

We have described the implementation details of the Spy project, and provided benchmark results which we believe demonstrate the potential of the Spy
project: despite the lack of fundamental optimizations such as a method cache,
and the fact that it was coded in a high-level language (complete with garbage
collection and other modern amenities), Spy delivers performance competitive
with or better than other alternative Squeak implementations.
Spy was developed partly as an experiment to see how suitable the PyPy
toolchain would be for a Smalltalk implementation. We found that PyPy is indeed a very useful tool for quickly implementing a virtual machine. The fact
that Spy was developed in only one week of development attests to the productivity boost offered by PyPy. By using a high-level language like RPython, and
in particular one with support for metaprogramming, we were able to reduce errors and eliminate boilerplate code throughout the system. Furthermore, PyPy’s
support for translation aspects enabled us to experiment with systematic, lowlevel optimizations, such as tagged integers, easily and without changes to the
interpreter source.
We are confident that with further development, Spy could join Squeak as a
realistic platform for Smalltalk development.

Back to the future in one week

17

Acknowledgments
We thank Orla Greevy for her careful review of a draft of this paper. The
members of the Software Composition Group, University of Bern, gratefully
acknowledge the financial support of the Swiss National Science Foundation for
the project “Analyzing, capturing and taming software change” (SNF Project
No. 200020-113342, Oct. 2006 - Sept. 2008), and the Hasler Foundation for the
project “Enabling the evolution of J2EE applications through reverse engineering and quality assurance”. Adrian Kuhn acknowledges the support of CHOOSE,
the Swiss Group for Object-Oriented Systems and Environments.

References
1. Carl Friedrich Bolz and Armin Rigo. Memory management and threading models
as translation aspects – solutions and challenges. Technical report, PyPy Consortium, 2005. http://codespeak.net/pypy/dist/pypy/doc/index-report.html.
2. Marcus Denker. Entwurf von optimierungen für squeak, 2002. Studienarbeit,
Universität Karlsruhe.
3. Adele Goldberg and David Robson. Smalltalk 80: the Language and its Implementation. Addison Wesley, Reading, Mass., May 1983.
4. Dan Ingalls, Ted Kaehler, John Maloney, Scott Wallace, and Alan Kay. Back to the
future: The story of Squeak, a practical Smalltalk written in itself. In Proceedings
of the 12th ACM SIGPLAN conference on Object-oriented programming, systems,
languages, and applications (OOPSLA’97), pages 318–326. ACM Press, November
1997.
5. Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong
Program Analysis & Transformation. In Proceedings of the 2004 International
Symposium on Code Generation and Optimization (CGO’04), Mar 2004.
6. Samuele Pedroni and Armin Rigo. JIT compiler architecture. Technical report, PyPy Consortium, 2007. http://codespeak.net/pypy/dist/pypy/doc/indexreport.html.
7. Ian Piumarta and Alessandro Warth. Open reusable object models. Technical
report, Viewpoints Research Institute, 2006. VPRI Research Note RN-2006-003-a.
8. Armin Rigo, Michael Hudson, and Samuele Pedroni.
Compiling dynamic language implementations. Technical report, PyPy Consortium, 2005.
http://codespeak.net/pypy/dist/pypy/doc/index-report.html.
9. Armin Rigo and Samuele Pedroni. PyPy’s approach to virtual machine construction. In Proceedings of the 2006 conference on Dynamic languages symposium,
OOPSLA ’06: Companion to the 21st ACM SIGPLAN conference on Objectoriented programming systems, languages, and applications, pages 944–953, New
York, NY, USA, 2006. ACM.
10. Douglas C. Schmidt. Guest editor’s introduction: Model-driven engineering. Computer, 39(2):25–31, 2006.
11. Alessandro Warth and Ian Piumarta. OMeta: an object-oriented language for
pattern matching. In DLS ’07: Proceedings of the 2007 symposium on Dynamic
languages, pages 11–19, New York, NY, USA, 2007. ACM.

18

A

C.F. Bolz, et al.

Source code of TinyBenchmarks

Number>>tinyBenchmarks
| t1 t2 r n1 n2 |
n1 := 1.
[ t1 := Time millisecondsToRun: [ n1 benchmarkPrimes ].
t1 < 1000 ] whileTrue: [ n1 := n1 * 2 ].
n2 := 28.
[ t2 := Time millisecondsToRun: [ r := 28 benchFibonacci ].
t2 < 1000 ] whileTrue: [ n2 := n2 + 1 ].
^ ((n1 * 500000 * 1000) // t1) printString , ’ bytecodes/sec; ’ ,
((r * 1000) // t2) printString , ’ sends/sec’
Number>>benchmarkPrimes
| size flags prime k count |
size := 8190.
1 to: self do: [ :iter |
count := 0.
flags := (Array new: size) atAllPut: true.
1 to: size do: [ :i |
(flags at: i) ifTrue: [
prime := i + 1.
k := i + prime.
[ k <= size ] whileTrue: [
flags at: k put: false.
k := k + prime ].
count := count + 1 ] ] ].
^ count
Number>>benchFibonacci
^ self < 2
ifTrue: [ 1 ]
ifFalse: [
(self - 1) benchmarkFibonacci
+ (self - 2) benchmarkFibonacci + 1 ]

B

How to download and run the Spy project

Make sure you are running Python version 2.5 or higher, and checkout the project
from subversion
> svn co http://codespeak.net/svn/pypy/dist pypy-dist
> cd pypy-dist

Back to the future in one week

19

Now, let’s generate some Squeak VMs. Switch to the translation goal folder
and run the toolchain
> cd pypy/translator/goal
> ./translate.py --gc=generation --batch targettinybenchsmalltalk.py
To run the generated executable:
> ./targettinybenchsmalltalk-c
If you browse the target’s Python file, you’ll find some fixture code together
with a function called entry point(argv). The fixture code is executed before
the toolchain takes over. It may use the full power of Python and is not restricted
to RPython. Then, the toolchain is started up, taking the entry point function
and the fixture’s result as an input, to generate the VM. Therefore, all code
eventually called by the entry point must conform to RPython.