Automating System Tests Using Declarative Virtual Machines Sander van der Burg Delft University of Technology Delft, The Netherlands Eelco Dolstra Delft University of Technology Delft, The Netherlands s.vanderburg@tudelft.nl e.dolstra@tudelft.nl Abstract—Automated regression test suites are an essential software engineering practice: they provide developers with rapid feedback on the impact of changes to a system’s source code. The inclusion of a test case in an automated test suite requires that the system’s build process can automatically provide all the environmental dependencies of the test. These are external elements necessary for a test to succeed, such as shared libraries, running programs, and so on. For some tests (e.g., a compiler’s), these requirements are simple to meet. However, many kinds of tests, especially at the integration or system level, have complex dependencies that are hard to provide automatically, such as running database servers, administrative privileges, services on external machines or specific network topologies. As such dependencies make tests difficult to script, they are often only performed manually, if at all. This particularly affects testing of distributed systems and system-level software. This paper shows how we can automatically instantiate the complex environments necessary for tests by creating (networks of) virtual machines on the fly from declarative specifications. Building on NixOS, a Linux distribution with a declarative configuration model, these specifications concisely model the required environmental dependencies. We also describe techniques that allow efficient instantiation of VMs. As a result, complex system tests become as easy to specify and execute as unit tests. We evaluate our approach using a number of representative problems, including automated regression testing of a Linux distribution. I. I NTRODUCTION Automated regression test suites are an essential software engineering practice, as they provide developers with rapid feedback on the impact of changes to a system’s source code. By integrating such tests in the build process of a software project, developers can quickly determine whether a change breaks some functionality. These tests are easy to realise for certain kinds of software: for instance, a regression test for a compiler simply compiles a test case, runs it, and verifies its output; similarly, with some scaffolding, a unit test typically just calls some piece of code and checks its result. However, other regression tests, particularly at the integration or system level, are significantly harder to automate because they have complex requirements on the environment in which the tests execute. For instance, they might require running database servers, administrative privileges, services on external machines or specific network topologies. This is especially a concern for distributed systems and system-level software. Consider for instance the following motivating examples used in this paper: • • • OpenSSH is an implementation of the Secure Shell protocol that allows users to securely log in on remote systems. One component is the program sshd, the secure shell server daemon, which accepts connections from the SSH client program ssh, handles authentication, starts a shell under the requested user account, and so on. A test of the daemon must run with super-user privileges on Unix (i.e. as root) because the daemon must be able to change its user identity to that of the user logging in. It also requires the existence of several user accounts. Quake 3 Arena is a multiplayer first-person shooter video game. An automated regression test of the multiplayer functionality must start a server and a client and verify that the client can connect to the server successfully. Thus, such a test requires multiple machines. In addition, the clients (when running on Unix) require a running X11 server for their graphical user interfaces. Transmission is a Bittorrent client, managing downloads and uploads of files using the peer-to-peer Bittorrent protocol. Peer-to-peer operation requires that a running Bittorrent client is reachable by remote Bittorrent clients. This is not directly possible if a client resides behind a router that performs Network Address Translation (NAT) to map internal IP addresses to a single external IP address. Transmission clients automatically attempt to enable port forwarding in routers using the UPnP-IGD (Universal Plug and Play Internet Gateway Device) protocol. A regression test for this feature thus requires a network topology consisting of an IGD-enabled router, a client “behind” the router, and a client on the outside. The test succeeds if the second client can connect to the first through the router. Thus, each of these tests requires special privileges, system services, external machines, or network topologies. This makes them difficult to include in an automated regression test suite: as these are typically started from (say) a Makefile on a developer’s machine prior to checking in changes, or on a continuous build system, it is important that they are self-contained. That is, the test suite should set up the complete environment that it requires. Without this property, test environments must be set up manually. For instance, we could manually set up a set of (virtual) machines to run the Transmission test suite, but this is labourious and inflexible. As a result, such tests tend to be done on an ad hoc, semi-manual basis. For example, many major system Unix packages (e.g. OpenSSH, the Linux kernel or the Apache web server) do not have automated test suites. In this paper, we describe an approach to make such tests as easy to write and execute as conventional tests that do not have complex environmental dependencies. This opens a whole class of automated regression tests to developers. In this approach, a test declaratively specifies the environment necessary for the test, such as machines and network topologies, along with an imperative test script. From the specification of the environment, we can then automatically build and instantiate virtual machines (VMs) that implement the specification, and execute the test script in the VMs. We achieve this goal by expanding on our previous work on NixOS, a Linux-based operating system distribution [1], which in turn builds on the purely functional package manager Nix [2]. In NixOS, the entire operating system – system packages such as the kernel, server packages such as Apache, end-user packages such as Firefox, configuration files in /etc and boot scripts, and so on – is built from source from a specification in what is in essence a purely functional “Makefile”. (We give an overview of the relevant concepts in Nix and NixOS in Section II.) The fact that Nix builds from a purely functional specification means that configurations can easily be reproduced. The latter aspect forms the basis for this paper. In a normal NixOS system, the configuration specification is used to build and activate a configuration on the local machine. In Section III, we show how these specifications can be used to produce the environment necessary for running a single-machine test. We also describe techniques that allow space and time-efficient instantiation of VMs. In Section IV, we extend this approach to networks of VMs. We discuss various aspects and applications of our work in Section V, including distributed code coverage analysis and continuous build systems. We have applied our approach to several realworld scenarios; we quantify this experience in Section VI. II. BACKGROUND : N IX AND N IX OS In our approach, virtual machines are specified and built using the purely functional package manager Nix, and the VM instances that we build from the specifications are instances of NixOS, a Linux distribution based on Nix. In this section we give a brief overview of Nix and NixOS. A. Nix For the purposes of this paper, Nix [2] (http://nixos.org/) can be seen as a purely functional “Make”. That is, like Make [3] and many other build tools, it performs build actions on the basis of a declarative specification of a graph of actions and their dependencies, but unlike Make, the specification is given in a lazy, purely functional language – the Nix expression language. This allows much more powerful abstractions to be expressed. Moreover, Nix stores the results of build actions such that they cannot interfere with each other, e.g. that the results of multiple invocations of a function do not overwrite each other. It stores the output of a build step, or derivation, under a unique path such as /nix/store/q325djkc1ivlfyzan22197dc62gbq04z-firefox-3.5 where q325djkc1ivl... is a cryptographic hash of the inputs of the derivation, such as sources, compilers, libraries and build scripts. The fundamental operation in the Nix expression language is the built-in function derivation, which takes as argument a set of name/value pairs, or attributes: derivation { name = "foo"; builder = "${bash}/bin/sh"; args = [ "-c" "echo Hello $who > $out" ]; who = "world"; } A derivation describes the invocation of a command (usually a shell script) that must produce output under a path in the Nix store. The derivation is built by executing a program, whose path and command-line arguments are specified in the attributes builder and args, respectively. The other attributes are passed to the builder as environment variables. Attribute values can be (lists of) strings or other derivations. The latter denote the dependencies of the current derivation. When building a derivation, its dependencies are built first. The path of each dependency’s output in the Nix store is placed in the corresponding environment variable. Strings can also contain references to other derivations, enclosed in ${...}. These are replaced by the derivation’s output path in the Nix store. For instance, if we evaluate the foo derivation above, first the derivation denoted by the variable bash (not shown here) is built, resulting in a store path like /nix/store/49ndfiqrlc9b...bash-4.0-p17. Then foo is built, with the environment variable who set to world. Nix passes the intended location of the output in the Nix store, computed by hashing the input attributes, through the environment variable out. Thus, the derivation above will write the string Hello world to a path such as /nix/store/6dsdb0j20n3b...-foo. A derivation can build anything, as long as it is pure, i.e. depends only on its explicitly defined inputs, and produces output under the path denoted by the environment variable out. Nix is primarily intended as a deployment tool – a package manager. Thus derivations are typically large steps that build entire packages. Figure 1 shows an example of a Nix expression to build the Apache web server. The language construct rec { ... } defines a set of variable rec { httpd = stdenv.mkDerivation { name = "apache-httpd-2.2.13"; src = fetchurl { url = http://.../httpd-2.2.13.tar.bz2; md5 = "8d8d904e7342125825ec70f03c5745ef"; }; buildInputs = [ perl apr aprutil pcre openssl ]; configureFlags = "--enable-mods-shared=all ..."; }; apr = stdenv.mkDerivation { name = "apr-1.3.8"; ... }; stdenv.mkDerivation = args: derivation { builder = ... ’’ PATH=${gcc}/bin:${coreutils}/bin:... tar xf ${args.src} ./configure --prefix=$out \ ${args.configureFlags} make make install ’’; ... }; ... } /nix/store snws5xld6iyx...-apache-httpd-2.2.13 bin httpd apachectl rl384gzsay47...-apr-1.3.8 lib libapr-1.so.0.3.8 nqapqr5cyk4k...-glibc-2.9 lib ld-linux.so.2 libc.so.6 ... Figure 2. Result of building Apache in the Nix store { config, pkgs, ... }: { services.httpd.enable = true; services.httpd.documentRoot = "/www-root"; services.xserver.enable = true; services.desktopManager.kde4.enable = true; environment.systemPackages = [ pkgs.firefox ]; } Figure 3. Figure 1. NixOS configuration module pkgs.nix: Nix expression to build Apache bindings that can refer to each other, e.g. httpd refers to apr (the derivation that builds the Apache runtime package). The derivation httpd shows the use of function abstractions to capture common build patterns: it calls the function stdenv.mkDerivation, which performs a build of a standard Unix-style package (namely, unpack the source, run an Autoconf configure script, run make to build, and finally make install to install the package under $out). Functions are defined using the syntax arg: body. Functions can also pattern-match on attribute sets: a function {arg1 , ..., argn }: body must be called with an attribute set containing the named attributes. Ellipses can be used in the argument list to denote that additional attributes are to be ignored. We can build Apache from the command line as follows: $ nix-build pkgs.nix -A httpd This builds the attribute httpd from the file pkgs.nix in Figure 1, after building its dependencies such as perl, apr and gcc. The result of building Apache on the Nix store is seen in Figure 2. There is a large distribution of Nix expressions, the Nix Packages collection, that contains almost 2500 packages, and supports a variety of operating systems. B. NixOS Nix has been used to build a Linux distribution, NixOS [1]. NixOS uses Nix to build the entire system from a specification in the Nix expression language – not just software packages. All static parts of the system – packages, the kernel, boot scripts, scripts to manage system services, configuration data, and so on – are built by Nix derivations. In fact, there is a single top-level derivation that, when built, causes all static parts of the system to be built as dependencies. The practical advantages of a purely functional approach to system configuration management are that upgrading the system is safe (since the old configuration in the Nix store is not overwritten) and reliable (since due to purity it does not rely on the previous state of the system), we can always roll back to previous configurations, and we can deterministically rebuild a configuration. NixOS has a declarative configuration model. The Nix expressions that constitute NixOS are organised into modules that together build the system. NixOS currently consists of around 125 modules, each implementing some part of the system (e.g. building the boot scripts, the Apache configuration, or the X11 GUI environment). In addition, the end-user configuration of a NixOS machine is also specified as a module. Figure 3 shows an example of a NixOS module specifying the high-level configuration of a system. It states that the system should run Apache to serve files in the directory /www-root, have a graphical user interface running the KDE desktop environment, and provide the Firefox web browser to users. Other modules compute values that depend on this configuration. For instance, the values of the attributes services.httpd.enable and services.httpd.documentRoot are used by another module – the Apache web server module – to determine whether to generate a script to start and manage Apache, as well as the contents of its configuration file httpd.conf. The basic structure of a NixOS module is: { config, pkgs, ... }: { ... configuration values ... } That is, a module is a function that accepts at least two arguments: config, which contains the full system configuration, and pkgs, which contains the Nix Packages collection for convenience. For instance, the value pkgs.httpd is the derivation that builds the Apache web server. The system configuration config is computed by calling every NixOS module and merging the attribute sets of configuration values returned by each. The result of the merge is passed back as the config function argument to each module. (This is possible because the Nix expression language is lazy.) Thus each module contributes values to the set config and can use values defined by other modules. Most configuration values are system options relevant to end users, but others are “computed” values that are derived from other configuration values. For instance, the value of the attribute build.system.kernel is a derivation that builds the Linux kernel. The entire system is built by the attribute build.system.toplevel, whose value is a derivation that has all other parts of the system as dependencies. Thus, the following command builds the entire operating system, including all packages, scripts, configuration files and system services: $ nix-build /etc/nixos/nixos \ -A config.build.system.toplevel The important property here is that NixOS provides us with a way to deterministically and automatically build a complete operating system environment, with all its dependencies, from a declarative specification. As we shall see in the next section, we can define other “top-level” derivations that instantiate virtual machines from a configuration, and extend the single-machine specifications such as the one in Figure 3 to networks of machines. III. S INGLE - MACHINE TESTS NixOS system configurations allow developers to concisely specify the environment necessary for a integration or system test and instantiate virtual machines from these, even if such tests require special privileges or running system services. After all, inside a virtual machine, we can do any actions that would be dangerous or not permitted on the host machine. We first address single-machine tests; in the next section, we extend this to networks of machines. A. Specifying and running tests Figure 4 shows an implementation of the OpenSSH regression test described in Section I. It consists of two parts: a declarative specification of the machine in which the test is to be performed (the attribute machine), and an imperative let openssh = stdenv.mkDerivation { ... }; in makeTest { machine = { config, pkgs, ... }: { users.extraUsers = [ { name = "sshd"; home = "/var/empty"; } { name = "bob"; home = "/home/bob"; } ]; }; testScript = ’’ $machine→succeed( "${openssh}/bin/ssh-keygen " . "-f /etc/ssh/ssh_host_dsa_key", "${openssh}/sbin/sshd -f /dev/null", "mkdir -m 700 /root/.ssh /home/bob/.ssh", "${openssh}/bin/ssh-keygen " . "-f /root/.ssh/id_dsa", "cp /root/.ssh/id_dsa.pub " . "/home/bob/.ssh/authorized_keys"); $machine→waitForOpenPort(22); $machine→succeed("${openssh}/bin/ssh " . "bob\@localhost ’echo \$USER’") eq "bob\n" or die; ’’; } Figure 4. openssh.nix: Specification of an OpenSSH regression test test script (testScript). The machine specification is very simple: all that we need for the test beyond a basic NixOS machine is the existence of two user accounts (sshd for the SSH daemon’s privilege separation feature, and bob as a test account for logging in). The test script is a Perl script running on the host that performs operations in the virtual machine (the guest) using a number of primitives. For instance, succeed executes shell commands in the guest and aborts the test if they fail, while waitForOpenPort waits until the guest is listening on the specified TCP port. The OpenSSH test script creates an SSH host key (required by the daemon to allow clients to verify that they are connecting to the right machine), starts the daemon, creates a public/private key pair, add the public key to Bob’s list of allowed keys, waits until the daemon is ready to accept connections, and logs in as Bob using the private key. Finally, it verifies that the SSH session did indeed log in as Bob and signals failure otherwise. The function makeTest applied to machine and testScript evaluates to two attributes: vm, a derivation that builds a script to starts a NixOS VM matching the specification in machine; and test, a derivation that depends on vm, runs its script to start the VM, and then executes testScript. Thus, the following command performs the OpenSSH test: $ nix-build openssh.nix -A test That is, it builds the OpenSSH package as well as a complete NixOS instance with its hundreds of dependencies. For interactive testing, a developer can also do: $ nix-build openssh.nix -A vm $ ./result/bin/run-vm (The call to nix-build leaves a symbolic link result to the output of the vm derivation in the Nix store.) This starts the virtual machine on the user’s desktop, booting a NixOS instance with the specified functionality. B. Implementation Two important practical advantages of our approach are that the implementation of the VM requires no root privileges and is self-contained (openssh.nix and the expressions that build NixOS completely describe how to build a VM automatically). These properties allow such tests to be included in an automated regression test suite. Virtual machines are built by the NixOS module qemuvm.nix that defines a configuration value system.build.vm, which is a derivation to build a shell script that starts the NixOS system built by system.build.toplevel in a virtual machine. We use QEMU/KVM (http://www.linux-kvm.org/), a modified version of the open source QEMU processor emulator that uses the hardware virtualisation features of modern CPUs to run VMs at near-native speed. An important feature of QEMU over most other VM implementations is that it allows VM instances to be easily started and controlled from the command line. This includes the fully automated starting, running and stopping of a VM in a derivation. Furthermore, QEMU provides special support for booting Linux-based operating systems: it can directly boot from a kernel and initial ramdisk image on the host filesystem, rather than requiring a full hard disk image with that kernel installed. (The initial ramdisk in Linux is a small filesystem image responsible for mounting the real root filesystem.) For instance, the system.build.vm derivation generates essentially this script: ${pkgs.qemu_kvm}/bin/qemu-system-x86_64 -smb / -kernel ${config.boot.kernelPackages.kernel} -initrd ${config.system.build.initialRamdisk} -append "init=${config.system.build.bootStage2} systemConfig=${config.system.build.toplevel}" The system.build.vm derivation does not build a virtual hard disk image for the VM, as is usual. Rather, the initial ramdisk of the VM mounts the Nix store of the host through the network filesystem CIFS (the -smb / option above); QEMU automatically starts a CIFS server on the host to service requests from the guest. This is a crucial feature: the set of dependencies of a system is hundreds of megabytes in size at the least, so to build such an image every time we reconfigure the VMs would be very wasteful in time and space. Thus, rebuilding a VM after a configuration change usually takes only a few seconds. The VM start script does create an empty ext3 root filesystem for the guest at startup, to hold mutable state such as the contents of /var or the system account file /etc/passwd. Thanks to sparse allocation of blocks in the virtual disk image, image creation takes a fraction of a second. NixOS’s boot process is self-initialising: it initialises all state needed to run the system. For interactive use, the filesystem is preserved across restarts of the VM, saved in the image file ./hostname.qcow2. QEMU provides virtualised network connectivity to VMs. The VM has a network interface, eth0, that allows it to talk to the host. The guest has IP address 10.0.2.15, QEMU’s virtual gateway to the host is 10.0.2.2, and the CIFS server is 10.0.2.4. This feature is implemented entirely in user space: it requires no root privileges. The test script executes commands on the VM using a root shell running on the VM that receives commands from the host through a TCP connection between the VM and the host. (The remotely accessible root shell is provided by a NixOS module added to the machine configuration by makeTest and does not exist in normal use.) QEMU allows TCP ports in the guest network to be forwarded to Unix domain sockets [4] on the host. This is important for security: we do not want anybody other than the test script connecting to the port. Also, in continuous build environments any number of builds may execute concurrently; the use of fixed TCP ports on the host would preclude this. IV. D ISTRIBUTED TESTS Many typical system tests are distributed: they require multiple machines to execute. Therefore a natural extension of the declarative machine specifications in the previous section is to specify entire networks of machines, including their topologies. A. Specifying networks Figure 5 shows a small automated test for Quake 3 Arena. As described in Section I, it verifies that clients can successfully join a game running on a non-graphical server. Here makeTest is called with a nodes argument instead of a single machine. This is a set specifying all the machines in the network; each attribute is a NixOS configuration module. In this case, it specifies a network of two machines: server, which automatically starts a Quake server daemon, and client, which runs an X11 graphical user interface and has the Quake client installed, but otherwise does nothing. The attribute test returned by makeTest evaluates to a derivation that executes the VMs in a virtual network and runs the given test suite. The test script uses additional test primitives, such as waitForJob (which waits until the given system service has started successfully) and screenshot (which takes a screenshot of the virtual display of the VM). The test script in the example first starts all machines and waits until they are ready. This speeds up the test as it boots the machines in parallel; otherwise they are only booted on demand, i.e., when the test script performs an action on the machine. It then executes a command on the client to start a graphical Quake client and connect to the server. After a while, we verify on the server that the client did indeed makeTest { nodes = { server = { config, pkgs, ... }: { jobs.quake3Server = { startOn = "startup"; exec = "${pkgs.quake3demo}/bin/quake3" + " +set dedicated 1" + " +set g_gametype 0" + " +map q3dm7 +addbot grunt" + " 2> /tmp/log"; }; }; client = { config, pkgs, ... }: { services.xserver.enable = true; environment.systemPackages = [ pkgs.quake3demo ]; }; }; testScript = ’’ startAll; $server→waitForJob("quake3-server"); $client→waitForX; $client→succeed( "quake3 +set name Foo +connect server &"); sleep 40; $server→succeed( "grep ’Foo.*entered the game’ /tmp/log"); $client→screenshot("screen.png"); ’’; } Figure 5. Specification of a Quake client/server regression test connect. The derivation will fail to build if this is not the case. Finally, we make a screenshot of the client to allow visual inspection of the end state, if desired. GUI testing is a notoriously difficult subject [5]. The point here is not to make a contribution to GUI testing techniques per se, but to show that we can easily set up the infrastructure needed for such tests. In the test script, we can run any desired automated GUI testing tool. The virtual machines can talk to each other because they are connected together into a virtual network. Each VM has a network interface eth1 with an IP address in the private range 192.168.1.n assigned in sequential order by makeTest. (Recall that each VM also has a network interface eth0 to communicate with the host.) QEMU propagates any packet sent on this interface to all other VMs in the same virtual network. The machines are assigned hostnames equal to the corresponding attribute name in the model, so the machine built from the server attribute has hostname server. B. Complex topologies The Quake test has a trivial network topology: all machines are on the same virtual network. The test of the port forwarding feature in the Transmission Bittorrent client (described in Section I) requires a more complicated topology: nodes = { tracker = { config, pkgs, ... }: { environment.systemPackages = [ pkgs.transmission pkgs.bittorrent ]; services.httpd.enable = true; services.httpd.documentRoot = "/tmp"; }; router = { config, pkgs, ... }: { environment.systemPackages = [ iptables miniupnpd ]; virtualisation.vlans = [ 1 2 ]; }; client1 = { config, pkgs, nodes, ... }: { environment.systemPackages = [transmission]; virtualisation.vlans = [ 2 ]; networking.defaultGateway = nodes.router .config.networking.ifaces.eth2.ipAddress; }; client2 = { config, pkgs, ... }: { environment.systemPackages = [transmission]; }; }; Figure 6. Network specification for the Transmission regression test an “inside” network, representing a typical home network behind a router, and an “outside” network, representing the Internet. The router should be connected to both networks and provide Network Address Translation (NAT) from the inside network to the outside. Machines on the inside should not be directly reachable from the outside. Thus, we cannot do this with a single virtual network. To support such scenarios, makeTest can create an arbitrary number of virtual networks, and allows each machine specification to declare in the option virtualisation.vlans to what networks they should be connected. Figure 6 shows the specification of the machines for the Transmission test. It has two virtual networks, identified as 1 (the “outside” network) and 2 (the “inside”). There are four machines: router is connected to both, client1 is connected to 2, while tracker and client2 are connected to 1. (If virtualisation.vlans is omitted, it defaults to 1.) The tracker runs the Apache web server to make torrent files available to the clients. The configuration further specifies what packages should be installed on what machines, e.g., the router needs the iptables and miniupnpd packages for its NAT and UPnP-IGD functionality. The test, shown in Figure 7, proceeds as follows. We first initialise NAT on the router. We then create a torrent file on the tracker and start the tracker program, a central Bittorrent component that keeps track of the clients that are sharing a given file, on port 6969. Also on the tracker we start the initial seeder, a client that provides the initial copy of the file so that other clients can obtain it. We then start a download on the client behind the router and wait until it finishes. tual specification and can be used to create virtual machines automatically, with a single command-line invocation [6]. However, there are many limitations to such tools: • Having a single formalism that describes the construction of an entire network from source makes hard things # Create the torrent and start the tracker. easy, such as building part of the system with coverage $tracker→succeed( "cp ${file} /tmp/test", analysis. In a tool such as Kickstart, the binary software "transmissioncli -n /tmp/test /tmp/test.torrent", packages are a given; we cannot easily modify the build "bittorrent-tracker --port 6969 &"); processes of those packages. $tracker→waitForOpenPort(6969); • For testing in virtual machines, it is important that # Start the initial seeder. VMs can be built efficiently. With Nix, this is the case my $pid = $tracker→background( because the VM can use the host’s Nix store. With "transmissioncli /tmp/test.torrent -w /tmp"); other package managers, that is not an option because # Download from the first (NATted) client. the host filesystem may not contain the (versions of) $client1→succeed("transmissioncli " . packages that a VM needs. One would also need to be "http://tracker/test.torrent -w /tmp &"); $client1→waitForFile("/tmp/test"); root to install packages on the host, making any such approach undesirable for automated test suites. # Bring down the initial seeder. • For automatic testing, one needs a formalism to de$tracker→succeed("kill -9 $pid"); scribe the desired configurations. In NixOS this is # Now download from the second client. already given: it is what users use to describe regular $client2→succeed("transmissioncli " . system configurations. In conventional Unix systems, "http://tracker/test.torrent -w /tmp &"); $client2→waitForFile("/tmp/test"); the configuration is a result of many “unmanaged” ’’; modifications to system configuration files (e.g. in /etc). Thus, given an existing Unix system, it is hard to Figure 7. Test script for the Transmission regression test distill the “logical” configuration of a system (i.e., specification in terms of high-level requirements) from the multitude of configuration files. If Transmission and miniupnpd work correctly in concert, Operating system generality: The network specificathe router should now have opened a port forwarding that tions described in this paper build upon NixOS: they build allows the second client to connect to the first client. To NixOS operating system instances. This obviously limits the verify that this is the case, we shut down the initial seeder generality of our current implementation: a test that must and start a download on the second client. This download run on a Windows machine cannot be accommodated. In can only succeed if the first client is reachable through the this sense, it shows an “ideal” situation, in which entire NAT router. networks of machines can be built from a purely functional Each virtual network is implemented as a separate QEMU specification. Nixpkgs does contain functions to build virtual network; thus a VM cannot send packets to a network machines for Linux distributions based on the RPM or Apt to which it is not connected. Machines are assigned IP package managers, such as Fedora and Ubuntu. These can addresses 192.168.n.m, where n is the number of the be supported in network specifications, though they would network and m is the number of the machine, and have be harder to configure since they are not declarative. On Ethernet interfaces connected to the requested networks. For the other hand, platforms such as Windows that lack fineexample, the router will have interfaces eth1 with IP address grained package management mechanisms are difficult to 192.168.1.3 and eth2 with address 192.168.2.3, while the support in a useful manner. Such platforms would require first client will only have an interface eth1 with IP address pre-configured virtual images, which are inflexible. 192.168.2.1. The test infrastructure provides operations to The Nix package manager itself is portable across a simulate events such as network outages or machine crashes. variety of operating systems, and the generation of virtual machines works on any Linux host machine (and probably V. D ISCUSSION other operating systems supported by QEMU). The fact Declarative model: To what extent do we need the that the guest OS is NixOS is usually fine for automated properties of Nix and NixOS, in particular the fact that an regression test suites, since many test cases do not care about entire operating system environment is built from source the specific type of guest Linux distribution. from a specification in a single formalism, and the purely Test tool generality: Our approach can support any fully non-interactive test tool that can build and run on Linux. functional nature of the Nix store? There are many tools to automate deployment of machines. For instance, Red Hat’s Since Nix derivations run non-interactively, tools that require Kickstart tool installs RPM-based Linux systems from a texuser intervention (e.g., interactive GUI testing tools) are not testScript = ’’ # Enable NAT on the router and start miniupnpd. $router→succeed( "iptables -t nat -F", ... "miniupnpd -f ${miniupnpdConf}"); Figure 8. Part of the distributed code coverage analysis report for the Subversion web service supported. Likewise, only systems with a fully automated build process are supported. Distributed coverage analysis: Declarative specifications of networks and associated test suites make it easy to perform distributed code coverage analysis. Again, we make no contributions to the technique of coverage analysis itself; we improve its deployability. First, the abstraction facilities of the Nix expression language make it easy to specify that parts of the dependency graph of a large system are to be compiled with coverage instrumentation (or any other form of build-time instrumentation one might want to apply). Second, by collecting coverage data from every machine in a test run of a virtual network, we get more complete coverage information. Consider for instance, a typical configuration of the Subversion revision control system: clients run the Subversion client software, while a server runs a Subversion module plugged into the Apache web server to provide remote access to repositories through the WebDAV protocol. These are both built from the Subversion code base. If a client performs a checkout from a server, different paths in the Subversion code will be exercised on the client than on the server. The coverage data on both machines should be combined to get a full picture. We can add coverage instrumentation to a package using the configuration value nixpkgs.config.packageOverrides. This is a function that takes the original contents of the Nix Packages collection as an argument, and returns a set of replacement packages: nixpkgs.config.packageOverrides = pkgs: { subversion = pkgs.subversion.override { stdenv = pkgs.addCoverageInstrumentation pkgs.stdenv; }; }; The original Subversion package, pkgs.subversion, contains a function, override, that allows the original dependencies of the package to be overriden. In this case, we pass a modified version of the standard build environment (stdenv) that automatically adds the flag --coverage to every invocation of the GNU C Compiler. This causes GCC to instrument object code to collect coverage data and write it to disk. Most C or C++-based packages can be instrumented in this way, including the Linux kernel. The test script automatically collects the coverage data from each machine in the virtual network at the conclusion of the test, and writes it to $out. The function makeReport then combines the coverage data from each virtual machine and uses the lcov tool [7] to make a set of HTML pages showing a coverage report and each source file decorated with the line coverage. For example, we have built a regression test for the Subversion example with coverage instrumentation on Apache, Subversion, Apr, Apr-util and the Linux kernel. Figure 8 shows a small part of the distributed coverage analysis report resulting from the test suite run. The line and function coverage statistics combine the coverage from each of the four machines in the network. One application of distributed coverage analysis is to determine code coverage of large systems, such as entire Linux distributions, on system-level tests (rather than unit tests at the level of individual packages). This is useful for system integrators, such as Linux distributors, as it reveals the extent to which test suites exercise system features. For instance, the full version of the coverage report in Figure 8 readily shows which kernel and Apache modules are executed by the tests, often at a very specific level: e.g., the ext2 filesystem does not get executed at all, while ext3 is used, except for its extended attributes feature. Continuous builds: The ability to build and execute a test with complex dependencies is very valuable for continuous integration. A continuous integration tool (e.g. CruiseControl) continuously checks out the latest source code of a project, builds it, runs tests, and produces a report [8]. A problem with the management of such tools is to ensure that all the dependencies of the build and the test are available on the continuous build system (e.g., a database server to test a web application). In the worst case, the administrator of the continuous build machines must install such dependencies manually. By contrast, the single command $ nix-build subversion.nix -A report causes Nix to build or download everything needed to produce coverage report for the Subversion web service test: the Linux kernel, QEMU, the C compiler, the C library, Apache, the coverage analysis tools, and so on. This automation makes it easy to stick such tests in a continuous build system. In fact, there is a Nix-based continuous build system, Hydra (http://hydra.nixos.org), that continuously checks out Nix expressions describing build tasks from a revision control systems, builds them, and makes the output available through a web interface. VI. E VALUATION We have created a number of tests1 using the virtual machine-based testing techniques described in this paper. These are primarily used as regression tests for NixOS: every time a NixOS developer commits a change to NixOS or Nixpkgs, our continuous integration system rebuilds the tests, if necessary. The tests are the following: • Several single-machine tests, e.g. the OpenSSH test, and a test for the KDE desktop environment that builds a NixOS machine and verifies that a user can successfully log into KDE and start several applications. • A two-machine test of an Apache-based Subversion service, which performs HTTP requests from a client machine to create repositories and user accounts on the server through the web interface, and executes Subversion commands to check out from and commit to repositories. It is built with coverage instrumentation to perform a distributed coverage analysis. • A four-machine test of Trac, a software project management service [9] involving a PostgreSQL database, an NFS file server, a web server and a client. • A four-machine test of a load-balancing front-end (reverse proxy) Apache server that sits in front of two back-end Apache servers, along with a client machine. It uses test primitives that simulate network outages to verify that the proxy continues to work correctly if one of the back-ends stops responding. • A three-machine variant of the Quake 3 test in Figure 5. • The four-machine Transmission test in Figure 6. • Several tests of the NixOS installation CD. An ISO9660 image of the installation CD is generated and used to automatically install NixOS on an empty virtual hard disk. The function that performs this test is parametrised with test script fragments that partition and format the hard disk. This allows many different installation scenarios (e.g., “XFS on top of LVM2 on top of RAID 5 with a separate /boot partition”) to be expressed concisely. The installation test is a distributed test, because the NixOS installation CD is not self-contained: during installation, it downloads sources and binaries for packages selected by the user from the Internet, mostly from the NixOS distribution server at http://nixos.org/. Thus, the test configuration contains a web server that simulates nixos.org by serving the required files. • A three-machine test of NFS file locking semantics in the Linux kernel, e.g., whether NFS locks are properly maintained across server crashes. (This test is slow 1 The outputs of these tests can be found at http://hydra.nixos.org/jobset/ nixos/trunk/jobstatus. The Nix expressions are at https://svn.nixos.org/repos/ nix/nixos/trunk/tests. Test empty openssh kde4 subversion trac proxy quake3 transmission installation nfs # VMs 1 1 1 2 4 4 3 4 2 3 Duration (s) 45.9 53.7 140.4 104.8 159.4 65.4 80.6 89.5 302.7 259.7 Memory (MiB) 166 267 433 329 756 477 528 457 751 358 Table I T EST RESOURCE CONSUMPTION because the NFS protocol requires a 90-second grace period after a server restart.) For a continuous test to be effective, it must be timely: the interval between the commit and the completion of the test must be reasonably short. Table I shows the execution time and memory consumption for the tests listed above, averaged over five runs. As a baseline, the test empty starts a single machine and shuts down immediately. The execution time is the elapsed wall time on an idle 4-core Intel Core i5 750 host system with 6 GiB of RAM running 64-bit NixOS. The memory consumption is the peak additional memory use compared to the idle system. (The host kernel caches were cleared before each test run by executing echo 3 > /proc/sys/vm/drop caches.) All VMs were configured with 384 MiB of RAM, though due to KVM’s para-virtualised “balloon” driver the VMs typically use less host memory than that. The host kernel was configured to use KVM’s same-page merging feature, which lets it replace identical copies of memory pages with a single copy, significantly reducing host memory usage. (See e.g. [10] for a description of this approach.) Table I shows that the tests are fast enough to execute from a continuous build system. Many optimisations are possible, however: for instance, VMs with identical kernels and initial ramdisks could be started from a shared, precomputed snapshot. VII. R ELATED WORK Most work on deployment of distributed systems takes place in the context of system administration research. Cfengine [11] maintains systems on the basis of declarative specifications of actions to be performed on each (class of) machine. Stork [12] is a package management system used to deploy virtual machines in the PlanetLab testbed. These and most other deployment tools have convergent models [13], meaning that due to statefulness, the actual configuration of a system after an upgrade may not match the intended configuration. By contrast, NixOS’s purely functional model ensures congruent behaviour: apart from mutable state, the system configuration always matches the specification. Virtualisation does not necessarily make deployment easier; apart from simplifying hardware management, it may make it harder, since without proper deployment tools, it simply leads to more machines to be managed [14]. MLN [15], a tool for managing large networks of VMs, has a declarative language to specify arbitrary network topologies. It does not manage the contents of VMs beyond a templating mechanism. Our VM testing approach currently is only appropriate for relatively small virtual networks. This is usually sufficient for regression testing of typical bugs, since they can generally be reproduced in a small configuration. It is not appropriate for scalability testing or network experiments involving thousands of nodes, since all VMs are executed in the same derivation and therefore on the same host. However, depending on the level of virtualisation required for a test, it is possible to use virtualisation techniques that scale to hundreds of nodes on a single machine [16]. There is a growing body of research on testing of distributed systems; see [17, Section 5.4] for an overview. However, deployment and management of test environments appear to be a somewhat neglected issue. An exception is Weevil [18], a tool for the deployment and execution of experiments in testbeds such as PlanetLab. We are not aware of tools to support the synthesis of VMs in automatic regression tests as part of the build processes of software packages. During unit testing, environmental dependencies such as databases are often simulated using test stubs or mock objects [19]. These are sometimes used due to the difficulty of having a “real” implementation of the simulated functionality. Generally, however, stubs and mocks allow more finegrained control over interactions than would be feasible with real implementations, e.g., when testing against I/O errors that are hard to trigger under real conditions. VIII. C ONCLUSION In this paper, we have shown a method for synthesizing virtual machines from declarative specifications to perform integration or system tests. This allows such tests to be easily automated, an essential property for regression testing. It enables developers to write integration tests for their software that would otherwise require a great deal of manual configuration, and would likely not be done at all. Acknowledgments: This research is supported by NWO-JACQUARD project 638.001.208, PDS: Pull Deployment of Services. We wish to thank the contributors to Nixpkgs and NixOS, in particular Nicolas Pierron, who implemented NixOS’s module system. We thank Armijn Hemel for his input on the Transmission example. R EFERENCES [1] E. Dolstra and A. Löh, “NixOS: A purely functional Linux distribution,” in ICFP 2008: 13th ACM SIGPLAN Intl. Conf. on Functional Programming. ACM, Sep. 2008. [2] E. Dolstra, E. Visser, and M. de Jonge, “Imposing a memory management discipline on software deployment,” in Proc. 26th Intl. Conf. on Software Engineering (ICSE 2004). IEEE Computer Society, May 2004, pp. 583–592. [3] S. I. Feldman, “Make—a program for maintaining computer programs,” Software—Practice and Experience, vol. 9, no. 4, pp. 255– 65, 1979. [4] W. R. Stevens and S. A. Rago, Advanced Programming in the UNIX Environment, 2nd ed. Addison-Wesley, Jun. 2005. [5] M. Grechanik, Q. Xie, and C. Fu, “Maintaining and evolving GUIdirected test scripts,” in ICSE ’09: 31st Intl. Conf. on Software Engineering. Los Alamitos, CA, USA: IEEE Computer Society, 2009, pp. 408–418. [6] Red Hat, Inc., Red Hat Enterprise Linux 5 Virtualization Guide, 4th ed. Red Hat, Inc., 2009. [7] P. Larson, N. Hinds, R. Ravindran, and H. Franke, “Improving the Linux Test Project with kernel code coverage analysis,” in Proceedings of the 2003 Ottawa Linux Symposium, Jul. 2003. [8] M. Fowler and M. Foemmel, “Continuous integration,” http:// www.martinfowler.com/articles/continuousIntegration.html, accessed 11 August 2005. [9] Edgewall Software, “Trac – integrated SCM & project management,” http://trac.edgewall.org/, 2009. [10] G. Miłoś, D. G. Murray, S. Hand, and M. A. Fetterman, “Satori: Enlightened page sharing,” in 2009 USENIX Annual Technical Conference. Berkeley, CA, USA: USENIX, 2009, pp. 1–15. [11] M. Burgess, “Cfengine: a site configuration engine,” Computing Systems, vol. 8, no. 3, 1995. [12] J. Cappos, S. Baker, J. Plichta, D. Nyugen, J. Hardies, M. Borgard, J. Johnston, and J. H. Hartman, “Stork: package management for distributed VM environments,” in LISA’07: Proceedings of the 21st conference on Large Installation System Administration Conference. Berkeley, CA, USA: USENIX, 2007, pp. 1–16. [13] S. Traugott and L. Brown, “Why order matters: Turing equivalence in automated systems administration,” in Proceedings of the 16th Systems Administration Conference (LISA ’02). USENIX, Nov. 2002, pp. 99–120. [14] D. Reimer, A. Thomas, G. Ammons, T. Mummert, B. Alpern, and V. Bala, “Opening black boxes: Using semantic information to combat virtual machine image sprawl,” in Proceedings of the Fourth ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. ACM, 2008, pp. 111–120. [15] K. Begnum, “Managing large networks of virtual machines,” in LISA’06: Proc. of the 20st Conference on Large Installation System Administration Conference. Berkeley, CA, USA: USENIX, 2006, pp. 205–214. [16] M. Hibler, R. Ricci, L. Stoller, J. Duerig, S. Guruprasad, T. Stack, K. Webb, and J. Lepreau, “Large-scale virtualization in the Emulab network testbed,” in 2008 USENIX Annual Technical Conference. Berkeley, CA, USA: USENIX, 2008, pp. 113–128. [17] M. J. Rutherford, A. Carzaniga, and A. L. Wolf, “Evaluating test suites and adequacy criteria using simulation-based models of distributed systems,” IEEE Transactions on Software Engineering, vol. 34, no. 4, pp. 452–470, 2008. [18] Y. Wang, M. J. Rutherford, A. Carzaniga, and A. L. Wolf, “Automating experimentation on distributed testbeds,” in Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. ACM, Nov. 2005, pp. 164–173. [19] S. Freeman, T. Mackinnon, N. Pryce, and J. Walnes, “Mock roles, not objects,” in Companion to the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2004), J. M. Vlissides and D. C. Schmidt, Eds. ACM, 2004, pp. 236–246.