Continuations and Transducer Composition
Olin Shivers

Matthew Might

Georgia Institute of Technology
shivers@cc.gatech.edu

Georgia Institute of Technology
mattm@cc.gatech.edu

Abstract
Compute

On-line transducers are an important class of computational agent;
we construct and compose together many software systems using
them, such as stream processors, layered network protocols, DSP
networks and graphics pipelines. We show an interesting use of
continuations, that, when taken in a CPS setting, exposes the control flow of these systems. This enables a CPS-based compiler to
optimise systems composed of these transducers, using only standard, known analyses and optimisations. Critically, the analysis
permits optimisation across the composition of these transducers,
allowing efficient construction of systems in a hierarchical way.

Input

Input

Output

Figure 1. Online transducers can be composed into pipelines.

example, g adds three to its argument, and f adds five, then the
body of h won’t consist of two nested function calls to f and g,
but rather will directly add eight to its argument. The technology
for this kind of optimisation is straightforward. A simple technique
that works well is just code inlining (if you come from the traditional compiler community), or β-reduction (if you come from the
functional-programming community).
Providing this kind of optimisation across composition is an extremely important thing to do, as it enables engineers to build systems in modular, maintainable ways. If an engineer can’t assume
his compositions will be optimised, then performance considerations may prevent him from composing his system together hierarchically. Instead, he must “hand inline” all the intermediate compositions, working at the lowest level.

Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors—Optimization; D.3.3 [Programming Languages]: Language Constructs and Features—Control structures,
coroutines
General Terms

Output

Compute

Languages, performance, design

Keywords program analysis, flow analysis, language design,
coroutines, fusion, stream processing, functional languages, lambda
calculus, continuation-passing style (CPS), continuations

1. Optimising across composition
Abstraction is fundamental to any engineering discipline as a tool
for managing complexity during synthesis; this is as true for bridge
builders as it is for those who construct software. One of the most
important tasks software researchers can do to support engineers is
to help support use of abstraction.
When a software engineer composes two smaller computational
entities to create a larger one, it is important that he know that
his system-construction tools, such as his compiler and linker, will
be able to optimise across this composition, so that he does not
pay a performance penalty for composing his system together in a
hierarchical, structured way.
As a simple example of this effect, consider procedural abstraction and composition. If we wish to make a new function h by hooking the output of function g up to the input of function f, we can do
this very easily with the “little circle” function—that is, we simply
write h = f ◦ g. The important point is that if an engineer does
this, he has reason to believe that the compiler, if it can statically
determine the definitions of f and g, will be able to “melt” his abstraction barriers, and optimise across his composition. So if, for

However, not all systems are composed together from procedures.
An important class of systems are constructed by composing online
transducers. An online transducer is a computational agent that
is not written as a procedure or invoked in a call/return pattern.
Rather, it is written as little chunk of code and state that performs
the following infinite loop: (1) get some input; (2) compute; (3) do
some output; (4) loop. An engineer can compose two transducers
together, making a new one, with the Unix “pipe” operator: g|f.
Figure 1 shows two transducers hooked together to produce a
new, composite one. Many important systems are composed from
primitive elements realised as online transducers, such as:

Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
PLDI’06 June 11–14, 2006, Ottawa, Ontario, Canada.
Copyright c 2006 ACM 1-59593-320-4/06/0006. . . $5.00.

Unfortunately, when the computational agents that we are composing are online transducers, compilers do not fare nearly so well.
If we take a transducer that adds three to its incoming stream, and
compose it with one that adds five, it is quite difficult for a compiler
to optimise across the two loops, reducing the composite transducer
to a simple add-eight loop.

2. Online transducers

• Stream processors
• Network-protocol stacks
• DSP and other media-processing systems
• Unix pipelines
• graphics-rendering pipelines

295

So the bad news is that, while designers of network protocol stacks can use elegant, layered architectures to explicate their
designs, implementors who care about efficiency—and networkprotocol implementors always do—have to throw out the modular,
layered approach, and write a system that is tightly integrated by
hand, producing a complex, arduously maintained mess.

(λ () (letrec ((lp (λ (sum)
(let ((sum (+ sum (get))))
(put sum)
(lp sum)))))
(lp 0)))
Now, a CPS-based compiler, when fed code like these examples,
will first convert this “direct-style” code into a CPS representation.
However, let’s use a non-standard representation, which we’ll call
“3CPS.” Instead of passing to a procedure one continuation representing the “rest of the computation,” we’ll pass in three:
• k will represent the rest of this pipeline stage’s computation;
• u will represent all the pending computation upstream from this
pipeline stage; and
• d will represent all the pending computation downstream from
this pipeline stage.
We augment the denotational semantics of our CPS representation
by adding new “UpCont” and “DownCont” domains. Here are the
functionalities:
c ∈ CmdCont = UpCont → DownCont → Ans
k ∈ ExpCont = Value → UpCont → DownCont → Ans
u ∈ UpCont = DownCont → Ans
d ∈ DownCont = Value → UpCont → Ans
x ∈ Value

3. CPS
Fortunately, the functional-programming community has a powerful representational tool at its disposal: continuation-passing style.
CPS is a low-level, λ-calculus intermediate representation in which
function calls never return, and, hence, are never nested. That is, in
CPS, we never write f (g(x)), since the g(x) call will never return
to the waiting f. Instead, one always passes to every function an
extra argument, the continuation, which is itself a function representing the entire computation to be performed after the original
function has computed its value. The original function applies its
continuation to the value it computes. So we can view calling a
continuation as a goto that passes values. For example, while in
Scheme we might write
(- a (* b c))
in CPS, we write

A “command continuation” c simply represents the state of a single,
suspended stage in a pipeline. In order to invoke it, we must pass
it its “connections,” that is, its upstream and downstream continuations, by calling c u d. Once passed these values, it executes the
pipeline stage, running the program forwards to completion, which
is signified by the “→ Ans” function range.
An “expression continuation” k represents the evaluation context for some expression. It is invoked by calling k x u d, passing to
k the value x produced by the expression, together with the current
pipeline stage’s upstream and downstream continuations, u and d.
Then k runs the current pipeline stage forwards to completion.
A “downstream continuation” d represents the computation of
the downstream pipeline stages. To transfer control downstream,
we call d x u, passing to d the data value x we wish to send downstream and an upstream continuation u. This u value represents the
suspended state of the entire computation that is upstream from d—
that is, it represents our current pipeline stage, plus everything that
is upstream from us. Since, from d’s point of view, we are its upstream computation, we must wrap ourselves up as an UpCont in
order to pass our state off to d.
Now, most primitive computations don’t shift up or down the
pipeline at all, so most of the time, the u and d values are simply
passed around unused and unaltered. In fact, the CPS-conversion
equations for our triple-continuation target language are exactly the
same as the classic single-continuation conversion! This is because
we placed the up and down continuations at the end of the functionalities, where they get η-converted away, being unreferenced.
If abstractions, applications, variable references, and constant
evaluations don’t touch the extra continuations, then who does? It’s
the special get and put functions, which cause shifts up and down
the pipeline. These are the primitives that actually use the extra
continuations, and, as such, we can’t define them in direct style,
where the continuations are “hidden,” but must define them directly
in our CPS language. The semantics of these two functions are:

(* b c (λ (temp) (- a temp k)))
The λ expression is *’s continuation; k is the continuation representing the entire context in which the expression sits. In brief, a
continuation is a functional representation of an expression’s context, and we may view it as “the rest of the computation.”
The use of CPS as a compiler representation has a long history in the design of compilers for functional programming languages [Rabbit, Orbit, TC, CwC]. For our purposes, let’s note for
now a very important property of the CPS representation: in CPS,
all transfers of control are represented as function calls. Function call, function return, loops, conditionals, sequencing, exception invocation and other non-local control transfers—these are all
represented as function call. So program analyses that answer the
question “which call sites call which λ-expressions” give us all the
control-flow information about the program [0CFA, k-CFA].

4. Pipelines and 3CPS
Let’s consider a world of online transducers that get arranged into
pipelines of data-flow networks. We’d like to write these transducers in a general-purpose programming language, such as Scheme,
as infinite loops that employ special “get” and “put” operators to
synchronously pass data (and transfer control) up and down the
pipeline. So a simple transducer that does nothing but repeatedly
send five downstream is
(λ () (letrec ((lp (λ () (put 5) (lp))))
(lp)))
and an only slightly less trivial transducer that doubles its input and
passes the result downstream is
(λ () (letrec ((lp (λ () (put (* 2 (get)))
(lp))))
(lp)))

get x k u d
put x k u d

=
=

u (λ x u0 . k x u0 d)
d x (λ d0 . k unit u d0 )

The get function takes an (ignored) argument x, and the standard
three continuations, k, u, and d. We must transfer control to the upstream continuation and run there so that it may pass us a value, so
get calls u. Now, the functionality of a u is DownCont → Ans, so

Transducers may have state, which we represent in the standard
manner employed in functional programming languages, as parameters in our tail-recursive loops. A simple integrator or summing
transducer is written

296

3CPS representation:

C

compose/pull c1 c2 = λ u d . c2 (λ d0 . c1 u d0 ) d

C1

C2

Compute

Compute

U

The result of composing c1 and c2 is a command continuation,
which must be invoked by passing to it the upstream and downstream connections u and d. So the composed continuation is written (λ u d . . . .). When this transducer runs, it simply transfers control to c2 . When we call c2 , we must pass it its connections. The
downstream connection for c2 is simply d, the downstream continuation for the whole transducer. But what should we pass to c2
as its upstream continuation? An upstream continuation has functionality DownCont → Ans, so it should be something of the form
(λ d0 . . . .). When c2 performs a get call, and attempts to transfer
control upstream, it will invoke this continuation, passing it d0 , a
down continuation that will represent c2 ’s suspended state. At this
point, we should start executing transducer c1 . When c1 wishes to
go upstream, it should call u; when it wishes to go back downstream
to c1 , it should call d0 .
Here are the three equations for put, get, and compose/pull,
all grouped together:

D
Input

Output

Input

Output

Figure 2. The compose/pull operator initially transfers control
to C2, “pulling” values downstream on demand.

we must pass u a DownCont. DownCont’s, in turn, have functionality Value → UpCont → Ans, so the value we pass u is written
(λ x u0 . . . .). Remember that this down continuation represents
the state of our own pipeline stage, plus everything downstream
from us—that is, everything downstream from u. When u has a
value ready to send us, it will call this down continuation, passing it the value x, and a new up continuation u0 for us to use the
next time we wish to perform a get. When this happens, our current get is complete—it should return value x. So we go back to
running our current pipeline stage by calling k, passing it x, the
new up continuation u0 , and the same down continuation we had
before the get call, d. Thus, the down continuation we pass to u is
(λ x u0 . k x u0 d).
We can walk through a definition of put in the same way,
guided by the functionalities or types of the different values we
manipulate. To put a value x, we must pass it to the downstream
pipeline stage, which we do by applying d to x. However, when
we transfer control downstream, we must also pass d an upstream
continuation that packages up all the computation upstream from
d—that is, the state of our own pipeline stage, and everything
upstream from us. An UpCont has functionality DownCont → Ans,
so it is written (λ d0 . . . .). When d wishes to transfer control
back to us, it will call this up continuation, passing it a new down
continuation d0 for us to use the next time we wish to perform a put.
That is, d0 will package up the state of the downstream continuation
when it suspends itself and transfers control back up to us. At this
point, our current put is complete, and we are ready to continue
running our current pipeline stage. So we should transfer control to
our current pipeline stage’s continuation k, passing it the unit value
produced by a put, our original, unchanged up continuation u, and
the new down continuation d0 . Hence, the up continuation we pass
to d is (λ d0 . k unit u d0 ).

get x k u d
put x k u d
compose/pull c1 c2

=
=
=

u (λ x u0 . k x u0 d)
d x (λ d0 . k unit u d0 )
λ u d . c2 (λ d0 . c1 u d0 ) d

With three small equations we have completely specified our transducer primitives and the composition operator. Note what using
CPS has done for us: it has made all the “plumbing,” that is, all
the interconnection structure, explicit, in terms of the d’s and u’s.
Even better, it has done so in terms of the λ-calculus, a representational framework for which we have decades of experience designing analyses, transforms, and optimisations.
Before we proceed, note that we could also define a related operator, compose/push, that initially transfered control to c1 , eagerly
“pushing” values downstream. The definition of compose/push
is interesting; it has been moved to an endnote, so that you may
have the fun of working out its definition for yourself {Note
compose/push}.

6. Working through a simple example
Let’s work through a simple example by hand. Consider the two
transducers
(λ () (letrec ((lp1 (λ () (put 5) (lp1))))
(lp1)))
(λ () (letrec ((lp2 (λ () (put (* 2 (get)))
(lp2))))
(lp2)))

5. Composing transducers in 3CPS
Now that we can write down transducers in our 3CPS representation, let’s consider how we can compose them together. Let’s define
a new primitive function, (compose/pull c1 c2 ), that takes two
transducers, represented as command continuations, and composes
them together to produce a new transducer c, with the output of c1
hooked up to the input of c2 . More precisely, when we invoke c,
passing it its connections u and d, we wish to transfer control to
the downstream half of the composition, c2 , running it connected
downstream to d. If it transfers control upstream, we wish to jump
into c1 , and run it connected upstream to u, and downstream back
to c2 (see Figure 2). Thus this operator creates a “lazy” composed
transducer, that “pulls” values downstream as needed.
Now, let’s write down the semantics of compose/pull—
which, as should be obvious from our definitions of get and put,
should also serve to give us a direct definition of the operator in our

We would like it to be the case that if we compose them together
with compose/pull, compiler analysis could reduce the pair of
loops to a single “put ten” loop.
First, we CPS-convert our two loops, and inline the put and get
operators, giving us
(λ (k1 u1 d1)
; Put-5 transducer
(letrec ((lp1 (λ (k1a u1a d1a)
(d1a 5 (λ (d1b)
(lp1 k1a u1a
d1b))))))
(lp1 k1 u1 d1)))

297

and apply it to our two transducer loops—that is, we drop the code
for the put-five loop where c1 appears, and the code for the doubler
loop where c2 appears. The resulting expression is

(λ (k2 u2 d2)
; Doubler
(letrec ((lp2 (λ (k2a u2a d2a)
(u2a (λ (x u2b)
(d2a (* 2 x)
(λ (d2b)
(lp2 k2a
u2b
d2b))))))))
(lp2 k2 u2 d2)))

(λ (k u d)
(letrec ((lp2 (λ (u2a d2a)
(u2a (λ (x u2b)
(d2a (* 2 x)
(λ (d2b)
(lp2 u2b
d2b))))))))
(lp2 (λ (d’)
(letrec ((lp1 (λ (d1a) (d1a 5 lp1))))
(lp1 d’)))
d)))

We’ve cheated on CPS-converting the (* 2 x) operation for compactness’ sake. Everywhere we saw put appear in our transducer
loops, we just dropped in its equivalent CPS combinator
(λ (x k u d) (d x (λ (d’) (k u d’))))
This is precisely what we wrote for put’s semantics, transliterated
into our Lisp-syntax CPS language, decurried, and with the convention that unit arguments are eliminated from calls and parameter
lists, so that put’s continuation k is not explicitly passed the trivial
unit value that put returns. We likewise replaced occurrences of
get with its defining combinator.
We can simplify this code a bit. First, notice that k1, k2, u1,
and u1a are never called. It’s not surprising that k1 and k2 are
never called, since both of our loops are infinite ones that never
return. Similarly, the put-five loop never goes upstream, so u1 and
u1a never get used. There is a standard analysis for spotting these
situations, “useless-variable analysis” [UVE], and it will remove
these variables from the calls and parameter lists in the code,
leaving us with

Compilers usually hoist letrec’s out; let’s do that to the lp1 binding,
giving us
(λ (k u d)
(letrec ((lp2 (λ (u2a d2a)
(u2a (λ (x u2b)
(d2a (* 2 x)
(λ (d2b)
(lp2 u2b
d2b)))))))
(lp1 (λ (d1a) (d1a 5 lp1))))
(lp2 (λ (d’) (lp1 d’)) d)))
Now we can η-reduce (λ (d’) (lp1 d’)) to lp1, reducing the
code for our original pair of loops to just

(λ (k1 u1 d1)
; Put-5 transducer
(letrec ((lp1 (λ (d1a)
(d1a 5
(λ (d1b) (lp1 d1b))))))
(lp1 d1)))

(λ (k u d)
(letrec ((lp2 (λ (u2a d2a)
(u2a (λ (x u2b)
(d2a (* 2 x)
(λ (d2b)
(lp2 u2b
d2b)))))))
(lp1 (λ (d1a) (d1a 5 lp1))))
(lp2 lp1 d)))

(λ (k1 u2 d2)
; Doubler
(letrec ((lp2 (λ (u2a d2a)
(u2a (λ (x u2b)
(d2a (* 2 x)
(λ (d2b)
(lp2 u2b
d2b))))))))
(lp2 u2 d2)))

This is already pretty good, but CPS-based control- and data-flow
analysis can take us the rest of the way to collapsing our double
loop to a single loop. Tracing through the flow of control reveals
that u2a and u2b are always bound to lp1; the automatic analysis
that reveals this is “super-β” [k-CFA, DCFA]. We can eliminate
these two variables, and replace the call to u2a with lp1.

We can “promote” lp1 from a user-procedure to an UpCont by
simply η-reducing (λ (d1b) (lp1 d1b)) to lp1. This leaves us
with

(λ (k u d)
(letrec ((lp2 (λ (d2a)
(lp1 (λ (x)
(d2a (* 2 x)
(λ (d2b)
(lp2 d2b)))))))
(lp1 (λ (d1a) (d1a 5))))
(lp2 d)))

(λ (k1 u1 d1)
; Put-5 transducer
(letrec ((lp1 (λ (d1a) (d1a 5 lp1))))
(lp1 d1)))
(λ (k1 u2 d2)
; Doubler
(letrec ((lp2 (λ (u2a d2a)
(u2a (λ (x u2b)
(d2a (* 2 x)
(λ (d2b)
(lp2 u2b
d2b))))))))
(lp2 u2 d2)))

Now we can η-reduce (λ (d2b) (lp2 d2b)) to lp2, and in-line
the call to lp1, which triggers off two more rounds of β-reduction,
settling out with
(λ (k u d)
(letrec ((lp2 (λ (d2a) (d2a (* 2 5) lp2))))
(lp2 d)))

Now let’s compose these two transducers together. This is easily
done: we take the code for compose/pull

This is the put-ten loop we wanted. Note that every step was a
standard transformation, either an application of the λ-calculus’s
simple η and β reductions, or a more sophisticated transform based
on known data-flow analyses.

(λ (c1 c2)
(λ (k u d) (c2 k
(λ (d’) (c1 k u d’))
d)))

298

8. Removing constraints, adding types

The moral is that while many of these CPS-based control- and
data-flow analyses were originally designed to operate on the kind
of control-flow we see in traditional intra-procedural settings—
loops and conditionals—in CPS, all transfers of control are represented with the same control mechanism: tail-recursive procedure
call. So an analysis that works in a CPS framework captures everything, including the more exotic control transfers being used in our
transducer constructs.

We can remove the linear-pipeline constraint on the connection
topology by the simple expedient of bringing the hidden u and
d cross-connecting resources out into the open, where they can
be directly manipulated by the programmer. This is a little more
verbose than the neat, clean manner in which these continuations
were invisibly passed around behind the scenes, but it allows us to
name and use multiple communications resources.
We can also take this opportunity to start introducing types into
our model. Let’s introduce a new type constructor, using an ML
syntax:
(α, β) Channel
A value c of type (α, β) Channel is a coroutine communications
resource. We can send an α value over it, and receive a β value
back. The primitive operator for doing so is switch:

7. Some problems
As pretty as this CPS-based explication of transducer structure
may be, it is not without limitations. The biggest issue is that
the linear topology of the data-flow pipeline is baked into our
triple-continuation CPS transform, which is a fixed part of the
compiler. If we’d like to have, say, a branching topology, we are
out of luck. For example, while the Sutherland-Hodgman polygonclipping algorithm has a data flow that perfectly fits the linear
pipeline model we’ve developed, the slight variant of this algorithm
that splits a polygon by a plane into two resulting polygons has a
natural tree structure to its data flow. (We’ll return to this example,
later.)
Note that when we consider non-linear communication topologies, we are still restricting ourselves to a serial computational
model, with a single locus of control that moves between transducers when values are sent or requested. We are not tackling true
concurrency, or attempting to provide CSP-like features with our
get and put operators.
In a related issue, the CPS transform is a global transform that
rewrites the entire program. So we can’t have a modular way of
defining some kind of communications resource or topology that is
restricted to some limited subset of the code.
From one perspective, what we’ve done is taken loop state—
control and otherwise—and packaged it up inside λ expressions,
where λ-calculus reasoning engines can trace out where different
pieces of state go. Pushing the transducer state down into the u
and d continuations puts it below the level at which the user code
can access it: these continuations are only manipulated by the get
and put primitives, who do so in a linear fashion. This also helps
out our analysers, since the critical bits of state are separated out,
instead of dropped into the formless sea of the runtime store, where
it is very difficult to keep items distinct.
It turns out that there is a technology that does provide this kind
of mechanism in a modular fashion: monads [Moggi, Essence].
Monads have exactly the same property as CPS of “pushing” certain structures below the surface of most of the code, only allowing certain related primitives to “dip down” and access them in restricted ways (and this is not a surprise, since continuation-passing
and monads are related). Hudak and others [MADT] have shown
how to use monads to provide monadic, linear data types—which
certainly describes the u’s and d’s we’ve been passing around. For
all these reasons, monads are popular in the pure-functional, lazylanguage community, receiving quite a bit of use in Haskell to
model things like I/O. Unfortunately, monads have problems for
general use in less-exotic, applicative-order languages such as SML
or Scheme. The principal issue is that they do not, in general, compose; as a result, we have been unable to figure out a way to use
them as a general tool for providing transducer mechanisms. Even
in the Haskell community, monads require programmers to write in
a particular style which would prove onerous to programmers who
work in applicative-order languages.
A final issue to consider is: can we type our transducers and
their primitives? If so, can we do it in standard type systems, e.g.,
SML’s?
Our task in the next section is to address all of these issues.

switch : α × (α, β) Channel → β × (α, β) Channel
Note that switch returns not only the β value returned from the
coroutine at the other end of the channel, but also a new channel for
us to use the next time we wish to communicate with that coroutine.
Channel values are “affine,” that is, they may not be used more than
once.
Now we can write some simple examples in an ML syntax that
parallel our previous, linear pipeline examples. We can define put
and get in terms of switch:
type γ DownChannel = (γ, unit) Channel
type γ UpChannel = (unit, γ) Channel
fun put(x,c) = #2(switch(x,c))
fun get c = switch((),c)
Our put-five example is just the simple loop
fun put_five c = put_five(put(5,c))
and our integrator is
fun integ(u,d) =
let fun lp(s,u,d) =
let val (x,u’) = get u
val s = s + x
in lp(s, u’, put(s,d))
end
in lp(0,u,d)
However, we could instead define a general-purpose coroutine generator that folds an α × β → β function across an α stream:
fun streamfold f zero (u,d) =
let fun lp(s,u,d) =
let val (x,u’) = get u
val s = f(x,s)
in lp(s, u’, put(s,d))
end
in lp(zero,u,d)
end
We can then define our integrator as:
fun integ(u,d) = streamfold op+ 0 (u,d)
The surprising thing about our channels is that, if we have a
callcc primitive to give us access to continuations, we can define
and type the switch primitive with no further extensions to SML’s
type system or primitives! The key observation is that if we have an
(α, β) Channel, then some other coroutine must be suspended in a
switch call; when we switch off to this coroutine by sending some
α value over our channel, this suspended coroutine will resume.

299

signature FUNCTIONAL_COROUTINES = sig
infix 4 ‘>| >|‘
infix 4 ‘|> |>‘
infix 5 ‘>> >>‘
type α cont

val >>‘ : (α,β,γ) Transducer * (β,δ,γ) Transducer
-> (α,δ,γ) Transducer

(* Can now write pipelines
*)
(* src |>‘ t1 >>‘ t3 >|‘ sink *)

(* Infix source+sink, push & pull. *)
val ‘>| : (α,γ) Source * (α,γ) Sink -> γ
val >|‘ : (α,γ) Source * (α,γ) Sink -> γ

(* Continuations. *)
(* Infix source+transducer, push & pull. *)
val ‘|> : (α,γ) Source *
(α,β,γ) Transducer -> (β,γ) Source
val |>‘ : (α,γ) Source *
(α,β,γ) Transducer -> (β,γ) Source

datatype (α,β) Channel =
Chan of (α * (β,α) Channel) cont
type α DownChannel = (α,unit) Channel
type α UpChannel
= (unit,α) Channel

(* Generic source creation. *)
val listsource : α list -> (α option,γ) Source
val put_n_xs : int -> α -> (α option,γ) Source

val switch : α * (α,β) Channel ->
β * (α,β) Channel
val get : α UpChannel -> (α * α UpChannel)
val put : α * α DownChannel -> α DownChannel

(* Generic sinks. *)
val firstsink : (α option, α) Sink
val listsink : (α option, α list) Sink

(* Transducers, sources and sinks. *)
type (α,β,γ) Transducer =
(α UpChannel * β DownChannel) -> γ

(* Generic transducer creation. *)
val stream_fold : (α * β -> β) -> β ->
(α,β,γ) Transducer
val stream_app : (α -> β) -> (α,β,γ) Transducer
end

type (α,γ) Source = α DownChannel -> γ
type (α,γ) Sink
= α UpChannel -> γ
(* Infix transducer+transducer, push & pull *)
val ‘>> : (α,β,γ) Transducer * (β,δ,γ) Transducer
-> (α,δ,γ) Transducer

Figure 3. The signature of a simple “functional coroutine” SML module.

So a channel must be the continuation for the other coroutine’s suspended switch call. But that suspended call is expecting eventually
to return a pair of values: an α value and a (β, α) Channel value, because, while we are using an (α, β) Channel to communicate with
it, it is using a mirror-image (β, α) Channel channel to communicate with us. Putting this all together gives us the following circular
definition:

fun op >>‘(t1,t2) (u,d) =
callcc
(fn k =>
t1(u, #2(callcc
(fn k’ =>
throw k (t2(Chan k’, d))))))
Here is the definition of stream_app, which creates a stateless
transducer applying f to each item in the incoming stream:

datatype (α,β) Channel =
Chan of (α * (β,α) Channel) cont

fun stream_app f =
let fun lp(u, d) = let val (x,u’) = get u
val d’ = put(f x, d)
in lp(u’,d’)
end
in lp
end

Encasing the circular definition in a datatype constructor allows
us to push the declaration through the SML type system. {Note
ThanksDana} The definition of switch follows from the type of
Channel:

We can, for example, use stream_app to define a function that
produces an “add-n” transducer given some integer n:

fun switch(x, Chan k) =
callcc (fn k’ => throw k (x, Chan k’))

fun addn n = stream_app (fn m => n + m)
Note that we have defined our coroutine-composition combinators
as infix operators; this means we can simply construct pipelines
with expressions such as

With this definition for channels and switch, we can proceed
to build a whole algebra of sources, sinks, transducers and their
combinators, this time in our more general, typed setting. Figure 3
shows the types of an example module we have defined. The definitions are straightforward. For the infix transducer combinators,
we use the convention of placing a ‘ tick on the side that drives
the composite transducer: on the left for a “push” operator, and on
the right for a “pull” operator. For example, here is our old friend
compose/pull:

lex |>‘ parse >>‘ translate ‘>| assemble
Writing functions to act as transducers, sources or sinks is straightforward. The definition of the combinators involves direct use
of the channel continuations, and so is a bit more subtle. Besides the transducer/transducer composition operator we’ve al-

300

• Replace a λ-bound variable with its bound term when

ready seen, here are the definitions of the pull combinators that
do source/transducer and source/sink composition:

the variable has zero or one reference,
the bound term is a variable or constant, or
the bound term is itself a λ term with no free variables.

(* source/sink composition, driven by sink. *)
fun op >|‘(source,sink) =
callcc
(fn k =>
source(#2(callcc
(fn u =>
throw k
(sink (Chan u))))))

• Inline a letrec-bound λ term inside other letrec-bound λ terms

whenever it flows to only one reference.
• Inline a letrec-bound λ term in the body of the letrec term if it

flows to only one reference or if it has no free variables.
We can define our general-topology channel mechanism in the
source language handled by our compiler. The compiler has successfully fused simple transducer compositions, such as our plusfive/doubler example, into single loops. As another example, if the
compiler is given a program that composes (1) a source that sends
the elements of a list downstream, with (2) a transducer that doubles its values and (3) a sink that prints out every value it receives,
it is able to fuse these three coroutines into a single, tight loop that
iterates over the list and prints out each element doubled.
It’s worth noting that our compiler has no special knowledge
of coroutines. It simply performs generally useful optimisations on
λ-expressions represented in CPS form. The leverage we have on
the problem comes from our choice of representation—which is
the main message of this paper. (We should also note one other
important “lever:” ∆CFA. A critical optimisation for fusing transducer loops, super-β, depends on ∆CFA analysis, which has only
recently been given a solid foundation [DCFA]. However, ∆CFA
and super-β are also generally useful tools, not particularly targeted
at coroutines or continuations.)
For a third example, we turn to a more exotic possibility. Consider a put-fringe generator that walks a binary tree, pausing at
each leaf node to send it downstream. What happens if we compose this source with a transducer that doubles the elements that
flow through it, and a sink that prints the values it receives? We
have now left the domain of simple iteration: our fringe generator
uses the run-time procedure stack as it recursively explores into the
tree. Each time the fringe generator pauses to send a leaf element
downstream, the resumption continuation packages up the entire
stack. However, the downstream computation doesn’t require an
entire stack as part of its suspension state. So the compiler is able
to fuse these two computations together, reducing them to the simple recursive summation function one would write by hand.
The source code for this composition is shown in Figure 4.
Our experimental compiler only provides one-variable λ expressions; multiple arguments are packaged up as cons cells. The compiler has no special knowledge of cons cells, or option types; these
are all Church-encoded as λ terms. (The compiler is good enough
with λ terms to sort all this out.) For reasons of space, the figure elides all the s-expression definitions of the standard coroutine elements we’ve defined in earlier sections. The ss-pull and
st-pull operators are the “pull” versions of the source/sink and
source/transformer combinators.
Once again, it’s worth repeating that the compiler has no special knowledge about fusing different kinds of control structure. It
simply reasons at the underlying, universal layer of the CPS form.
When the computation can be reduced to a purely iterative one, that
has no dependency on the stack, this is done. If, however, the composite computation fundamentally requires a call stack, then this
residual continuation structure becomes the central control structure of the fused computation, and the rest of the computation coalesces around this core. This all simply falls out of our CPS representation.
A CPS-based representation can’t solve all problems, however.
If we compose two computations that both require stacks, then even
after optimisation, the coroutine management will necessarily em-

(* S is an α source; T an α-to-β transducer.
** Return a β source.
*)
fun op |>‘(s,t) d =
callcc
(fn k =>
s(#2(callcc
(fn k’ =>
throw k
(t(Chan k’, d))))))
As a final observation before leaving these operators, note
that several of the transducer functions (e.g., stream_app) never
return—that is, they never make use of their implicit continuation. With this in mind, we could go back through the library and
change the types and definitions so that such operators were passed
one of their channel parameters encoded as the implicit continuation. This seems a bit more parsimonious of mechanism, though it
complicates things to the degree of having two distinct encodings
of channels (implicit continuations and explicit channels).

9. Linearity
Recall that we require programmers to use a channel no more than
once. Unfortunately, SML won’t give the programmer any help at
all in checking to see that his program doesn’t accidentally violate
this rule. However, there is a whole body of type systems, based
on linear logic, that provide this kind of service [Type1, LT]. So it
might be a useful extension to the machinery we’ve described so
far to add a linear-logic type system to the language to help the
programmer with checking this requirement.

10. Experience
We walked through a detailed example in the 3CPS, linear-pipeline
framework, working through a plausible scenario showing that
standard CPS-based compiler technology is capable of “fusing”
coroutines together, giving us the critical ability to optimise across
composition that we want.
The question remains: can we make this work in practice?
Preliminary results indicate that we can. We have constructed a
simple CPS-based compiler front end in Haskell for processing
programs written in a simple Scheme language. The compiler CPSconverts the source terms, then applies the kind of transformations
we employed in Section 6:
• Super-β flow-based inlining of variables and λ terms
• Local β/η reductions
• Useless-variable elimination
• Dead-code elimination

Not all transforms revealed by the data-flow analyses as legal
are desireable; some inlining transforms might lead to excessive
code expansion. The policy used by our test-bed compiler is:

301

(letrec (...
(doubler (λ (u:d)
(let ((get-u (get (car u:d))))
(match-option (car get-u)
(λ (n) (doubler (cons (cdr get-u)
(put (cons (SOME (* 2 n))
(cdr u:d))))))
(λ () (doubler (cons (cdr get-u)
(put (cons (NONE) (cdr u:d))))))))))
(display-sink (λ (u)
(let ((get-u (get u)))
(match-option (car get-u)
(λ (elt) (begin (display elt)
(display-sink (cdr get-u))))
(λ () #f)))))
(put-fringe (λ (tree)
(λ (d) (put (cons (NONE)
((put-fringe/aux tree) d))))))
(put-fringe/aux (λ (tree)
(λ (d) (match-tree tree
(λ (l r) ((put-fringe/aux l)
((put-fringe/aux r) d)))
(λ (elt) (put (cons (SOME elt) d)))))))
...)
(let ((tree (NODE (LEAF 3) (LEAF 2))))
(ss-pull (st-pull (put-fringe tree)
doubler)
display-sink)))
Figure 4. Composing a tree-fringe generator with a doubling transducer and a printing consumer. A CPS-based compiler is capable of fusing
the three continuation-capturing coroutines into a single recursive tree walk.

ploy “heavyweight” stack-capturing continuations at the coroutine
transfer points. This is unavoidable. All we are offering is the possibility of avoiding this cost when it is possible to use cheaper control
structures.

To see this, we just need to permute the arguments to foldr a bit,
getting the variant
(* α list → α abslist *)
fun foldr’ []
(kons,knil) = knil
| foldr’ (x::xs) (kons,knil) =
kons(x, foldr’ xs (kons,knil))

11. Related work

(* α abslist → α list *)
fun build abslis = abslis (op ::, [])

11.1 Deforestation, fold/build and other catamorphisms
There is a wealth of work related to transducer fusion, in both the
systems and programming-language communities. Because pure
functional, lazy languages such as Haskell provide the ability to
directly manipulate unbounded sequences of data, there has been
a fair amount of work on optimising compositions of sequence
operators, such as “deforestation,” “foldr/build” pairs, and various
forms of intimidatingly-named morphisms (hylo, cata, ana, apo,
etc.) [Deforest, Cata].
The core idea behind this approach is to find places where a list
is constructed as an intermediate carrier between the producer and
consumer of some sequence, and to somehow rearrange or reduce
the code to connect the producer directly to the consumer. Roughly
speaking, this is usually managed by using some sort of Church
encoding to represent sequences. For example, we can view the
foldr and build functions [Fold] as converters between actual
lists and Church-encoded “abstract lists,” where an abstract list is a
function from a “cons” and “nil” pair of abstract constructors to the
result of assembling the elements of the list with these constructors:

The key property of these converters, then, is that (in the absence
of effects)
(foldr’ ◦ build) abslis = abslis.
Thus if a list is produced by passing an abstract list to build, and
this list is then consumed by a foldr’, we can eliminate the list
itself, along with the redundant conversions.
We can view “unfoldr/destroy” fusion in a similar way [Unfold].
Suppose we represent a sequence of α elements by a (next, s)
“generator” pair:
α generator = ∃β . (β → (α × β) Option) × β.
Applying next to the generator state s produces either the next element of the sequence and the next state value, or the NONE option,
meaning the end of the sequence. Then we can view unfoldr as
a function that converts a generator into a list; and destroy as a
function that converts a generator consumer into a list consumer,
that is, from something of type α generator → γ into something
of type α list → γ:

α abslist = ∀β.((α × β → β) × β) → β.
302

(* α generator → α list *)
fun unfoldr (next,s) =
case (next s) of
NONE
=> []
| SOME(elt,s’) => elt :: unfoldr(next,s’)

reported on a method for fusing together what he terms “microprotocols” in a data-flow network [Fusion]. His fundamental technology is essentially a constrained partial evaluation. In order to make
this work, he restricts the programmer to a simplified language for
writing the microprotocol components. This is necessary, as partial
evaluation is such an aggressive optimisation technology that it has
potential to “run away” and blow up on some programs. Constraining the language helps to constrain the analysis and transform.
The ESP language for writing embedded controllers [ESP] is
another example of exploiting a restricted framework, this time
CSP-like processes that communicate over synchronous channels,
where the processes, channels and memory allocation are all statically fixed. What is particularly of note about ESP is its use of a
SAT solver to check the system for errors, a powerful debugging
aid.
In contrast to these restricted systems, our approach can be
applied to a full-strength, general-purpose programming language,
such as SML. For non-functional programmers, the important thing
to note here is that these are just simple loops, written the way
functional programmers naturally write them. A Scheme or ML
programmer will write the integrator example we gave as quickly
and easily as a C programmer would write

(* (α generator → γ) → (α list → γ) *)
fun destroy gconsumer =
fn xs => let fun next []
= NONE
| next (x::xs) = SOME(x,xs)
in gconsumer (next, xs)
end
Again, the key optimisation is that (ignoring effects)
List consumer

List

z
}|
{ z
}|
{
(destroy gconsumer) (unfoldr (next,s)) =
gconsumer (next,s) .
| {z } | {z }
generator consumer

generator

Thus, we can again eliminate the intermediate list passed from
unfoldr to destroy, and instead pass the (next,s) generator
directly to the consumer, where it can likely then be inlined into a
loop.
In both cases, the idea is that if the programmer is willing
to write his list producers and consumers by applying these converters to abstract sequence producers and consumers, then producer/consumer compositions can be directly connected.
This is a very different approach, exploiting a high-level view
of stream computation. This is appealingly elegant, and works
especially well in a pure-functional setting that permits unbounded
lists and enables equational reasoning about the operators. In a
call-by-value language that permits effects, the transformation is
no longer generally safe. For example, the transforms alter the
order of effects: in the original code, the elements are all produced
and assembled into a list, before the consumer executes; the fused
code, in contrast, interleaves producing and consuming elements.
Control, I/O and memory effects all can alter the meaning of the
program under such shuffling.
In contrast, our approach, based on analysis of the CPS representation, is very low level—the compiler maps the program over
to a low-level representation and then examines it, looking for opportunities revealed by static analysis for optimisation. We have no
problem with effects, since, in essence, we focus purely on control,
just optimising the control flow that is encoded by the higher-order
language.
As one example, it’s not clear how we could use high-level
fusion techniques to encode an example such as the SutherlandHodgman polygon clipping algorithm, which has a “push” control paradigm, and permits a clipping stage to send 0, 1 or 2
points downstream in response to receiving a single point of input.
Even more problematic is the variant of this algorithm that splits a
polygon—in this version, each transducer has two distinct downstream connections, producing a tree-like topology of transducers.
We can represent these algorithms quite naturally using the model
we’ve developed here; SML code rendering them as transducers is
shown in an appendix.
It’s hard to say much about the relative power of these two
approaches, one, high level, the other, low level, until we’ve gained
more experience with the system.

s = 0;
while(1) {
s += get();
put(s);
}
That is, we’ve made no concessions to the pipeline or data-flow
framework. It’s just simple, straightforward code, using the full
features of the language.
Also, note that we have not bounded the set of analyses and
optimisations that could be applied to our compositions; our central message is that CPS and continuations provide a useful representational framework for compiler analyses and transformations
to reason about transducers and their compositions. While we’ve
restricted ourselves to the conservative optimisations enabled by
data-flow analysis, the lessons of Proebsting’s work can be applied in our framework, as well. The compiler is certainly allowed
to choose from a range of optimisation strategies, of varying degrees of aggressiveness, depending upon programmer hints or other
sources of information.
11.3 CLU generators
The programming language CLU provides a limited form of coroutine mechanism intended solely for the purpose of defining generators to be used in iteration [CLU]. Thus, to specify an iterator over
an array or hash table, the module exporting the array or hash table
type provides a generator coroutine. These generators are invoked
from the CLU for loop, and the iterator mechanism is carefully
limited to ensure that the entire CLU program can run with a single
call stack—for example, one cannot sequence across two iterators
in parallel in a single for loop. A recent loop package designed by
one of the authors provides a framework for expressing a similar
facility [Loop].
The coroutine mechanism we discuss here generalises this capability. We allow the programmer to use continuations as a convenient and extremely general-purpose abstraction for packaging
up generator state. The programmer doesn’t even need to set down
what the generator state is, nor does he need to explicitly “marshal”
or collect this state together across element generations; this is all
managed implicitly by the rules of λ calculus, lexical scope and the
context captured by call/cc. This use of a λ term (in this case,
one encoding a continuation) as a “carrier” of state is precisely one
of the uses of λ terms that the super-β transform is intended to

11.2 Filter fusion
In the systems community, we can find similar efforts, usually motivated by the performance demands of network-protocol stacks,
typically going under names like “filter fusion,” “protocol integration,” or “integrated layer processing.” For example, Proebsting has

303

13. Acknowledgements

make efficient. Thus the λ-calculus representation and the compiler
analysis dovetail neatly.

A very early version of this work was presented at the IFIP WG2.8
workshop on functional programming; we are grateful for the invitation to present these ideas and the useful feedback we received
from the working group members. Greg Morrisett and Franklyn
Turbak were particularly vocal in encouraging us to generalise the
early, linear-pipeline model, which led to the general channel structure. Anonymous reviewers provided valuable comments that improved the final version of this paper.

11.4 Run-time mechanisms
Much effort has been expended in the Scheme community on optimising the dynamic cost of allocating first-class continuations
[RepCtl]. By sharing saved stack segments or incrementally performing the stack-copy operation associated with creating general
first-class continuations, the price of an allocation can be reduced
to a constant cost on the order of a general procedure call or heapallocation operation.
These dynamic mechanisms can mitigate the cost of using
heavyweight, general continuations in cases where compiler analysis is unable to statically optimise them away.

References
[0CFA]

Olin Shivers. Control-flow analysis in Scheme. In
Proceedings of the SIGPLAN ’88 Conference on
Programming Language Design and Implementation, June
1988.

[Cata]

John Launchbury and Tim Sheard. Warm fusion: Deriving
build-catas from recursive definitions. In Proceedings of
the ACM Functional Programming and Computer
Architecture, 1995.

[CFASem]

Olin Shivers. The semantics of Scheme control-flow
analysis. In Proceedings of the First ACM SIGPLAN and
IFIP Symposium on Partial Evaluation and
Semantics-Based Program Manipulation, June 1991.
Published as SIGPLAN Notices 26(9):190–198,
Association for Computing Machinery, September 1991.

[CLU]

Barbara Liskov. A History of CLU. Technical Report 561,
MIT Laboratory for Computer Science, April 1992.

[CML]

John Reppy. CML: A higher-order concurrent language.
In Proceedings of the ACM SIGPLAN Conference on
Programming Language Design and Implementation,
ACM (June 1991).

12. Final thoughts and conclusion
The contributions of this work are several. First, we have structured
coroutines or data transducers in a fashion that allows λ-based analyses to optimise their use. This is a fundamentally enabling step;
if these computational structures cannot be adequately optimised,
then they cannot be exploited by engineers in performance-critical
applications.
Second, we have shown that we have not yet fully exploited the
power of CPS-based representations. It was this framework that exposed the control structure in a fashion that our analytic tools could
manipulate. It is common to view both continuations and coroutines as expensive control structures. We’d like to put forth the view
that this does not have to be the case. The commonplace notion of
a continuation is that “continuations are just stacks.” This is like
saying that λ expressions produce heap-allocated data structures. A
more accurate statement would be that a continuation is an abstraction of processor state [ContThreads]. All sophisticated functionallanguage compilers employ a range of techniques to render λ expressions into machine code, depending on what is statically known
about each particular expression’s use and context: some λ’s turn
into loops; some turn into branches; some become heap-allocated
closures; some just vanish completely. CPS-based compilers take
the same approach to continuations, which are pervasive throughout the code. The lesson is that when we have static information
about continuations, they can be implemented with little or no cost
at all.
This leads us to our final remark: coroutines are the neglected
control structure. Because coroutines have traditionally not been
amenable to analysis and optimisation, they have been regarded as
expensive, heavyweight program elements. The Computer Science
community has developed a “blind spot” with respect to them. We
find that once one has taken the step of viewing them as cheap,
lightweight control structures that one can apply as pervasively as
a λ expression, they turn out to be a surprisingly useful tool for
structuring programs. They allow one to construct modular iteration mechanisms, after the fashion of CLU’s for construct. They
allow one to finely partition the design and implementation of a
large, pipelined task. Once implemented, one can then easily divide
the entire pipeline into a few large segments by inserting buffering and concurrency elements [CML] at a few of the divisions,
and know that the intra-segment elements will be tightly composed
together into a serial loop. Later performance measurements can
be fed back to this larger structuring to balance out the computation/communication costs. We’d like to put coroutines back into
the toolbox of the systems programmer. Thus our current agenda of
developing the technology to render coroutines and their compositions efficient.

[ContThreads] Olin Shivers. Continuations and threads: Expressing
machine concurrency directly in advanced languages. In
Proceedings of the Second ACM SIGPLAN Workshop on
Continuations, January 1997, Paris. Also available as
BRICS Notes Series NS-96-13, University of Aarhus,
Denmark.

304

[CwC]

Andrew W. Appel. Compiling with Continuations.
Cambridge University Press, 1992.

[DCFA]

Matthew Might and Olin Shivers. Environmental analysis
via ∆CFA. In Proceedings of the 33rd Annual ACM
Symposium on Principles of Programming Languages
(POPL 2006), Charleston, South Carolina, January 2006.

[Deforest]

Philip Wadler. Deforestation: transforming programs to
eliminate trees. Theoretical Computer Science, pages
231–248.

[ESP]

Sanjeev Kumar, et al.. ESP: A language for
programmable devices. In Proceedings of the 2001 ACM
SIGPLAN Conference on Programming Language Design
and Implementation (PLDI), pages 309–320, Snowbird,
Utah, June 2001.

[Essence]

Philip Wadler. The essence of functional programming. In
Proceedings of the 19th Symposium on Principles of
Programming Languages, pages 1–14, ACM, Jan 1992.

[Fold]

A short cut to deforestation. Andrew Gill, John
Launchbury and Simon L. Peyton Jones. In Proceedings
of the FPCA ’93 Conference on Functional Programming
Languages and Computer Architecture, pages 223–232,
Copenhagen, Denmark, June 1993.

[FPMonads]

Philip Wadler. Monads for functional programming.
Advanced Functional Programming, Ed. J. Jeuring and E.
Meijer, Springer Verlag, LNCS.

[Fusion]

Todd A. Proebsting and Scott A. Watterson. Filter Fusion.
In Proceedings of the 23rd symposium on Principles of
Programming Languages (POPL’96), ACM, 1996.

[k-CFA]

[Loop]

[UVE]

Olin Shivers. Useless-variable elimination. In proceedings
of Workshop on Static Analysis of Equational, Functional
and Logic Programs, Université Bordeaux I, LaBRI,
Bordeaux, France, October, 1991.

Notes
{Note compose/push}
The definition of compose/push is

Control-Flow Analysis of Higher-Order Languages. Ph.D.
dissertation, Carnegie Mellon University, May 1991.
Technical Report CMU-CS-91-145, School of Computer
Science. (Also available via anonymous ftp as URL
ftp://cs.cmu.edu/afs%2Fcs.cmu.edu%2Fuser%
2Fshivers%2Flib%2Fpapers/diss.ps.Z.)

compose/push c1 c2 = λ u d . c1 u (λ x u0 . c2 (λ d0 . d0 x u0 ) d).
Note the one-shot, “pump-priming” up-continuation λ d0 . d0 x u0
which is used to hold onto c1 ’s first generated value x until c2 has
run forward far enough to enter its first get call.

Olin Shivers. The anatomy of a loop: a story of scope and
control. In Proceedings of the 10th ACM SIGPLAN
International Conference on Functional Programming
(ICFP 2005), pages 2–14, Tallinn, Estonia, September
2005.

{Note ThanksDana}
Dana Scott is the Church of the Lattice-Way Saints.
— A. J. Perlis

[LT]

Philip Wadler. Linear types can change the world.
Programming Concepts and Methods, Ed. M. Broy and
C. Jones, North Holland.

The type of an (α, β) Channel,

[MADT]

Chih-ping Chen and Paul Hudak. Rolling your own
mutable ADT—a connection between linear types and
monads. In Proceeding of the 24th ACM Symposium on
Principles of Programming Languages, January 1997,
Paris, France.

has a slightly unsettling and exotic appearance. Typically, recursive
datatype definitions, such as α list, have multiple arms:

[Moggi]

Eugenio Moggi. Computational lambda-calculus and
monads. In Symposium on Logic in Computer Science,
Asilomar,California, IEEE, June 1989.

[Orbit]

David Kranz. Orbit: An Optimizing Compiler for Scheme.
Ph.D. dissertation, Yale University, February 1988.
Research Report 632, Department of Computer Science.
A conference-length version of this dissertation appears in
SIGPLAN 86.

[Perlis]

Alan J. Perlis. Epigrams on programming. Sigplan 17 #9,
September 1980.

[PolyClip]

I. E. Sutherland and G. W. Hodgman. Reentrant polygon
clipping. Communications of the ACM 17(1), pages
32–42, January 1974.

[Rabbit]

Guy L. Steele Jr. RABBIT: A Compiler for SCHEME.
Technical Report 474, MIT AI Lab, May 1978.

[RepCtl]

Robert Hieb, R. Kent Dybvig and Carl Bruggeman.
Representing control in the presence of first-class
continuations. In Proceedings of the ACM SIGPLAN’90
Conference on Programming Language Design and
Implementation (PLDI), pages 66–77, White Plains, New
York, June 1990.

[TC]

Richard Kelsey and Paul Hudak. Realistic compilation by
program transformation. In Proceedings of the 16th
Annual ACM Symposium on Principles of Programming
Languages, January 1989.

[Type1]

David N. Turner, Philip Wadler, Christian Mossin. Once
upon a type. In Proceedings of the 7th International
Conference on Functional Programming and Computer
Architecture, San Diego, California, June 1995.

[Unfold]

Shortcut fusion for accumulating parameters & zip-like
functions. Josef Svenningsson. In Proceedings of the
Seventh ACM SIGPLAN International Conference on
Functional Programming (ICFP 2002), Pittsburgh,
Pennsylvania, October 2002.

datatype (α,β) Channel =
Chan of (α * (β,α) Channel) cont;

datatype α list = cons of α × α list | nil
One arm is recursive (cons of α × α list), and one serves as
the “base case” (nil). But with channels, we have no base case,
just the single recursive constructor Chan! One might wonder how
it would be possible to construct such a value. Things work out, of
course, in the same fashion that limits are well-defined when we
define the denotational semantics of loops recursively: continuous
lattices come to the rescue.

305

get pt no more

if <pt,pt0> crosses plane
then put(<pt,pt0> intersect plane)

if first point?
then pt0 := pt
else if <prev,pt> crosses plane
then put(<prev,pt> intersect plane)

Done

2

3

6

7

if pt on + side of plane
then put(pt)

4

prev := pt

5

1

8

Figure 5. The Sutherland-Hodgman polygon-clipping transducer

A. Polygon clipping and splitting

We can assemble a six-plane viewing-space polygon clipper
very easily, with the pipeline constructors from Figure 3. If we wish
to clip a polygon stored as a list of vertices verts, we can do so
with
val c = clipper
(listsource verts) ‘|> (c top)
‘>> (c bottom)
‘>> (c left)
‘>> (c right)
‘>> (c hither) ‘>> (c yon)
‘>| listsink
To show that we are not limited to linear pipelines, the bottom
half of Figure 6 presents an SML transducer that uses a plane to
split a polygon, sending the vertices of the polygon on the positive
side of the plane to down-channel pos, and the vertices of the
polygon on the negative side of the plane to the down-channel neg.
It is a simple variation of the polygon clipper.

The Sutherland-Hodgman clipping algorithm [PolyClip] is shown
in Figure 5. We trace around the periphery of the polygon, pumping
its vertices through the clipper. When the first vertex arrives, the
clipper saves it away as pt0. On subsequent arrivals, the clipper
considers the edge <prev,pt>, where prev is the previous point. If
this edge crosses the plane, then the intersection of the edge and the
plane is computed and sent downstream. The clipper then checks to
see if the current point is on the proper (or visible) side of the plane;
if so, it is sent downstream, as well. That completes one iteration;
we update the previous-point prev variable to be the current point
and loop back to get the next point. When we run out of points, the
clipper closes the polygon by processing the edge <pt,pt0>. The
picture in the figure shows an eight-vertex polygon being clipped
to a plane; the dotted lines show the resulting clipped polygon.
Note that this algorithm doesn’t have a simple one-point-in/onepoint-out structure. When we send a vertex downstream into the
clipper, it, in turn, may send zero, one or two vertices downstream.
Structuring a polygon clipper as a transducer works well because we sometimes wish to clip a polygon to more than one plane.
In the classic 3D rendering pipeline, before the perspective transformation is performed, we clip each polygon to six planes, which
define the volume of space which is visible to the viewer, the socalled top, bottom, left, right, hither and yon planes, which, together, bound a truncated pyramid called the “viewing cone.”
The top half of Figure 6 shows the clipping algorithm rendered as a transducer, in SML. It uses primitives crosses_plane
and plane_intersect to determine if a line segment crosses a
plane and, if so, to compute the intersection point. The function
plane_sign produces a number which is positive, if the point argument is on one side of the plane; negative, if on the other side;
and zero, if directly on the plane. The code is essentially a direct
translation of the flow-chart algorithm given in the previous figure. Vertices are sent to the clipper embedded in an option type;
we close the polygon by sending a NONE value. After the clipper
has closed the polygon (and passed the NONE token downstream
itself), it resets and prepares to clip a new polygon.

306

fun clipper plane =
let fun sendcross(pt1,pt2,outc) =
(* If (pt1,p2) intersects plane, *)
if crosses_plane plane (pt1, pt2) (* send intersection downstream. *)
then put(SOME(plane_intersect plane (pt1,pt2)), outc)
else outc
(* Send pt downstream if it’s "visible" -- on non-neg side of plane. *)
fun sendvispt(pt,outc) = if plane_sign plane pt >= 0.0
then put(SOME pt, outc)
else outc
fun start(inc,outc) =
case (get inc) of
(NONE, inc) => close(inc,outc)
(* Zero vertices! *)
| (SOME pt0, inc) => let fun lp(prev,inc,outc) = let val outc = sendvispt(prev,outc)
in case (get inc) of
(* Go upstream. *)
(NONE,inc) => let val outc = sendcross(prev,pt0,outc)
in close(inc, outc)
end
| (SOME pt, inc) =>
let val outc = sendcross(prev,pt,outc)
in lp(pt, inc, outc)
end
end
in lp(pt0,inc,outc)
end
and close(inc, outc) = start(inc, put(NONE, outc))
in start
end
fun splitter plane =
let fun sendpt(pt,chan) = put(SOME pt, chan)
fun sendcross(pt1,pt2,pos,neg) =
if crosses_plane plane (pt1, pt2)
then let val pt = plane_intersect plane (pt1,pt2)
in (sendpt(pt,pos), sendpt(pt,neg))
end
else (pos,neg)
(* Steer pt to pos or neg consumer, depending on which side of the
* plane it occurs. If is lies exactly on the plane, send it to both. *)
fun steerpt(pt,pos,neg) = let val s = plane_sign plane pt
val neg = if s <= 0.0 then sendpt(pt,neg)
else neg
val pos = if s >= 0.0 then sendpt(pt,pos)
else pos
in (pos,neg)
end
fun start(inc,pos,neg) =
case (get inc) of
(NONE, inc) => close(inc,pos,neg)
(* Zero vertices! *)
| (SOME pt0, inc) => let fun lp(prev,inc,pos,neg) =
let val (pos,neg) = steerpt(prev,pos,neg)
in case (get inc) of
(NONE,inc) => let val (pos,neg) = sendcross(prev,pt0,pos,neg)
in close(inc,pos,neg)
end
| (SOME pt, inc) => let val (pos,neg) = sendcross(prev,pt,pos,neg)
in lp(pt, inc, pos, neg)
end
end
in lp(pt0, inc, pos, neg)
end
and close(inc,pos,neg) = start(inc, put(NONE, pos), put(NONE,neg))
in start
end

Figure 6. The Sutherland-Hodgman polygon clipper and splitter as online transducers, in SML. Note that the transducer takes two output
channels.

307