arXiv:math/9205211v1 [math.HO] 1 May 1992

Two Notes on Notation
by Donald E. Knuth
Computer Science Department, Stanford University
Mathematical notation evolves like all languages do. As new experiments are made, we sometimes witness the survival of the fittest, sometimes the survival of the most familiar. A healthy
conservatism keeps things from changing too rapidly; a healthy radicalism keeps things in tune with
new theoretical emphases. Our mathematical language continues to improve, just as “the d-ism of
Leibniz overtook the dotage of Newton” in past centuries [4, Chapter 4].
In 1970 I began teaching a class at Stanford University entitled Concrete Mathematics. The
students and I studied how to manipulate formulas in continuous and discrete mathematics, and
the problems we investigated were often inspired by new developments in computer science. As the
years went by we began to see that a few changes in notational traditions would greatly facilitate
our work. The notes from that class have recently been published in a book [15], and as I wrote
the final drafts of that book I learned to my surprise that two of the notations we had been using
were considerably more useful than I had previously realized. The ideas “clicked” so well, in fact,
that I’ve decided to write this article, blatantly attempting to promote these notations among the
mathematicians who have no use for [15]. I hope that within five years everybody will be able to
use these notations in published papers without needing to explain what they mean.
The notations I’m talking about are (1) Iverson’s convention for characteristic functions; and
(2) the “right” notation for Stirling numbers, at last.
1. Iverson’s convention. The first notational development I want to discuss was introduced by
Kenneth E. Iverson in the early 60s, on page 11 of the pioneering book [21] that led to his well
known APL.
“If α and β are arbitrary entities and R is any relation defined on them, the relational
statement (aRb) is a logical variable which is true (equal to 1) if and only if α stands in
the relation R to β. For example, if x is any real number, then the function
(x > 0) − (x < 0)
(commonly called the sign function or sgn x) assumes the values 1, 0, or −1 according as
x is strictly positive, 0, or strictly negative.”
When I read that, long ago, I found it mildly interesting but not especially significant. I began
using his convention informally but infrequently, in class discussions and in private notes. I allowed
it to slip, undefined, into an obscure corner of one of my books (see page 117 of [16]). But when
I prepared the final manuscript of [15], I began to notice that Iverson’s idea led to substantial
improvements in exposition and in technique.
Before I can explain why the notation now works so well for me, I need to say a few words
about the manipulation of sums and summands. I realized long ago that “boundary conditions”
This research was supported in part by National Science Foundation grant CCR-86-10181.
1

on indices of summation are often a handicap and a waste of time. Instead of writing
n  
X
n k
n
(1 + z) =
z ,
k

(1.1)

k=0

it is much better to write
n

(1 + z) =

X n
k

k

zk ;

(1.2)

the sum now extends over all integers k, but only finitely many terms are nonzero. The second
formula (1.2) is instantly converted to other forms:

X n 
X n 
X
n
n
k
k+1
(1 + z) =
z =
z
=
z ⌊n/2⌋−k ;
(1.3)
k
k+1
⌊n/2⌋ − k
k

k

k

by contrast, we must work harder when dealing with (1.1), because we have to think about the
limits:
n

(1 + z) =

n  
X
n
k=0

k

k

z =

n−1
X

k=−1




n
z k+1 =
k+1

⌊n/2⌋

X

k=−⌈n/2⌉




n
z ⌊n/2⌋−k .
⌋n/2⌋ − k

(1.4)

Furthermore, (1.2) and (1.3) make sense also when n is not a positive integer.
Even when limits are necessary, it is best to keep them as simple as possible. For example, it’s
almost always a mistake to write
n−1
X

k(k − 1)(n − k)

instead of

n
X

k(k − 1)(n − k) ;

(1.5)

k=0

k=2

the additional zero terms are more helpful than harmful (and the former sum is problematical when
n = 0, 1, or 2).
Finally it dawned on me that Iverson’s convention allows us to write any sum as an infinite
sum without limits: If P (k) is any property of the integer k, we have
X
X
f (k) [P (k)] .
(1.6)
f (k) =
P (k)

k

For example, the sums in (1.5) become
X
X
k(k − 1)(n − k) [0 ≤ k ≤ n] =
k(k − 1)(n − k) [k ≥ 0] [k ≤ n] .
k

(1.7)

k

(At the time I made this observation, I had forgotten that Iverson originally defined his convention only for single relational operators enclosed in parentheses; I began to put arbitrary logical
statements in square brackets, and to assume that this would produce the value 0 or 1.) In this
particular case nothing much has been gained when passing from (1.5) to (1.7), although we might
be able to make use of identities like
k [k ≥ 0] = k [k ≥ 1] .
2

(1.8)

But in general, the ability to manipulate “on the line” instead of “below the line” turns out to be
a great advantage.
For example, in my first book [25] I had found it necessary to include the rule
X

X

f (k) +

k∈A

X

f (k) =

k∈B

X

f (k) +

k∈A∪B

f (k)

(1.9)

k∈A∩B

P
as a separate axiom for
manipulation. But this axiom is unnecessary in [15], because it can be
derived easily from other basic laws: The left-hand side is
X
X
X
X
f (k) +
f (k) =
f (k) [k ∈ A] +
f (k) [k ∈ B]
k∈A

k∈B

k

=

X

k

f (k) ([k ∈ A] + [k ∈ B])

k

and the right-hand side is the same, because we have
[k ∈ A] + [k ∈ B] = [k ∈ A ∪ B] + [k ∈ A ∩ B] .

(1.10)

The interchange of summation order in multiple sums also comes out simpler now. I used to
have trouble understanding and/or explaining why
j
n X
X

f (j, k) =

j=1 k=1

n
n X
X

f (j, k) ;

(1.11)

k=1 j=k

but now it’s easy for me to see that the left-hand sum is
X
X
f (j, k) [1 ≤ j ≤ n] [1 ≤ k ≤ j] =
f (j, k) [1 ≤ k ≤ j ≤ n]
j,k

j,k

=

X

f (j, k) [1 ≤ k ≤ n] [k ≤ j ≤ n] ,

j,k

and this is the right-hand sum.
Here’s another example: We have
[k even] =

X

[k = 2m]

and

[k odd] =

m

X

[k = 2m + 1] ;

(1.12)

m

therefore
X

f (k) =

X

f (k) ([k even] + [k odd])

=

X

f (k) [k = 2m] +

X

f (2m) +

k

k

k,m

=

m

X

f (k) [k = 2m + 1]

k,m

X
m

3

f (2m + 1) .

(1.13)

The result in (1.13) is hardly surprising; but I like to have mechanical operations like this available
so that I can do manipulations reliably, without thinking. Then I’m less apt to make mistakes.
Let lg stand for logarithms to base 2. Then we have
X  n  X X n

=
m = ⌊lg k⌋
⌊lg k⌋
m
m

k≥1

k≥1

Xn

[m ≤ lg k < m + 1] [k ≥ 1]

=

Xn

[2m ≤ k < 2m+1 ] [k ≥ 1]

=

Xn

(2m+1 − 2m ) [m ≥ 0]

Xn

2m = 3n .

=

k,m

m,k

m

=

m

m

m

m

m

(1.14)

If we are doing infinite products we can use Iversonian brackets as exponents:
Y

P (k)

f (k) =

Y

f (k)[P (k)] .

(1.15)

k

For example, the largest squarefree divisor of n is
Y

p [p prime] [p divides n] .

p

Everybody is familiar with one special case of an Iverson-like convention, the “Kronecker delta”
symbol

1 , i = k;
(1.16)
δik =
0 , i 6= k.
Leopold Kronecker introduced this notation in his work on bilinear forms [30, page 276] and in his
lectures on determinants (see [31, page 316]); it soon became widespread. Many of his followers
wrote δjk , which is a bit more ambiguous because it conflicts with ordinary exponentiation. I now
prefer to write [j = k] instead of δjk , because Iverson’s convention is much more general. Although
‘[j = k]’ involves five written characters instead of the three in ‘δjk ’, we lose nothing in common
cases when ‘[j = k + 1]’ takes the place of ‘δj(k+1) ’.
Another familiar example of a 0–1 function, this time from continuous mathematics, is Oliver
Heaviside’s unit step function [x ≥ 0]. (See [44] and [37] for expositions of Heaviside’s methods.) It
is clear that Iverson’s convention will be as useful with integration as it is with summation, perhaps
even more so. I have not yet explored this in detail, because [15] deals mostly with sums.
It’s interesting to look back into the history of mathematics and see how there was a craving
for such notations before they existed. For example, an Italian count named Guglielmo Libri
4

x

published several papers in the 1830s concerning properties of the function 00 . He noted [32]
that 0x is either 0 (if x > 0) or 1 (if x = 0) or ∞ (if x < 0), hence
x

00 = [x > 0] .

(1.17)

But of course he didn’t have Iverson’s convention to work with; he was pleased to discover a way
to denote the discontinuous function [x > 0] without leaving the realm of operations acceptable in
x−n
his day. He believed that “la fonction 00
est d’un grand usage dans l’analyse mathématique.”
And he noted in [33] that his formulas “ne renferment aucune notation nouvelle. . . . Les formules
qu’on obtient de cette manière sont très simples, et rentrent dans l’algèbre ordinaire.”
Libri wrote, for example,
(1 − 00

−x

x−a

)(1 − 00

)

for the function [0 ≤ x ≤ a], and he gave the integral formula
2
π

Z

∞
0

−x
−x 
e−x
dq cos qx
ex
+ x
.
= ex · 00 + e−x 1 − 00
= −x
2
1+q
0 +1 0 +1

(Of course, we would now write the value of that integral as e−|x| , but a simple notation for
absolute value wasn’t introduced until many years later. I believe that the first appearance of ‘|z|’
for absolute value in Crelle’s journal—the journal containing Libri’s papers [32] and [33]—occurred
on page 227 of [56] in 1881. Karl Weierstrass was the inventor of this notation, which was applied
at first only to complex numbers; Weierstrass seems to have published it first in 1876 [55].)
x

Libri applied his 00 function to number theory by exhibiting a complicated way to describe
the fact that x is a divisor of m. In essence, he gave the following recursive formulation: Let
P0 (x) = 1 and for k > 0 let
x−k

Pk (x) = 00

x−k+1

P0 (x) − 00

x−1

P1 (x) − · · · − 00

Pk−1 (x) .

Then the quantity
x−m

1 − m · 00

x−m+1

P0 (x) − (m − 1) 00

x−2

P1 (x) − · · · − 2 · 00
x

x−1

Pm−2 (x) − 00

Pm−1 (x)

turns out to equal 1 if x divides m, otherwise it is 0. (One way to prove this, Iverson-wise, is
x−k
to replace 00
in Libri’s formulas by [x > k], and to show first by induction that Pk (x) =
[x divides k] − [x divides k − 1] for all k > 0. Then if ak (x) = k [x > k], we have
m−1
X

am−k (x)Pk (x) =

m−1
X

am−k (x) ([x divides k] − [x divides k − 1])

k=0

k=0

=

m−1
X
k=0


[x divides k] am−k (x) − am−k−1 (x) .

If the positive integer x is not a divisor of m, the terms of this new sum are zero except when
m−k = m mod x, when we have am−k (x)−am−k−1 (x) = 1. On the other hand if x is a divisor of m,
5

the only nonvanishing term occurs for m − k = x, when we have am−k (x)− am−k−1 (x) = 0− (x− 1).
Hence the sum is 1−x [x divides m]. Libri obtained his complicated formula by a less direct method,
applying Newton’s identities to compute the sum of the mth powers of the roots of the equation
tx−1 + tx−2 + · · · + 1 = 0.)
Evidently Libri’s main purpose was to show that unlikely functions can be expressed in algebraic terms, somewhat as we might wish to show that some complex functions can be computed
x
by a Turing Machine. “Give me the function 00 , and I’ll give you an expression for [x divides m].”
But our goal with Iverson’s notation is, by contrast, to find a simple and natural way to express
quantities that help us solve problems. If we need a function that is 1 if and only if x divides m,
we can now write [x divides m].
Some of Libri’s papers are still well remembered, but [32] and [33] are not. I found no mention
of them in Science Citation Index, after searching through all years of that index available in our
library (1955 to date). However, the paper [33] did produce several ripples in mathematical waters
when it originally appeared, because it stirred up a controversy about whether 00 is defined. Most
mathematicians agreed that 00 = 1, but Cauchy [5, page 70] had listed 00 together with other
expressions like 0/0 and ∞ − ∞ in a table of undefined forms. Libri’s justification for the equation
00 = 1 was far from convincing, and a commentator who signed his name simply “S” rose to
the attack [45]. August Möbius [36] defended Libri, by presenting his former professor’s reason
for believing that 00 = 1 (basically a proof that limx→0+ xx = 1). Möbius also went further and
presented a supposed proof that limx→0+ f (x)g(x) = 1 whenever limx→0+ f (x) = limx→0+ g(x) = 0.
Of course “S” then asked [3] whether Möbius knew about functions such as f (x) = e−1/x and
g(x) = x. (And paper [36] was quietly omitted from the historical record when the collected works
of Möbius were ultimately published.) The debate stopped there, apparently with the conclusion
that 00 should be undefined.
But no, no, ten thousand times no! Anybody who wants the binomial theorem
n  
X
n k n−k
(x + y) =
x y
k
n

(1.18)

k=0

to hold for at least one nonnegative integer n must believe that 00 = 1, for we can plug in x = 0
and y = 1 to get 1 on the left and 00 on the right.
The number of mappings from the empty set to the empty set is 00 . It has to be 1.
On the other hand, Cauchy had good reason to consider 00 as an undefined limiting form, in
the sense that the limiting value of f (x)g(x) is not known a priori when f (x) and g(x) approach 0
independently. In this much stronger sense, the value of 00 is less defined than, say, the value of
0 + 0. Both Cauchy and Libri were right, but Libri and his defenders did not understand why truth
was on their side.
Well, it’s instructive to study mathematical history and to observe how tastes change as
progress is made. But let’s come closer to the present, to see how Iverson’s convention might
be useful nowadays. Today’s mathematical literature is, in fact, filled with instances where analogs
of Iversonian brackets are being used—but the concepts must be expressed in a roundabout way,
because his convention is not yet established. Here are two examples that I happened to notice
6

just before writing this paper:
(1) Hardy and Wright, in the course of proving the Staudt-Clausen theorem about the denominators of Bernoulli numbers [20, § 7.9], consider the sum
X

p−1 divides k

1
p

where p runs through primes. They define ǫk (p) to be 1 if p − 1 divides k, otherwise ǫk (p) = 0;
then the sum becomes
X ǫk (p)
.
p
p
Pp−1 k
They proceed to show that m=1
m ≡ −ǫk (p) (mod p) whenever p is prime, and the theorem
follows with a bit more manipulation.
(2) Mark Kac, introducing the relation of ergodic theory to continued fractions [24, § 5.4],
says: “Let now P0 ∈ Ω and g(P ) the characteristic function of the measurable set A; i.e.,

1, p ∈ A,
g(P ) =
0, p ∈ A.
It is now clear that t(τ, P0 , A) is given by the formula
Z τ

t(τ, P0 , A) =
g Tt (P0 ) dt ,
0

and . . . ”.
I hope it is now clear why my students and I would find it quite natural to say directly that
Z τ
[Tt (P0 ) ∈ A] dt .
t(τ, P0 , A) =
0

Also, in the context of Hardy and Wright, we would evaluate
that it is (p − 1) [p − 1 divides k].



Pp−1

m=1

m

k



mod p and discover

If you are a typical hard-working, conscientious mathematician, interested in clear exposition
and sound reasoning—and I like to include myself as a member of that set—then your experiences
with Iverson’s convention may well go through several stages, just as mine did. First, I learned about
the idea, and it certainly seemed straightforward enough. Second, I decided to use it informally
while solving problems. At this stage it seemed too easy to write just ‘[k ≥ 0]’; my natural tendency
was to write something like ‘δ(k ≥ 0)’, giving an implicit bow to Kronecker, or ‘τ (k ≥ 0)’ where
τ stands for truth. Adriano Garsia, similarly, decided to write ‘χ(k ≥ 0)’, knowing that χ often
denotes a characteristic function; he has used χ notation effectively in dozens of papers, beginning
with [10], and quite a few other mathematicians have begun to follow his lead. (Garsia was one of
my professors in graduate school, and I recently showed him the first draft of this note. He replied,
“My definition from the very start was
n
1 if A is true
χ(A) =
0 if A is false
7

where A is any statement whatever. But just like you, I got it by generalizing from Iverson’s APL.
. . . I don’t have to tell you the magic that the use of the χ notation can do.”)
If you go through the stages I did, however, you’ll soon tire of writing δ, τ , or χ, when you
recognize that the notation is quite unambiguous without an additional symbol. Then you will
have arrived at the philosophical position adopted by Iverson when he wrote [21]. And I had also
reached that stage when I completed the first edition of [15]; I adopted Iverson’s original suggestion
to enclose logical statements in ordinary parentheses, not square brackets.
Unfortunately, not all was well with that first edition. Students found cases where I had
parenthesized a complicated logical statement for clarity, for example when I wrote something of
the form ‘α and (β or γ)’; they pointed out that the simple act of putting parentheses around ‘β
or γ’ automatically caused it to be evaluated as either 0 or 1, according to a strict interpretation
of Iverson’s rule as I had extended it.
Worse yet, as I began to read the first edition of [15] with fresh eyes, I found that the formulas
involved too many parentheses. It was hard for me to perceive the structure of complex expressions
that involved Iversonian statements; the statements had been clear to me when I wrote them down,
but they looked confusing when I came back to them several months later. A computer could readily
parse each expression, but good notation must be engineered for human beings.
Therefore in the second and subsequent printings of [15], my co-authors and I now use square
brackets instead of parentheses, whenever we wish to transform logical statements into the values 0
or 1. This resolves both problems, and we now believe that the notation has proved itself well
enough to be thrust upon the world. Square brackets are used also for other purposes, but not in
a conflicting way, and not so often that the multiple uses become confusing.
One small glitch remains: We want to be able to write things like
X
[p prime] [p ≤ x]/p

(1.19)

p

to denote the sum of all reciprocals of primes ≤ x. But this summand unfortunately reduces to 0/0
when p = 0. In general, when an Iverson-bracketed statement is false, we want it to evaluate into
a “very strong 0,” namely a zero so strong that it annihilates anything it is multiplied by—even if
that other factor is undefined.

Similarly, in formulas like (1.2) it is convenient to regard nk as strongly zero when k is negative,
 −10
n
so that, for example, −10
z
= 0 when z = 0.

The strong-zero convention is enough to handle 99% of the difficult situations, but we may also
be using 1 − [P (k)] to stand for the quantity [not P (k)]; then we want [P (k)] to give a “strong 1.”
And paradoxes can still arise, whenever irresistible forces meet immovable objects. (What happens
if a strong zero appears in the denominator? And so on.)
In spite of these potential problems in extreme cases, Iverson’s convention works beautifully in
the vast majority of applications. It is, in fact, far less dangerous than most of the other notations
of mathematics, whose dark corners we have learned to avoid long ago. The safe use of Iverson’s
simple and convenient idea is quite easy to learn.
2. Stirling numbers. The second plea I wish to make for perspicuous notation concerns the
8

famous coefficients introduced by James Stirling at the beginning of his Methodus Differentialis in
1730 [52]. The lack of a widely accepted way to refer to these numbers has become almost scandalous. For example, Goldberg, Newman, and Haynsworth begin their chapter on Combinatorial
Analysis in the NBS Handbook [1] by remarking that notations for Stirling numbers “have never
been standardized . . . We feel that a capital S is natural for Stirling numbers of the first kind; it is
infrequently used for other notation in this context. But once it is used we have difficulty finding
a suitable symbol for Stirling numbers of the second kind. The numbers are sufficiently important
to warrant a special and easily recognizable symbol, and yet that symbol must be easy to write.
We have settled on a script capital S without any certainty that we have settled this question
permanently.”
The present predicament came about because Stirling numbers are indeed important enough
to have arisen in a wide variety of applications, yet they are not quite important enough to have
deserved a prominent place in the most influential textbooks of mathematics. Therefore they have
been rediscovered many times, and each author has chosen a notation that was optimized for one
particular application.
The great utility of Stirling numbers has become clearer and clearer with time, and mathematicians have now reached a stage where we can intelligently choose a notation that will serve us
well in the whole range of applications.
I came into the picture rather late, having never heard of Stirling numbers until after receiving my Ph.D. in mathematics. But I soon encountered them as I was beginning to analyze
the performance of algorithms and to write the manuscript for my books on The Art of Computer Programming. I quickly realized the truth of Imanuel Marx’s comment that “these numbers

have similarities with the binomial coefficients nk ; indeed, formulas similar to those known for
the binomial coefficients are easily established” [35]. In order to emphasize those similarities and
to facilitate pattern recognition when manipulating formulas, Marx recommended using bracket

 
symbols nk for Stirling numbers of the first kind and brace symbols nk for Stirling numbers of
the second kind. A similar proposal was being made at about the same time in Italy by Antonio
Salmeri [46].
I was strongly motivated by Charles Jordan’s book, Calculus of Finite Differences [23], which
introduced me to the important analogies between sums of factorial powers and integrals of ordinary
powers. But I kept getting mixed up when I tried to use Stirling numbers as he defined them,
because half of his “first kind” numbers were negative and the other half were positive. I had
similar problems with Marx’s suggestions in [35]; he made all Stirling numbers of the first kind
positive, but then he attached a minus sign to half the numbers of the second kind. I decided that
I’d never be able to keep my head above water unless I worked with Stirling numbers that were
entirely signless.
And I soon learned that the signless Stirling numbers have important combinatorial significance. So I decided to try a definition that combined the best qualities of the other notations
 

I’d seen; I defined the quantities nk and nk as follows:
n
k = the number of permutations of n objects having k cycles;
n
k = the number of partitions of n objects into k nonempty subsets.
9

For example,
cycles:

4
2

= 11, because there are eleven different ways to arrange four elements into two
[1, 2, 3] [4]
[1, 3, 2] [4]
[1, 2] [3, 4]

And

4
2

[1, 2, 4] [3]
[1, 4, 2] [3]
[1, 3] [2, 4]

[1, 3, 4] [2]
[1, 4, 3] [2]
[1, 4] [2, 3].

[2, 3, 4] [1]
[2, 4, 3] [1]

= 7, because the partitions of {1, 2, 3, 4} into two subsets are
{1, 2, 3}{4}
{1, 2}{3, 4}

{1, 2, 4}{3}
{1, 3}{2, 4}

{1, 3, 4}{2}
{2, 3, 4}{1}
{1, 4}{2, 3}.

Notice that this notation is mnemonic: The meaning of nk is easily remembered, because braces
{ } are commonly used to denote sets and subsets. We could also adopt the convention of writing
cycles in brackets, as in my examples above, where [1, 2, 3] = [2, 3, 1] = [3, 1, 2] is a typical three 
cycle; that would make the notation nk equally mnemonic. But I don’t insist on this.
 

I have never decided how to pronounce ‘ nk ’ and ‘ nk ’ when I’m reading formulas aloud in

class. Many people have begun to verbalize ‘ nk ’ as “n choose k”; hence I’ve been saying “n cycle k”
 

for nk and “n subset k” for nk . But I have also caught myself calling them “n bracket k” and
“n brace k.”
One of the advantages of these notational conventions is that binomial coefficients and Stirling
numbers can be defined by very simple recurrence relations having a nice pattern:

   

n+1
n
n
=
+
;
(2.1)
k
k
k−1


  

n+1
n
n
=n
+
;
(2.2)
k
k
k−1


  

n+1
n
n
=k
+
.
(2.3)
k
k
k−1
Moreover—and this is extremely important—these identities hold for all integers n and k, whether
positive, negative, or zero. Therefore we can apply them in the midst of any formula (for example,

 
to “absorb” an n or a k that appears in the context n nk or k nk ), without worrying about
exceptional circumstances of any kind.
I introduced these notations in the first edition of my first book [25], and by now my students
and I have accumulated some 25 years of experience with them; the conventions have served us well.
However, such brackets and braces have still not become widely enough adopted that they could be
considered “standard.” For example, Stanley’s magnificent book on Enumerative Combinatorics
 

[51] uses c(n, k) for nk and S(n, k) for nk . His notation conveys combinatorial significance, but
it fails to suggest the analogies to binomial coefficients that prove helpful in manipulations. Such
analogies were evidently not important enough in his mind to warrant an extravagant two-line


k −n
=
(−1)
notation—although he does use nk to denote n+k−1
k , the number of combinations
k 

n
with repetitions permitted. In a sense, Stanley’s k is a signless version of the numbers −n
.
k
When I wrote Concrete Mathematics in 1988, I explored Stirling numbers more carefully than

 
I had ever done before, and I learned two things that really clinch the argument for nk and nk as
10

the best possible Stirling number notations. Ron Graham sent me a preview copy of a memorandum
by B. F. Logan [34], which presented a number of interesting connections between Stirling numbers
and other mathematical quantities. One of the first things that caught my attention was Logan’s
 

Table 1, a two-dimensional array that contained the numbers nk and nk simultaneously—implying
that there really is only one “kind” of Stirling number. Indeed, when I translated Logan’s results
into my own favorite notation, I was astonished to find that his arrangement of numbers was
equivalent to a beautiful and easily remembered law of duality,
   
n
−k
=
.
k
−n

(2.4)

Once I had this clue, it was easy to check that the recurrence relations (2.2) and (2.3) are equivalent
to each other. And the boundary conditions
   
0
0
=
= [k = 0]
k
k

and

   
n
n
=
= [n = 0]
0
0

(2.5)

yield unique solutions to (2.2) and (2.3) for all integers k and n, when we run the recurrences
forward and backward; the “negative” region for Stirling numbers of one kind turns out to contain
precisely the numbers of the other kind. For example, the following subset of Logan’s table gives
 
the values of nk when |n| and |k| are at most 4:
k = −4 k = −3 k = −2 k = −1 k = 0
n = −4
n = −3
n = −2
n = −1
n=0
n=1
n=2
n=3
n=4

1
6
7
1
0
0
0
0
0

0
1
3
1
0
0
0
0
0

0
0
1
1
0
0
0
0
0

0
0
0
1
0
0
0
0
0

0
0
0
0
1
0
0
0
0

k=1 k=2 k=3 k=4
0
0
0
0
0
1
1
2
6

The reflection of this matrix about a 45◦ diagonal gives the values of

0
0
0
0
0
0
1
3
11
n
k

=

0
0
0
0
0
0
0
1
6
−k 
−n

0
0
0
0
0
0
0
0
1

.

Naturally I wondered how I could have been working with Stirling numbers for so many years
without having been aware of such a basic fact. Surely it must have been known before? After
several hours of searching in the library, I learned that identity (2.4) had indeed been known,
but largely forgotten by succeeding generations of mathematicians, primarily because previous
notations for Stirling numbers made it impossible to state the identity in such a memorable form.
These investigations also turned up several things about the history of Stirling numbers that I had
not previously realized.
During the nineteenth century, Stirling’s connection with these numbers had been almost
entirely forgotten. The numbers themselves were studied, in the role of “sums of products of
11

combinations of the numbers {1, 2, . . . , n} taken k at a time.” Let Ck (n) and Γk (n) denote those
sums, when the combinations are respectively without or with repetitions; thus, for example,
C4 (4) = 1 · 2 · 3 + 1 · 2 · 4 + 1 · 3 · 4 + 2 · 3 · 4 = 50 ;
Γ3 (3) = 1 · 1 · 1 + 1 · 1 · 2 + 1 · 1 · 3 + 1 · 2 · 2 + 1 · 2 · 3
+ 1 · 3 · 3 + 2 · 2 · 2 + 2 · 2 · 3 + 2 · 3 · 3 + 3 · 3 · 3 = 90 .
It turns out that



n+1
Ck (n) =
n+1−k



and



n+k
Γk (n) =
.
n

Christian Kramp [28] proved near the end of the eighteenth century that
X n + 1
(k + l)!
,
Ck (n) =
j
k + l j1 ! 2 1 j2 ! 3j2 j3 ! 4j3 . . .
X n + k 
(k + l)!
,
Γk (n) =
j
1
k + l j1 ! 2! j2 ! 3!j2 j3 ! 4!j3 . . .

(2.6)

(2.7)
(2.8)

where the sums are over all sequences of nonnegative integers hj1 , j2 , j3 , . . . i such that we have
j1 + 2j2 + 3j3 + · · · = k (i.e., over all partitions of k), and where l = j1 + j2 + j3 + · · · . For example,








n+1 1
n+2 1
n+2 1
n+1 1
+
;
Γ2 (n) =
+
.
C2 (n) =
3
4
3
8
3
8
6
4
Notice that Ck (n) and Γk (n) are polynomials in n, of degree 2k. The duality law (2.4) and the
notational transformations of (2.6) are equivalent to the amazing polynomial identity
Ck (n − 1) = Γk (−n) ;

(2.9)

but hardly anybody was aware of this surpising fact, otherwise we would almost certainly find it
mentioned explicitly in the comprehensive surveys compiled in the 1890s [19, 38].
On the other hand, a rereading of Stirling’s original treatment [52] makes it clear that Stirling
himself would not have found the duality law (2.4) at all surprising. From the very beginning, he
thought of the numbers as two triangles hooked together in tandem. Indeed, his entire motivation
for studying them was the general identity
X n
n
z =
zk ,
(2.10)
k
k

which expresses ordinary powers in terms of falling factorial powers. When n is positive, the nonzero
terms in this sum occur for positive values of k ≤ n; but when n is negative, the nonzero terms

occur for negative k ≤ n. Stirling presented his tables by displaying nk with k as the row index
 
and nk with k as the column index; thus, he visualized a tandem arrangement exactly as in the
matrix of numbers above, with each column containing a sequence of coefficients for (2.10).
I need to digress a bit about factorial powers. If n is a positive integer and z is a complex
number, I like to write
z n = z(z − 1) . . . (z − n + 1) ,
(2.11)
12

which I call “z to the n falling,” and
z n = z(z + 1) . . . (z + n − 1) ,

(2.12)

which is “z to the n rising.” More generally, if α is any complex number, factorial powers are
defined by
and
z α = Γ(z + α)/Γ(z) ,
z α = z!/(z − α)!
(2.13)
unless these formulas reduce to ∞/∞ (when limiting values are used). My use of underlined and
overlined exponents is still controversial, but I cannot resist mentioning a curious fact: Many people
(e.g., specialists in hypergeometric series) have become accustomed to the notation (z)n for rising
factorial powers, while many other people (e.g., statisticians) use the same notation for falling
powers. The curious fact is that this notation is called “Pochhammer’s symbol,” but Pochhammer

himself [43] used (z)n to stand for the binomial coefficient nz . I prefer the underline/overline
notation because it is unambiguous and mnemonic, especially when I’m doing work that involves
factorial powers of both kinds. (Moreover, I know that z n and z n are easy to typeset, using macros
available in the file gkpmac.tex in the standard UNIX distribution of TEX.)
In the special case n = 3, Stirling’s formula (2.10) gives
 
 
 
3 3
3 2
3 1
3
z =
z +
z +
z = z(z − 1)(z − 2) + 3z(z − 1) + z .
3
2
1
And in the special case n = −1, it reduces to the infinite sum
 
1 X −1 k
=
z
z
k
k
Xk
=
z −k
1
k

=

1!
2!
0!
+
+
+ ··· ,
z + 1 (z + 1)(z + 2) (z + 1)(z + 2)(z + 3)

because

 
n
= (n − 1)! [n > 0] .
1

(2.14)

(2.15)

Stirling did not discuss convergence; he was, after all, writing in 1730. We have the partial sum
n

(k − 1)!
n!
1 X
=
+
;
z
(z + 1) . . . (z + k) z(z + 1) . . . (z + n)
k=1

this is a special case of the general identity
n

z1 . . . zn
1 X
z1 . . . zk−1
=
+
z
(z + z1 ) . . . (z + zk ) z(z + z1 ) . . . (z + zn )

(2.16)

k=1

discovered by François Nicole [39] a few years before Stirling’s treatise appeared. Therefore the
infinite series (2.14) converges if and only if Re(z) > 0. By induction on n, the same condition is
13

necessary and sufficient for (2.10) when n is any negative integer. See [41, § 30] for further discussion

of (2.10).
 
We noted above that the numbers m
k can be regarded as sums of products of combinations.
The first identity in (2.6) is equivalent to the formula
X n
n
z =
zk ,
(2.17)
k
k

when n is a nonnegative integer, if we expand the product z n and sum the coefficients of each power
of z. Similarly, we have
X n
n
z =
(−1)n−k z k .
(2.18)
k
k

These equations are valid also when n is a negative integer; in that case both infinite series converge
for |z| > |n|. Notice that (2.10) and (2.18) tell us how to convert back and forth between ordinary
powers and factorial powers.
Let’s turn now to the nineteenth century. Kramp [29] decided to explore a slightly generalized
type of factorial power, for which he used the notations

an|r = a(a + r) . . . a + (n − 1) r
(2.19)
a−n|r = 1/(a − r)(a − 2r) . . . (a − nr)

(2.20)

when n is a positive integer. Then he considered the expansion
an|r = an + n k 1. an−1 r + n k 2. an−2 r 2 + · · · ,

(2.21)

where the coefficients n k m are independent of a and r [29, §§539–540]; thus, n k m was his notation
 n 
for n−m
. He obtained [29, § 557] a series of formulas equivalent to
 m−1
X  n− k  n 
n
m
=
,
m+1−k n−k
n−m


(2.22)

k=0

 n 
thereby giving a new proof that n−m
is a polynomial in n of degree 2m. This proof, independent
of his earlier formulas (2.7) and (2.8), works for both positive and negative values of n.
Kramp implicitly understood the duality principle (2.4), in the sense that he regarded the
 

coefficients nk and nk as the positive and negative portions of a doubly infinite array of numbers.
In fact, he assumed that equation (2.21) would hold for arbitrary real values of n. He differentiated
ax|r with respect to x and gave formal derivations of several interesting series. However, his
expansion (2.21) is equivalent to
X n 
zn =
z n−k
(2.23)
n−k
k

a slight variation of (2.17) , and this series is not always convergent for noninteger n. We can
show, for example, that


1/2
> k!/7k
for infinitely many k ;
(2.24)
1/2 − k
14

hence (2.23) diverges for all z when n = 1/2. Kramp lived before the days when convergence of
P
infinite series was understood. (See [29, § 574], where he says that the divergent series k>0 Bk y k/k
is “très convergente pour peu que y soit une petite fraction”!)
Several other nineteenth-century authors developed the theory of factorial powers, notably
Andreas von Ettingshausen [6], Ludwig Schläfli [41, 48], and Oskar Schlömilch [49], who used the
respective notations
n

Fm ,

n

Am ,

n

and

Cm

 n 
for the coefficients n−m
. All of these authors considered both positive and negative integers n.

 −n 
Thus, for example, Ettingshausen’s notation for a Stirling number such as n+m
= −n−m
was
n
−n
Fm

(see [6, § 151]).
Incidentally, these works of Kramp and Ettingshausen proved to be important in the history of
mathematical notations. Kramp’s book introduced the notation n! for factorials [29, pages V and

219], and Ettingshausen’s book introduced the notation nk for binomial coefficients [6, page 30].
P
Ettingshausen wrote his book shortly after Fourier [8] had invented -notation for sums; EttingsPb
hausen tried a German variation, writing Ska,b for what has evolved into k=a . He also wrote
(a, r)n for Kramp’s an|r ; thus, for example, Ettingshausen [6, § 153 and § 156] gave the equations
w n

(a, d)n = S Fw an−w dw
0

r

an = S (−1)r

and

0

−n+r
Fr (a, d)n−r

dr

as equivalents of Kramp’s (2.21) and Stirling’s (2.10). He presented Kramp’s (2.22) in the form
v

n
Fv

w

= S

0,v−1



n−w
v+1−w



n

Fw ,

and remarked [6, § 154] that this holds for both negative and positive n. Ettingshausen had
related the F coefficients to sums of products of combinations with and without repetition; thus he
implicitly confirmed (2.9).
The first person to attach Stirling’s name to the numbers we now call Stirling numbers was
Niels Nielsen in 1904 [40]; he said that this new nomenclature had been suggested to him by T. N.
Thiele. (The numbers may have been studied before Stirling’s time; for example, I once found the
 
values of nk for 1 ≤ n ≤ 7 in some unpublished manuscripts of Thomas Harriot, dating from about
1600, in the British Museum [26, page 241]. But Stirling almost surely deserves the credit for being
 

first to deduce nontrivial facts about nk and nk .)
 n 
, which he called a “Stirling number of rank n”; and he wrote Ckn for
Nielsen wrote Cnk for n−k
n+k−1
, which he called a “Stirling number of rank −n.” (He should really have defined its rank
n−1
to be 1−n). In equation (41) of his paper, Nielsen obtained a rigorous proof of the duality law (2.4);
but he had to state it in a peculiar way, because he had defined Cnk and Ckn only for nonnegative n
and k. Thus, he could not write Cnk = Ck1−n ; he had to say instead that fk (n) = gk (1 − n), where
15

fk (n) and gk (n) were the polynomials defined by Cnk and Ckn . Tweedie [54] expressed (2.4) with
similar circumlocutions.
 

When Jordan took up Stirling numbers [22], he wrote Snk for (−1)n−k nk and Skn for nk . He
does not seem to have known the duality law (2.4), probably because he had learned about Stirling
numbers from Nielsen’s book [41], which omitted some of the details in Nielsen’s paper [40]. And
as far as I know, the duality law largely disappeared from mathematicians’ collective consciousness
during most of the twentieth century; it seems to have been mentioned explicitly only in a few
scattered places: (1) Hansraj Gupta, “working in a small township away from what was then the
only University in the Panjab” [18, page 5], rediscovered Stirling numbers and Stirling duality by
himself, in the early 1930s. This became part of his Ph.D. dissertation [17], and he included it in
a book on number theory prepared many years later [18, Chapter 5]. (2) H. W. Gould [12] was
probably the first twentieth-century mathematician to observe that we can use the polynomials
 n 
 n
and
to extend the domain of Stirling numbers to negative values of n. Gould’s way of
n−k
n−k
writing (2.4) was S1 (−n − 1, k) = S2 (n, k); and shortly thereafter [13], he mentioned the equivalent
formula
−n
= (−1)n−k Skn ,
S−k
in Jordan’s notation. (3) R. V. Parker [42], like Gupta, displayed both of Stirling’s triangles
in tandem, presenting them in a single table as Logan later did. (4) In 1976, Ira Gessel and
Richard Stanley investigated some of the deeper structure underlying the Stirling polynomials
 n 

. They noted in particular [11, equation (3)] that fk (−n) = gk (n).
and gk (n) = n−k
fk (n) = n+k
n
This fact is equivalent to the duality law (2.4).
Stanley had discovered a beautiful theorem in his Ph.D. thesis a few years earlier [50, Propostion 13.2(i)], now called the reciprocity theorem for order polynomials: If P is any finite partially
ordered set, let Ω(P, n) be the number of order-preserving mappings from P into the totally ordered
set {1, 2, . . . , n}; and let Ω(P, n) be the number of such mappings that are strictly order-preserving.
Thus, if x ≺ y in P , the mappings f enumerated by Ω(P, n) must satisfy f (x) ≤ f (y), and the mappings g enumerated by Ω(P, n) must satisfy g(x) < g(y). Stanley’s theorem states that, in general,
we have f (−n) = (−1)p g(n), where p is the number of elements of P . For example, if P consists of
p isolated points with no order constraints whatever, we have Ω(P, n) = Ω(P, n) = np . And if the

points of P are themselves totally ordered, then Ω(P, n) is n+p−1
, the number of combinations
p

of n things p at a time with repetitions permitted, and Ω(P, n) is np , the combinations without
repetition. In both cases we have Ω(P, −n) = (−1)p Ω(P, n).
I showed Stanley the first draft of this note and asked him whether the Stirling duality law
(2.4) could be derived as a special case of his general reciprocity law. Sure enough, he replied that
Gessel had noticed a simple way to do exactly that, shortly after the paper [11] was written. Let
Pk be the partial order on 2k points typified by
q
@
q
@q
@
q
@q
@
@q
q
@
P4 = @q
;
16

then

X

Ω(Pk , n) =

[x1 ≤ · · · ≤ xk ][x1 ≥ y1 ] . . . [xk ≥ yk ]

1≤x1 ,...,xk ,y1 ,...,yk ≤n

X

=

[x1 ≤ · · · ≤ xk ] x1 . . . xk ,

1≤x1 ,...,xk ≤n

and

X

Ω(Pk , n) =

[x1 < · · · < xk ][x1 > y1 ] . . . [xk > yk ]

1≤x1 ,...,xk ,y1 ,...,yk ≤n

X

=

[x1 < · · · < xk ](x1 − 1) . . . (xk − 1)

2≤x1 ,...,xk ≤n

X

=

[x1 < · · · < xk ] x1 . . . xk .

1≤x1 ,...,xk ≤n−1


Thus the sums are respectively Γk (n) and Ck (n − 1); by (2.6) we have Ω(Pk , n) = n+k
and
n
 n 
Ω(Pk , n) = n−k , hence (2.4) is indeed an instance of Stanley’s theorem.
 
Now we are ready to discuss the second reason why I became convinced that nk is the right
symbolism for these coefficients after I had translated Logan’s memo [34] into that notation: We
 n 
know that n−k
is a polynomial in n, when k is an integer; hence, as Kramp knew, we can sensibly
 α 
define the quantity α−k
for arbitrary complex α and integer k, using that same polynomial.
Then—and here comes the punch line—Logan noticed that the fundamental equations (2.17) and
(2.18) generalize to asymptotic formulas, valid for arbitrary exponents α: If z → ∞ and if m is any
nonnegative integer, we have

m 
X
α
z =
z α−k + O(z α−m−1 ) ;
α−k

(2.25)


m 
X
α
(−1)k z α−k + O(z α−m−1 ) .
z =
α−k

(2.26)

α

k=0

α

k=0

(See [15, exercise 9.44]; equation (2.25) is a correct way to formulate Kramp’s divergent series (2.23).
These equations are special cases of a still more general result proved by Tricomi and Erdélyi [53, 9].)
The easily remembered expansions in (2.25) and (2.26) were quite a revelation to me. I had often
spent time laboriously calculating approximations to ratios such as z 1/2 = Γ(z + 1/2)/Γ(z), the
hard way: I took logarithms, then used Stirling’s approximation, and then took exponentials. But
equations (2.25) and (2.26) produce the answer directly.
Moreover Stirling’s original identity (2.10) can be generalized in a similar way: If α is any
complex number, we have
X α 
z =
z α−k ,
α−k
α

Re(z) > 0 .

(2.27)

k

When I wrote the first draft of this note, I knew only that the series (2.27) was convergent, and that
it was asymptotically correct as z → ∞; so I conjectured that equality might hold. Soon afterward,
17

B. F. Logan found the following proof (although he naturally stated it in his own notation): Suppose
first that Re(α) < 1. Then we have the well known identity
Z ∞
1
α−1
z
=
e−zt t−α dt ,
Re(z) > 0 ,
(2.28)
Γ(1 − α) 0
and we can substitute e−t = 1 − u to get
z

α−1

1
=
Γ(1 − α)

Z

1
z−1 −α

(1 − u)

0

u



1
1
ln
u
1−u

−α

du .

 α


1
Now it turns out that the powers of u1 ln 1−u
generate the Stirling numbers α−k
= k−α
, in the
−α
sense that
−α X 


α
1
uk
1
ln
,
(2.29)
=
u
1−u
α − k (k − α) . . . (1 − α)
k

a series that converges for |u| < 1 see [15, equations (6.45), (6.53), (7.50)] . Therefore
Z 1
X α 
z
z =
(1 − u)z−1 uk−α du
α − k Γ(k + 1 − α) 0
k
X α 
X α 
z!
Γ(z + 1)
=
,
=
α − k (z + k − α)!
α − k Γ(z + 1 + k − α)
α

k

k

and (2.27) is verified when Re(α) < 1. To complete the proof, we need only show that (2.27) holds
for α + 1 if it holds for α; but this is easy, because
X α 
α+1
z
=
z · z α−k
α−k
k
X α 

=
z α+1−k + (α − k)z α−k
α−k
k

X α 
X
α
α+1−k
=
z
+
(α + 1 − k)z α+1−k
α−k
α+1−k
k
k


X
α+1
=
z α+1−k
α+1−k
k

by the basic recurrence equation (2.3).
Notice that in all of the general identities (2.25)–(2.27), as in the original formulas (2.10),
(2.17), and (2.18) that inspired them, the lower index within the braces or brackets is the same
as the exponent of z. This makes the relations easy to remember, by analogy with the binomial
theorem
X α 
α
(1 + z) =
zk ,
when |z| < 1 .
(2.30)
k
k

Some readers will have been thinking, “This all looks fairly plausible, but unfortunately Knuth
 
is overlooking a key point that ruins the whole proposal: We can’t use the notation nk for Stirling
18

numbers, because it has already been used for more than a century as the standard notation for
Gauss’s generalized binomial coefficients.”
Well, there is a down side to every good idea, but this objection is not really severe. For
one thing, the standard notation for Gaussian binomial coefficients involves a hidden parameter q,
and it’s not unusual for modern researchers to make transformations that change q. Therefore
 
Gauss’s notation is incomplete, and Andrews (for example) has used the notation nk q2 for the
Gaussian coefficient with q 2 as the hidden parameter [2, page 49]. Such examples suggest that it is

appropriate to denote Gaussian binomials as nk q , especially since they reduce to ordinary binomials

when q = 1. This notation also generalizes nicely to such things as Fibonomial coefficients nk F ;
 
 
see [27]. We can then reserve the notation nk q for a q–generalization of nk . (The reverse strategy
was unfortunately adopted in [14].) Secondly, I do not believe that any existing mathematical
works, including books like [2] which use Gaussian coefficients extensively, would become seriously
 

cluttered if the Gaussian nk were changed everywhere to nk q . Even so, such changes are not
necessary; there is obviously no harm in beginning a mathematical paper or a book chapter or an
 
entire book with a statement to the effect that “ nk will denote a Gaussian binomial coefficient
with parameter q in what follows.” All notation can be redefined for special purposes. Therefore
 
Stirling number enthusiasts are not encroaching on Gaussian territory when they write nk , if they
also mumble something about Stirling in order to set the context.

 
One further point is worth noting in conclusion: As soon as the notations nk and/or nk are
adopted, there will no longer be a need to speak about Stirling numbers “of the first and second
kind,” except as a concession to history. Nielsen wrote a superb book [41], but he did the world a
disservice by originating the Erster Art and Zweier Art terminology, because that terminology has

no mnemonic value and is historically inaccurate. Stirling introduced the numbers nk first and
 
brought in nk second. Indeed, practical applications have always tended to involve the numbers

n
n
counterparts. It seems far better to speak of nk as a Stirling
k
k much more often than their
 
subset number, and to call nk a Stirling cycle number. Then the names are tied to intuitive,
student-friendly concepts, not to arbitrary and offputting concepts of the kth kind.
Acknowledgments. I am extremely grateful for comments received from John Ewing, Philippe
Flajolet, Adriano Garsia, B. F. Logan, Andrew Odlyzko, Richard Stanley, and H. S. Wilf, without
which these notes would have been substantially poorer.
References
[1] Milton Abramowitz and Irene A. Stegun, editors, Handbook of Mathematical Functions (U.S.
National Bureau of Standards, 1964).
[2] George E. Andrews, The Theory of Partitions, Encyclopedia of Mathematics and its Applications, volume 2 (Reading, Mass.: Addison–Wesley, 1976).
[3] Anonymous and S . . . , “Bemerkungen zu den Aufsatze überschrieben, ‘Beweis der Gleichung
00 = 1 nach J. F. Pfaff,’ im zweiten Hefte dieses Bandes, S. 134,” Journal für die reine und
angewandte Mathematik 12 (1834), 292–294.
[4] Charles Babbage, Passages from the Life of a Philosopher (London, 1864). Reprinted in
Charles Babbage and his Calculating Engines, edited by Philip Morrison and Emily Morrison
19

(New York: Dover, 1961).
[5] Augustin-Louis Cauchy, Cours d’Analyse de l’Ecole Royale Polytechnique (1821).
Œuvres Complètes, series 2, volume 3.

In his

[6] Andreas v. Ettingshausen, Die combinatorische Analysis (Vienna, 1826).
[7] Philippe Flajolet and Andrew Odlyzko, “Singularity analysis of generating functions,” SIAM
Journal on Discrete Mathematics 3 (1990), 216–240.
[8] J. Fourier, “Refroidissement séculaire du globe terrestre,” Bulletin des Sciences par la Société
philomathique de Paris, series 3, 7 (1820), 58–70. Reprinted in Œuvres de Fourier, volume 2,
271–288.
[9] C. L. Frenzen, “Error bounds for asymptotic expansions of the ratio of two gamma functions,”
SIAM Journal on Mathematical Analysis 18 (1987), 890–896.
[10] Adriano M. Garsia, “On the ‘maj’ and ‘inv’ q-analogues of Eulerian polynomials,” Linear and
Multilinear Algebra 8 (1979), 21–34.
[11] Ira Gessel and Richard P. Stanley, “Stirling polynomials,” Journal of Combinatorial Theory
A24 (1978), 24–33.
[12] H. W. Gould, “Stirling number representation problems,” Proceedings of the American Mathematical Society 11 (1960), 447–451. For subsequent work, see his review of [42] in Mathematical Reviews 49 (1975), 885–886.
[13] H. W. Gould, “Note on a paper of Klamkin concerning Stirling numbers, This Monthly 68
(1961), 477–479.
[14] H. W. Gould, “The q–Stirling numbers of first and second kinds,” Duke Mathematical Journal
28 (1961), 281–289.
[15] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik, Concrete Mathematics (Reading,
Mass.: Addison–Wesley, 1989).
[16] Daniel H. Greene and Donald E. Knuth, Mathematics for the Analysis of Algorithms, second
edition (Boston: Birkhäuser, 1981). Third edition, 1990.
[17] H. Gupta, Symmetric Functions in the Theory of Integral Numbers, Lucknow University Studies 14 (Allahabad: Allahabad Law Journal Press, 1940).
[18] Hansraj Gupta, Selected Topics in Number Theory (Tunbridge Wells, England: Abacus Press,
1980).
[19] Johann G. Hagen, Synopsis der Höheren Mathematik 1 (Berlin, 1891).
[20] G. H. Hardy and E. M. Wright, An Introduction to the Theory of Numbers (Oxford, Clarendon
Press, 1938). Fifth edition, 1979.
[21] Kenneth E. Iverson, A Programming Language (New York: Wiley, 1962).
[22] Charles Jordan, “On Stirling’s Numbers,” Tôhoku Mathematical Journal 37 (1933), 254–278.
[23] Charles Jordan, Calculus of Finite Differences (Budapest, 1939). Third edition, 1965.
[24] Mark Kac, Statistical Independence in Probability, Analysis and Number Theory, Carus Mathematical Monographs 12 (Mathematical Association of America, 1959).
[25] Donald E. Knuth, Fundamental Algorithms (Reading, Mass.: Addison –Wesley, 1968).
20

[26] Donald E. Knuth, review of History of Binary and Other Nondecimal Numeration by Anton
Glaser, Historia Mathematica 10 (1983), 236–243.
[27] Donald E. Knuth and Herbert S. Wilf, “The power of a prime that divides a generalized
binomial coefficient,” Journal für die reine und angewandte Mathematik 396 (1989), 212–219.
[28] Christian Kramp, “Coefficient des allgemeinen Gliedes jeder willkührlichen Potenz eines Infinitinomiums; Verhalten zwischen Coefficienten der Gleichungen und Summen der Produkte
und der Potenzen ihrer Wurzeln; Transformation und Substitution der Reihen durch einander,”
in Der polynomische Lehrsatz, edited by Carl Friedrich Hindenburg (Leipzig, 1796), 91–122.
[29] C. Kramp, Élémens d’arithmétique universelle (Cologne, 1808).
[30] Leopold Kronecker, “Ueber bilineare Formen,” Journal für die reine und angewandte Mathematik 68 (1868), 273–285.
[31] Leopold Kronecker, Vorlesungen Über de Theorie der Determinanten, edited by Kurt Hensel,
volume 1 (Leipzig: Teubner, 1903).
x

[32] Guillaume Libri, “Note sur les valeurs de la fonction 00 ,” Journal für die reine und angewandte
Mathematik 6 (1830), 67–72.
[33] Guillaume Libri, “Mémoire sur les fonctions discontinues,” Journal für die reine und angewandte Mathematik 10 (1833), 303–316.
[34] B. F. Logan, “Polynomials related to the Stirling numbers,” AT&T Bell Labs internal technical
memorandum, August 10, 1987.
[35] Imanuel Marx, “Transformation of series by a variant of Stirling numbers,” This Monthly
 

n

n−k n+1
69 (1962), 530–532. His nk is my n+1
k+1 ; his k is my (−1)
k+1 .
[36] A. F. Möbius, “Beweis der Gleichung 00 = 1, nach J. F. Pfaff,” Journal für die reine und
angewandte Mathematik 12 (1834), 134–136.
[37] Douglas H. Moore, Heaviside Operational Calculus: An Elementary Foundation (New York:
American Elsevier, 1971).
[38] Eugen Netto, Lehrbuch der Combinatorik (Leipzig, 1901). Second edition, with additions by
Thoralf Skolem and Viggo Brun, 1927.
[39] Nicole, “Méthode pour sommer une infinité de Suites nouvelles, dont on ne peut trouver les
Sommes par les Méthodes connuës,” Mémoires de l’Academie Royale des Sciences (Paris, 1727),
257–268.
[40] Niels Nielsen, “Recherches sur les polynomes et les nombres de Stirling,” Annali di Matematica
pura ed applicata, series 3, 10 (1904), 287–318.
[41] Niels Nielsen, Handbuch der Theorie der Gammafunktion (Leipzig: Teubner, 1906).
[42] R. V. Parker, “The complete polynomial grid,” Matematichki Vesnik 10 (25) (1973), 181–203.
[43] L. Pochhammer, “Ueber hypergeometrische Functionen nter Ordnung,” Journal für die reine
und angewandte Mathematik 71 (1870), 316–352.
[44] Hillel Poritsky, “Heaviside’s operational calculus—its applications and foundations,” This
Monthly, 43 (1936), 331–344.
[45] S . . . , “Sur la valeur de 00 ,” Journal für die reine und angewandte Mathematik 11 (1834),
21

272–273.
[46] Antonio Salmeri, “Introduzione alla teoria dei coefficienti fattoriali,” Giornale di Matematiche
 
 n+1 
di Battaglini 90 (1962), 44–54. His nk is my n+1−k
.

[47] Schlaeffli, “Sur les coëfficients du développement du produit 1.(1+ x)(1+ 2x) . . . 1+ (n − 1) x
suivant les puissances ascendantes de x,” Journal für die reine und angewandte Mathematik
43 (1852), 1–22.
[48] Schläffli, “Ergänzung der Abhandlung über die Entwickelung des Products 1.(1 + x)(1 + 2x)
n

(1 + 3x) . . . 1 + (n − 1) x = Π(x) in Band XLIII dieses Journals,” Journal für die reine und
angewandte Mathematik 67 (1867), 179–182.
[49] O. Schlömilch, “Recherches sur les coefficients des facultés analytiques,” Journal für die reine
und angewandte Mathematik 44 (1852), 344–355.
[50] Richard P. Stanley, Ordered Structures and Partitions, Memoirs of the American Mathematical
Society 119 (1972).
[51] Richard P. Stanley, Enumerative Combinatorics, volume 1 (Belmont, Calif.: Wadsworth, 1986).
[52] James Stirling, Methodus Differentialis (London, 1930). English translation, The Differential
Method, 1749.
[53] F. G. Tricomi and A. Erdélyi, “The asymptotic expansion of a ratio of gamma functions,”
Pacific Journal of Mathematics 1 (1951), 133–142.
[54] Charles Tweedie, ‘The Stirling Numbers and Polynomials,” Proceedings of the Edinburgh
Mathematical Society 37 (1918), 2–25.
[55] Karl Weierstrass, “Zur Theorie den eindeutigen analytischen Functionen,” Mathematische Abhandlungen der Akademie der Wissenschaften zu Berlin (1876), 11–60; reprinted in his Mathematische Werke, volume 2, 77–124. (Florian Cajori, in History of Mathematical Notations 2,
cites unpublished papers of 1841 and 1859 as the first occurrences of the notation |z|; however,
those papers were not edited for publication until 1894, and they use the notation without
defining it, so their published form may differ from Weierstrass’s original.)
[56] Christian Wiener, “Geometrische und analytische Untersuchung der Weierstrassschen Function,” Journal für die reine und angewandte Mathematik 90 (1881), 221–252.

22

Note to printer: A few special symbols are used herein.
S is uppercase script S
C is uppercase Fraktur C
S is uppercase Fraktur S
A is uppercase script A
F is uppercase script F
R is uppercase script R
k is lowercase Fraktur k