Galois Connections and Fixed Point Calculus Roland Backhouse∗ October 15, 2001 Abstract Fixed point calculus is about the solution of recursive equations de˛ned by a monotonic endofunction on a partially ordered set. This tutorial presents the basic theory of ˛xed point calculus together with a number of applications of direct relevance to the construction of computer programs. The tutorial also presents the theory and application of Galois connections between partially ordered sets. In particular, the intimate relation between Galois connections and ˛xed point equations is amply demonstrated. ∗ School of Computer Science and Information Technology, University of Nottingham, Nottingham NG8 1BB, England 1 2 Contents 1 Introduction 1.1 Fixed Point Equations . 1.2 Languages . . . . . . . . 1.3 Functions . . . . . . . . 1.4 Datatypes . . . . . . . . 1.5 Galois Connections . . . 1.6 Basic Assumptions . . . 1.7 Issues and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Galois Connections — Introductory 2.1 Simple Examples . . . . . . . . . . 2.2 The Floor Function . . . . . . . . . 2.2.1 Properties of Floor . . . . . 2.2.2 Rounding O¸ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 4 5 6 6 6 7 . . . . 8 8 11 12 13 3 Abstract Properties of Galois Connections 3.1 Combining Galois Connections . . . . . . 3.2 Cancellation . . . . . . . . . . . . . . . . . 3.3 Pointwise Ordering . . . . . . . . . . . . . 3.4 Poset Isomorphism . . . . . . . . . . . . . 3.5 Uniqueness of Adjoints . . . . . . . . . . . 3.6 Universal Property . . . . . . . . . . . . . 3.7 Commutativity Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 16 18 18 19 21 21 23 4 Existence of Galois Connections 4.1 In˛ma and Suprema . . . . . . . . . . . . 4.1.1 In˛ma and Completeness . . . . . 4.1.2 Shape Posets . . . . . . . . . . . . 4.1.3 Suprema and Cocompleteness . . . 4.2 Extremum Preservation Properties . . . . 4.3 Existence Theorem . . . . . . . . . . . . . 4.3.1 Pair Algebras . . . . . . . . . . . . 4.3.2 Examples of Pair Algebras . . . . . 4.3.3 The Existence Theorem . . . . . . 4.3.4 Existential Extremum Preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 26 26 28 29 30 32 32 33 35 38 3 5 Unity of Opposites 39 5.1 The Pre˛x Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 The Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6 Fixed Points 6.1 Pre˛x Points . . . . . . . . . . . . . 6.2 A First Example . . . . . . . . . . . 6.3 Kleene Algebra . . . . . . . . . . . . 6.3.1 The Axioms . . . . . . . . . . 6.3.2 Re‚exive, Transitive Closure . . . . . . . . . . . . . . . 7 Fixed Point Calculus 7.1 Basic Rules . . . . . . . . . . . . . . . . . 7.2 Fusion . . . . . . . . . . . . . . . . . . . . 7.2.1 Applications . . . . . . . . . . . . 7.3 Uniqueness . . . . . . . . . . . . . . . . . 7.4 Parameterised Pre˛x Points . . . . . . . . 7.4.1 The Abstraction Rule . . . . . . . 7.4.2 The Beautiful Theorem . . . . . . 7.4.3 Cocontinuity of Iteration . . . . . 7.5 Mutual Recursion . . . . . . . . . . . . . . 7.6 An Illustration | Arithmetic Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 45 50 53 53 56 . . . . . . . . . . 59 59 61 63 67 69 69 71 72 74 77 8 Further Reading 80 9 Solutions to Exercises 85 4 1 1.1 Introduction Fixed Point Equations Formulating and solving equations is a fundamental mathematical activity and whole areas of mathematics are devoted to the methods for solving particular classes of equations | think, for example, of the di¸erential calculus developed in order to solve di¸erential equations. This chapter is about a class of equations called ˛xed point equations that is particularly important in computing. Fixed point equations have a very simple form, namely x = f.x , where f is a given function and x is the unknown. A solution to the equation is called a \˛xed point" of the function f because applying f to a solution leaves the value \˛xed", i.e. unchanged. 1.2 Languages In the literature on computing science, ˛xed point equations are most often called recursive equations because the unknown x \recurs" on the right side of the equation. Recursion was ˛rst used extensively in computing science in the now-classic Algol 60 report [Nau63] which de˛ned the programming language Algol 60. The Algol 60 report introduced so-called \Backus-Naur Form" to de˛ne the syntax of the language. Here is a small, simpli˛ed extract from the language de˛nition. Expression ::= Expression + Expression | ( Expression ) | Variable This de˛nes (arithmetic) expressions by recursion. The symbol \ ::= " can be read simply as \is" or, indeed, as an equality symbol. The symbol \ | " is read as \or". Terms enclosed within angle brackets (thus Expression and Variable ) are called \nonterminals" |these are the entities being de˛ned| and the remaining symbols (here \+" and the two parentheses \(" and \)") are called the \terminal" symbols |these are the symbols that form words in the language being de˛ned| . In this way the de˛nition of Expression is read as: An expression is either an expression followed by the symbol \+" followed by an expression, or a parenthesised expression, or a variable. Assuming that x and y are variables, the following are examples of expressions: x+y , (x) , x , x+x+y , y + (x+y) . 5 The de˛nition of Expression is an example of a ˛xed point equation in which the unknown x is Expression and the function f is a function from languages (sets of words) to languages. Language theory provides many good examples of ˛xed point calculus; we use it frequently as a source of examples. 1.3 Functions Recursion is used extensively in mathematics for the de˛nition of functions. An elementary example is the factorial function, which is speci˛ed by the equation in fac , (1) fac.0 = 1 fac.n = n ∗ fac.(n−1), for n > 0. The de˛nition of the factorial function also has the form of a ˛xed point equation although this is not so easy to see as in the previous example. To verify that this is indeed the case we need to rewrite the de˛nition in the form fac = · · · . Using the notation x: x∈Type: Exp for a function that maps a value x of type Type to the value given by expression Exp , we have: fac = n: n∈IN: if n = 0 then 1 else n∗fac.(n−1) . Now, abstracting from fac on the right side of this equation, de˛ne the function F by F = f: f ∈ IN←IN: n: n∈IN: if n = 0 then 1 else n∗f.(n−1) . This de˛nes F to be an endofunction1 on functions of type IN←IN , the type of the factorial function. That is, the function F maps a function to natural numbers from natural numbers to a function to natural numbers from natural numbers. For example, applying F to the successor function n:: n+1 we get F.n:: n+1 = n:: if n = 0 then 1 else n∗((n−1)+1) . (For brevity we have omitted the type information on n .) Simplifying, F.n:: n+1 = n:: if n = 0 then 1 else n2 . The characteristic feature of the factorial function is that its de˛nition demands that fac = F.fac . That is, fac is a ˛xed point of F . 1 An endofunction is a function whose target and source are the same. 6 1.4 Datatypes Recursion is also used in the de˛nition of datatypes in programming languages. A de˛nition of, for example, lists in a functional programming language looks like the following: List a = Nil | Cons a (List a) This states that a list of a ’s (where a stands for an arbitrary type) is either Nil or the operator Cons applied to a value of type a and a list of a ’s. The de˛nition of List is not strictly a ˛xed-point equation. Rather than expressing the equality of List a and Nil | Cons a (List a) (misleadingly suggested by the equals sign) the declaration expresses an isomorphism between the type List a and the type 11+a×(List a) , the disjoint sum (\ + ") of the unit type (a set containing exactly one element, here denoted by \ 11 ") and the cartesian product (\ × ") of the type a and the type List a . 1.5 Galois Connections Fixed point calculus is only of value to computing science if many of the things we want to compute are speci˛ed by ˛xed point equations. Fortunately, this is very much the case. Indeed, many speci˛cations are directly expressed in terms of the solution of a ˛xed point equation. Moreover, a large number of additional problems have speci˛cations that involve a ˛xed point equation, albeit less directly. In such cases it is often the case that the speci˛cation can be transformed to one that expresses the solution to the speci˛cation directly in terms of a ˛xed point equation. The key to such transformations is the notion of a Galois connection. Galois connections are of interest in their own right, even when not related to the theory of ˛xed point equations. Their typical use is to de˛ne one, relatively complex, function in terms of another, relatively simple, function. A number of the examples we present are of very elementary and well-known functions; even so, readers not familiar with Galois connections may ˛nd the calculations we present delightfully refreshing! Galois connections are also basic to the fundamental notions of in˛mum and supremum. Section 4.1 discusses these notions; in particular, the notion of suprema and in˛ma of a certain \shape" is discussed in detail. 1.6 Basic Assumptions The simple form of a ˛xed-point equation means that it is a very general notion which captures a very broad range of computational problems. But a theory that is too general is usually also too weak to be of practical value. In developing a useful ˛xed-point 7 calculus we need to impose some non-trivial mathematical properties on the ˛xed point equations we consider. Here we require two properties. The ˛rst property we demand is that the domain of solutions be a complete lattice. That is the domain of solutions is a partially ordered set such that suprema and in˛ma of any shape always exist. This requirement clearly holds in the case of languages. When a language is being de˛ned it is required that the \alphabet" of the language is clearly stated. The alphabet of a language is just a ˛nite set of symbols, and words in the language are ˛nite sequences of symbols in the alphabet. If Σ denotes an alphabet then Σ∗ is the notation used to denote the set of all words in that alphabet (that is, the set of all ˛nite sequences |including the empty sequence| of symbols in the alpahabet). A language is thus a subset of Σ∗ and, as is well known, the set of all subsets of a given set is a complete lattice under the subset ordering. Although there is no ordering relation explicitly mentioned in the de˛nition of fac (other than the equality relation, which is pretty uninteresting as an ordering relation) the theory we develop still applies by the simple device of viewing functions as special cases of binary relations. A binary relation between two sets A and B is a subset of the cartesian product A×B and the set of all subsets of A×B forms a complete lattice. This will be our solution domain when solving a ˛xed-point equation that is intended to de˛ne a function to set A from set B . The second requirement we impose is that the function f in the given ˛xed point equation be monotonic in the ordering on the solution domain. This requirement is also clearly met in the case of language de˛nitions. The function X:: {b} ∪ {a}·X·{c} is typical of the sort of functions that are used in language de˛nitions. (A raised in˛x dot denotes the concatenation operator extended to sets.) It is the composition of three functions, the function X:: {b}∪X , which adds the word b to a language, the function X:: {a}·X , which concatenates a at the beginning of every word in a given language, and X:: X·{c} , which concatenates c at the end of every word in a given language. These functions are all clearly monotonic, and thus so is their composition. 1.7 Issues and Applications An arbitrary ˛xed point equation may have no solutions, it may have exactly one solution, or it may have more than one solution. An important consequence of our two assumptions is that the ˛xed point equations we consider always have at least one solution. Indeed, the set of solutions is itself a complete lattice. In particular, a monotonic function on a complete lattice has a least and a greatest ˛xed point. Moreover, it has a unique ˛xed point if (and only if) the least and great ˛xed points coincide. A major element of these lecture notes is how to manipulate ˛xed point equations. Finding out how to express the same quantity by di¸erent equations is important to 8 understanding. Another important element is providing general, practical, su‹cient conditions for a ˛xed point equation to have a unique solution. These foundational elements of ˛xed point calculus are discussed in depth in section 7. An example of the sort of issue tackled in section 7 is the use of recursion to de˛ne a number of values simultaneously. The terminology \mutual recursion" is used. In the de˛nition of Algol 60, for example, this is what the de˛nition of Expression really looks like: Expression ::= Expression + Term | Term Term ::= Term × Factor | Factor Factor ::= ( Expression ) | Variable This is still a ˛xed point equation, but it is an equation in the triple of languages ( Expression , Term , Factor ) . An important issue is whether a ˛xed point equation in a vector of values can be solved piecewise and, if so, how. This is the content of theorem 106. Section 6.1 is introductory; section 6.3 is also introductory but, even so, considers an important application of ˛xed point calculus, namely Kleene algebra, which is the algebra of choice, composition and iteration, three indispensible ingredients of any programming language. 2 Galois Connections — Introductory Examples This section begins our discussion of \Galois connections", a concept ˛rst introduced by Oystein Ore in 1944 [Ore44]. The importance of Galois connections lies in their ubiquity and their simplicity. Mathematics and, in particular, computing science abounds with instances of Galois connections. Some simple examples are presented in subsection 2.1 whilst subsection 2.2 continues with a more detailed analysis of the ‚oor and ceiling functions. Later sections discuss more abstract properties of Galois connections. 2.1 Simple Examples A Galois connection involves two preordered sets2 ( A , ≤ ) and ( B ,  ) and two functions, F ∈ A←B and G ∈ B←A . These four components together form a Galois connection i¸ for all x∈B and y∈A the following holds (2) 2 F.x ≤ y ≡ x  G.y . The pair ( A , ≤ ) is a preordered set if ≤ is a binary relation on A that is re‚exive (i.e. x ≤ x for all x ) and transitive (i.e. x ≤ y ∧ y ≤ z ⇒ x ≤ z for all x , y and z ). 9 This compact de˛nition of a Galois connection was ˛rst introduced in [Sch53]. We refer to F as the lower adjoint and to G as the upper adjoint. (Many texts use the names \left" and \right" where we use \lower" and \upper". Our terminology is in line with [GHK + 80] which is a standard reference on the theory presented here.) Lots of examples of Galois connections can be given. In the ˛rst instance, examples can be given by observing that two inverse functions are Galois connected. Suppose A and B are two sets and F ∈ A←B and G ∈ B←A are inverse functions. Then their being inverse can be expressed by the equivalence, for all x∈B and y∈A , F.x = y ≡ x = G.y . This is a Galois connection in which we view A and B as ordered sets where the ordering relation is identical to the equality relation. (Two elements are ordered if and only if they are equal.) That inverse functions are Galois connected is a useful observation | not because a study of Galois connections will tell us something we didn’t already know about inverse functions, but because we can draw inspiration from our existing knowledge of properties of inverse functions to guide us in the study of Galois-connected functions. Further examples of Galois connections are not hard to ˛nd although sometimes they are not immediately evident. One is the connection between conjunction and implication in the predicate calculus: p ∧ q ⇒ r ≡ q ⇒ (p ⇒ r) . Here p , q and r denote predicates and the connection is between the functions ( p ∧ ) and ( p⇒ ). To be more precise, both sets A and B in the de˛nition of a Galois connection are taken to be the set of predicates, and the ordering relation is implication ( ⇒ ). The above formula describes a family of Galois connections, one for each instance of the variable p . An interesting example is provided by negation (not) in the predicate calculus. We have: ¬p ⇒ q ≡ p ⇐ ¬q . The example is interesting because it involves two di¸erent orderings on the same set. Speci˛cally, we can order predicates by implication ( ⇒ ) or by the converse ordering follows-from ( ⇐ ). Predicates ordered by implication and predicates ordered by follows-from are quite di¸erent partially ordered sets. The point is that there are four elements to the de˛nition of a Galois connection: two ordered sets and two functions between the ordered sets. All four elements form an integral part of the de˛nition and mistakes in the exploitation of the properties of Galois connections may occur if one is not clear about all four. 10 One elementary example of a Galois connection that is not immediately evident is a¸orded by the binary maximum operator, max , on real numbers. Denoting it by the in˛x operator ↑ , we have: x↑y ≤ z ≡ x ≤ z ∧ y ≤ z . At ˛rst sight this doesn’t look like a Galois connection principally on account of the conjunction on the righthand side of the equation. We can however identify it as such as follows. First note that max is a binary function, i.e. a function from the cartesian product IR×IR (the set of pairs of real numbers) to the set IR . Now IR is ordered by the at-most relation ( ≤ ), and this relation can be extended pointwise to an ordering relation on IR×IR . Speci˛cally, denoting the relation by ≤2 , we de˛ne (u, v) ≤2 (x, y) ≡ u ≤ x ∧ v ≤ y . Finally, we de˛ne the doubling function, denoted by , by z = (z, z) . Having done so we can rewrite the de˛nition of max as follows: max(x, y) ≤ z ≡ (x, y) ≤2 z . Thus max is a function mapping IR×IR , ordered pointwise by the relation ≤2 , to IR , ordered by the at-most relation ( ≤ ), and is de˛ned by a Galois-connection connecting it to the doubling function. Many predicates are themselves adjoints in a Galois connection, but the fact is rarely recognised. An example is the predicate even on integers (which is true when its argument is divisible by 2 ). Speci˛cally, the integers are ordered by the \is-divisible-by" relation, which we denote here by \ / " (so m / n should be read as \ m is divisible by n " or, more precisely, as \there is an integer k such that n × k = m ") and, for all integers m and booleans b , even.m ⇐ b ≡ m / (if b then 2 else 1) . This is an instance of the general result that a predicate p on the elements of a poset ( A , ≤ ) with greatest element is a lower adjoint in a Galois connection (between ( A , ≤ ) and ( Bool , ⇐ )) if there is some constant a such that p.x ≡ x ≤ a . Speci˛cally, the Galois connection is, for all x∈A and b∈Bool , p.x ⇐ b ≡ x ≤ if b then a else . It is quite surprising how many predicates ful˛ll this condition. The \is-empty" test on a set is an example (a set S is empty if S ⊆ φ ). Another example is the membership 11 test on a set. Fix some value a in a universe of values and consider the predicate p that given a subset S of the universe returns the value of a∈S . Then, since a∈S ≡ {a} ⊆ S , the general result says that p is a lower adjoint in a Galois connection. Indeed, a∈S ⇐ b 2.2 ≡ S ⊇ if b then {a} else φ . The Floor Function This section discusses a simple but illuminating example of a Galois connection. The ‚oor function from reals to integers is described in words as follows: for all real x we take x (read \‚oor x ") to be the greatest integer that is at most x . Formally, this is captured by a simple equivalence. Definition 3 (Floor Function) integers n , For all real x , x is an integer such that, for all n ≤ x ≡ n ≤ x . 2 In the de˛nition of the ‚oor function we use the mathematical convention of not denoting the conversion from integers to reals. It is implicit in the inequality n ≤ x which seems to compare an integer with a real. In fact, what is meant is the comparison of the real value corresponding to n with the real value x . On the right side of the equivalence the at-most relation (\ ≤ ") is between reals whereas on the left side it is between integers. Making explicit both conversions, temporarily adopting a Java-like casting notation, reveals the two adjoint functions in a Galois connection. We have, for all real x , (floor)x is an integer such that for all integers n , n ≤ (floor)x ≡ (real)n ≤ x . So the ‚oor of x is de˛ned by connecting it to the conversion from integers to reals in a simple equivalence. The ‚oor function is an instance of a common phenomenon, namely that many functions are de˛ned to be adjoint to an embedding of one partially ordered set in another. The functions so de˛ned are closure operators 3 . Examples include the re‚exive closure, symmetric closure and transitive closure of a relation. These are adjoints of the embeddings of, respectively, all re‚exive relations, all symmetric relations and all transitive relations in the ordered set of relations (of an appropriate type). 3 Formally, it is not the adjoint that is a closure operator but the composition of the adjoint with the embedding function. This said, it is useful to adopt the mathematical convention of omitting explicit mention of the embedding function and this is what we do from now on. 12 2.2.1 Properties of Floor The ˛rst time that one encounters a de˛nition like de˛nition 3 it can be di‹cult to see how it is used. This section illustrates the basic techniques for exploiting Galois connections. The ˛rst thing we can do is to try to identify some special cases that simplify the de˛nition. Two possibilities present themselves immediately; both exploit the fact that the at-most relation is re‚exive. The equation n ≤ x ≡ n ≤ x is true for all integers n and reals x . Also, x is by de˛nition an integer. So we can instantiate n to x . We get x ≤ x ≡ x ≤ x . The lefthand side that is obtained | x ≤ x | is true, and so the righthand side is also true. That is, x ≤ x . This tells us that the ‚oor function rounds down. It returns an integer that is at most the given real value. The second possibility is to instantiate x to n . This is allowed because every integer is a real. Strictly, however, we are instantiating x to the real value obtained by converting n . We get n ≤ n ≡ n ≤ n . In this case it is the right side of the equivalence that is true. So we can simplify to n ≤ n . Earlier we determined that x ≤ x for all real values x . Instantiating x to n , we get n ≤ n . So, as the at-most relation is anti-symmetric, we have derived that for all integers n n = n . A good understanding of the equivalence operator suggests something else we can do with the de˛ning equation: use the rule of contraposition. The contrapositive of the de˛nition of the ‚oor function is, for all integers n and real x , ¬(n ≤ x) ≡ ¬(n ≤ x) . 13 But ¬(n ≤ m) ≡ m < n . So x < n ≡ x < n . Equally, using that for integers m and n , m < n ≡ m+1 ≤ n , x +1 ≤ n ≡ x < n . Now we can exploit re‚exivity of the at-most relation again. Instantiating n with x +1 and simplifying we deduce: x < x +1 . Recalling that x ≤ x , we have established x ≤ x < x +1 . In words, x is the (unique) integer such that x is at most x and x is less than x +1 . We now ask whether the ‚oor function is monotonic. That is, we want to show that x ≤ y ⇐ x ≤ y . Here we calculate: x ≤ y = { de˛nition 3, x,n := y , x } x ≤ y ⇐ { transitivity of ≤ } x ≤ x ≤ y = { x ≤ x } x≤y . Thus the ‚oor function is indeed monotonic. 2.2.2 Rounding Off Let us now demonstrate how to derive more complicated properties of the ‚oor function. In the process we introduce an important technique for reasoning with Galois connections called the rule of indirect equality. The de˛nition of the language Java prescribes that integer division rounds towards zero. This causes di‹culties in circumstances when it is required to round away from 14 zero. Had the de˛nition been that integer division rounds down then it is easy to implement rounding up, towards zero or away from zero. Rounding up, for example, corresponds to negating, then rounding down, and then negating again. (See exercise 7(a).) The problem is thus how to implement rounding up integer divisions supposing that our programming language always rounds down. In order to express the problem we need the ceiling function. The de˛nition is a dual of the de˛nition of the ‚oor function. For all real x , x is an integer such that, for all integers n , Definition 4 x ≤ n ≡ x ≤ n . 2 We leave it as an exercise to the reader to derive properties of the ceiling function dual to the properties of the ‚oor function derived in section 2.2.1. Rounding down an integer division of positive numbers m and n is expressed by   m n m where is the real division of m and n . Dually, rounding up is expressed by n   m . n Implementing rounding up given an implementation of rounding down amounts to ˛nding suitable values p and q so that    p m = q n  . The values p and q should be expressed as arithmetic functions of m and n (that is, functions involving addition and multiplication, but not involving the ‚oor or ceiling functions). The rule of indirect equality is m = n ≡ ∀k:: k ≤ m ≡ k ≤ n where k , m and n all range over the same poset. Using this rule, we can derive suitable expressions for p and q . Speci˛cally, for arbitrary integer k , we aim to eliminate the ceiling function from the inequality  m k≤ n  15 obtaining an inequality of the form k≤e where e is an arithmetic expression in m and n. We may then conclude that   m e = . n The ˛rst step in the calculation is perhaps the most di‹cult. This is because the definition of the ceiling function, de˛nition 4, provides a rule for dealing with inequalities where a ceiling value is on the lower side of an at-most relation but not when it is on the higher side (which is the case we are interested in). However, recalling our discussion of the ‚oor function, the solution is to consider the contrapositive of the de˛ning equation. Speci˛cally we have, by negating both sides of 4, n < x ≡ n < x . (5) We can now proceed with the derivation:   m k≤ n ≡ { integer arithmetic }   m k−1 < n ≡ { contrapositive of de˛nition of ceiling (rule (5)) m k−1 < n ≡ { arithmetic, n > 0 } } n(k−1) < m ≡ { integer inequalities } n(k−1) + 1 ≤ m ≡ { arithmetic, n > 0 } m+n−1 k ≤ n { de˛nition of ‚oor function: (3)  m+n−1 k ≤ . n ≡ } Here k is arbitrary. So, by indirect equality, we get  (6)   m m+n−1 = n n  . In Java, for example, if it is required to round up the result of dividing m by n one should compute (m+n-1)/n . 16 Exercise 7 Prove the following properties. (Hint: use indirect equality.) (a) − x = −x , (b) (c) (d) x+m = x +m , x/m =  x = x /m (assuming m is a positive integer), √ x . 2 3 Abstract Properties of Galois Connections We now begin a study of the abstract properties of Galois connections. First, we record a number of basic properties following which we give an alternative, equivalent de˛nition of a Galois connection (in fact, the original de˛nition). An exercise presents a third de˛nition. Knowing that a concept can be de˛ned in several di¸erent but equivalent ways is an indicator of its importance as well as helping one to recognise it in other applications. This subsection is the one to read if you want more insight into what it means for one function to be the adjoint of another. Subsequently we look back at the examples in the section preceeding this one in order to see how the abstract properties might have lead us to anticipate those examples. The basic idea is that we seek a generalisation of the rule that inverse functions have inverse algebraic properties. There are many more properties of Galois connections not discussed here. Some of these have been included in the exercises. Proofs of the lemmas and theorems are omitted when they present no real di‹culty. 3.1 Combining Galois Connections In what follows we take ( A , A ) and ( B , B ) to be partially ordered sets 4 and we assume that F ∈ A←B and G ∈ B←A . For such an F and G we recall the following de˛nition. Definition 8 (Galois Connection) ( F , G ) is a Galois connection between the posets ( A , A ) and ( B , B ) means that, for all x∈B and y∈A , F.x A y ≡ x B G.y . 4 The di¸erence between a partially ordered set and a preordered set is that the ordering on a partially ordered set is anti-symmetric. That is, x = y ≡ x  y ∧ y  x for all x and y . The theory of Galois connections can be developed in its entirety for preordered sets but it’s occasionally simpler to assume a partial ordering. Where we do so will be apparent from the fact that we appeal to anti-symmetry. 17 2 In order to make the formulae more readable, we will often drop the subscripts from the orderings. This can be confusing, even though it can usually be deduced which ordering is meant from type considerations. Thus, occasionally we will reintroduce the subscripts. Recall also that F is referred to as the lower adjoint, since it is on the lower side of an ordering, and G as the upper adjoint, since it is on the upper side of an ordering. The names \lower" and \upper" are somewhat arbitrary because orderings can always be turned upside down to make big small and small big. The mathematical jargon for this is that statements about orderings can always be \dualised". In the case of Galois connections we have: Theorem 9 ( F , G ) is a Galois connection between the posets ( A , A ) and ( B , B ) i¸ ( G , F ) is a Galois connection between the posets ( B , B ) and ( A , A ). Proof Immediate from (F.x A y ≡ x B G.y) ≡ (G.y B x ≡ y A F.x) . 2 A result of this is that all statements about one of the adjoints of a Galois connection have a dual statement for the other adjoint. That is, any theorem concerning a lower adjoint gives rise to a theorem about the upper adjoint, since that one is the lower adjoint when we reverse the ordering. So with one proof, we get two theorems. A second, very basic, result is that Galois-connected functions can be composed to form new Galois connections. Theorem 10 If ( F , G ) is a Galois connection between the posets ( A , A ) and ( B , B ) and ( H , K ) is a Galois connection between the posets ( B , B ) and ( C , C ) then ( F•H , K•G ) is a Galois connection between the posets ( A , A ) and ( C , C ). (Here and elsewhere we use the symbol \ • " for function composition.) 2 A ˛nal, very basic, theorem should not be neglected. Theorem 11 ( I , I ), where I denotes the identity function on A , is a Galois connection between the poset ( A , A ) and itself. 2 18 3.2 Cancellation Looking at the de˛nition of a Galois connection, two very obvious properties are the cancellation laws. (12) x  G.F.x for all x∈B , (13) F.G.y  y for all y∈A . We call these \cancellation" laws because they enable one to eliminate occurrences of the adjoints F and G from an expression. They are possibly the most frequently used of all the properties of Galois connections. Using the cancellation laws one can infer that the adjoints F and G are monotonic. The combination of the cancellation laws and monotonicity is an equivalent way of de˛ning the notion of a Galois connection, in fact the way that Ore [Ore44] originally de˛ned the notion. Theorem 14 ( F , G ) is a Galois connection between the posets ( A , A ) and ( B , B ) i¸ the following two conditions hold. (a) For all x∈B and y∈A , x  G.(F.x) and F.(G.y)  y . (b) F and G are both monotonic. 2 De˛nition 8, proposed by J. Schmidt [Sch53], and Ore’s de˛nition, contained in theorem 14, both have their merits. Schmidt’s is easy to remember since it contains only one clause, and lends itself to compact calculation. It is a form of \shunting rule": the game that one plays with it is to shunt occurrences of function F in an expression out of the way in order to expose the function’s argument. After performing some manipulations on the argument, F is shunted back into the picture. (Or, of course, the other way around: function G is shunted temporarily out of the way.) It’s an attractive strategy, requiring little creativity, that is particularly useful in inductive proofs. 3.3 Pointwise Ordering Ore’s de˛nition is most useful when expressed at function level. Let us de˛ne the relation _ on functions of the same type by  _ g ≡ ∀x:: f.x  g.x . f 19 Then we can eliminate the dummies x and y in the cancellation laws to obtain (15) _ G•F IB  and _ IA . F• G  Schmidt’s de˛nition can also be lifted to function level and, in combination with (15), can be used to construct elegant theorems. Speci˛cally, we have: Lemma 16 ( F , G ) is a Galois connection between the posets ( A , A ) and ( B , B ) equivales, for all functions h and k with the same (arbitrary) domain, and ranges respectively B and A , _ k ≡ h _ G•k . F•h  2 In words, (17) (F, G) forms a Galois connection ≡ (F• , G•) forms a Galois connection. A property that is somewhat weaker but in a sense dual to this one is the following: (18) (F, G) forms a Galois connection ⇒ (•G , •F) forms a Galois connection. (Take care to note the switch in the order of F and G .) Speci˛cally, we have: Lemma 19 If ( F , G ) is a Galois connection between the posets ( A , A ) and ( B , B ) then, for all monotonic functions h and k with the same range, and domains respectively B and A , _ k ≡ h _ k•F . h•G  2 3.4 Poset Isomorphism A function f with domain ( A , A ) and range ( B , B ) is a poset monomorphism if, for all x , y in A , x A y ≡ f.x B f.y . A poset isomorphism is a surjective poset monomorphism. In this section we prove the following lemma, which is an important component of the \unity-of-opposites" theorem proved in section 5. Lemma 20 then If ( F , G ) is a Galois connection between the posets ( A , A ) and ( B , B ) F ∈ F.B ← G.A is a poset isomorphism, G ∈ G.A ← F.B is a poset isomorphism. 20 Moreover, within the given domains, F and G are inverses. 2 In order to prove the lemma, a more convenient, higher-order de˛nition of a poset monomorphism is the following. Function F with domain G.A is a poset monomorphism if, for all monotonic functions h and k (with range A and the same domain), _ G•k ≡ F•G•h  _ F•G•k . G• h  (21) To show that F is a poset monomorphism, we begin with an elementary calculation. G•F•G _  { cancellation of F•G } { cancellation of G•F } G _  G•F•G . It follows, by anti-symmetry, that G•F•G = G . In words, G is the left inverse of F ∈ F.B ← G.A . Dually, F•G•F = F . (Thus, F is the left inverse of G ∈ G.A ← F.B .) Now, _ F•G•k F•G•h  ≡ { ( F , G ) is a Galois connection, lemma 16 } _ G•F•G•k G•h  ≡ { G•F•G = G } _ G•k . G•h  So, F ∈ F.B ← G.A is a poset monomorphism. It remains to show that F ∈ F.B ← G.A is surjective. Suppose y∈B . We have to exhibit some x∈A such that F.y = F.(G.x) . The key is the property F•G•F = F . By this property, F.y = F.(G.(F.y)) so that we may take F.y for x . Dually, G ∈ G.A ← F.B is a poset isomorphism. This completes the proof. Exercise 22 Suppose F ∈ A←B and G ∈ B←A are two functions between posets ( A , A ) and ( B , B ). Prove that F and G are inverse poset monomorphisms is the same as the conjunction of the properties: ( F , G ) is a Galois connection between ( A , A ) and ( B , B ), and ( G , F ) is a Galois connection between ( B , B ) and ( A , A ). 2 21 3.5 Uniqueness of Adjoints There is a simple but powerful corollary of lemmas 16 and 19. Speci˛cally, suppose (F0 , G0) and (F1 , G1) are both pairs of Galois connected functions between the same partially ordered sets. Then, _ F1 ≡ G 1  _ G0 , F0  (23) since _ F1 F0  ≡ { lemma 16 with F,G,h,k := F0,G0,I,F1 } lemma 19 with F,G,h,k := F1,G1,I,G0 } _ G0•F1 I ≡ { _ G0 . G1  (Note this time the switch in the order of subscripts.) Hence, by applying 23 twice and combining the two applications with anti-symmetry, (24) F0 = F 1 ≡ G 1 = G 0 . Thus adjoints are uniquely de˛ned. We shall see in the next section how this observation can be put to good use. 3.6 Universal Property One of the reasons that Galois connections are not easy to spot is that standard mathematical de˛nitions of functions like, for example, the ‚oor function do not take the form of either Schmidt’s or Ore’s de˛nition of a Galois connection. Rather, they correspond to a hybrid of the two. The hybrid de˛nition is given in the following lemma. Lemma 25 ( F , G ) is a Galois connection between the posets ( A , A ) and ( B , B ) i¸ the following conditions hold. (a) G is monotonic. (b) For all x∈B , x  G.(F.x) . (c) For all x∈B and y∈A , x  G.y ⇒ F.x  y . 22 2 The hybrid de˛nition of a Galois connection is the least elegant of the three alternatives because the symmetry between the adjoint functions is hidden. An explanation for why it is nevertheless commonly used is that Galois connections are often used to de˛ne a function, F , in terms of a known function, G . The hybrid de˛nition focuses on the requirements on F . Indeed, we can put the hybrid de˛nition into words as follows. Given a monotonic function G , the lower adjoint, F , of G is de˛ned by the requirement that, for all x , F.x is the least y such that x  G.y . (Note that requirement (b) in lemma 25 requires that F.x be a y such that x  G.y and requirement (c) that it be the least such y .) The requirement \for all x , F.x is the least y such that x  G.y " is often referred to as the universal property of F . Examples of this form of de˛nition can be found in Gentzen’s [Gen69] formalisation of what he called \natural deduction". Gentzen de˛ned the logical operators (conjunction, implication, etc.) systematically by giving introduction and elimination rules for each operator. An example is his de˛nition of disjunction. There are two introduction rules, namely: p⇒p∨q and q⇒p∨q . (Gentzen would have used a turnstile rather than an implication operator. Such subtleties will be ignored in this discussion.) There is one elimination rule for disjunction: (p⇒r) ∧ (q⇒r) ⇒ (p ∨ q ⇒ r) . To see that Gentzen’s rules conform to properties (a), (b) and (c) of lemma 25 we ˛rst rewrite the elimination rule in the same way as we did for maximum above. Doing so and comparing with requirement (c), we identify G as the doubling function and F as disjunction: ((p, q) ⇒2 (r, r)) ⇒ (∨.(p, q) ⇒ r) . (The relation ⇒2 is de˛ned in just the same way as we de˛ned ≤2 earlier. That is, (p, q) ⇒2 (r, s) equivales (p⇒r) ∧ (q⇒s) .) We now check that the required cancellation law ( x  G.(F.x) ) corresponds to the introduction rules. Formally, it is: (p, q) ⇒2 (p ∨ q , p ∨ q) which is indeed the same as the conjunction of p ⇒ p ∨ q and q ⇒ p ∨ q . Finally, it is obvious that the doubling function is monotonic. 23 3.7 Commutativity Properties In our discussion of elementary examples of Galois-connected functions we observed that inverse functions are Galois connected. A vital property of inverse functions is that they have \inverse" algebraic properties. The exponential function, for instance, has as its inverse the logarithmic function; moreover, exp (−x) = 1 exp x and exp (x + y) = exp x · exp y and ln x + ln y = ln (x · y) . whereas 1 −(ln x) = ln ( ) x In general, if θ and φ are inverse functions then, for any functions f and g of appropriate type, ∀x:: θ.(f.x) = g.(θ.x) ≡ ∀y:: f.(φ.y) = φ.(g.y) . More generally, and expressed at function level, if (θ0 , φ0) and (θ1 , φ1) are pairs of inverse functions, then for all functions f and g of appropriate type, (26) θ0•f = g•θ1 ≡ f•φ1 = φ0•g . In general, our goal is to try to predict algebraic properties of functions previously unknown to us. Many such properties have the form of \commutativity properties". Among such properties are, for example, the property −(2·x) = 2·(−x) which expresses the fact that multiplication by 2 commutes with negation. In addition we include properties 1 like ln = − (ln x) in which the order of application of the logarithmic function is \comx muted" but in so doing a change occurs in the function with which it is commuted (in this case the reciprocal function becomes negation). In this way distributivity properties also become commutativity properties. For instance the property that x·(y+z) = (x·y)+(x·z) is a commutativity property of addition. Speci˛cally, multiplication by x after addition commutes to addition after the function (y, z) → (x·y , x·z) . In general, for a given function F we are interested in discovering functions g and h for which F•g = h•F . In the case that F is de˛ned by a Galois connection we can often answer this question most e¸ectively by translating it into a question about the commutativity properties of its adjoint | especially in the case that the adjoint is a known function with known properties, or a \trivial" function whose algebraic properties are easily determined. The latter is the case with the ‚oor function: the adjoint function is the function embedding integers into reals. The rule that is the key to this strategy is the following. Suppose, for numbers m and n , 0 ≤ m and 0 ≤ n , and for all i , 0 ≤ i < m+n , (Fi, Gi) is a Galois-connected 24 pair of functions. Then, assuming the functions are so typed that the compositions and equalities are meaningful, (27) F0• . . . •Fm−1 = Fm• . . . •Fm+n−1 ≡ Gm+n−1• . . . •Gm = Gm−1• . . . •G0 . (Note that the cases m = 0 and n = 0 are included; the composition of zero functions is of course the identity function.) In particular, if for i = 0..1 , (hi, ki) is a Galois-connected pair of functions and so too is (F, G) then (28) h0•F = F•h1 ≡ k1•G = G•k0 . The rule (27) captures the strategy used to discover algebraic properties of an unknown function F , namely to translate to the discovery of properties of the adjoint function. The proof of (27) is trivial: according to theorems 10 and 11, the m -fold composition Gm−1• . . . •G0 is an upper adjoint of F0• . . . •Fm−1 and the n -fold composition Gm+n−1• . . . •Gm is an upper adjoint of Fm• . . . •Fm+n−1 . Thus, (27) is the instance of (24) obtained by making the instantiations: F0 := F0• . . . •Fm−1 , F1 := Fm• . . . •Fm+n−1 , G1 := Gm+n−1• . . . •Gm , G0 := Gm−1• . . . •G0 . Note that (27) subsumes the composition and identity rules (theorems 10 and 11) and the uniqueness of adjoints (24), the three rules that contributed to its proof. Nevertheless we do not recommend that you should commit it to memory. It’s better to remember the individual rules and the general idea of how these may be combined. Exercise 29 A property that combines (16) and (19) is the following. Suppose, for i = 0, 1 , (Ai, Ai ) and (Bi, Bi ) are posets and (Fi ∈ Ai ← B i, Gi ∈ Bi ← Ai) are Galois-connected pairs of functions. Let h ∈ B 0 ← B 1 and k ∈ A0 ← A1 be arbitrary monotonic functions. Then (30) _ k•F1 ≡ h•G1  _ G0•k . F0•h  Prove this rule. 2 Exercise 31 A pattern that you may have observed in the properties of the ‚oor function is that if f is an arbitrary function from reals to reals such that it is the upper adjoint in a Galois connection, and 25 its (lower) adjoint maps integers to integers. then f commutes with the ‚oor function in the sense that, for all real x , f.x = f. x . Show how this is an instance of (27). In other words, state precisely how to instantiate m , n and the functions F0 , . . . , Fm+n−1 and G0 , . . . , Gm+n−1 in order to obtain the law. 2 4 Existence of Galois Connections In this section we delve deeper into the theory of Galois connections. The goal is to establish a number of theorems that predict the existence of adjoint functions. These theorems are all intimately tied up with extremum preservation properties of functions, which in turn depend on the notions of supremum and in˛mum. There are four sections. Section 4.1 de˛nes suprema and in˛ma of a given \shape", section 4.2 then de˛nes preservation properties and section 4.3 presents the existence theorems. In order to better motivate what is to follow, let us begin with a brief outline. We recall that a Galois connection is a connection between two functions F and G between two posets (A, ) and ( B ,  ) of the form F.x  y ≡ x  G.y . Typical accounts of the existence of Galois connections focus on the properties of these functions. For example, given a function F , one may ask whether F is a lower adjoint in a Galois connection. The question we want to ask is subtley di¸erent. Note that the statement F.x  y de˛nes a relation between B and A . So too does x  G.y . The existence of a Galois connection states that these two relations are equal. A natural question is therefore: under which conditions does an arbitrary (binary) relation between two posets de˛ne a Galois connection between the sets? Exploring the question in more detail leads to the following question. Suppose R is a relation between posets B and A (i.e. R ⊆ B×A ). What is a necessary and su‹cient condition that there exist a function F such that (x, y) ∈ R ≡ F.x  y ? Such a relation is called a pair algebra. If we know the answer to this question then we can answer the question of whether a given function G has a lower adjoint | by 26 de˛ning R to be the relation x  G.y on x and y . We can also answer the dual question of whether there exists a function G such (x, y) ∈ R ≡ x  G.y . If, for a given relation R , the answer to both these questions (the original and its dual) is positive, then the relation R clearly de˛nes a Galois connection. In order to simplify the question, we make an abstraction step. We are required to ˛nd |for each x | a value F.x such that (x, y) ∈ R ≡ F.x  y . Suppose we ˛x x and hide the dependence on x . Then the problem becomes one of determining, for a given predicate p , necessary and su‹cient conditions guaranteeing that (32) p.y ≡ a  y for some a . If we can solve this simpli˛ed problem, we have also solved the original problem by de˛ning p.y to be (x, y) ∈ R and F.x to be a . It is easy to identify a necessary condition for (32) to hold: a must be the least y satisfying p . That is, p.a must hold (since a  a ) and for all y such that p.y holds, a  y . The question thus becomes: when is there a least element satisfying a given predicate p , and when does this least element a (say) satisfy (32) for all y ? The least element satisfying a given property (if it exists) is characterised by two properties. First, it itself satis˛es the property and, second, it is the \in˛mum" of all values satisfying the property. Section 4.1 de˛nes the notion of an in˛mum via a Galois connection. The dual notion of \supremum" is also discussed in this section. In˛ma and suprema are collectively called \extrema". Among the properties of extrema that we study, the question studied in section 4.2 of whether a function preserves in˛ma is particularly important. The question is of importance in its own right, but we shall also see that it is vital to answering our question about the existence of Galois connections. Indeed, the basic insight is that a predicate p on a (complete) poset preserves in˛ma if and only if there is an a such that, for all y , p.y is a  y . We see in section 4.3 how this simple observation is used to prove the \fundamental theorem" (theorem 48) on the existence of Galois connections. In˛ma and suprema need not exist. Section 5 is devoted to showing how the existence of a Galois connection can be used to predict the existence of extrema in a poset. 4.1 4.1.1 Infima and Suprema Infima and Completeness In this section we formulate the notion of an extremum (an in˛mum or a supremum) of a certain \shape" (for example, the in˛mum of a ˛nite set, or the in˛mum of a countably 27 in˛nite set) via a Galois connection. The notions of completeness and cocompleteness, relative to a given shape, are also de˛ned. Suppose (A, ) and (B, ) are partially ordered sets and f ∈ A←B is a monotonic function. Then an in˛mum of f is a solution of the equation: x :: ∀a:: a  x ≡ ∀b:: a  f.b . (33) As an example, consider the function b: c  b: b where c is some given constant. This has in˛mum c since ∀a:: a  c ≡ ∀b: c  b: a  b . There are two ways in which this de˛nition di¸ers from the most common de˛nition of an in˛mum. The ˛rst is that it is more common to de˛ne the in˛mum of a set rather than of a function. That the two notions are equivalent is not di‹cult to see. The in˛mum of a set is the in˛mum of the identity function on that set, and the in˛mum of a function can be thought of as the in˛mum of the range of the function. (The range of a function is a set.) De˛ning in˛ma on functions is an elegant way of being able to talk about di¸erent kinds of in˛ma as we shall see very shortly. The second di¸erence is that an in˛mum is often de˛ned as a greatest lower bound. That is, x is an in˛mum of f if it is a lower bound: ∀b:: x  f.b , and it is greatest among such lower bounds ∀a:: a  x ⇐ ∀b:: a  f.b . This is entirely equivalent to (33), as is easily veri˛ed. The in˛mum of a function f need not exist but if it does it is certainly unique. This is easily seen. Speci˛cally, for all a∈A , a  x0 ≡ { • x0 solves (33) } x1 solves (33) } ∀b:: a  f.b ≡ { • a  x1 . Hence, by indirect equality, x0 and x1 both solve (33) ⇒ x0 = x1 . 28 As mentioned, equation (33) need not have a solution. If it does, for a given f , we denote its solution by f . By de˛nition, then, ∀a:: a  f ≡ ∀b:: a  f.b . (34) Suppose we ˛x the posets A and B and consider all functions of type A←B . Suppose that there is a function  mapping all such functions to their in˛ma. Then we recognise (34) as a Galois connection. Speci˛cally, ∀b:: a  f.b ≡ { • de˛ne the function K ∈ (A←B)←A by (K.a).b = a } ∀b:: (K.a).b  f.b ≡ { _ (pointwise ordering on functions) de˛nition of  } _ f . K.a  Thus, our supposition becomes that there is a function  that is the upper adjoint of the so-called \constant combinator" of type (A←B)←A de˛ned by (K.a).b = a for all a∈A and b∈B . That is, for all a∈A and f ∈ A←B , (35) _f . a  f ≡ K.a  If this is the case we say that the poset A is B -complete. If a poset A is B -complete for all B we say that it is complete. 4.1.2 Shape Posets The poset B is called the shape poset. By varying B we can consider di¸erent \types" or \shapes" of in˛ma. For instance, if we take B to be 2 , the two-point set {0,1} ordered by equality, then the set of functions to A from B is in (1{1) correspondence with pairs of elements ( a0,a1 ) (to be precise: f→( f.0 , f.1 ) and ( a0,a1 )→f where f.0 = a0 and f.1 = a1 ). Via this correspondence, the function K is the doubling function: (K.a).b = a = ( a,a ) , and writing f.0  f.1 instead of f , the Galois connection (35) simpli˛es to a  xy ≡ ( a,a )  ( x,y ) . That is, a  xy ≡ a  x ∧ a  y . 29 This is the Galois connection de˛ning the in˛mum of a bag of two elements (of which the Galois connection de˛ning the minimum of two numbers is a special case). If B is the empty poset, φ , then there is exactly one function of type A←B . The right side of (35) is vacuously true and, thus, for all a∈A and f ∈ A←φ , a  f . In words, the poset A is φ -complete equivales A has a greatest element. If B equals A then the set of functions of type A←B includes the identity function. The in˛mum of the identity function is a solution of the equation x :: ∀a:: a  x ≡ ∀b:: a  b . and thus, by instantiating a to x , a solution of x :: ∀b:: x  b . The in˛mum of the identity function is thus the least element in the set A . A ˛nal example is the case that B is (IN, ≥) (the natural numbers ordered by the at-least relation). Functions in A←B are then in (1{1) correspondence with so-called (descending) chains | sets of elements ai ( 0 ≤ i ) such that a0  a1  a2  ... . To say that A is (IN, ≥) -complete is equivalent to all such chains having an in˛mum in A . 4.1.3 Suprema and Cocompleteness The notion dual to in˛mum is \supremum" and the notion dual to completeness is \cocompleteness". Suppose (A, ) and (B, ) are partially ordered sets and f ∈ A←B is a monotonic function. Then a supremum of f is a solution of the equation: (36) x :: ∀a:: x  a ≡ ∀b:: f.b  a . As for in˛ma, equation (36) need not have a solution. If it does, for a given f , we denote its solution by f . By de˛nition, then, (37) ∀a:: f  a ≡ ∀b:: f.b  a . The poset A is B -cocomplete if there is a function  that is the lower adjoint of the constant combinator of type (A←B)←A . That is, for all a∈A and f ∈ A←B , (38) _ K.a . f  a ≡ f  If a poset A is B -cocomplete for all B we say that it is cocomplete. It can be shown, using the techniques developed here, that completeness and cocompleteness of a poset are equivalent. So the term \cocomplete" (used without qualifying shape poset) is redundant. 30 The fact that extrema are de˛ned via Galois connections immediately suggests a number of useful calculation properties. We mention just two. First, both extremum operators are monotonic. That is, _ g ⇒ f  g ∧ f  g . f Second, instantiating a to f in (34) we get the cancellation property: ∀b:: f  f.b . In words, f is a lower bound on (the range of ) f . Dually, ∀b:: f.b  f . That is, f is an upper bound (on the range of) f . 4.2 Extremum Preservation Properties Suppose A and B are partially ordered sets and suppose f ∈ A←B . An important consideration is whether f preserves in˛ma (or suprema) of a certain shape. If f preserves in˛ma of shape C we say that f is C -inf-preserving. If f preserves suprema of shape C we say that f is C -sup-preserving. The precise formulation that we use in calculations is as follows. Definition 39 Suppose A , B and C are partially ordered sets such that B is C complete. Then function f ∈ A←B is C -inf-preserving if, for all functions g ∈ B←C , (40) _ f•g . ∀a:: a  f.(g) ≡ K.a  Dually, suppose that B is C -cocomplete. Then function f ∈ A←B is said to be C -suppreserving if, for all functions g ∈ B←C , (41) _ K.a . ∀a:: f.(g)  a ≡ f•g  If B is complete, we say that f is inf-preserving if it satis˛es (40) for all C and all g . Sup-preserving is de˛ned dually. 2 The de˛nition of inf-preserving does not require that A be complete. If that is the case, and we abuse notation by using  to denote the in˛mum operator for both A and B , then f ∈ A←B is inf-preserving if for all functions g with range B (42) f.(g) = (f•g) . 31 Although more complicated, we are obliged to use equation (40) if we want to establish properties of inf-preservation, whereas the less complicated equation (42) is the equation we will use when we want to establish properties of a function f that is known to be inf-preserving, and the posets concerned are known to be complete. A predicate is a function with range Bool , the two element set with elements true and false . Ordering Bool by implication ( ⇒ ) the in˛mum of a (monotonic) predicate p is the universal quanti˛cation ∀p (that is ∀x:: p.x ). Also that predicate p is C -infpreserving means that p.(g) ≡ ∀(p•g) for all functions g with range the domain of p and domain C . Formally, p ∈ Bool←B is C -inf-preserving ≡ { de˛nition } ∀g: g ∈ B←C: ∀a: a∈Bool: a ⇒ p.(g) ≡ ∀x:: a ⇒ p.(g.x) ≡ { simpli˛cation using true ⇒ q ≡ q and false ⇒ q ≡ true } ∀g: g ∈ B←C: p.(g) ≡ ∀x:: p.(g.x) . That is, for predicate p with domain B , (43) p is C-inf-preserving ≡ ∀g: g ∈ B←C: p.(g) ≡ ∀x:: p.(g.x) . The dual of (43) is: (44) p is C-sup-preserving ≡ ∀g: g ∈ B←C: p.(g) ≡ ∀x:: p.(g.x) . Just as for (co)completeness properties, various terminology exists for speci˛c instances of C . Taking C to be (Bool, ⇒) , the booleans ordered by implication, a C -(inf or sup)-preserving function, f , is a function that maps values x and y such that x  y to values f.x and f.y such that f.x  f.y . Thus a function is (Bool, ⇒) -(inf and/or sup)-preserving if and only if it is monotonic. If C is the empty set then f is C -sup-preserving means that f maps the least element of B to the least element of A . The terminology that is often used in the computing science literature is \ f is strict". Sometimes one says f is bottom-strict, \bottom" being a common name in the computing science literature for the least element in a set. \Top-strict" then means preserving the greatest element, i.e. empty set-inf-preserving. If C is the set of natural numbers ordered by the usual at-most relation then C -suppreservation of a function is often called ω -continuity. If C is a non-empty ˛nite set then C -inf-preservation of f is equivalent to f preserving binary in˛ma (i.e. f.(xy) = f.x  f.y ). This is sometimes referred to as positive inf-preservation of f . 32 Several examples of ˛nite preservation properties have already been discussed. All of these examples are instances of a fundamental extremum preservation property of adjoint functions which we present in the next section. 4.3 Existence Theorem In this section we derive a fundamental theorem on the existence of Galois connections. To simplify matters we often make the assumption that we are dealing with complete posets, particularly in the beginning of the discussion. The section ends with a discussion of properties of adjoint functions when this assumption is not valid. 4.3.1 Pair Algebras Recall the motivating discussion at the beginning of section 4. There we went from the question, given a relation R , when is there a function F such that (x, y) ∈ R ≡ F.x  y , to the question, given a predicate p , when is it the case that p.y ≡ a  y . The answer to the latter question is the subject of lemma 46 below, following which we can immediately answer the former question. It is but a short step from here to the fundamental theorem on the existence of Galois connections. Along the way, we discuss several other examples of the lemmas we have formulated. We begin with a trivial, but insightful lemma. Lemma 45 Function f is C -inf-preserving is the same as, for all a , the predicate x:: a  f.x is C -inf-preserving. Proof f is C -inf-preserving ≡ { de˛nition of C -inf-preserving: (40), where g ranges over functions with domain C } ∀g:: ∀a:: a  f.(g) ≡ ∀x:: a  f.(g.x) ≡ { nesting of quanti˛cations } ∀a:: ∀g:: a  f.(g) ≡ ∀x:: a  f.(g.x) ≡ { (43) } ∀a:: x:: a  f.x is C -inf-preserving  . 2 Noting that the identity function is inf-preserving, an immediate corollary of lemma 45 is that, for each a , the predicate x:: a  x is inf-preserving. This suggests that we explore whether the converse is true. And, indeed, it is: 33 Lemma 46 Suppose p is a predicate on a complete poset. Then p is inf-preserving ≡ ∃a:: p ≡ x:: a  x . Furthermore, if p is inf-preserving x: p.x: x is the unique solution of the equation a:: p ≡ x:: a  x . 2 Applying lemma 46 as we indicated we would be doing we have answered our original question about relations R : Theorem 47 (Pair Algebras) Suppose B is a set and (A, ) is a complete poset. Suppose R ⊆ B×A is a relation between the two sets. Then the following two statements are equivalent. The function F de˛ned by F.x = y: (x, y) ∈ R: y satis˛es, for all x∈B and all y∈A , (x, y) ∈ R ≡ F.x  y . For all x∈B , the predicate y:: (x, y) ∈ R is inf-preserving. 2 4.3.2 Examples of Pair Algebras A simple example of the pair-algebra theorem is provided by the membership relation. For all sets S and all x (in a given universe of which S is a subset) we have: x∈S ≡ {x} ⊆ S . This statement has the form (x, S) ∈ R ≡ F.x ⊆ S where the relation R is the membership relation. We thus deduce that the predicate ( x∈ ) is inf-preserving. That is, for all bags of sets S , x ∈ ∩S ≡ ∀S: S∈S: x∈S . 34 A common way to exploit the pair algebra theorem is to take a function or relation and extend it to a relation between sets in such a way that the in˛mum and supremum preserving properties are automatically satis˛ed. In fact this is the basis of the so-called « \Galois correspondence" between groups and ˛elds put forward by Evariste Galois in 1832. That example involves too much background terminology to usefully include it here. A simpler example is the following. Consider a relation R on two sets A and B . (Thus R ⊆ A×B .) These sets need not be ordered. De˛ne the relation R on subsets X and Y of A and B , respectively, by (X, Y) ∈ R ≡ X×Y ⊆ R . Now, the functions X:: X×Y and Y:: X×Y preserve arbitrary unions of sets, and hence so do the predicates X:: X×Y ⊆ R and Y:: X×Y ⊆ R . Thus, by the pair-algebra theorem (taking care with the direction of the orderings) there are functions FR and GR such that for all subsets X and Y of A and B , respectively, FR.X ⊇ Y ≡ X×Y ⊆ R ≡ X ⊆ GR.Y . In words, FR.X is the largest set Y such that every element of X is related by R to every element of Y . Similarly, GR.Y is the largest set X such that every element of X is related by R to every element of Y . The functions FR and GR are called polarities; FR is the left polar and GR is the right polar of relation R . This Galois connection is the basis of so-called concept lattices [DP90, GW99]. In this application area, the set A is a set of objects, and the set B is a set of attributes (or \concepts"). For example, we may take A to be the set of planets of the sun (Mars, Venus, Jupiter, etc.) and B to be attributes like \has a moon" and \is nearer to the sun than the Earth". The relation R is then \satis˛es the predicate". We may then ask for all the planets that both have a moon and are nearer to the sun than the Earth. A more substantial example of the pair-algebra theorem, that stands out in the computing science literature, is provided by conditional correctness assertions of program statements. Suppose S is a program statement and p and q are predicates on the state space of S. Then, one writes {p}S{q} if after successful execution of statement S beginning in a state satisfying the predicate p the resulting state will satisfy predicate q. In such a case one says that statement S is conditionally correct with respect to precondition p and postcondition q. (\Conditional" refers to the fact that satisfying predicate q is conditional on the termination of statement S .) In this way each program statement S de˛nes a relation on state predicates. This relation is such that, for all bags of predicates P , {∃p: p∈P: p}S{q} ≡ ∀p: p∈P: {p}S{q} . 35 That is, for each state predicate q the predicate p:: {p}S{q} preserves suprema in the poset of predicates ordered by implication (equivalently, preserves in˛ma in the poset of predicates ordered by follows-from). Thus, by the dual of the pair-algebra theorem with implication as the ordering on predicates, for each statement S and predicate q there is a predicate wlp(S, q) satisfying {p}S{q} ≡ p ⇒ wlp(S, q) . The abbreviation \wlp" stands for \weakest liberal precondition". Before leaving this example, let us note that it is also the case that, for all bags of predicates Q , {p}S{∀q: q∈Q: q} ≡ ∀q: q∈Q: {p}S{q} . There is thus also a predicate slp(S, p) satisfying {p}S{q} ≡ slp(S, p) ⇒ q . Combining this equation with the equation for the weakest liberal precondition, we thus have the Galois connection: for all predicates p and q , slp(S, p) ⇒ q ≡ p ⇒ wlp(S, q) . The abbreviation \slp" stands for \strongest liberal postcondition". 4.3.3 The Existence Theorem Now that we have seen several concrete examples, let us state the fundamental theorem. Theorem 48 (Fundamental Theorem) Suppose that B is a poset and A is a complete poset. Then a monotonic function G ∈ B←A is an upper adjoint in a Galois connection equivales G is inf-preserving. Dually, a monotonic function F ∈ A←B is a lower adjoint in a Galois connection equivales F is sup-preserving. Proof De˛ne the relation R by (x, y) ∈ R ≡ x  G.y and function F as in theorem 47. Then, G is inf-preserving ≡ { lemma 45 } ∀x:: y:: x  G.y is inf-preserving  ≡ { theorem 47 with R and F de˛ned as above ∀x, y:: F.x  y ≡ x  G.y . } 36 2 We mentioned a couple of instances of theorem 48 immediately prior to its derivation. There are many other instances we could give. The predicate calculus is a source of several. We saw earlier that for all predicates p the function ( p∧ ) is a lower adjoint. Speci˛cally, (p ∧ q ⇒ r) ≡ (p ⇒ (q⇒r)) . The supremum operator in the lattice of predicates ordered by implication is existential quanti˛cation since (∃x:: p.x ⇒ q) ≡ ∀x:: p.x ⇒ q . Instantiating theorem 48 we thus conclude that ( p∧ ) commutes with existential quanti˛cation. That is, p ∧ ∃x:: q.x ≡ ∃x:: p ∧ q.x . The dual theorem is that ( p⇒ ) commutes with universal quanti˛cation: (p ⇒ ∀x:: q.x) ≡ ∀x:: p ⇒ q.x . A more enlightening example is a¸orded by negation. Recall that (¬p ⇒ q) ≡ (p ⇐ ¬q) . Now, in order to instantiate lemma 55 we need to be very clear about the partially ordered sets involved: instantiate A to the predicates ordered by implication and C to the predicates ordered by follows-from. Observe that the supremum operator in A is existential quanti˛cation, and in C is universal quanti˛cation. Thus by application of the lemma: ¬∀x:: p.x ≡ ∃x:: ¬(p.x) . This example illustrates why in general it is necessary to take great care with the precise de˛nitions of the orderings on the posets A and B when applying theorem 48. The above examples all illustrate the fact that, given a Galois connection, we can infer that the lower adjoint preserves suprema (and the upper adjoint preserves in˛ma). The converse property is most often used to ascertain that a function is a lower adjoint (or is an upper adjoint) without it being necessary to know what the corresponding upper adjoint of the function is. Indeed it is often the case that the upper adjoint has a clumsy de˛nition that one wishes to avoid using explicitly at all costs. 37 Exercise 49 Consider an arbitrary set U . For each x in U the predicate ( x∈ ) maps a subset P of U to the boolean value true if x is an element of P and otherwise to false . The predicate ( x∈ ) preserves set union and set intersection. That is, for all bags S of subsets of U , x ∈ ∪S ≡ ∃P: P∈S: x∈P and x ∈ ∩S ≡ ∀P: P∈S: x∈P . According to the fundamental theorem and its dual, the predicate ( x∈ ) thus has a lower and an upper adjoint. Construct a closed formula for the upper and lower adjoints of ( x∈ ). 2 Exercise 50 Consider the predicate ( = φ ) (where φ denotes the empty set) on subsets of a given set U . Using lemma 46 (or otherwise) determine in what sense the predicate preserves set union. Hence construct a Galois connection in which the predicate is the lower adjoint. 2 Exercise 51 Suppose ( A , A ) and ( B , B ) are complete posets. Theorem 48 states that there is a function mapping the sup-preserving functions F ∈ A←B to the inf-preserving functions G ∈ B←A and, vice-versa, there is a function mapping the infpreserving functions G ∈ B←A to the sup-preserving functions F ∈ A←B . Denote these functions by a superscript  and a superscript  , respectively, so that F denotes the upper adjoint of sup-preserving function F , and G denotes the lower adjoint of infpreserving function G . In other words, for all x and y , F.x  y ≡ x  F .y and G .x  y ≡ x  G.y . Show that ings!)  and  are inverse poset isomorphisms. (Take care with the order- 2 Exercise 52 2 Prove: a poset is complete is the same as the poset is cocomplete. 38 4.3.4 Existential Extremum Preservation In the statement of theorems 47 and 48 we have assumed that a poset is complete. Sometimes this assumption is not justi˛ed, but we would still like to know what the extremum preservation properties of the adjoints of a Galois connection are. For this purpose, we make use of (27). Suppose A , B and C are partially ordered sets and both A and C are B -cocomplete. We recall that this means that all functions of type A←B or C←B have a supremum. Equivalently, the K-combinators of type (A←B)←A and (C←B)←C each have lower adjoints. Denoting the K-combinator of type (A←B)←A by KA and the corresponding supremum operator by A , we can instantiate (27) to (53) F0•C = A •F1 ≡ KC •G0 = G1•KA . This holds for all functions F0 , G0 , F1 and G1 such that F0 ∈ A←C , G0 ∈ C←A , F1 ∈ (A←B)←(C←B) , G1 ∈ (C←B)←(A←B) , and (F0, G0) and (F1, G1) are both Galois pairs. If you ignore the subscripts on  and K (for instance, when A and C are identical) then (53) is seen to be an instance of (28). The extra generality proves to be necessary for some examples to be discussed shortly. Now, the K-combinator is a particularly simple function. It is relatively easy to explore its commutativity properties. Then, with the aid of (53), we can translate those properties into properties of suprema. Let’s do that. Suppose G ∈ C←A . It is easy to ˛nd a function h such that KC •G = h•KA . Speci˛cally, (54) KC •G = (G•)•KA . Recalling that the function G• has lower adjoint F• given that G has lower adjoint F we can now instantiate (53) obtaining the lemma: Lemma 55 Suppose that F ∈ A←C is a lower adjoint in a Galois connection. Then F preserves suprema. That is, for each partially ordered set B such that both A and C are B -cocomplete, F•C = A •(F•) . Equivalently, for each function h ∈ C←B , F.(C .h) = A .(F•h) . 2 39 Exercise 56 Another consequence of (54) is that for an arbitrary monotonic function f of type A←C A .(f•h)  f.(C .h) . Prove this property. 2 5 Unity of Opposites In this section we want to show that adjoint functions create complete posets. Speci˛cally, we want to show that if the pair (F, G) is a Galois connection between posets A and B , and either one of A or B is C -complete, then the images of F and G (the set of elements x that equal F.y for some y , and the set of y that equal G.x for some x ) are also C -complete. For brevity we denote the image of F by F.B and the image of G by G.A . We make the assumption that B is C -complete. We begin by showing that G.A is C -complete and then show that F.B is C -complete. The dual theorem is that both of F.B and G.A are C -complete if A is C -complete. Thus, both are C -complete if A or B is C -complete. 5.1 The Prefix Lemma The key property that we use is that x is an element of the image of G is the same as G.(F.x)  x . Thus the image of G is the set of so-called \pre˛x points" of the function G•F . Pre˛x points will ˛gure highly in later sections on ˛xed point calculus. For the moment, it su‹ces for us to observe the so-called \pre˛x lemma" which asserts that the pre˛x points of a monotonic endofunction on a C -complete poset form a C -complete poset. (The pre˛x lemma emerges somewhat out of the blue in this development. We encounter the lemma again in section 7 where we hazard a guess as to how the lemma may have been ˛rst observed.) The ˛rst task |which we leave to the reader| is to show that (57) x ∈ G.A ≡ G.(F.x)  x . Property (57) is important because it identi˛es the image of G as the set of \pre˛x points" of the function G•F . In general, if h is a monotonic endofunction on some poset A then y∈A is said to be a a pre˛x point of h if h.y  y . The reason that this is important is because of the following lemma. 40 Lemma 58 (Prefix lemma) Suppose B and C are posets and suppose B is C complete. Suppose also that h is a monotonic endofunction on B . Then the set, Pre.h , of pre˛x points of h is also C -complete. Proof Suppose f is any function with range Pre.h and domain C . Since B is C complete and Pre.h is a subset of B the function f has an in˛mum, B .f , in B . Letting P denote Pre.h , the required in˛mum of f is a solution of the equation: _ f x: x∈P: ∀y: y∈P: y  x ≡ K.y  (59) whereas B .f is a solution of the equation: _ f . x: x∈B: ∀y: y∈B: y  x ≡ K.y  By specialising the latter universal quanti˛cation to elements of P (which is a subset of B ) it is thus the case that B .f solves: _ f . x: x∈B: ∀y: y∈P: y  x ≡ K.y  (60) Comparing (59) with (60) it is clear that B .f solves (59) if it is an element of P . That is, we must show that B .f is a pre˛x point of h. This we do as follows. B .f ∈ Pre.h ≡ { de˛nition of Pre.h } h.(B .f)  B .f ≡ { de˛nition of B .f , where dummy c ranges over C } ∀c:: h.(B .f)  f.c ⇐ { range of f is Pre.h . Thus ∀c:: h.(f.c)  f.c . Transitivity of  . } ∀c:: h.(B .f)  h.(f.c) ⇐ { h is monotonic } ∀c:: B .f  f.c ≡ { cancellation } true . 2 By instantiating h to G•F , it follows immediately from the pre˛x lemma that if B is C -complete, Pre.(G•F) is C -complete. Combining this with (57), we get that if B is a C -complete poset then G.A is also a C -complete poset. 41 5.2 The Theorem The proof of the pre˛x lemma shows that the in˛mum operator in G.A coincides with the in˛mum operator in B . The supremum operators of G.A and B do not necessarily coincide. Let f be any function with range G.A and domain C . The supremum of f in G.A is a solution of the equation X: X ∈ G.A: ∀x: x ∈ G.A: X  x ≡ ∀y:: f.y  x whereas the supremum of f in B is a solution of the equation Y: Y∈B: ∀x: x∈B: Y  x ≡ ∀y:: f.y  x . Suppose Y is the supremum of f in B (that is, Y = B .f ) and suppose x ∈ G.A . Then, we construct the supremum of f in G.A as follows. The goal is to construct X such that X ∈ G.A and ∀x: x ∈ G.A: X  x ≡ ∀y:: f.y  x . Equivalently, X = G.a where a∈A and ∀x: x ∈ G.A: G.a  x ≡ ∀y:: f.y  x . We now calculate a : G.a  x ≡ ∀y:: f.y  x ≡ { de˛nition of Y , x ∈ G.A ⇒ x∈B } G.a  x ≡ Y  x ≡ { propositional calculus } (G.a  x ⇐ Y  x) ∧ (G.a  x ⇒ Y  x) ≡ { x ∈ G.A ≡ G.(F.x) = x } (G.a  G.(F.x) ⇐ Y  x) ∧ (G.a  x ⇒ Y  x) ⇐ { properties of Galois-connected functions: G•F is monotonic and Y  G.(F.Y) } a = F.Y . In summary, we have calculated that (61) G.A .f = G.(F.(B .f)) . The ˛nal task is to show that F.B is complete. This we do by expressing the supremum and in˛mum operators in F.B in terms of those in B . From section 3.4 we know that the 42 functions F and G witness a poset isomorphism between F.B and G.A . We have also just shown that G.A is complete. It follows immediately that F.B is complete. That is, all supremum and in˛ma exist in F.B and we can use simple equational reasoning to express them in terms of the suprema and in˛ma in B . Let f be a function with range F.B . Then, F.B .f = { x ∈ F.B ≡ x = (F•G).x } F.B .(F•G•f) = { F ∈ F.B ← G.A is a poset isomorphism, fundamental theorem } F.(G.A .(G•f)) = { (61) } (F•G•F).(B .(G•f)) = { F•G•F = F } F.(B .(G•f)) and F.B .f = { x ∈ F.B ≡ x = (F•G).x } (F•G).(F.B .f) = { F ∈ F.B ← G.A is a poset isomorphism, fundamental theorem } F.(G.A .(G•f)) = { in˛mum in G.A coincides with the in˛mum in B } F.(B .(G•f)) . Summarising, we have proved the following theorem. Theorem 62 (Unity of Opposites) Suppose F ∈ A←B and G ∈ B←A are Galois connected functions, F being the lower adjoint and G being the upper adjoint. Then F.B and G.A are isomorphic posets. Moreover, if one of A or B is C -complete, for 43 some shape poset C , then F.B and G.A are also C -complete. Assuming that B is C -complete the supremum and in˛mum operators are given by G.A .f = B .f G.A .f = G.(F.(B .f)) F.B .f = F.(B .(G•f)) F.B .f = F.(B .(G•f)) . 2 Picturing the posets A and B as sets in which larger elements are above smaller elements, the unity-of-opposites theorem is itself summarised in the following diagram. The two larger lenses picture the sets A (on the left) and B (on the right); the bottomleft lens pictures F.B and the top-right lens G.A . The latter two sets are pictured as having the same size because they are isomorphic, whereas A and B are pictured as having di¸erent size because they will not be isomorphic in general. Note that F maps the least element of B (denoted ⊥⊥B in the diagram) to the least element of A (denoted ⊥⊥A ). Further G maps the greatest element of A (denoted A in the diagram) to the greatest element of B (denoted B ). The posets F.B and G.A are \opposites" in the sense that the former contains small elements whereas the latter contains large elements. In particular, F.B includes the least element of A and G.A includes the greatest element of B . They are uni˛ed, however, by the fact that they are isomorphic. 44 B : u    G   A    u      F   F.  B = u u > ⊥   G.⊥ A      G    uX  y ⊥⊥AXXXXX F XXXXX u ⊥⊥B Exercise 63 (Closure Operators) Closure operators play an important role in many areas of mathematics. This exercise relates Galois connections to closure operators. There are two de˛nitions of closure operators. According to the most common definition, a closure operator is an endofunction, f , on a poset (A, ) that is re‚exive: for all x , x  f.x , idempotent : for all x , f.x = f.(f.x) , and monotonic. According to the second de˛nition, a closure operator is an endofunction, f , such that x  f.y ≡ f.x  f.y for all x and y . An example of a closure operator is the function ( a ) where a is some element of A . The identity function is also (trivially) a closure operator. (a) Prove that the two de˛nitions of closure operator given above are equivalent. 45 Expressing the second de˛nition in point-free form, f is a closure operator if and only if _ f•h ≡ f•g  _ f•h g for all monotonic functions g and h with range A and the same domain. You may ˛nd this the more useful de˛nition in the ˛nal part of this exercise. (b) Suppose that f ∈ (A, )←(A, ) is a closure operator and suppose the pair of functions G ∈ (A, ) ← (B, ) and H ∈ (B, ) ← (A, ) forms a Galois connection (with G being the lower adjoint). Show that H•f•G is a closure operator. 2 Exercise 64 Suppose the conditions on F and G stated in theorem 62 hold. Then the supremum operator in F.B de˛nes an ordering  on the elements of F.B by the equation x  y ≡ x F.B y = y . Show that this ordering is the same as the poset ordering on elements of A . That is, show that for all x and y in F.B x A y ≡ x F.B y = y . 2 6 6.1 Fixed Points Prefix Points We begin our study of ˛xed points by introducing the notion of a \pre˛x" point. As an instance of where the notion commonly occurs, consider the following example from language theory. The set L = {n: 0 ≤ n: anbcn} over the alphabet {a,b,c} is sometimes speci˛ed in the following way. The word b is in the set L . If w is in the set L then so is the word awc . Nothing else is in the set L . 46 Expressing the ˛rst two clauses in terms of set inclusion we see that L is required to satisfy the equation: X:: {b} ⊆ X ∧ {a}·X·{c} ⊆ X , which is equivalent to the equation: X:: {b} ∪ {a}·X·{c} ⊆ X . Now consider the function f mapping sets of words to sets of words de˛ned by: f.X = {b} ∪ {a}·X·{c} . Then the requirement is that L be a so-called \pre˛x point" of f , i.e. L should satisfy the equation: X:: f.X ⊆ X . What about the third clause: \Nothing else is in the set L "? Note that this clause is necessary to specify L since, for example, the set of all words over the alphabet {a,b,c} is a pre˛x point of the function f . One way to understand this clause is as the requirement that L be the least pre˛x point of the function f . Thus the complete speci˛cation of L is the equation X:: f.X ⊆ X ∧ ∀Y: f.Y ⊆ Y: X ⊆ Y . Here, the ˛rst conjunct is the requirement that X be a pre˛x point of f , and the second conjunct that f be at most all pre˛x points Y of f . In this section we de˛ne the notions of least pre˛x point and least ˛xed point in a general setting, and we establish the most basic properties of these notions. You are recommended to instantiate the properties we establish on the example just discussed as you read in order to con˛rm your understanding. Suppose A = (A, ) is a partially ordered set and suppose f is a monotonic endofunction on A . Then a pre˛x point of f is an element x of the carrier set A such that f.x  x . A least pre˛x point of f is a solution of the equation x:: f.x  x ∧ ∀y: f.y  y: x  y . A least pre˛x point of f is thus a pre˛x point of f that is smaller than all other pre˛x points of f. A least ˛xed point of f is a solution of the equation (65) x:: f.x = x ∧ ∀y: f.y=y: x  y . Rather than study ˛xed points of f we are going to study pre˛x points of f since the latter are mathematically more manageable than the former. 47 Exercise 66 Relations are often de˛ned by just saying which pairs are in the relation. Implicitly the relation in question is de˛ned to be the least solution of a certain equation. (Recall that relations are sets of pairs and thus ordered by set inclusion.) An example is the following de˛nition of the \at-most" relation on natural numbers (denoted as usual by the in˛x operator ≤ ). The \at-most" relation is the least relation satisfying, ˛rst, that for all natural numbers n , 0≤n and, second, for all natural numbers m and n , m+1 ≤ n+1 ⇐ m ≤ n . This de˛nes the \at-most" relation as a least pre˛x point of a function from relations to relations. What is this function? 2 It is easy to see that a least pre˛x point of f is unique. Let x and x  be least pre˛x points of f . Then, x = x ≡ { anti-symmetry } x  x ∧ x  x ⇐ { ∀y: f.y  y: x  y with the instantiation y := x  ∀y: f.y  y: x   y with the instantiation y := x } f.x   x  ∧ f.x  x ≡ { x and x  are pre˛x points of f } true . A similar argument establishes that a least ˛xed point of f is unique. Nothing is lost by studying least pre˛x points rather than least ˛xed points since, by the following calculation, a least pre˛x point of a monotonic function f is a ˛xed point of f . x = f.x ≡ { • f.x  x } • ∀y: f.y  y: x  y , y := f.x x  f.x ⇐ { } 48 f.(f.x)  f.x ⇐ { f is monotonic } f.x  x ≡ { • f.x  x } true . Since all ˛xed points are also pre˛x points it follows that a least pre˛x point of f is also a least ˛xed point of f. Dual to the notion of least pre˛x point we can de˛ne the notion of \greatest post˛x point": a greatest post˛x point of endofunction f ∈ (A , ) ← (A , ) is a least pre˛x point of the function f ∈ (A , ) ← (A , ) . Spelling this out in detail, a post˛x point of f is an element x of the carrier set A such that x  f.x and a greatest post˛x point of f is a solution of the equation (67) x:: x  f.x ∧ ∀y: y  f.y: y  x . A greatest ˛xed point of f is a solution of the equation x:: f.x = x ∧ ∀y: f.y = y: y  x . Since a least pre˛x point is a least ˛xed point, we immediately have the dual result that a greatest post˛x point is also a greatest ˛xed point. A simple but important corollary is that function f has a unique ˛xed point if and only if the least pre˛x point of f equals the greatest post˛x point of f . Let us summarise what has been learnt in this section. Introducing the notation µf for a solution of the equation (65) and νf for a solution of the equation (67) we have: Theorem 68 (Least Prefix Point) Suppose (A, ) is an ordered set and the function f of type (A, ) ← (A, ) is monotonic. Then f has at most one least pre˛x point, µf , characterised by the two properties: (69) f.µf  µf and, for all x∈A , (70) µf  x ⇐ f.x  x . Moreover, the least pre˛x point of f is a ˛xed point of f : (71) f.µf = µf . 49 2 Note that least ˛xed points are characterised by the combination of (69) and (70) or (71) and (70). Because (69) is weaker than (71) it is usual to use the former characterisation when it is required to establish that a particular value is a least pre˛x point, and the latter characterisation when it is required to exploit the fact that a particular value is a least pre˛x point. Theorem 72 (Greatest Postfix Point) Suppose (A, ) is an ordered set. Suppose, also, that the function f of type (A, ) ← (A, ) is monotonic. Then f has at most one greatest post˛x point, νf , characterised by the two properties: (73) νf  f.νf and, for all x∈A , (74) x  νf ⇐ x  f.x . Moreover, the greatest post˛x point of f is a ˛xed point of f : (75) f.νf = νf . 2 Use of the rules (70) or (74) corresponds to using induction. We therefore refer to their use in calculations as ˛xed point induction. (Strictly we should say least ˛xed point induction or greatest ˛xed point induction but it will invariably be clear which of the two rules is meant.) Rules (71) and (75) are called computation rules because they are often used in programs as left-right rewrite rules, repeated application of which is used to compute a ˛xed point. Both induction rules can of course be written as an implication rather than a followsfrom. For example, (70) can be written as f.x  x ⇒ µf  x . This is the way many would write the rule. We deviate from this practice because in almost all our calculations we will apply the rule from left to right in the form given in (70). This is independent of whether our goal is to verify a statement of the form µf  x or, given x , to construct an f satisfying µf  x , or, given f , to construct an x satisfying µf  x . The bene˛t of proceeding in this way is the replacement of compli˛cation steps by simpli˛cation steps. Exercise 76 2 Show that µf  νf . 50 6.2 A First Example In this section we present a simple example of the Least Pre˛x Point theorem (theorem 68). The example illustrates why (70) really is an induction rule. It also illustrates the use of the computation rule (71) in a simple example of algorithm design. Let L be the least solution of the equation X:: {a} ∪ {b}·X·X ⊆ X . This corresponds to a grammar having two productions X ::= a and X ::= bXX . We ˛rst use (70) to prove that, for all words w in L , the number of a ’s in w is one more than the number of b ’s in w . Let M be the set of all words w such that the number of a ’s in w is one more than the number of b ’s in w . Let #aw denote the number of a ’s in w , and #bw denote the number of b ’s in w . We have to show that L ⊆ M . We use ˛xed point induction: L⊆M ⇐ { by de˛nition L = µf where f = X:: {a} ∪ {b}·X·X , induction: (70) } {a} ∪ {b}·M·M ⊆ M ≡ { set theory, de˛nition of concatenation } a∈M ∧ ∀x, y: x∈M ∧ y∈M: bxy ∈ M ≡ { de˛nition of M } #aa = #ba + 1 ∧ ∀x, y: #ax = #bx + 1 ∧ #ay = #by + 1: #a(bxy) = #b(bxy) + 1 ≡ { de˛nition of #a and #b } true ∧ ∀x, y: #ax = #bx + 1 ∧ #ay = #by + 1: #ax + #ay = 1 + #bx + #by + 1 ≡ { arithmetic } true . Note that this proof is not a proof by induction on the length of words in L . Introducing the length of words in L into the proof is spurious and only obscures the properties that are truly vital to the proof. Now let us illustrate the use of the computation rule. The problem we consider is how to write a simple loop that determines whether a given word W is in the language 51 L . At the same time we illustrate one of the most basic elements of the mathematics of program construction, namely the use of loop invariants. For the purposes of the discussion we need two functions on words, fst and rest . The fst function returns the ˛rst symbol of a word of length at least one, and the rest function returns the remainder of the word. For example, fst.(aba) = a and rest.(aba) = ba . We also use ε to denote the empty word (the word of length zero). The problem is to determine the truth or falsity of W∈L . As L is a ˛xed point of its de˛ning equation, we can expand this requirement as follows: W ∈L ≡ { by (71), L = {a} ∪ {b}·L·L } W ∈ {a} ∪ {b}·L·L ≡ { set calculus } W = a ∨ (fst.W = b ∧ rest.W ∈ L·L) . The two tests W = a and fst.W = b can, of course, easily be implemented. This then leaves us with the task of implementing the test rest.W ∈ L·L . Were we to repeat the same small calculation beginning with the latter then we would end up with the task of implementing the test rest.(rest.W) ∈ L·L·L . This clearly suggests a generalisation of the original problem, namely, given natural number n and word w determine the truth or falsity of the statement w∈Ln where, by de˛nition, L0 = {ε} and Ln+1 = L·Ln . With this generalised problem we begin our calculation again. First, by de˛nition of 0 L , w∈L0 ≡ w = ε . (77) Second, w∈Ln+1 ≡ { Ln+1 = L·Ln and, by (71), L = {a} ∪ {b}·L·L } w ∈ ({a} ∪ {b}·L·L)·Ln ≡ { concatenation distributes through union, concatenation is associative } w ∈ {a}·Ln ∪ {b}·L·L·Ln ≡ { set calculus } (fst.w = a ∧ rest.w ∈ Ln) ∨ (fst.w = b ∧ rest.w ∈ Ln+2) . 52 This calculation provides enough information to construct the required loop. The loop uses two variables, a word w and a natural number k , satisfying the loop invariant W ∈ L ≡ w ∈ Lk . This invariant is established by the assignment: w , k := W , 1 . Also, by the above calculation, it is maintained when k = n+1 for some number n by the assignment if fst.w = a → w , k := rest.w , k−1 2 fst.w = b → w , k := rest.w , k+1 fi . (Here we make the implicit assumption that the given word W is a string of a ’s and b ’s and contains no other characters, thus avoiding stipulating the semantics of the statement when neither of the two tests evaluates to true .) This statement also makes progress by reducing the length of w by 1 at each iteration. Finally, if we impose the termination condition w = ε ∨ k = 0 , then the conjunction of the termination condition and the invariant (w = ε ∨ k = 0) ∧ (W ∈ L ≡ w ∈ Lk) implies, using (77) and the calculation immediately following it, that W∈L ≡ w = ε ∧ k = 0 . This then is the complete program: w , k := W , 1 { Invariant: W ∈ L ≡ w ∈ Lk . Bound function: length of w ; do ¬(w = ε ∨ k = 0) → if } fst.w = a → w , k := rest.w , k−1 2 fst.w = b → w , k := rest.w , k+1 fi od { (w = ε ∨ k = 0) ∧ (W ∈ L ≡ w ∈ Lk) } { W ∈L ≡ w=ε ∧ k=0 } 53 6.3 Kleene Algebra The purpose of this section is to provide some practice in the use of the characterising properties of least ˛xed points, in particular the induction rule. For this purpose we consider an algebraic structure called a \Kleene algebra" by Kozen [Koz94]. A Kleene algebra has two constants 0 and 1 , two binary operators + and · , and a unary operator ∗ . These operators satisfy a number of axioms which we shall give shortly. The name \Kleene algebra" is a tribute to S. C. Kleene [Kle56] who postulated an algebraic structure as a basis for studying the behaviour of ˛nite state machines. He called it the \algebra of regular events", which nowadays is most commonly abbreviated to \regular algebra". Regular algebra is central to the mathematics of program construction because the three operators capture the essential properties of the three main ingredients of programming languages, choice ( + ), sequencing ( · ) and iteration ( ∗ ). The axioms of a Kleene algebra are not su‹cient to capture all the properties of choice, sequencing and iteration in programming languages and we shall need to add to the axioms later on. Any Kleene algebra is thus also a regular algebra but it is not necessarily the case that any regular algebra is a Kleene algebra. See sections 7.3 and 7.4 for further discussion. 6.3.1 The Axioms It is convenient to discuss the axioms of a Kleene algebra in two stages, ˛rst the axiomatisation of the + and · operators, and then the axiomatisation of the ∗ operator. In presenting the axioms we use variables a , b , x , y and z to range over the carrier set of the algebra. These variables are universally quanti˛ed in the usual way. The notation \ + " and \ · " is chosen to suggest a connection with addition and multiplication in ordinary arithmetic. Indeed, the + operator of a Kleene algebra is required to be associative and symmetric and to have 0 as unit element: (x+y)+z = x+(y+z) , x+y = y+x , and x+0 = x = 0+x . Also, just as in arithmetic, multiplication is required to be associative, to distribute over addition, and to have 1 as unit and 0 as zero element: x·(y·z) = (x·y)·z , x · (y+z) = (x·y) + (x·z) , (y+z) · x = (y·x) + (z·x) , 54 x·0 = 0 = 0·x , 1·x = x = x·1 . and Use of these rules in calculations will be referred to simply as \arithmetic". The addition and multiplication operators of a Kleene algebra deviate from the operators of normal arithmetic in two respects. First, multiplication is not assumed to be commutative. Second, addition is assumed to be idempotent. That is, x+x = x for all x . (Equivalently, 1+1 = 1 .) Because addition is idempotent, associative and symmetric, the relation ≤ de˛ned by x ≤ y ≡ x+y = y is re‚exive, transitive and anti-symmetric. In short, it is a partial ordering relation. Moreover, because 0 is the unit of addition, it is the least element in the ordering: 0≤x for all x . Finally, addition and multiplication are both monotonic with respect to the ≤ ordering. Exercise 78 Verify all the claims made in the last paragraph. Prove also that + is the binary supremum operator. That is, x+y ≤ z ≡ x ≤ z ∧ y ≤ z . 2 The ∗ operator will be the focus of our attention. The axioms for ∗ require that a∗ · b be the least pre˛x point of the (monotonic) function mapping x to b + a·x and that b · a∗ be the least pre˛x point of the (monotonic) function mapping x to b + x·a . Formally, a∗ · b is a pre˛x point of the function mapping x to b + a·x : (79) b + a·(a∗ ·b) ≤ a∗ ·b , and is the least among all such pre˛x points: (80) a∗ ·b ≤ x ⇐ b + a·x ≤ x . That is, (81) a∗ ·b = µx:: b + a·x . 55 Similarly, b · a∗ is a pre˛x point of the function mapping x to b + x·a : (82) b + (b·a∗ )·a ≤ b·a∗ , and is the least among all such pre˛x points: (83) b·a∗ ≤ x ⇐ b + x·a ≤ x . That is, (84) b·a∗ = µx:: b + x·a . The axioms (79) and (82) are thus instances of (69) and the axioms (80) and (83) are instances of (70), the two properties characterising least pre˛x points. An immediate corollary is that a∗ ·b and b·a∗ are ˛xed points of the relevant functions: (85) b + a·(a∗ ·b) = a∗ ·b , (86) b + (b·a∗ )·a = b·a∗ . This concludes the axiomatisation of a Kleene algebra. There are several interpretations of the operators in a Kleene algebra that ful˛ll the axioms. Table 6.3.1 summarises a few. The ˛rst interpretation in the table, in which the carrier consists of sets of words over a given, ˛xed alphabet, is the original application of Kleene algebra. Here the addition operation is set union, the multiplication operation is concatenation of sets of words, 0 is the empty set, 1 is the set whose sole element is the empty word, and a∗ is the set of words formed by concatenating together an arbitrary number of words in the set a . The second interpretation is the one we will return to most often in these lecture notes. In this interpretation the carrier set of the algebra is the set of binary relations over some state space, the addition operator is set union, multiplication is the composition operator on relations, 0 is the empty relation, 1 is the identity relation, and the iteration operator is the re‚exive, transitive closure operator on relations. The remaining three interpretations all concern path-˛nding problems; in each case the iteration operator ( ∗ ) is uninteresting. These interpretations only become interesting when one considers graphs with edge labels drawn from these primitive Kleene algebras, the crucial theorem being that it is possible to de˛ne a Kleene algebra of graphs given that the edge labels are elements of a Kleene algebra [BC75, Koz94]. 56 Table 1: Kleene algebras carrier + · 0 1 a∗ ≤ sets of words ∪ · φ {ε} a∗ ⊆ Programming binary relations ∪ ◦ φ id a∗ ⊆ Reachability booleans ∨ ∧ Shortest paths nonnegative reals max + Bottlenecks nonnegative reals max min Languages 6.3.2 false true true ⇒ ∞ 0 0 ≥ 0 ∞ ∞ ≤ Reflexive, Transitive Closure It is common to call a∗ the re‚exive, transitive closure of a . This is because a∗ is re‚exive, in the sense that 1 ≤ a∗ , transitive, in the sense that a∗ = a∗ ·a∗ , and ∗ is a closure operator, i.e. a ≤ b∗ ≡ a∗ ≤ b∗ . In this section we shall verify, from the axioms given above, that a∗ does indeed have these and other related properties. The main tool will be ˛xed point induction, i.e. axioms (80) and (83). The re‚exivity of a∗ is immediate from (85) or (86). Instantiating b to 1 in both these rules and using 1·x = x = x·1 we get 1 + a · a ∗ = a ∗ = 1 + a∗ · a . Thus, since + is the binary supremum operator, 1 ≤ a∗ . It then follows that a ≤ a∗ since a∗ = 1 + a·a∗ = 1 + a·(1 + a·a∗ ) = 1 + a·1 + a·(a·a∗ ) = 1 + a + a·a·a∗ . Now for the transitivity of a∗ we use a ping-pong argument: 57 a∗ = a∗ ·a∗ ≡ { } antisymmetry a∗ ≤ a∗ ·a∗ ∧ a∗ ·a∗ ≤ a∗ . The ˛rst inequality does not involve induction: a∗ ≤ a∗ ·a∗ ⇐ { transitivity } a∗ ≤ 1·a∗ ≤ a∗ ·a∗ ≡ { arithmetic for the ˛rst inequality; 1 ≤ a∗ and multiplication is monotonic for the second } true . The second inequality is where we need to use induction: a∗ ·a∗ ≤ a∗ ⇐ { (80) with a,b,x := a , a∗ , a∗ } a∗ + a·a∗ ≤ a∗ ≡ { + is the binary supremum operator } a∗ ≤ a∗ ∧ a·a∗ ≤ a∗ ≡ { re‚exivity of ≤ ; a∗ = 1 + a·a∗ } true . Thus we have established that a∗ = a∗ ·a∗ . We establish that ∗ is a closure operator by the following ping-pong argument. a∗ ≤ b∗ ⇐ { (80) with a,b,x := a , a , b∗ } 1 + a·b∗ ≤ b∗ ≡ { + is the binary supremum operator } 1 ≤ b∗ ∧ a·b∗ ≤ b∗ ⇐ { b∗ = 1 + b·b∗ , b∗ = b∗ ·b∗ , · is monotonic a ≤ b∗ ⇐ { a∗ ≤ b∗ . a ≤ a∗ } } 58 Exercise 87 Prove the following properties: (a) a · b∗ ≤ c∗ · a ⇐ a·b ≤ c·a (b) c∗ · a ≤ a · b∗ ⇐ c·a ≤ a·b (c) a · (b·a)∗ = (a·b)∗ · a (d) (a+b)∗ = b∗ · (a · b∗ )∗ = (b∗ · a)∗ · b∗ (e) (a∗ )∗ = a∗ Properties (a) and (b) are called leapfrog rules (because a \leapfrogs" from one side of a star term to the other). Both have the immediate corollary that ∗ is monotonic (by taking a to be 1 ). Properties (c) and (d) are called the mirror rule and star decomposition rule, respectively. Property (e) states that ∗ is idempotent. 2 Exercise 88 The goal of this exercise is to show that a∗ is the least re‚exive and transitive element of the algebra that is at least a . Let a+ denote a · a∗ . We call a+ the transitive closure of a . Prove the following properties of a+ . (a) a ≤ a+ (b) a + = a · a ∗ = a∗ · a (c) a+ · a+ ≤ a+ (d) a+ ≤ x ⇐ a + x·x ≤ x Note that the combination of (a), (c) and (d) expresses the fact that a+ is the least pre˛x point, and hence the least ˛xed point, of the function x:: a + x·x . That is, a+ is the least transitive element that is at least a . (e) Show that a∗ is the least ˛xed point of the function x:: 1 + a + x·x . 2 59 7 Fixed Point Calculus The least ˛xed point of a monotonic function is, as we have seen in theorem 68, characterised by two properties. It is a ˛xed point, and it is least among all pre˛x points of the functions. This gives us two calculational rules for reasoning about the least ˛xed point µf of monotonic function f : the computation rule µf = f.µf and the induction rule: for all x , µf  x ⇐ f.x  x . In principle these are the only rules one needs to know about least ˛xed points. Every calculation can be reduced to one involving just these two rules (together with the rules pertaining to the speci˛c problem in hand). This indeed is commonly what happens. Many postulates are veri˛ed by induction. But this is often not the most e¸ective way of reasoning about ˛xed points. The problem is that this basic characterisation of least ˛xed points typically invites proof by mutual inclusion. That is, given two expressions, E and F , involving ˛xed points (or pre˛x points) the obvious strategy to determine conditions under which they are equal is to determine conditions under which the two inclusions E  F and F  E hold. This can lead to long and clumsy calculations. To avoid this we state a number of equational properties of least ˛xed points. Their proofs, which are mostly straightforward, are left as exercises. We conclude the section with the mutual recursion theorem, which expresses the solution of mutually recursive ˛xed point equations in terms of the successive solution of individual equations. The proof, which we do include, is a ˛ne illustration of the earlier properties. Unless otherwise stated, we assume throughout this section that f and g are monotonic endofunctions on the complete lattice A = (A , ) . (The assumption of completeness can be avoided but is convenient and practical.) The set of such functions also forms _ de˛ned by a complete lattice under the pointwise ordering  _ g ≡ ∀x:: f.x  g.x . f The set of pre˛x points of f is denoted by Pre.f and its set of post˛x points is denoted by Post.f . The set Pre.f is also a complete lattice, as is Post.f . 7.1 Basic Rules The most basic ˛xed point rule is that the least-˛xed-point operator is monotonic: (89) _g . µf  µg ⇐ f  60 The second rule is called the rolling rule: (90) g.µ(f•g) = µ(g•f) (where f ∈ (A , ) ← (B , ) and g ∈ (B , ) ← (A , ) for some posets (A , ) and (B , ) ). Note that the rolling rule subsumes the rule that a least pre˛x point is also a ˛xed point. Just instantiate f to the identity function. The proof we gave of this fact in the previous section can be adapted to give a proof of the rolling rule. An immediate corollary of the rolling role is that, for a monotonic endofunction f , 2 µ(f ) = f.µ(f2) where f2 denotes f•f . This suggests that µ(f2) and µf are equal. This is indeed the case, and is called the square rule: (91) µf = µ(f2) . Consider now a monotonic, binary function ⊕ on the poset A . Then, for each x , the function ( x⊕ ) (which maps y to x⊕y ) is monotonic and so has a least ˛xed point µ(x⊕) . Abstracting from x , the function x:: µ(x⊕) is a monotonic endofunction on A and its least ˛xed point is µx:: µy:: x⊕y . The diagonal rule removes the nesting in this expression: (92) µx:: x⊕x = µx:: µy:: x⊕y . The diagonal rule is a very important rule because it is the basis of methods for ˛nding ˛xed points step by step. Suppose one wants to solve a ˛xed point equation x:: x = x⊕x where \ x⊕x " is a large expression involving several occurrences of the identi˛er x . The diagonal rule allows one to eliminate the x ’s one-by-one. This is what happens when one solves a number of equations de˛ning a set of values by mutual recursion (sometimes called a set of \simultaneous equations"). Such a set of equations can be viewed as one equation of the form x = f.x where x is a vector of values. Each occurrence of an element xi in the set of equations is in fact an occurrence of Πi.x , where Πi denotes the function that \projects" the vector x onto its i th component. It is thus an occurrence of \ x " itself and the standard practice is to eliminate such occurrences individually. Mutual recursion is discussed further in section 7.5. Exercise 93 For arbitrary elements a and b of a Kleene algebra , prove the following properties. Use only equational reasoning. (That is, do not prove any of the properties by mutual inclusion.) (Hint: Take advantage of exercises 87 and 88.) (a) µX:: a·X∗  = a+ , (b) µX:: (a+X)∗  = µX:: (a·X)∗  = µX:: a + X∗  = a∗ , (c) µX:: a + X·b·X = a·(b·a)∗ , 61 (d) µX:: 1 + a·X·b·X + b·X·a·X = µX:: 1 + a·X·b + b·X·a + X·X . 2 Exercise 94 Verify each of the rules given in this subsection. 2 7.2 Fusion This section is about perhaps the most important ˛xed point rule of all, the fusion rule. In words, the fusion rule provides a condition under which the application of a function to a least ˛xed point can be expressed as a least ˛xed point. The rule is important because many computational problems are initially speci˛ed as such a function application, but are solved by computing the (least or greatest) solution to an appropriate ˛xed point equation. Several examples arise as path-˛nding problems. Given a graph with \weighted" edges (formally, a function from the edges to the real numbers), the length of a path is de˛ned to be the sum of the edge weights, and the width of a path is de˛ned to be the minimum of the edge weights. The shortest distance between two nodes is the minimum, over all paths between the nodes, of the length of the path whilst the largest gap between two nodes is the maximum, over all paths between the nodes, of the width of the path. Thus both the shortest distance and the largest gap are de˛ned as functions applied to the set of all paths between two given nodes; in the case of shortest distance, the function is \minimize the length", and, in the case of the largest gap, the function \maximize the width". However, both these problems are typically solved not by ˛rst enumerating all paths (an impossible task in the general case because there are in˛nitely many of them) but by solving directly recursive equations de˛ning the shortest distance and largest gap functions. The formal justi˛cation for this process relies on the fusion theorem below. The most powerful of the two rules characterising least pre˛x points is the induction rule. Its power is, however, somewhat limited because it only allows one to calculate with orderings in which the µ operator is the principal operator on the lower side of the ordering (i.e. orderings of the form µf  · · · ). Formally the fusion rule overcomes this restriction on the induction rule by combining the calculation properties of Galois connections with those of ˛xed points. Theorem 95 ( µ -fusion) Suppose f ∈ A ← B is the lower adjoint in a Galois connection between the posets (A , ) and (B , ) . Suppose also that g ∈ (B , ) ← (B , ) and h ∈ (A , ) ← (A , ) are monotonic functions. Then 62 (a) _ h•f , f.µg  µh ⇐ f•g  (b) f.µg = µh ⇐ f•g = h•f . Indeed, if the condition f•g = h•f holds, f is the lower adjoint in a Galois connection between the posets (Pre.h , ) and (Pre.g , ) . 2 We call this theorem µ -\fusion" because it states when application of function, f , can be \fused" with a ˛xed point, µg , to form a ˛xed point, µh . (The rule is also used, of course, to \defuse" a ˛xed point into the application of a function to another ˛xed point.) Another reason for giving the rule this name is because it is the basis of so-called \loop fusion" techniques in programming: the combination of two loops, one executed after the other, into a single loop. The rule is also called the transfer lemma because it states when function f \transfers" the computation of one ˛xed point ( µg ) into the computation of another ˛xed point ( µh ). Exercise 96 Prove the fusion rule. (Hint: ˛rst show that the function k ∈ B ← A _ k•h . Use this to show that f is maps pre˛x points of h into pre˛x points of g if g•k  the lower adjoint in a Galois connection between the posets (Pre.h , ) and (Pre.g , ) .) 2 Note that in order to apply µ -fusion we do not need to know the upper adjoint of function f , we only need to know that it exists. The fundamental theorem of Galois connections is the most commonly used tool to establish the existence of an upper adjoint. As an aid to memorising the µ -fusion theorems note that the order of f , g ,  or = , and h is the same in the consequent and the antecedent, be it that in the antecedent the lower adjoint occurs twice, in di¸erent argument positions of • . The conclusion of µ -fusion | µh = f.µg | involves two premises, that f be a lower adjoint, and that f•g = h•f . The rule is nevertheless very versatile since being a lower adjoint is far from being uncommon, and many algebraic properties take the form f•g = h•f for some functions f , g and h . The next lemma also combines ˛xed points with Galois connections. We call it the exchange rule because it is used to exchange one lower adjoint for another. Its proof is left as an exercise. (Hint: for part (a) use induction followed by the rolling rule.) 63 Lemma 97 (Exchange Rule) Suppose f has type (A , ) ← (B , ) , and g and h both have type (B , ) ← (A , ) . Then, if g is a lower adjoint in a Galois connection, (a) _ h•f•g . µ(g•f)  µ(h•f) ∧ µ(f•g)  µ(f•h) ⇐ g•f•h  Furthermore, if both g and h are lower adjoints, then (b) µ(g•f) = µ(h•f) ∧ µ(f•g) = µ(f•h) ⇐ g•f•h = h•f•g . 2 The two conclusions of (b), µ(g•f) = µ(h•f) and µ(f•g) = µ(f•h) , say that g and h can be exchanged within a µ term in which they are composed after or before an arbitrary function f . Exercise 98 Prove the exchange rule. The two conjuncts in the consequent of 97(b) can be combined into one equation. The resulting lemma states when lower adjoint g can be replaced by lower adjoint h in an arbitrary context. The lemma has four variables rather than three. Find the lemma and prove its validity. 2 7.2.1 Applications We mentioned path-˛nding problems earlier as an example of the use of fusion. Space does not allow us to expand this in detail. Instead, this section illustrates the use of the rule by some examples involving languages. Shortest Word As explained earlier, a context-free grammar de˛nes a function from languages to languages. The language de˛ned by the grammar is the least ˛xed point of this function. There is a number of computations that we may wish to perform on the grammar. These include testing whether the language generated is nonempty or not, and determining the length of a shortest word in the grammar. Let us consider the latter problem. (In fact, the problem is intimately related to the shortest path problem on graphs.) For concreteness we will demonstrate how to solve the problem using the grammar with productions S ::= aS | SS | ε . The language de˛ned by the grammar is thus µX:: {a}·X ∪ X·X ∪ {ε} 64 (Of course the problem is very easily solved for a speci˛c example as simple as this one. Try to imagine however that we are dealing with a grammar with a large number of nonterminals and a large number of productions.) Given a language L de˛ned in this way, the general problem is to ˛nd #L where the function # is de˛ned by extending the length function on words to a shortest length function on languages. Speci˛cally, letting length.w denote the length of word w (the number of symbols in the word), #L = ⇓w: w∈L: length.w (where ⇓ denotes the minimum operator | the minimum over an empty set is de˛ned to be in˛nity). Now, because # is the in˛mum of the length function it is the lower adjoint in a Galois connection. Indeed, #L ≥ k ≡ L ⊆ Σ≥ k where Σ≥ k is the set of all words (in the alphabet Σ ) whose length is at least k . Noting the direction of the ordering on the left-hand side of this Galois connection, we get that #(µ⊆ f) = µ≥ g ⇐ #•f = g•# . Applying this to our example grammar, we ˛ll in f and calculate g : # • X:: {a}·X ∪ X·X ∪ {ε} = g • # ≡ { de˛nition of composition } ∀X:: #({a}·X ∪ X·X ∪ {ε}) = g.(#X) ≡ { # is a lower adjoint and so distributes over ∪ , } de˛nition of # ∀X:: #({a}·X) ↓ #(X·X) ↓ #{ε} = g.(#X) ≡ { #(Y·Z) = #Y + #Z , #{a} = 1 , #{ε} = 0 } (1 + #X)↓(#X + #X)↓0 = g.(#X) ⇐ { } instantiation ∀k:: (1+k)↓(k+k)↓0 = g.k . In this way we have determined that, for the language L de˛ned by the above grammar, #L = µ≥ k:: (1+k)↓(k+k)↓0 . That is, the length of a shortest word in the language de˛ned by the grammar with productions S ::= aS | SS | ε 65 is the greatest value k satisfying k = (1+k) ↓ (k+k) ↓ 0 . Note how the structure of the equation for k closely follows the structure of the original grammar | the alternate operator has been replaced by minimum, and concatenation of languages has been replaced by addition. What the example illustrates is that the length of the shortest word in the language generated by a context-free grammar can always be computed by determining the greatest ˛xed point (in the usual ≤ ordering on numbers | that is, the least ˛xed point if numbers are ordered by ≥ ) of an equation (or system of simultaneuous equations) obtained by replacing all occurrences of alternation in the grammar by minimum and all occurrences of concatenation by addition. Moreover, the key to the proof of this theorem is the fusion rule. Membership of a language To illustrate the use of fusion yet further we consider two examples. The ˛rst is an example of how the fusion theorem is applied; the second illustrates how the fusion theorem need not be directly applicable. We discuss brie‚y how the second example is generalised in such a way that the fusion theorem does become applicable. Both examples are concerned with membership of a set. So, let us consider an arbitrary set U . For each x in U the predicate ( x∈ ) maps a subset P of U to the boolean value true if x is an element of P and otherwise to false . The predicate ( x∈ ) preserves set union. That is, for all bags S of subsets of U , x ∈ ∪S ≡ ∃P: P∈S: x∈P . According to the fundamental theorem, the predicate ( x∈ ) thus has an upper adjoint. Indeed, we have, for all booleans b , x∈S ⇒ b ≡ S ⊆ if b → U 2 ¬b → U\{x} fi . Now suppose f is a monotonic function on sets. Let µf denote its least ˛xed point. The fact that ( x∈ ) is a lower adjoint means that we may be able to apply the fusion theorem to reduce a test for membership in µf to solving a recursive equation. Speci˛cally (x ∈ µf ≡ µg) ⇐ ∀S:: x ∈ f.S ≡ g.(x ∈ S) . That is, the recursive equation with underlying endofunction f is replaced by the equation with underlying endofunction g (mapping booleans to booleans) if we can establish the property ∀S:: x ∈ f.S ≡ g.(x ∈ S) . 66 An example of where this is always possible is testing whether the empty word is in the language de˛ned by a context-free grammar. For concreteness, consider again the grammar with just one nonterminal S and productions S ::= | aS SS | ε Then the function f maps set X to {a}·X ∪ X·X ∪ {ε} . We compute the function g as follows: ε ∈ f.S = { de˛nition of f } ε ∈ ({a}·S ∪ S·S ∪ {ε}) = { membership distributes through set union } ε ∈ {a}·S ∨ ε ∈ S·S ∨ ε ∈ {ε} = { ε ∈ X·Y ≡ ε ∈ X ∧ ε ∈ Y } (ε ∈ {a} ∧ ε ∈ S) ∨ (ε ∈ S ∧ ε ∈ S) ∨ ε ∈ {ε} = { • g.b = (ε ∈ {a} ∧ b) ∨ (b ∧ b) ∨ ε ∈ {ε} , see below for why the rhs has not been simpli˛ed further } g.(ε ∈ S) . We have thus derived that ε ∈ µX:: {a}·X ∪ X·X ∪ {ε} ≡ µb:: (ε ∈ {a} ∧ b) ∨ (b ∧ b) ∨ ε ∈ {ε} . Note how the de˛nition of g has the same structure as the de˛nition of f . E¸ectively set union has been replaced by disjunction and concatenation has been replaced by conjunction. Of course, g can be simpli˛ed further (to the constant function true ) but that would miss the point of the example. Now suppose that instead of taking x to be the empty word we consider any word other than the empty word. Then, repeating the above calculation with \ ε ∈ " replaced everywhere by \ x ∈ ", the calculation breaks down at the second step. This is because the empty word is the only word x that satis˛es the property x ∈ X·Y ≡ x ∈ X ∧ x ∈ Y 67 for all X and Y . Indeed, taking x to be a for illustration purposes, we have a ∈ µX:: {a}·X ∪ X·X ∪ {ε} ≡ true but µb:: (a∈{a} ∧ b) ∨ (b ∧ b) ∨ a ∈ {ε} ≡ false . This second example emphasises that the conclusion of µ -fusion | µh = f.µg | demands two properties of f , g and h , namely that f be a lower adjoint, and that f•g = h•f . The rule is nevertheless very versatile since being a lower adjoint is far from being uncommon, and many algebraic properties take the form f•g = h•f for some functions f , g and h . In cases when the rule is not immediately applicable we have to seek generalisations of f and/or g that do satisfy both properties. For the membership problem above, this is achieved by generalising the problem of applying the function ( x∈ ) to the language but the matrix of functions ( u∈ ) where u is a segment of x . This is the basis of the Cocke-Younger-Kasami algorithm for general context-free language recognition. 7.3 Uniqueness An important issue when confronted with a ˛xed point equation is whether or not the equation has a unique solution. Uniqueness is a combination of there being at least one solution and at most one solution. In a complete lattice every monotonic function has a least and a greatest ˛xed point, so the question is whether the least and greatest ˛xed points are equal. A very important special case is when the equation x:: x = a + b·x has a unique solution in a Kleene algebra. In order to give a general answer to this question, it is necessary to make a further assumption about the partial ordering relation on elements in the algebra. The assumption we make is that the elements in the algebra form a complete lattice under the partial ordering relation and, furthermore, this ordering is such that, for each a , the section ( a+ ) is the upper adoint in a Galois connection. This important property is one that distinguishes a regular algebra from a Kleene algebra. The other additional property of a regular algebra (which we do not need to consider until later) is that each section ( b· ) is a lower adjoint in a Galois connection. An ordering relation that is complete and is such that sections of the binary supremum operator are upper adjoints in a Galois connection is called a complete, completely distributive lattice. (It is called completely distributive because being an upper adjoint 68 is equivalent to being universally distributive over all in˛ma.) A regular algebra is thus a Kleene algebra for which underlying poset is a complete, completely distributive lattice and such that each section ( b· ) is a lower adjoint in a Galois connection. The assumption of being completely distributive clearly holds for languages and relations, since in both cases the ordering relation is the subset relation and addition is set union and we have the shunting rule: b ⊆ a∪c ≡ ¬a ∩ b ⊆ c . The assumption that ( a+ ) is the upper adoint in a Galois connection allows us to use fusion as follows: If ( y + ) is an upper adjoint, then we have, for all a and b , Theorem 99 νx:: a + x·b = y + νx:: x·b ⇐ y = a + y·b . Proof νx:: a + x·b = y + νx:: x·b ⇐ { ( y + ) is upper adjoint: ν -fusion } ∀x:: a + (y+x)·b = y + x·b ⇐ { ( ·b ) distributes over + , associativity of + } a + y·b = y . 2 As a consequence, in a regular algebra, the largest solution of the equation x:: x = a + x·b is the sum (i.e. supremum) of an arbitrary solution and the largest solution of the equation x:: x = x·b . Note that a special choice for y in theorem 99 is y = a · b∗ . An immediate corollary of theorem 99 is that if νx:: x·b = 0 , function x:: a + x·b has a unique ˛xed point. This is the rule we call the unique extension property (UEP) of regular algebra. Theorem 100 (The unique extension property (UEP)) of a regular algebra. If νx:: x·b = 0 , then, for all a and x , Suppose b is an element a + x·b = x ≡ x = a · b∗ . 2 The UEP draws attention to the importance of property νx:: x·b = 0 . In language theory it is equivalent to ε∈b since if, on the contrary, x is a non-empty set such that x = x·b then the length of the shortest word in x must be equal to the length of the 69 shortest word in b plus the length of the shortest word in x. That is, the length of the shortest word in b is zero. The terminology that is often used is \ b does not possess the empty-word property". In relation algebra we say \ b is well-founded": the property expresses that there are no in˛nite sequences of b -related elements (thus, if relation b represents a ˛nite directed graph, νx:: x·b = 0 means that the graph is acyclic). Exercise 101 Consider the ˛nal three instances of a Kleene algebra shown in table 6.3.1. In each of these instances, suppose b is a square matrix with entries drawn from the carrier set of the relevant instance. (Equivalently, suppose b is a graph with edge labels drawn from the carrier set. If the ( i,j )th matrix entry is 0 then there is no edge from i to j in the graph.) What is the interpretation of νx:: x·b = 0 in each case? 2 7.4 Parameterised Prefix Points Often ˛xed point equations are parameterised, sometimes explicitly, sometimes implicitly. An example of implicit parameters is the de˛nition of the star operator discussed in section 6.3. In the de˛ning equations given there (see for example equation (79)) a and b are parameters; the equations do not depend on how a and b are instantiated. The equations de˛ne the function mapping a to a∗ . The rules we present in this section are concerned with such parameterised ˛xed point equations. The ˛rst rule we present, the abstraction rule, is rarely given explicitly although it is very fundamental. The second rule, called the \beautiful theorem" because of its power and generality, is well-known in one particular form |its application to proving continuity properties of functions de˛ned by ˛xed point equations| but is less well known at the level of generality given here. 7.4.1 The Abstraction Rule The de˛nition of a∗ in section 6.3 is a very good example of the \abstraction rule". Recall that a∗ is de˛ned to be the least pre˛x point of the function mapping x to 1 + a·x . Thus a is a parameter in the de˛nition of the star operator. The remarkable, and very fundamental, property is that the star operator is itself a least pre˛x point. Indeed, suppose we consider the function g:: a:: 1 + a · g.a . (Note that this function maps an endofunction g on the Kleene algebra to an endofunction on the Kleene algebra and so is a (higher order) endofunction.) Then the star 70 operator is the least ˛xed point of this function. That is, a:: a∗  = µg:: a:: 1 + a · g.a . We leave the veri˛cation of this fact as an exercise. (We prove a more general theorem shortly.) The general theorem that this illustrates is obtained in the following way. We begin by making explicit the fact that a is a parameter in the de˛nition of the star operator. This we do by considering the binary operator ⊕ de˛ned by a⊕x = 1 + a·x . Then, by de˛nition, a∗ = µx:: 1 + a·x . So a:: a∗  = a:: µx:: a⊕x . This is just a repetition of the de˛nition in section 6.3 but slightly more roundabout. But now we consider the function F from functions to functions de˛ned by F = g:: a:: a ⊕ g.a . (That is, (F.g).a = a ⊕ g.a .) So for the particular de˛nition of ⊕ given above, F = g:: a:: 1 + a · g.a . Then the theorem is that a:: µx:: a⊕x = µF for all (monotonic) binary operators ⊕ and all F de˛ned as above. That is, for all monotonic binary operators ⊕ , (102) a:: µx:: a⊕x = µg:: a:: a ⊕ g.a . This rule we call the abstraction rule. In order to verify the validity of the rule we have to verify that a:: µx:: a⊕x is a pre˛x point of the function F , and that it is at most any pre˛x point of F. This turns out to be straightforward. First, it is a pre˛x point by the following argument: _ a:: µx:: a⊕x F.a:: µx:: a⊕x  ≡ { pointwise ordering, de˛nition of (F.g).a } ∀a:: a ⊕ a:: µx:: a⊕x.a  µx:: a⊕x ≡ { true . µx:: a⊕x is a pre˛x point of ( a⊕ ) } 71 _ g . Second, it is at most any pre˛x point. Suppose g is a pre˛x point of F , i.e. F.g  Then _ g a:: µx:: a⊕x  ≡ { pointwise ordering } ∀a:: µx:: a⊕x  g.a ⇐ { ˛xed point induction: (70) } ∀a:: a ⊕ g.a  g.a ≡ { pointwise ordering, de˛nition of (F.g).a } _ g F.g  ≡ { g is a pre˛x point of F } true . This concludes the proof. Exercise 103 Suppose the operator ⊕ is de˛ned by x⊕y = f.y . What property does one then obtain by instantiating the abstraction rule? 2 7.4.2 The Beautiful Theorem A very important application of the abstraction rule in combination with ˛xed point fusion is a theorem that has been dubbed \beautiful" by Dijkstra and Scholten [DS90, p. 159]. As observed precisely above, a parameterised least-˛xed-point equation de˛nes a function in terms of a binary operator. The beautiful theorem is that this function enjoys any kind of supremum-preserving property enjoyed by the binary operator. Thus supremumpreserving properties are preserved in the process of constructing least ˛xed points. Dijkstra and Scholten formulate the \beautiful theorem" in the context of the predicate calculus. In terms of poset theory the theorem is stated as follows. Suppose that A = (A,) and B = (B, ) are ordered sets and suppose ⊕ ∈ (A←A)←B . (Thus ⊕ is a monotonic function mapping elements of the set B into monotonic endofunctions on the set A .) As in section 7.4.1 denote application of ⊕ to x by x⊕ . Assume µ(x⊕) exists for each x and consider the function x:: µ(x⊕) , which we denote by † . The \beautiful theorem" is that † enjoys any type of supremum-preserving property that is enjoyed by the (uncurried binary) function ⊕ . More precisely, _ = (f) ⊕ (g) , ∀f:: (†•f) = †(f) ⇐ ∀f, g:: (f⊕g) 72 _ is the function mapping x to f.x ⊕ g.x , • denotes function where, by de˛nition, f⊕g composition and  denotes the supremum operator on functions of the shape of f and g. The supremum operator, if it exists for a class of functions of a certain shape, is the lower adjoint in a Galois connection. This is the key to the proof of the beautiful theorem. In order to make this explicit we formulate the theorem yet more abstractly. The theorem is that for arbitrary function G that is the lower adjoint in a Galois connection, and arbitrary function H ( G and H having of course appropriate types), _ = H.f ⊕ G.g . ∀f:: G.(†•f) = †(H.f) ⇐ ∀g:: G.(f⊕g) The proof is very short. We ˛rst note that (†•f).x = †(f.x) = µ((f.x)⊕) , by de˛nition of _ . This is just what we need in order to † . Thus, by the abstraction rule, †•f = µ(f⊕) apply the fusion theorem. G.(†•f) = † .(H.f) ≡ { above } _ = µ((H.f)⊕) G . µ(f⊕) ⇐ { by assumption, G is a lower adjoint. fusion: theorem 95 } _ = H.f ⊕ G.g . ∀g:: G.(f⊕g) 7.4.3 Cocontinuity of Iteration We conclude the discussion of parameterised ˛xed points by considering the implications of the beautiful theorem for the iteration operator. Speci˛cally, we show that in a regular algebra the iteration operator is cocontinuous 5 . Speci˛cally, this is the property that for an ascending sequence f.0 ≤ f.1 ≤ . . . ≤ f.n ≤ . . . (Σn: 0 ≤ n: f.n)∗ = Σn: 0 ≤ n: (f.n)∗  . A property of a regular algebra that we need to complete the proof is that, for all such sequences f and for all z , z · Σn: 0 ≤ n: f.n = Σn: 0 ≤ n: z · f.n and 5 Σn: 0 ≤ n: f.n · z = Σn: 0 ≤ n: f.n · z There is a disturbing confusion between the standard terminology in the computing science literature and the literature on category theory in this respect. What is called a \continuity" property in the computing science literature is called a \cocontinuity" property in the category theory literature. We use the category theory terminology here. 73 The property of iteration is thus that in a Kleene algebra in which the sections ( ·z ) and ( z· ) are cocontinuous, for all z , the iteration operator ∗ is also cocontinuous. (This proviso is satis˛ed in a regular algebra and thus in all Kleene algebras of interest to us.) Recalling the discussion in the introduction to this section, the instantiation of the beautiful theorem we wish to consider is as follows: the function † is the operator ∗ (i.e. the function a:: a∗  ; correspondingly the binary operator ⊕ is de˛ned by a⊕x = 1 + a·x . The functions G and H are both taken to be the supremum operator of ascending functions on the natural numbers to the carrier of a regular algebra, which we denote by Σ . The instantiation of the beautiful theorem (with G,H,† := Σ,Σ,∗ ) gives us, for all ascending sequences f : Σn:: (f.n)∗  = (Σf)∗ ⇐ ∀f, g:: Σn:: 1 + f.n · g.n = 1 + (Σf)·(Σg) where the dummies n range over the natural numbers, and the dummies f and g range over ascending sequences. We try to simplify the antecedent of this theorem. First, we have: Σn:: 1 + f.n · g.n ≤ 1 + (Σf)·(Σg) ≡ { de˛nition of supremum } ∀n:: 1 + f.n · g.n ≤ 1 + (Σf)·(Σg) ⇐ { + is the binary supremum operator } ∀n:: f.n · g.n ≤ (Σf)·(Σg) ≡ { property of suprema } true . So it is the opposite inclusion that is crucial. Now, (Σf)·(Σg) = { • assume, for all z , ( ·z ) is cocontinuous } Σn:: f.n · (Σg) = { • assume, for all z , ( z· ) is cocontinuous Σn:: Σm:: f.n · g.m . So, Σn:: 1 + f.n · g.n ≥ 1 + (Σf)·(Σg) ≡ { + is the binary supremum operator } } 74 Σn:: 1 + f.n · g.n ≥ 1 ∧ Σn:: 1 + f.n · g.n ≥ (Σf)·(Σg) ⇐ { Σn:: 1+c ≥ 1 , for all c , and 1 + f.n · g.n ≥ f.n · g.n } Σn:: f.n · g.n ≥ (Σf)·(Σg) ≡ { above } Σn:: f.n · g.n ≥ Σn:: Σm:: f.n · g.m ≡ { suprema, dummy change } ∀n, m:: Σp:: f.p · g.p ≥ f.n · g.m ⇐ { Σp:: f.p · g.p ≥ f.(n↑m) · g.(n↑m) where ( n↑m ) denotes the maximum of m and n } ∀n, m:: f.(n↑m) · g.(n↑m) ≥ f.n · g.m ≡ { f and g are ascending sequences, product is monotonic } true . This completes the proof. 7.5 Mutual Recursion In all but the simplest cases, recursive functions are de˛ned by mutual recursion. For example, the BNF de˛nition of the syntax of a programming language uses mutual recursion to simultaneously de˛ne the syntax of expressions, statements, declarations etc. A de˛nition by mutual recursion can be seen as a single ˛xed point equation where there is one unknown |a vector of values| and the function de˛ning the unknown maps a vector to a vector. On the other hand, one also wishes to view a de˛nition by mutual recursion as a collection of individual equations, de˛ning several unknown values, and which may be solved on an individual basis. In this section we show that these two views are compatible with each other. We present one theorem to the e¸ect that least pre˛x points can be calculated by solving individual equations and back-substituting. Without loss of generality we can con˛ne ourselves to ˛xed point equations involving two unknowns: a ˛xed point equation with n+1 unknowns can always be viewed as an equation on two unknowns, one of which is a vector of length n . Any ˛xed point of a function mapping pairs of values to pairs of values is, of course, a pair. We show how to calculate the individual components of the least ˛xed point of such a function, given that it is possible to calculate least ˛xed points of functions mapping single elements to single elements. The ˛xed-point equation to be dealt with 75 is the following. (104) x, y:: x = xy ∧ y = x⊗y . We ˛rst consider two special cases in which  and ⊗ depend on one of their arguments only. Lemma 105 (a) ( µf , µg ) = µx, y:: (f.x,g.y) (b) ( µ(f•g) , µ(g•f) ) = µx, y:: (f.y,g.x) . Proof of (a). This is an instance of the abstraction rule. To see this, rename the functions to f0 and f1 and their arguments to x0 and x1 . Then (102) says that i:: µx:: fi.x = µg:: i:: fi.(g.i) Identifying the pair ( x0,x1 ) with the function i:: x.i , µg:: i:: fi.(g.i) = µx0,x1 :: (f0.x0 , f1.x1) . So i:: µx:: fi.x = µx0,x1 :: (f0.x0 , f1.x1) as required. Proof of (b) ( µ(f•g) , µ(g•f) ) = µx, y:: (f.y,g.x) ⇐ { (a) on lhs } µx, y:: (f.g.x , g.f.y) = µx, y:: (f.y,g.x) ≡ { de˛ne φ : φ(x,y) = (f.y , g.x) } µ(φ•φ) = µφ ≡ { square rule } true . 2 With the aid of lemma (105), we now compute the least solution of (104), viz. we prove 76 Theorem 106 ( µx:: x  p.x , µy:: q.y ⊗ y ) = µx, y:: (xy,x⊗y) where p.x = µv:: x⊗v and q.y = µu:: uy , i.e. p.x and q.y are the least ˛xed points of the individual equations. Proof µx, y:: (xy,x⊗y) = { } diagonal rule (92) µx, y:: µu, v:: (uy,x⊗v) = { lemma (105a) and de˛nition of p and q } µx, y:: (q.y,p.x) = { lemma (105b) } ( µx:: q.(p.x) , µy:: p.(q.y) ) = { de˛nition q , p } ( µx:: µu:: u  p.x , µy:: µv:: q.y ⊗ v ) = { diagonal rule (92) twice: u := x , v := y } ( µx:: x  p.x , µy:: q.y ⊗ y ) . 2 This concludes the presentation of the calculus. The corresponding calculus of greatest ˛xed points is obtained by the interchanges µ ↔ ν ,  ↔  , and lower adjoint ↔ upper adjoint. Exercise 107 the following: Suppose f is a (unary) function and ⊗ is a binary operator. Prove (a) µx, y:: (f.x , x⊗y) = ( µf , µy:: µf ⊗ y ) (b) µx, y:: (f.y , x⊗y) = µx, y:: (f.(x⊗y) , x⊗y) . Hint : Use theorem 106 for (a). Use the diagonal rule and lemma 105 for (b). 2 77 7.6 An Illustration — Arithmetic Expressions The Algol 60 de˛nition of arithmetic expressions, restricted to just multiplication and addition, has the following form. The nonterminals Term and Factor serve to introduce a precedence of multiplication over addition, and the use of left recursion (for example, \ Expression " is repeated as the leftmost symbol on the right of the production Expression ::= Expression + Term ) serves to de˛ne a left-to-right evaluation order on repeated occurrences of the same operator. Expression ::= Expression + Term | Term Term ::= Term × Factor | Factor Factor ::= ( Expression ) | Variable A much simpler grammar for such arithmetic expressions takes the following form. Expression ::= Expression + Expression | Expression × Expression | ( Expression ) | Variable This grammar imposes no precedence of multiplication over addition and is ambivalent about the order in which repeated additions or multiplications are evaluated. (The former is clearly undesirable but the latter is of no consequence because addition and multiplication are associative.) We want to prove that the languages de˛ned by Expression in the two grammars above are equal. The proof is a combination of the ˛xed point calculus presented in this section and Kleene algebra presented in chapter 6.3. To avoid confusion, let us rename the nonterminal and terminal symbols in the grammars. (We need to rename the nonterminals in order not to confuse the two instances of Expression ; we need to rename the terminal symbols because of the use of \ + " and the parentheses \(" and \)" in both regular expressions (the meta language) and in arithmetic expressions (the object language).) Choosing letters a , b , c , d and e for \ + ", \ × ", \(", \)" and Variable , and renaming Expression , Term and Factor as E , T and F , respectively, the ˛rst grammar is transformed to the following. E ::= E a T T ::= T b F F ::= c E d | T | F | e Using the same renaming for the terminal symbols but renaming Expression as D , we get the following for the second grammar. 78 D ::= D a D | DbD | cDd | e The task is now to prove that the languages generated by E and D are equal. Note that the proof does not rely on a , b , c , d and e being terminal symbols. They may denote arbitrary languages. We begin with an informal calculation. Then we show how the informal calculation is justi˛ed using the rules of the ˛xed point calculus. The informal calculation treats the grammars as systems of simultaneous equations. We have four equations: (108) E = E·a·T + T , (109) T = T ·b·F + F , (110) F = c·E·d + e , (111) D = D·a·D + D·b·D + c·D·d + e . The ˛rst step is to \solve" the equations for E and T . That is, we eliminate E and T from, respectively, the right sides of equations (108) and (109). We get: (112) E = T ·(a·T )∗ and (113) T = F·(b·F)∗ . Now we substitute the right side of the equation for T in the right side of the equation for E . We get: E = F·(b·F)∗ ·(a·F·(b·F)∗ )∗ which can be simpli˛ed using star decomposition (exercise 87(e)) to E = F·(b·F + a·F)∗ . This can in turn be simpli˛ed, using the \arithmetic" rules of Kleene algebra, to E = F·((a+b)·F)∗ . So, applying exercise 93(c), E = µX:: F + X·(a+b)·X . 79 Now we substitute the right side of equation (110) into this last equation, in order to eliminate F . We obtain E = µX:: c·E·d + e + X·(a+b)·X . \Solving" this equation by eliminating E on the right side: E = µY:: µX:: c·Y·d + e + X·(a+b)·X . The last step is to apply the diagonal rule, together with a little \arithmetic": E = µX:: X·a·X + X·b·X + c·X·d + e . That is, E solves the equation (111) for D . Let us now conduct this calculation formally using the ˛xed point calculus. The ˛rst two steps combine the use of the diagonal rule with exercise 107(a) to justify the derivation of equations (112) and (113). The third step uses exercise 107(b) to justify the substitution of F · (b·F)∗ for T in the informal proof. µE, T, F:: (E·a·T + T , T ·b·F + F , c·E·d + e) = { diagonal rule } µE, T, F:: µX, Y, Z:: (X·a·T + T , Y·b·F + F , c·E·d + e) = { exercise 107(a) and de˛nition of ∗ (twice) } µE, T, F:: (T ·(a·T )∗ , F · (b·F)∗ , c·E·d + e) = { exercise 107(b) } µE, T, F:: ((F · (b·F)∗ ·(a·F · (b·F)∗ )∗ ) , F · (b·F)∗ , c·E·d + e) . The remaining steps in the calculation do not alter the subexpressions \ F · (b·F)∗ " and \ c·E·d + e ". We therefore introduce abbreviations for these subexpressions. Note how formally all elements of the system of equations being solved have to be carried along whereas it is only the equation for E that is being manipulated, just as in the informal calculation. µE, T, F:: ((F · (b·F)∗ ·(a·F · (b·F)∗ )∗ ) , F · (b·F)∗ , c·E·d + e) = { introduce the abbreviation Φ for F · (b·F)∗ and Ψ for c·E·d + e } µE, T, F:: (F · (b·F)∗ ·(a·F · (b·F)∗ )∗ , Φ , Ψ) = { star decomposition (exercise 87(e)), arithmetic µE, T, F:: (F·((a+b)·F)∗ , Φ , Ψ) } 80 = { exercise 93(c) } µE, T, F:: (µX:: F + X·(a+b)·X , Φ , Ψ) = { exercise 107(b), de˛nition of Ψ } µE, T, F:: (µX:: c·E·d + e + X·(a+b)·X , Φ , Ψ) . In the ˛nal part of the calculation, we again use exercise 107(a) in combination with the diagonal rule. µE, T, F:: (µX:: c·E·d + e + X·(a+b)·X , Φ , Ψ) = { } exercise 107(a) µE, T, F:: µX, Y, Z:: (X·a·X + X·b·X + c·E·d + e , Φ , Ψ) = { diagonal rule } µE, T, F:: (E·a·E + E·b·E + c·E·d + e , Φ , Ψ) = { exercise 107(a) where Θ = µE:: E·a·E + E·b·E + c·E·d + e } (µE:: E·a·E + E·b·E + c·E·d + e , µT, F:: (F · (b·F)∗ , c·Θ·d + e)) . No use has been made of the mutual recursion rule, theorem 106, in this calculation although it is applicable to several steps. This is often the case. Theorem 106 is a combination of the diagonal rule and lemma (105) and exercise 107 is another combination. Often it is easier to use the simpler rules, or some straightforward combination of them, than it is to apply the more complicated rule. 8 Further Reading In spite of the name, the general concept of a \Galois" connection was not invented by « the famous mathematician Evariste Galois, but an instance of a \Galois connection" is the \Galois correspondence" between groups and extension ˛elds to which Galois owes his fame. The term \adjoint function" used here (and in category theory texts) is derived from the original terminology used by Galois. (Extending a ˛eld involves \adjoining" new values to the ˛eld.) Detailed discussion of the Galois correspondence can be found in [Ste89, Fen91]. Feijen [FB90] has written a delightful article on the Galois connection de˛ning the maximum of two numbers. The source of several of the properties of the ‚oor and ceiling functions discussed in section (2) is [GKP89]. The Galois connections de˛ning the ‚oor and ceiling function are mentioned in [GKP89] but never used because the 81 authors \can never remember which inequality to use"! Further elementary examples of Galois connections can be found in [Lam94]. The idea of extracting a Galois connection from a relation would appear to be due to Hartmanis and Stearns [HS64, HS66]. The same idea appears in [Lam81]. In that paper Lambek gives several mathematical examples of polarities. Further examples in the areas of geometry and the theory of rings and groups can be found in Birkho¸’s classic text [Bir67]. The notion of concept analysis is discussed in detail in [GW99]. Conditional correctness assertions were studied by Hoare in [Hoa69] and are called Hoare triples. The conclusion of section 5 appears in [MSS92]; the solution presented here is based on Aarts [Aar92]. The use of the pre˛x lemma has been strongly in‚uenced by Lambek’s article on so-called subequalizers [Lam70]. Lambek [Lam81] summarises the philosophy of dialectical contradictions as \the view [due to Heraclitus] of the universe as a sort of divine debating society in which opposing ideas are forever struggling to produce a synthesis". He attributes the phrase \unity of opposites" to Mao Tse-tung, and the idea that dialectical contradictions are illustrated by adjoint functions6 to discussions he had with Bill Lawvere in 1965-6. Lambek and Scott [LS86] describe the unity of opposites theorem as \the most interesting consequence of a Galois correspondence". Our own use of the name \unity-of-opposites" theorem in this text is not intended to suggest some deep philosophical signi˛cance but simply to help make the theorem stick in one’s memory. 6 Actually functors, not functions. Monotonic functions are a particular case of the notion in category theory of a functor, and a Galois connection is a particular case of the notion of an adjunction. Many of the examples in Lambek’s paper are however Galois connections and can be understood without a knowledge of category theory. 82 References [Aar92] C.J. Aarts. Galois connections presented calculationally. Afstudeer verslag (Graduating Dissertation), Department of Computing Science, Eindhoven University of Technology, 1992. [BC75] R.C. Backhouse and B.A. Carr«e. Regular algebra applied to path-˛nding problems. Journal of the Institute of Mathematics and its Applications, 15:161{186, 1975. [Bir67] Garrett Birkho¸. Lattice Theory, volume 25 of American Mathematical Society Colloquium Publications. American Mathematical Society, Providence, Rhode Island, 3rd edition, 1967. [DP90] B. A. Davey and H. A. Priestley. Introduction to Lattices and Order. Cambridge Mathematical Textbooks. Cambridge University Press, ˛rst edition, 1990. [DS90] E.W. Dijkstra and C.S. Scholten. Predicate Calculus and Program Semantics. Springer-Verlag, Berlin, 1990. [FB90] W.H.J. Feijen and Lex Bijlsma. Exercises in formula manipulation. In E.W. Dijkstra, editor, Formal Development of Programs and Proofs, pages 139{ 158. Addison-Wesley Publ. Co., 1990. [Fen91] Maureen H. Fenrick. Introduction to the Galois Correspondence. Birkha­ user Boston, 1991. [Gen69] G. Gentzen. Investigations into logical deduction. In M.E. Szabo, editor, The Collected Papers of Gerhard Gentzen, pages 68{213. North-Holland, Amsterdam, 1969. [GHK + 80] G. Gierz, K. H. Hofmann, K. Keimel, J.D. Lawson, M. Mislove, and D. S. Scott. A Compendium of Continuous Lattices. Springer-Verlag, 1980. [GKP89] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics : a Foundation for Computer Science. Addison-Wesley Publishing Company, 1989. [GW99] Bernhard Ganter and Rudolf Wille. Formal Concept Analysis. Mathematical Foundations. Springer-Verlag Berlin Heidelberg, 1999. Translated from the German by Cornelia Franzke. Title of the original German edition: Formale Begri¸sanalyse { Mathematische Grundlagen. 83 [Hoa69] C.A.R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576{580, 1969. [HS64] J. Hartmanis and R.E. Stearns. Pair algebras and their application to automata theory. Information and Control, 7(4):485{507, 1964. [HS66] J. Hartmanis and R.E. Stearns. Algebraic Structure Theory of Sequential Machines. Prentice-Hall, 1966. [Kle56] S.C. Kleene. Representation of events in nerve nets and ˛nite automata. In Shannon and McCarthy, editors, Automata Studies, pages 3{41. Princeton Univ. Press, 1956. [Koz94] Dexter Kozen. A completeness theorem for Kleene algebras and the algebra of regular events. Information and Computation, 110(2):366{390, 1994. [Lam70] Joachim Lambek. Subequalizers. 13(3):337{349, 1970. [Lam81] J. Lambek. The in‚uence of Heraclitus on modern mathematics. In J. Agassi and R.S.Cohen, editors, Scienti˛c Philosophy Today, pages 111{121. D. Reidel Publishing Co., 1981. [Lam94] J. Lambek. Some Galois connections in elementary number theory. J. of Number Theory, 47(3):371{377, June 1994. [LS86] J. Lambek and P.J. Scott. Introduction to Higher Order Categorical Logic, volume 7 of Studies in Advanced Mathematics. Cambridge University Press, 1986. [MSS92] Austin Melton, Bernd S.W. Schr­oder, and George E. Strecker. Connections. In S.Brookes, M.Main, A.Melton, M.Mislove, and D.Schmidt, editors, Mathematical Foundations of Programming Semantics, 7th International Conference, Pittsburgh, March 1991, volume LNCS598, pages 25{28. Springer-Verlag, 1992. [Nau63] P. (Ed.) Naur. Revised report on the algorithmic language ALGOL 60. Comm. ACM, 6:1{20, Also in The Computer Journal, 5: 349{67 (1963); Numerische Mathematik, 4: 420{52 (1963) 1963. [Ore44] Oystein Ore. Galois connexions. Transactions of the American Mathematical Society, 55:493{513, 1944. [Sch53] J. Schmidt. Beitrage f­ ur ˛ltertheorie. II. Math. Nachr., 10:197{232, 1953. Canadian Mathematical Bulletin, 84 [Ste89] Ian Stewart. Galois Theory. Chapman and Hall, 2nd edition, 1989. 85 9 Solutions to Exercises 7 (a) We have, for all n , n ≥ − x = { negation } −n ≤ x = { de˛nition of ‚oor } −n ≤ x = { negation } n ≥ −x = { } de˛nition of ceiling n ≥ −x . Thus, by indirect equality, the function f is the ceiling function. (b) We have, for all n , n ≤ x+m = { de˛nition of ‚oor } n ≤ x+m = { arithmetic } n−m ≤ x = { de˛nition of ‚oor } n−m ≤ x = { arithmetic } n ≤ x +m . The result follows by indirect equality. (c) We have, for all n , n ≤ x/m = { de˛nition of ‚oor } n ≤ x/m = { arithmetic, m is positive } 86 n×m ≤ x = { de˛nition of ‚oor } n×m ≤ x = { arithmetic, m is positive } n ≤ x /m = { n≤ de˛nition of ‚oor } x /m . The result follows by indirect equality. (d) We have, for all n , n≤ =  x { n≤ = } n is an integer, de˛nition 3  x { arithmetic } n2 ≤ x ∨ n < 0 = n2 is an integer, de˛nition 3 { } n2 ≤ x ∨ n < 0 = { √ arithmetic } { n is an integer, de˛nition 3 n≤ x = n≤ √ x } . The result follows by indirect equality. 2 22 Suppose ˛rst that F and G are inverse poset monomorphisms. Then, F.x  y ⇐ { F is the inverse of G. Thus F.(G.y) = y F.x  F.(G.y) ≡ { x  G.y . F is a poset isomorphism } } 87 Thus, F.x  y ⇐ x  G.y . Symmetrically, x  G.y ⇐ F.x  y . Combining these two properties we get that , ( F , G ) is a Galois connection between ( A , A ) and ( B , B ). Replacing F by G and G by F , we also get that ( G , F ) is a Galois connection between ( B , B ) and ( A , A ). Suppose now that ( F , G ) and ( G , F ) are both Galois connections. Then, by theorem 14, F.(G.y)  y , because ( F , G ) is a Galois connection, and y  F.(G.y) , because ( G , F ) is a Galois connection. Thus F.(G.y) = y . Symmetrically, G.(F.x) = x . That is, F and G are inverses (and thus surjective). Moreover, both F and G are monotonic. Thus, xy ⇒ { F is monotonic } G is monotonic } F.x  F.y ⇒ { G.(F.x)  G.(F.y) ⇒ { F and G are inverses } xy . That is, F is a poset monomorphism. Symmetrically, G is a poset monomorphism. 2 29 The proof is by mutual implication but only one implication is given since the other is entirely dual. _ k•F1 F0•h  ≡ { (16) } _ G0•k•F1 h ⇒ { monotonicity } _ G0•k•F1•G1 h•G1  ⇒ { cancellation: (15), _ monotonicity and transitivity of  } _ G0•k h•G1  (The hints \monotonicity" refer to the fact that the function •f is monotonic for all functions f .) 2 88 31 Take m = 2 and n = 4 Let G0 , G2 and G5 be the ‚oor function; also let G1 and G3 be the function f and G4 to be the function, U , embedding the integers in the reals. Let F0 , F2 and F5 be U ; also let F1 and F3 be the lower adjoint of the function f and F4 be the ceiling function. The right side of (27) can then be read as the requirement that the lower adjoint of f maps integers to integers. 2 49 The relevant Galois connections take the form: Fx.b ⊆ P ≡ b ⇒ x∈P and x∈P ⇒ b ≡ P ⊆ Gx.b for some functions F and G . In each case the construction of the closed formula involves a case analysis on b . The simplest cases are Fx.false = φ and Gx.true = U . The other two cases are Fx.true = {x} and Gx.false = U/{x} (where U/{x} denotes the set of all elements of U di¸erent from x ). Thus, Fx.b = if b then {x} else φ , and Gx.b = if b then U else U/{x} . 2 50 The negation of ( = φ ) is the predicate ( = φ ) which is equivalent to the predicate ( ⊆ φ ). By the dual of 46, the latter is universally ∪ -junctive, assuming booleans are ordered by follows-from. That is, for all sets, P , of subsets of U , ∪P = φ ≡ ∀S: S∈P: S = φ . Thus, since ¬(∀p) ≡ ∃(¬p) , ∪P = φ ≡ ∃S: S∈P: S = φ . This is the sense in which ( = φ ) preserves set union: the predicate is a function from subsets of U ordered by set inclusion to Bool ordered by implication. The upper adjoint is the function mapping boolean b to if b then U else φ . That is, for all sets S and all Booleans b , (S = φ ⇒ b) ≡ S ⊆ (if b then U else φ) . 2 89 51 Applying (23), we have _ G ≡ G  _ F . F and _ F ≡ F  _G . G  Thus, by exercise 22,  and  are inverse poset isomorphisms between the set of inf_ and the set of suppreserving functions ordered by the usual pointwise ordering  _ , the converse of the pointwise ordering  _ . preserving functions ordered by  2 52 Suppose A is a complete poset. Let  denote the in˛mum operator for A . We use theorem 47 to show that A is cocomplete. Consider the relation R between the set of functions with range A and A de˛ned by (f, a) ∈ R ≡ ∀b:: f.b  a . Then, applying theorem 47, there is a function  satisfying (35) if and only if the predicate a:: (f, a) ∈ R is inf-preserving. We verify that this is indeed the case as follows: a:: (f, a) ∈ R is inf-preserving ≡ { de˛nition of inf-preserving: (43) with p.x ≡ ∀b:: f.b  x } ∀g:: ∀b:: f.b  g ≡ ∀x:: ∀b:: f.b  g.x ≡ { de˛nition of in˛mum: (33) } ∀g:: ∀b:: ∀x:: f.b  g.x ≡ ∀x:: ∀b:: f.b  g.x ≡ { predicate caculus true . 2 56 A .(f•h)  f.(C .h) ≡ { de˛nition _ KA .(f.(C .h)) f•h  } } 90 ≡ { (54) } _ f • KA .(C .h) f•h  ⇐ { } f is monotonic _ KA .(C .h) h ≡ { cancellation } true . 2 63 (a) Suppose ˛rst that f is re‚exive, idempotent and monotonic. Then x  f.y ⇒ { } f is monotonic f.x  f.(f.y) ≡ { f is idempotent } f.x  f.y ⇒ { } f is re‚exive x  f.y . Now suppose that x  f.y ≡ f.x  f.y for all x and y . Instantiating y to x we get that f is re‚exive. Now monotonicity follows: f.x  f.y ≡ { assumption } x  f.y ⇐ { f is re‚exive, transitivity } xy . _ f•f Finally, idempotency is proved by a mutual inclusion argument. The inclusion f  is immediate from re‚exivity. The other inclusion is the instance of the assumption obtained by instantiating x to f.x and y to x . (b) _ H•f•G•h H•f•G•g  ⇐ { monotonicity of H• _ f•G•h f•G•g  } 91 ≡ { f is a closure operator } _ f•G•h G•g  ≡ { Galois connection } _ H•f•G•h g ⇐ { _ H•f•G I ⇐ { f is re‚exive, monotonicity } _ H•G I ≡ { cancellation } true . _ and monotonicity transitivity of  } _ H•f•G•h . H•f•G•g  Comparing the ˛rst, seventh and last lines, we have thus shown that _ H•f•G•h ≡ H•f•G•g  _ H•f•G•h g as required. 2 64 Assume x and y are elements of F.B . Note that this is equivalent to x = F.(G.x) and y = F.(G.y) . x F.B y = y ≡ { x F.B y = F.(G.x B G.y) , y = F.(G.y) } F.(G.x B G.y) = F.(G.y) ≡ { unity-of-opposites theorem ( F is an isomorphism) } G.x B G.y = G.y ≡ { suprema } G.x B G.y ≡ { unity-of-opposites theorem ( G is an isomorphism) x A y . } 92 2 66 The \at-most" relation is the least pre˛x point of the function f where, for any relation R , f.R = {n: n∈IN: (0, n)} ∪ {m, n: (m, n)∈R: (m+1 , n+1)} . 2 76 µf  νf ⇐ { induction } f.νf  νf ≡ { computation: f.νf = νf } true . 2 78 ≤ is re‚exive ≡ { de˛nition of re‚exive } ∀x:: x ≤ x ≡ { de˛nition of ≤ } ∀x:: x+x = x ≡ { } de˛nition of idempotent + is idempotent. ≤ is transitive ≡ { de˛nition of transitive } ∀x, y, z:: x ≤ y ∧ y ≤ z ⇒ x ≤ z ≡ { de˛nition of ≤ } ∀x, y, z:: x+y = y ∧ y+z = z ⇒ x+z = z Now suppose x+y = y and y+z = z . Then, 93 x+z = { y+z = z } x+(y+z) = { + is associative } (x+y)+z = { x+y = y } { y+z = z } y+z = z . Thus ≤ is transitive. Now for the antisymmetry of ≤ . x≤y∧y≤x ≡ { de˛nition of ≤ } x+y = y ∧ y+x = x ≡ { + is symmetric } x+y = y ∧ x+y = x ⇒ { substitution } x=y . We now show that + is monotonic. x+y ≤ x+z ≡ { de˛nition of ≤ } (x+y)+(x+z) = x+z ≡ { arithmetic, + is idempotent x+y+z = x+z ⇐ { Leibniz } y+z = z ≡ { y≤z . de˛nition of ≤ } } 94 To show that · is monotonic we have to show that it is monotonic in both its arguments. We only show that it is monotonic in its right argument. The other case is symmetrical. x·y ≤ x·z ≡ { de˛nition of ≤ } x·y + x·z = x·z ≡ { } distributivity x·(y+z) = x·z ⇐ { Leibniz } y+z = z ≡ { de˛nition of ≤ } y≤z . To show that + is the binary supremum operator we have to show that x+y ≤ z ≡ x ≤ z ∧ y ≤ z . This we do as follows. x+y ≤ z ≡ { de˛nition of ≤ } (x+y)+z = z ≡ { substitution } (x+y)+z = z ∧ x+z = x+((x+y)+z) ≡ { + is associative and idempotent } (x+y)+z = z ∧ x+z = (x+y)+z ⇒ { substitution } x+z = z ≡ { de˛nition of ≤ } x≤z . Symmetrically, x+y ≤ z ⇒ y ≤ z . Finally, x ≤ z ∧ y ≤ z ⇒ x+y ≤ z by the monotonicity of + . 2 95 87 (a) a · b∗ ≤ c∗ · a ⇐ { (83) } a + c∗ · a · b ≤ c ∗ · a ⇐ c∗ · a = a + c∗ · c · a , monotonicity { } a·b ≤ c·a . (b) c∗ · a ≤ a · b∗ ⇐ { (80) } a + c · a · b∗ ≤ a · b∗ ⇐ a · b∗ = a + a · b · b∗ , monotonicity { } c·a ≤ a·b . (c) By combining (a) and (b) using the anti-symmetry of the ordering relation we obtain a · b∗ = c∗ · a ⇐ a·b = c·a . Thus, with b,c := b·a , a·b , we have: a · (b·a)∗ = (a·b)∗ · a ⇐ a·(b·a) = (a·b)·a . The property now follows from the associativity of product. (d) First we use induction: (a+b)∗ ≤ b∗ · (a · b∗ )∗ ⇐ { (80) } 1 + (a+b) · b∗ · (a · b∗ )∗ ≤ b∗ · (a · b∗ )∗ Now we show that 1 + (a+b) · b∗ · (a · b∗ )∗ = b∗ · (a · b∗ )∗ . 1 + (a+b) · b∗ · (a · b∗ )∗ = { arithmetic } 1 + a · b∗ · (a · b∗ )∗ + b · b∗ · (a · b∗ )∗ = { 1 + x · x∗ = x∗ with x := a · b∗ (a · b∗ )∗ + b · b∗ · (a · b∗ )∗ } 96 = { arithmetic } (1 + b · b∗ ) · (a · b∗ )∗ = 1 + x · x∗ = x∗ with x := b { } b∗ · (a · b∗ )∗ . The opposite inequality is proved as follows: b∗ · (a · b∗ )∗ ≤ { } monotonicity (a+b)∗ · ((a+b) · (a+b)∗ )∗ ≤ x · x∗ ≤ x∗ with x := a+b , monotonicity { } (a+b)∗ · ((a+b)∗ )∗ = x∗ = x∗ · x∗ = (x∗ )∗ with x := a+b { } (a+b)∗ . Finally, b∗ · (a · b∗ )∗ = (b∗ · a)∗ · b∗ is an instance of (c). (e) (a∗ )∗ = a∗ ≡ { } anti-symmetry (a∗ )∗ ≤ a∗ ∧ a∗ ≤ (a∗ )∗ ⇐ ∗ { is a closure operator and ∗ is monotonic } a∗ ≤ a∗ ∧ a ≤ a∗ ≡ { re‚exivity and a∗ = 1 + a·a∗ = 1 + a + a·a·a∗ } true . 2 88 (a) a+ = { de˛nition } a · a∗ ≥{a∗ ≥ 1} a·1 = a . (b) This is an application of the mirror rule, 87(c), with b instantiated to 1 . (c) a+ · a+ = { } de˛nition, (b) a · a∗ · a∗ · a = { a ∗ = a∗ · a ∗ } 97 a · a∗ · a ≤ a∗ = 1 + a∗ ·a { } a · a∗ = { de˛nition } a+ . (d) a + x·x ≤ x ⇒ { induction (83), with a,b,x := x,a,x } a·x∗ ≤ x ≡ { x∗ = 1 + x·x∗ , arithmetic } a + a·x·x∗ ≤ x ⇒ { 1 ≤ x∗ , addition and multiplication are monotonic } a + a·x ≤ x ⇒ { induction (80), with a,b,x := a,a,x } a∗ ·a ≤ x ≡ { (a) } a+ ≤ x . (e) First, we have that a∗ is a pre˛x point of the function x:: 1 + a + x·x since 1 ≤ a∗ , a ≤ a∗ and a∗ · a∗ ≤ a∗ . We now show that it is the least pre˛x point: 1 + a + x·x ≤ x ≡ { de˛nition of suprema } 1 ≤ x ∧ a + x·x ≤ x ⇒ { (d) } 1 ≤ x ∧ a+ ≤ x ≡ { de˛nition of suprema } 1 + a+ ≤ x ≡ { a∗ ≤ x . 2 a∗ = 1 + a·a∗ = 1 + a+ } 98 93 (a) µX:: a·X∗  = { de˛nition of a·X∗ } µ X:: µY:: a + Y·X = { } diagonal rule µX:: a + X·X = { exercise 88 } a+ . (b) µX:: (a+X)∗  = { de˛nition of ∗ } µX:: µY:: 1 + (a+X)·Y = { } diagonal rule µX:: 1 + (a+X)·X = { distributivity of concatenation over sum µX:: 1 + a·X + X·X = { } diagonal rule µX:: µY:: 1 + a·X + Y·Y = { } exercise 88(e) µX:: (a·X)∗  = { rolling rule } (µX:: a·X∗ )∗ = { (a) } (a+)∗ = { exercise 87(e) } a∗ . From this calculation we conclude that µX:: (a+X)∗  = µX:: (a·X)∗  = a∗ . } 99 The ˛nal piece is then µX:: a + X∗  = { } rolling rule a + µX:: (a+X)∗  = { } above a + a∗ = { a ≤ a∗ } a∗ . (c) µX:: a + X·b·X = { } diagonal rule µX:: µY:: a + Y·b·X = { de˛nition of ∗ } µX:: a·(b·X)∗  = { } rolling rule a · µX:: (b·a·X)∗  = { (b) } a·(b·a)∗ . (d) µX:: 1 + a·X·b·X + b·X·a·X = { diagonal rule (92) } µX:: µY:: 1 + a·X·b·Y + b·X·a·Y = { distributivity } µX:: µY:: 1 + (a·X·b + b·X·a)·Y = { de˛nition of ∗ } µX:: (a·X·b + b·X·a)∗  = { exercise 88(e) } µX:: µY:: 1 + a·X·b + b·X·a + Y·Y 100 = { } diagonal rule (92) µX:: 1 + a·X·b + b·X·a + X·X . 2 94 Rolling rule: First, µ(g•f)  g.µ(f•g) ⇐ { induction } (g•f).(g.µ(f•g))  g.µ(f•g) ≡ { } de˛nition of ( g•f ) g.(f.(g.µ(f•g)))  g.µ(f•g) ⇐ { g is monotonic } f.(g.µ(f•g))  µ(f•g) ≡ { } de˛nition of ( f•g ) (f•g) . µ(f•g)  µ(f•g) ≡ { computation rule } true . Second, we suppose that y is an arbitrary pre˛x point of g•f and calculate as follows: g.µ(f•g)  y ⇐ { • (g•f).y  y , transitivity } g.µ(f•g)  (g•f).y ⇐ { de˛nition of g•f , monotonicity of g } µ(f•g)  f.y ⇐ { induction } (f•g).(f.y)  f.y ⇐ { (f•g).(f.y) = f.((g•f).y) , monotonicity of f (g•f).y  y ≡ { true . Square rule: y is a pre˛x point of g•f } } 101 µf = µ(f2) ≡ { antisymmetry } µf  µ(f2) ∧ µ(f2)  µf ⇐ { induction } f.µ(f2)  µ(f2) ∧ f2.µf  µf ≡ { rolling rule and computation rule (applied twice) } true . Diagonal rule: First suppose that µy:: x⊕y exists, for all x , as does µx:: µy:: x⊕y . We prove that the latter satis˛es the de˛nition of the least pre˛x point of the function x:: x⊕x . For brevity let us denote µy:: x⊕y by m.x and µx:: µy:: x⊕y by M . First M is a ˛xed point of the function: M = M⊕M ≡ { ˛xed point: M = m.M = M ⊕ m.M } M ⊕ m.M = M⊕M ≡ { ˛xed point: M = m.M } true . Second, M is at most every pre˛x point of the function: Mx ⇐ { induction: M = µm } m.x  x ⇐ { induction: m.x = µy:: x⊕y } x⊕x  x . Now, with the same abbreviations as above, we assume that m.x exists for all x. We also introduce the abbreviation N for µx:: x⊕x which we assume to exist. We have to show that N is the least pre˛x point of the function m . First it is a pre˛x point of m . m.N  N ⇐ { N⊕N  N induction: m.N = µy:: N⊕y } 102 ≡ { pre˛x point: N = µx:: x⊕x } true . Second, N is at most every pre˛x point of m : Nx ⇐ { induction: N = µx:: x⊕x } x⊕x  x ⇐ { } transitivity x⊕x  x ⊕ m.x  m.x  x ≡ { ˛xed point: m.x = x ⊕ m.x } x⊕x  x ⊕ m.x ∧ m.x  x ≡ { ⊕ is monotonic } m.x  x . 2 96 k.x is a pre˛x point of g ≡ { de˛nition } g.(k.x)  k.x ⇐ { • h.x  x (i.e. x is a pre˛x point of h ) k is monotonic. Thus, k.(h.x)  k.x . } g.(k.x)  k.(h.x) . Abstracting from the variable x and denoting the set of pre˛x points of h by Pre.h we have thus established the lemma: (114) _ k•h . ∀x: x∈Pre.h: k.x ∈ Pre.g ⇐ g•k  In particular, assuming the existence of µh , k.µh is a pre˛x point of g . Thus, since µg is the least pre˛x point of g , (115) µg  k.µh ⇐ _ k•h . g•k  Now, if k is an upper adjoint in a Galois connection with lower adjoint f then _ k•h ≡ f•g  _ h•f . g•k  103 So, in the case that k is an upper adjoint with lower adjoint f , (114) is equivalent to (116) _ h•f . ∀x: x∈Pre.h: k.x ∈ Pre.g ⇐ f•g  Also, (115) is equivalent to: _ h•f . f.µg  µh ⇐ f•g  But also, by making the substitutions k,g,h := f,h,g in (114) we obtain immediately: (117) _ f•g . ∀y: y∈Pre.g: f.y ∈ Pre.h ⇐ h•f  It is thus evident that, if h•f and f•g are equal, we have both ∀x: x∈Pre.h: k.x ∈ Pre.g and ∀y: y∈Pre.g: f.y ∈ Pre.h . Thus the pair of functions f and k forms a Galois connection between the set of pre˛x points of h and the set of pre˛x points of g In particular, since lower adjoints map least elements to least elements, we have derived: f.µg = µh ⇐ f•g = h•f . 2 98 (a) We have: µ(g•f)  µ(h•f) ⇐ { induction } (g•f) . µ(h•f)  µ(h•f) ≡ { composition and rolling rule } g . µ(f•h)  µ(h•f) ⇐ { µ -fusion, g is a lower adjoint _ h•f•g . g•f•h  Further, µ(f•g)  µ(f•h) ≡ { rolling rule f . µ(g•f)  f . µ(h•f) } } 104 ⇐ { } monotonicity of f µ(g•f)  µ(h•f) ⇐ { above } _ h•f•g . g•f•h  (b) This part is a straightforward corollary of (a) and the antisymmetry of the ordering relations. As remarked above, the exchange rule says that g and h can be exchanged within a µ term in which they are composed after or before an arbitrary function f . The generalisation we seek is that they can be exchanged anywhere in a µ term. That is, we seek a condition such that µ(f0•g•f1) = µ(f0•h•f1) . We derive such a condition as follows: µ(f0•g•f1) = µ(f0•h•f1) ≡ { rolling rule } f0 . µ(g•f1•f0) = f0 . µ(h•f1•f0) ⇐ { exchange rule } g•f1•f0•h = h•f1•f0•g . We don’t advise you to remember this rule. It is overly complicated and is so easily recovered from the basic rule. 2 101 In the case of reachability, graph b is acyclic. In the case of shortest paths, the sum of the edge lengths on any cycle of edges is greater than zero. In the case of bottlenecks, there is no cycle of edges each of weight ∞ . That is, there is no subset of nodes connected by a cycle of edges with each edge o¸ering no bottleneck. 2 103 µg:: a:: a ⊕ g.a = { de˛nition of ⊕ } µg:: a:: f.(g.a) = { µ(f•) de˛nition of ( f• ) } 105 = { abstraction rule } a:: µx:: a⊕x = { de˛nition of ⊕ } a:: µx:: f.x = { η -reduction } a:: µf Thus the rule is: µ(f•) = K.µf where K is the constant combinator. 2 107 (a) µx, y:: (f.x , x⊗y) = { theorem 106 with xy := f.x (thus q.y := µu:: f.u ) } ( µx:: f.x , µy:: µu:: f.u ⊗ y ) = { u:: f.u = f } ( µf , µy:: µf ⊗ y ) . (b) µx, y:: (f.y , x⊗y) = { diagonal rule } µx, y:: µu, v:: (g.v , x⊗y) = { lemma 105(b) with f := g and g.u := x⊗y } µx, y:: ( µu:: g.(x⊗y) , µv:: x⊗y ) = { µu:: c = c for any c independent of u (as µf = f.µf ) µx, y:: ( g.(x⊗y) , x⊗y ) . 2 }