Linear Programming and the Simplex Method David Gale T his exposition of linear programming and the simplex method is intended as a companion piece to the article in this issue on the life and work of George B. Dantzig in which the impact and significance of this particular achievement are described. It is now nearly sixty years since Dantzig’s original discovery [3] opened up this whole new area of mathematics. The subject is now widely taught throughout the world at the level of an advanced undergraduate course. The pages to follow are an attempt at a capsule presentation of the material that might be covered in three or four lectures in such a course. Linear Programming The subject of linear programming can be defined quite concisely. It is concerned with the problem of maximizing or minimizing a linear function whose variables are required to satisfy a system of linear constraints, a constraint being a linear equation or inequality. The subject might more appropriately be called linear optimization. Problems of this sort come up in a natural and quite elementary way in many contexts but especially in problems of economic planning. Here are two popular examples. The Diet Problem A list of foods is given and the object is to prescribe amounts of each food so as to provide a meal that has preassigned amounts of various nutrients such as calories, vitamins, proteins, starch, David Gale is professor of mathematics at the University of California, Berkeley. His email address is gale@math.berkeley.edu. 364 etc. Further, each food has a cost, so among all such adequate meals the problem is to find one that is least costly. The Transportation Problem A certain good, say steel, is available in given amounts at a set of m origins and is demanded in specified amounts at a set of n destinations. Corresponding to each origin i and destination j there is the cost of shipping one unit of steel from i to j. Find a shipping schedule that satisfies the given demands from the given supplies at minimum cost. Almost all linear programming applications, including these examples, can be motivated in the following way. There is a given set of goods. For the diet problem these are the various nutrients. For the transportation problem there are m + n goods, these being steel at each of the origins and destinations. There is also a set of processes or activities which are specified by amounts of the goods consumed as inputs or produced as outputs. In the diet problem, for example, the carrot-consuming activity c has an output of c1 units of calories, c2 units of vitamin A, etc. For the transportation problem the transportation activity tij has as input one unit of steel at origin i and as output one unit of steel at destination j. In general, an activity a is column vector whose positive entries are outputs, negative entries inputs. These vectors will be denoted by boldface letters. It is assumed that activity aj may be carried out at any nonnegative level xj . The constraints are given by another activity vector b, the right-hand side, which specifies the amounts of the different goods that are to be produced or consumed. For the diet problem it is the list of amounts of the prescribed Notices of the AMS Volume 54, Number 3 nutrients. For the transportation problem it is the given supplies and demands at the origins and destinations. Finally, associated with each activity aj is a cost cj . Given m goods and n activities aj the linear programming problem (LP) is then to find activity levels xj that satisfy the constraints P and minimize the total cost j cj xj . Alternatively, c may be thought of as the profit generated by activity a, in which case the problem is to maximize P rather than minimize j cj xj . The simplex method is an algorithm that finds solutions of LPs or shows that none exist. In the exposition to follow we will treat only the special case where the constraints are equations and the variables are nonnegative, but the more general cases are easily reduced to this case. The Simplex Method In the following paragraphs we describe the simplex algorithm by showing how it can be thought of as a substantial generalization of standard Gauss-Jordan elimination of ordinary linear algebra. To gain intuition as to why the algorithm works, we will refer to the linear activity model of the previous section. Finally we will mention some interesting discoveries in the analysis of the algorithm, including a tantalizing unsolved problem. 1 b b2 .. . bi .. . bm March 2007 a2 x12 x22 .. . xi2 .. . xm2 ··· ··· ··· .. . ··· .. . ··· aj x1j x2j .. . xij .. . xmj ··· ··· ··· .. . ··· .. . ··· an x1n x2n .. . xin .. . xmn where xij is the coefficient of bi in the expression for aj as a linear combination of the bi . In matrix terms, X is the (unique) matrix satisfying (1) BX = A. (Symbols A, B are used ambiguously to stand either for a matrix or the set of its columns.) It will often be useful to include the unit vectors ei in the tableau in which case it appears as shown below. 1 b b2 .. . bi .. . bm Pivots and Tableaus The basic computational step in the simplex algorithm is the same as that in most of elementary linear algebra, the so-called pivot operation. This is the operation on matrices used to solve systems of linear equations, to put matrices in echelon form, to evaluate determinants, etc. Given a matrix A one chooses a nonzero pivot entry aij and adds multiples of row i to the other rows so as to obtain zeros in the jth column. The ith row is then normalized by dividing it by aij . For solving linear equations a pivot element can be any nonzero entry. By contrast, the simplex method restricts the choice of pivot entry and is completely described by giving a pair of simple rules, the entrance rule that determines the pivot column j and the exit rule that determines the pivot row i (in theory a third rule may be needed to take care of degenerate cases). By following these rules starting from the initial data the algorithm arrives at the solution of the linear program in a finite number of pivots. Our purpose here is to present these rules and show why they work. We shall need one other concept. Definition. The tableau X of a set of vectors A = {a1 , a2 , . . . , an } with respect to a basis 2 m B = {b1 , b , . . . , b } is the m × n matrix a1 x11 x21 .. . xi1 .. . xm1 .. . .. . e1 y11 y21 .. . yi1 .. . ym1 a1 x11 x21 .. . xi1 .. . xm1 e2 y12 y22 .. . yi2 .. . ym2 ··· ··· ··· .. . ··· .. . ··· a2 x12 x22 .. . xi2 .. . xm2 ··· ··· ··· .. . ··· .. . ··· ej y1j y2j .. . y ij .. . ymj aj x1j x2j .. . xij .. . xmj ··· ··· ··· .. . ··· .. . ··· ··· ··· ··· .. . ··· .. . ··· em y1m y2m .. . yim .. . ymm an x1n x2n .. . xin .. . xmn Note that from (1), Y = {yij } is the solution of BY = I so Y = B −1 . Multiplying (1) on the left by Y gives the important equation (2) Y A = X. It is easy to verify that if one pivots on entry xij , one obtains a new matrix X 0 that is the tableau with respect to the new basis in which aj has replaced bi . b1 b2 .. . aj .. . bm a1 x011 x021 .. . x0i1 .. . x0m1 Notices of the AMS a2 x012 x022 .. . x0i2 .. . x0m2 ··· ··· ··· .. . ··· .. . ··· aj 0 0 .. . 1 .. . 0 ··· ··· ··· .. . ··· .. . ··· an x01n x02n .. . x0in .. . x0mn 365 The simplex method solves linear programs by a sequence of pivots in successive tableaus, or, equivalently, by finding a sequence of bases, where each basis differs from its predecessor by a single vector. Solving Linear Equations We start by showing how to solve systems of linear equations using the language of pivots and tableaus. Our fundamental result is the following: Theorem 1. Exactly one of the following systems has a solution: (3) Ax = b or (4) yT A = 0T , yT b = 1. (Again, boldface letters are column vectors and are converted to row vectors using the superscript T .) Note that rather than giving complicated conditions for (3) not to be solvable this formulation puts the solvable and unsolvable case on an equal footing. This duality is an essential element of the theory, as will be seen. ui , is the value of a variable xji whose corresponding column aji = bi . Every xj whose corresponding column is not in the current basis has the value zero. Case II. In trying to replace ek there is no available pivot because xkj = 0 for all j. (i) If in addition uk = 0, then leave ek in the basis and go on to replace ek+1 . (ii) If uk ≠ 0, then the kth row yk of Y gives a h solution ih iof (4). h To see i this, note that from (2) Y A b = X u . Since the kth row of X is zero, we get yk A = 0, whereas yk b = uk ≠ 0, so yk /uk solves (4). The algebra becomes considerably more complicated if one requires the solutions of (3) to be nonnegative. In this case we have the following existence theorem (known as Farkas’s Lemma): Theorem 2. Exactly one of the following systems has a solution. (5) or (6) Geometric Interpretation If (3) has no solution this means that b does not lie in the linear subspace generated by the aj . In that case there is a vector y orthogonal to the aj but not to b. We shall continue to present such geometric pictures as aids to intuition but the mathematical arguments will not depend on these “pictures”. It is immediate that (3) and (4) cannot both hold since multiplying (3) by yT and (4) by x would imply 0 = 1. The nontrivial part is to show that either (3) or (4) must hold. We do this by giving an algorithm that in at most m pivots finds a solution to either (3) or (4). The initial tableau consists of the matrix A together with the right-hand side b all preceded by the identity matrix I. It is represented schematically in the form below. i {b } {ei } {aj } {b} [I A b] (The symbols like ei , {aj } in curly brackets are the row and column headings of the tableau.) We proceed to try replacing the ei in order by some aj . The tableau at any stage has the form  i {b } {ei } {aj } {b} [Y X u] There are now two possibilities. Case I. All of the unit vectors ei can be replaced by some of the aj . Then a solution of (3) can be read off from the vector u in the b-column of the tableau above. Namely, each component of u, say 366 Ax = b, x ≥ 0 yT A ≤ 0T , yT b > 0. Again it is not possible for both systems to have solutions since multiplying (5) by yT gives yT Ax = yT b > 0 whereas multiplying (6) by x gives yT Ax ≤ 0. Geometric Interpretation Equation (5) says that b lies in the convex cone generated by the columns aj of A. If b does not lie in this cone, then there is a “separating hyperplane” whose normal makes a non-acute angle with the aj and an acute angle with the vector b. The simplex method finds a solution of either (5) or (6) in a finite number of pivots. However, there is no useful upper bound on the number of pivots that may be needed as we shall see shortly. We may assume to start out that the vector b is nonnegative (if not, change the signs of some of the equations). Definition. A basis B will be called feasible (strongly feasible) if b is a nonnegative (positive) linear combination of vectors of B. This is equivalent to the condition that the b column u of the tableau is nonnegative (positive). Imitating the algorithm of the previous section we start from the feasible basis {ei } of unit vectors and try to bring in the {aj } by a sequence of pivots that maintain feasibility, but to do this the pivot rule of the previous section must be restricted. Suppose aj is to be brought into a new feasible basis. Then in order for the new basis to be feasible the basis vector it replaces must satisfy the following easily derived condition. Notices of the AMS Volume 54, Number 3 Exit Pivot Rule The pivot element xij must satisfy (i) xij > 0, (ii) ui /xij ≤ uk /xkj for all k with xkj > 0. Because of the above restriction it is no longer possible to simply replace the basis vectors ei one at a time as in the previous section. Some further rule for choosing the pivot column must be given. We will postpone the proof of Theorem 2 as it will be shown to be a special case of a much more general theorem, which we now describe. The Standard Minimum Problem We consider the following important special case of a linear programming problem. Given an m-vector b, an n-vector cT and an m × n matrix A, find an n-vector x so as to (7) minimize subject to cT x Ax = b x ≥ 0. Corresponding to this problem (and using exactly the same data) is a dual problem, find an mvector yT so as to maximize yT b (8) e0 e0 h 1 {ei } 0 subject to yT A ≤ cT . Terminology The linear function cT x and yT b are called the objective functions of (7) and (8), respectively. The vectors x, y that solve these problems are called optimal solutions; the numbers cT x and yb are optimal values. Let X and Y be the sets (possibly empty) of all vectors that satisfy the constraints of (7) and (8), respectively. Such vectors are said to be feasible. Key Observation If x ∈ X, yT ∈ Y, then yT b ≤ cT x. Proof. Multiply Ax = b by yT and yT A ≤ cT by x. An important consequence of this observation is Corollary. If x ∈ X and y ∈ Y satisfy yT b = cT x, then x and y are optimal solutions, and the numbers yT b=cT x are optimal values for their respective problems. The converse of this fact makes up the Fundamental Duality Theorem. The dual problems above have optimal solutions x, yT if and only if X and Y are nonempty in which case the optimal values cT xand yT b are equal. Historical Note The first explicit statement of the duality theorem is due to von Neumann [7] in a manuscript that was privately circulated but never formally published in his lifetime. Moreover it has been hard to March 2007 verify the validity of the von Neumann proof. The first formally published proof is due to Gale, Kuh, and Tucker [4]. To see how Theorem 2 is a special case of the duality theorem we take [I A b] as the given initial tableau of a standard minimum problem and the vector cT = (1, .., 1, 0, ..., 0) whose first m entries are 1, the rest 0, as the objective function. The set X is nonempty since it contains the nonnegative vector (b1 , .., bm , 0, .., 0), and the set Y is nonempty since it contains the zero vector. If the optimal value is zero, then a subvector of the optimal solution x solves (5). If the optimal value is positive, then the dual optimal vector y solves (6). We will now see how the simplex method solves problems (7), (8) and gives a constructive proof of the duality theorem. As a first step we incorporate the objective function cT x as the 0th row of the tableau. That is, bj to be we define the augmented column vector a (−cj , aj ), and we introduce the 0th unit vector e0 . The augmented initial tableau is then, {ei } {b aj } b b 0 I −cT A 0 b i We may assume that we have found an initial feasible basis, by applying the simplex method to the nonnegative solution problem (5), (6) as described. Then with respect to the general augmented basis b 1 , . . . ,b b m } the tableau has the form below. {e0 ,b e0 0 e h 1 bi } 0 {b {ei } {b aj } yT Y zT X b b w i u P bi Note that by definition of the tableau, w e0 + i ui b P b = b so for P the 0th entry we have w − i ui ci = 0, so w = i ui ci , which is the value of the objective i function for the basic feasible basis P {b }. Also from the tableau zj = i xij ci − cj . This number compares the cost associated with activity vector aj to that of the corresponding linear combination of the basis vectors bi . If zj is positive it means that one could reduce the total cost by bringing the vector aj into the next basis. This leads to the following two important observations: I. If the row zT is nonpositive, then u can be extended to a solution x of (7) and yT solves (8). Namely, using (2) again we have " #" # " # 1 yT −cT 0 zT w = 0 Y A b X u so −cj + yT aj = zj ≤ 0. Hence yT A ≤ cT so yT satisfies the dual constraints. Similarly we have yT b = w = cT x, so from the corollary above, the Notices of the AMS 367 Duality Theorem is verified. Bland’s Pivot Selection Rule II. If for some j we have zj > 0 and xij ≤ 0 for all i, then problem (7) has P no minimum, for from the tableau we have aj − i xij bi = 0; and since the xij ≤ 0, we have λ(b aj − X bi ) + xij b i X bi = b xi b i giving a new feasible (nonnegative) solution for any λ > 0. The corresponding change in the objecP tive function is λ(cj − i xij ci ) = −λzj ; so since zj > 0 the function has no minimum. In view of I above, one is looking for a feasible solution with z nonpositive, so the following pivot rule suggests itself. Entrance Pivot Rule bj where zj is posBring into the basis any column a itive. Note that as described the second pivot rule may, in general, require making a choice among many possible positive zj . A natural choice would be for example to choose the column with the largest value of zj , a choice originally suggested by Dantzig. The two pivot rules completely describe the simplex algorithm. If for some j, zj is positive and some xij is positive, then bring aj into the basis (by a pivot). It remains to show that eventually either I or II will occur. The argument is simple if we make the following Nondegeneracy Assumption Every feasible basis is strongly feasible. Note if this were not the case, then b would be a positive linear combination of fewer than m of the vectors aj , a degenerate situation. Although the degenerate case would seem (on mathematical grounds) to be rare, this is not so in practice. Suppose xij is the (positive) pivot. Then after pivoting w 0 = w − zj ui /xij < w since ui > 0 from nondegeneracy, so the value of the objective function decreases with each pivot. This means that no basis can recur (since the basis determines the objective value), and since there are only finitely many bases, the algorithm must terminate in either state I or II. If the nondegeneracy assumption is not satisfied some further argument is necessary. Indeed, examples have been constructed by Hoffman [5] (for one) using the Dantzig pivot choice rule which can lead to “cycling,” meaning the sequence of feasible bases recurs indefinitely. It turns out, however, that the following simple rule due to Bland [1] guarantees that no basis will recur. 368 Among eligible entering vectors choose the one with lowest index and let it replace the eligible basis vector with lowest index. While the condition is simple to state, the proof that it avoids cycling is quite subtle. Moreover, it is not as efficient as the customary (Dantzig) pivot choice rule. Finally, there is a natural economic interpretation of the dual problem. In the model of goods and activities, the vector yT = (y1, .., ym ) can be thought of as a price vector where yi is the price of one unit of the ith good. Then yT aj is the revenue generated by operating activity aj at unit level. The condition of dual feasibility, yT aj − cj ≤ 0 states that profit (revenue minus cost) cannot be positive. This is an economically natural requirement, for if an activity generated positive profits producers would want to operate it at arbitrarily high levels and this would clearly not be feasible. Duality then says that prices are such that the price vector yT maximizes the value yT b of the right-hand side activity b subject to the condition of no positive profits. Some Properties of the Simplex Method Since its creation by Dantzig in 1947, there has been a huge amount of literature on variations and extensions of the simplex method in many directions, several devised by Dantzig himself. We will mention here only one of these, revolving around the question of the number of pivot steps required to solve an LP in the worst case. Based on a vast amount of empirical experience, it seemed that an m × n program was typically solved in roughly 3m/2 pivots. However, in 1972, Klee and Minty [6] gave an example of an m × 2m standard problem that using the Dantzig choice rule required 2m − 1 pivots. To appreciate the example, first consider the matrix A consisting of the m unit vectors ei and m vectors aj = ej where the right-hand side is e, the vector all of whose entries are one. By inspection, (see for example the tableau below) the set of feasible solutions X decomposes into the direct product of m unit intervals, thus, it is a unit m-cube. {aj } {ei }  1   0 0 0 1 0 0 0 1 1 0 0 0 1 0 b   0 1   0  1  1 1 It is easy to see that every feasible basis, thus every vertex of the cube, must contain either ai or ei , but not both, for all i. This property will be preserved if the ai are slightly perturbed. By a clever choice of this perturbation and a suitable objective function, the authors show that the Dantzig pivot rule causes the algorithm to visit all of the Notices of the AMS Volume 54, Number 3 2m vertices following a well known Hamiltonian path on the m-cube. 4.................................................. ....................... ........ ... ...................... ......... . .. ........ . . .. . ....... . . . . . ... ....... ..... 5 . ......... ....... 7 . ... .... ..... .. ......... ... ....... ........ .. ......... ... .............. .......... . .. . ... . . . . .... ................................... ..... . ... .. . ..................... . . . . ... ... .................. .. ... ...................... ... ... ... .. ... ... . ....... ... ... ... .. ... ... ... ... .. ... ... ... .. ... ... ... .. ... ... ... . ... ... ... . . . . . . . . . . . . . . . ... . ... ... ... ... .......... ... .. .. . . . . . . . ... . .. ... ... ... ... . .. .. ... . . . . . . . . . . . . . ... . ..... ... . . . . . . . . . . . . . ... . .. .. ..... . . . . . . . . . . . . ... ... ..... . . . . . .... ... . ... ... ... . . .............. ... ................... .................. ........ ... .................... ....... .. . .................... ....... .................................. ... 6 3 0 2 1 Figure 1. A Hamiltonian path on a 3-cube. This means, for example, that the vector a1 enters the basis on the first pivot, stays in the basis for one pivot, then exits and stays out for one pivot, then reenters and stays, then exits and stays out, etc. throughout the computation. So we know that in the worst case the simplex method may require an exponential number of pivots, although, as mentioned earlier, no naturally occurring problem has ever exhibited such behavior. There are also results on the expected number of pivots of a “random” LP. Since the Klee-Minty example shows that it is possible for the simplex algorithm to behave badly, it is natural to ask whether there may be some other pivoting algorithms that are guaranteed to find an optimal solution in some number of pivots bounded, say, by a polynomial in n. Let us first consider a lower bound on the number of pivots. If the initial feasible basis is A = {a1 , . . . , am } and the optimal basis is a disjoint set m B = {b1 , .., b } then clearly it will require at least m pivots to get from A to B. The very surprising fact is that if the set X of feasible solutions is bounded, then in all known examples it is possible to go from any feasible basis to any other in at most m pivots. Recall that the set X is an mdimensional convex polytope whose vertices are the feasible bases. The above observation is equivalent to the statement that any two vertices of this polytope are connected by an edge path of at most m edges. The conjecture originally posed by Warren Hirsch has been proved by Klee and Walkup through dimension 5 but remains unresolved in higher dimensions. For Dantzig this “Hirsch conjecture” was especially intriguing since a constructive proof of the conjecture would have meant that there might be an algorithm that solves linear programs in at most m pivots. However, even if such an algorithm existed, it might be that the amount of calculation involved in selecting the sequence of pivots would make it far less computationally efficient than the simplex method. Indeed this remarkable algorithm and its many refinements remains to this March 2007 day the method of choice for solving most linear programming problems. References [1] R. G. Bland New finite pivoting rules for the simplex method, Mathematics of Operations Research 2 (1977) 103–107. [2] J. B. J. Fourier, Solution d’une question particulière du calcul des inégalités (1826), and extracts from Histoire de l’Academie (1823, 1824), Oeuvres II, French Academy of Sciences, pp. 317–328. [3] G. B. Dantzig, Maximization of a linear function subject to linear inequalities, in T. C. Koopmans (ed.), Activity Analysis of Production and Allocation, John Wiley & Sons, New York, 1951, pp. 339–347. [4] D. Gale, H. W. Kuhn, and A. W. Tucker, Linear programming and the theory of games, in T. C. Koopmans (ed.), Activity Analysis of Production and Allocation, John Wiley & Sons, New York, 1951, pp. 317–329. [5] A. J. Hoffman, Cycling in the simplex algorithm, National Bureau of Standards, Report No. 2974, Washington, D.C., December 16, 1953. [6] V. Klee and G. J. Minty, How good is the simplex algorithm? in (O. Shisha, ed.) Inequalities III, New York, Academic Press, 1972, 159–175. [7] J. von Neumann, Discussion of a maximum problem, working paper dated November 15–16, 1947, reproduced in A. H. Taub (ed.), John von Neumann: Collected Works, Vol. VI , Pergamon Press, Oxford, 1963, pp. 89–95. Notices of the AMS 369