2-D ANALOGUES OF ALLEN INTERVAL ALGEBRA FOR IMAGE ANALYSIS: TOWARDS JUSTIFICATION SCOTT A. STARKS, DIMA IOURINSKI, and VLADIK KREINOVICH NASA Pan-American Center for Earth and Environmental Studies University of Texas, El Paso, TX 79968, USA, vladik@cs.utep.edu Abstract 1.2 In reasoning about time and duration, researchers often use Allen’s Interval Algebra. This algebra describes possible relations between 1-D intervals. An interval can precede the other one, follow the other one, start the other one, etc. This algebra describes the relationship between different intervals in terms of words from natural language. To give a natural language description of 2D images, it is desirable to develop a similar approach for describing the relationship between 2-D objects in a picture. In their recent papers, Jim Keller and his collaborators proposed a new approach based on a simulation of a “force” between these objects. In this paper, we show that their force formula is theoretically optimal. Keywords Image analysis, relation between images. 1 Introduction 1.1 How to describe relation between 2-D objects: informal description of the problem In reasoning about time and duration, researchers often use Allen’s Interval Algebra; see, e.g., [2–5]. This algebra describes possible relations between 1-D intervals. An interval can precede the other one, follow the other one, start the other one, etc. This algebra describes the relationship between different intervals in terms of words from natural language. To give a natural language description of 2D images, it is desirable to develop a similar approach for describing the relationship between 2-D objects in a picture. Force histogram method of describing the relation between 2-D objects A new method of describing the relation between the 2-D objects, a method that seems to be in a very good accordance with the expert reasoning, was recently described in [6], [7], [9]. Namely, to describe the relation between the two sets A and B , this “force histogram” method proposed, crudely speaking, to do the following:  take all possible pairs a 2 A and b 2 B ,  compute, for each pair, the “force” whose direction is from a to b and whose value is r?2 , and then,  add up these force vectors. The orientation of the resulting vector F~ describe the intuitive understanding of where the sets are in relation to each other, and the length of this vector describes how close these sets are to each other. 1.3 This method is somewhat ad hoc In the above description of the force method, the idea that we should:   take all possible pairs compute, for each pair of points a 2 A and b 2 B , the “force” whose direction is from a to b and whose value f (r) depends on the distance between the points, and then, add up these force vectors is very natural. What is not very natural is the exact form of the dependence on f (r) on r. It is therefore desirable to consider all possible dependencies and come up with the optimal (best possible) function f (r). Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference Copyright c 2001 1.4 What we are planning to do In order to solve this problem, we must describe, in precise terms, which functions are possible and which functions are the best. In this paper, we propose such a description, and show that this description leads to a new symmetry-based justification for the above formula (actually, for a slightly more general formula r? ). This new justification is in line with the general symmetry-based approach which has been shown, in [8], to explain similar heuristic formulas in fuzzy, neural, genetic, and other approaches, and to explain a similar heuristic force formula in robotic control. 2 First Idea: Only Decreasing Functions f (r) Make Sense The function f (r) describes the “weight” with which directions between different pairs (a; b) influence the resulting sum. Intuitively, when we decide which of the two objects is, say, to the left and which is to right, we pay more attention to close points and less attention to points which are far away from each other. Therefore, it is reasonable to require that the weight f (r) should be larger for closer points and smaller for more distant points. In other words, it is reasonable to require that the function f (r) be strictly decreasing. 3 Second Idea: We Must Choose a Family of Functions, Not a Single Function Once we select a force function f (r), for every two sets A and B , the orientation of the resulting vector F~ describe the intuitive understanding of where the sets are in relation to each def other, and the length ` = = jF~ j of this vector describes how close these sets are to each other. Our main goal is to describe the orientation, i.e., the typical angle between the two sets. We also get some information from the value of the “closeness” `, but this information comes only from the comparison of different pairs of sets. If we simply say that the closeness between two sets A and B is, say, 1.5, we did not gain any knowledge about how close they are. If, however, for some other pair of sets A0 and B 0 , the closeness is 4.5, we can already make some meaningful conclusions: namely, we can say that A is closer to B than A0 is close to B 0 . We can even say that A is three times closer to B than A0 is close to B 0 . From this viewpoint, the absolute values of F do not carry much information, what carries information is the relation between the values F corresponding to different pairs of sets. In particular, if we multiply all the values of the original function f (r) by a constant, i.e., if we replace the function f (r) with a new funcdef tion fe(r) = C  f (r) for some constant C , then all the values of the forces get multiplied by C and hence, all the resulting vectors F~ are multiplied by the same constant C . When we multiply a vector by a constant, it orientation does not change, and its length gets multiplied by the same constant. Thus, if we replace the function f (r) by a new function C  f (r), for every pair of sets, we get the exact same orientation as before, and for every two pairs, we get the exact same ratio between their “closenesses”. So, intuitively, there is no big difference between using the original function f (r) and the new function C  f (r). Hence, we cannot select a unique function f (r) and claim it to be the best, because for every function f (r), the function C  f (r) leads to exactly the same results. In view of this, instead of formulating a problem of choosing the best force function, it is more natural to formulate a problem of choosing the best family fC  f (r)gC of force functions. 4 Which Family Is the Best? We May Need Non-Numerical Optimality Criteria Among all the families fC f (r)gC , we want to choose the best one. In mathematical optimization problems, numerical criteria are most frequently used, when to every alternative (in our case, to each family) we assign some value expressing its performance, and we choose an alternative (in our case, a family) for which this value is the largest. In our problem, as such a numerical criterion, we can select, e.g., the average approximation error A, measured as the mean square deviation between the orientation generated by the corresponding force method and the orientation marked by an expert. Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference Copyright c 2001 However, it is not necessary to restrict ourselves to such numerical criteria only. For example, if we have several different families that have the same average approximation error A, we can choose between them the one that has the minimal computation time T . In this case, the actual criterion that we use to compare two families is not numerical, but more complicated: a family F1 is better than the family F2 if and only if either A(F1 ) < A(F2 ), or A(F1 ) = A(F2 ) and T (F1 ) < T (F2 ). A criterion can be even more complicated. What a criterion must do is to allow us for every pair of families to tell whether the first family is better with respect to this criterion (we’ll denote it by F1  F2 ), or the second is better (F1  F2 ) or these families have the same quality in the sense of this criterion (we’ll denote it by F1  F2 ). Of course, it is necessary to demand that these choices be consistent, e.g., if F1  F2 and F2  F3 then F1  F3 . 5 The Optimality Criterion Must Select a Unique Optimal Family Another natural demand is that this criterion must choose a unique optimal family (i.e., a family that is better with respect to this criterion than any other family). The reason for this demand is very simple. If a criterion does not choose any family at all, then it is of no use. If several different families are “the best” according to this criterion, then we still have a problem to choose among those “best”. Therefore, we need some additional criterion for that choice. For example, if several families turn out to have the same average approximation error, we can choose among them a family with the minimal computation time. So what we actually do in this case is abandon that criterion for which there were several “best” families, and consider a new “composite” criterion instead: F1 is better than F2 according to this new criterion if either it was better according to the old criterion or according to the old criterion they had the same quality and F1 is better than F2 according to the additional criterion. In other words, if a criterion does not allow us to choose a unique best family it means that this criterion is not final. We have to modify it until we come to a final criterion that will have that property. 6 The Optimality Criterion Must Be Scale-Invariant The next natural condition that the criterion must satisfy is connected with the fact that the numerical value of the distance r depends on the choice of the unit for measuring distance. If we replace the original unit of length by a new unit which is  times larger (i.e., replace feet by meters), then numerical values change from r to re = r=. How will the force function look in the new units? Let us assume that in the new units, the distance between the two points equals re. Then, the same distance in the old units is equal to r =   re. Thus, the force between the two points is equal to f (r) = f (  e). Thus, if we know the distance re in the new units, we can compute the corresponding force value as fe(re), where fe(z ) denotes f (  z ). So, the same force function which, in the old units, had the form f (r), in the new units, has a new form f (  r). Since this change is simply a change in a unit of length, it is reasonable to require that going from f (r) from f (  r) should not change the relative quality of the force functions, i.e., if a family fC  f (x)gC is better that the family fC  g(r)gC , then for every  > 0, the family fC  f (  r)gC must be still better than the family fC  g (  r)gC . So, we arrive at the following definitions. 7 Definitions and the Main Result Definition 1.  By a force function we mean a strictly decreasing function from non-negative real numbers to non-negative real numbers.  By a family of functions we mean the family fC f (r)gC , where f (r) is a given force function and C runs over arbitrary positive real numbers.  A pair of relations (; ) is called consistent [8] if it satisfies the following conditions: (1) if a  b and b  c then a  c; (2) a  a; (3) if a  b then b  a; (4) if a  b and b  c then a  c; (5) if a  b and b  c then a  c; (6) if a  b and b  c then a  c; (7) if a  b, then b  a or a  b are impossible. Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference Copyright c 2001 Definition 2.  Assume a set A is given. Its elements will be called alternatives. By an optimality criterion we mean a consistent pair (; ) of relations on the set A of all alternatives. If b  a, we say that a is better than b; if a  b, we say that the alternatives a and b are equivalent with respect to this criterion.  We say that an alternative a is optimal (or best) with respect to a criterion (; ) if for every other alternative b either b  a or a  b.  We say that a criterion is final if there exists an optimal alternative, and this optimal alternative is unique.  Let  > 0 be a real number. By the -rescaling R (f ) of a function f (r) we def mean a function (R (f ))(r) = f (  r).  By the -rescaling R (F ) of a family F , we mean the set of the functions that are obtained from f 2 F by -rescaling. In this paper, we consider optimality criteria on the set F of all families. Definition 3. We say that an optimality criterion on F is scale-invariant if for every two families F and G and for every number  > 0, the following two conditions are true: i) if F is better than G in the sense of this criterion (i.e., G  F ), then ii) R (G)  R (F ); if F is equivalent to G in the sense of this criterion (i.e., F  G), then R (F )  R (G): As we have already remarked, the demands that the optimality criterion is final and scaleinvariant are quite reasonable. The only problem with them is that at first glance they may seem rather weak. However, they are not, as the following Theorem shows: Theorem. If a family F is optimal in the sense of some optimality criterion that is final and scale-invariant, then every force function f (r) from this optimal family F which has the form f (r) = A  r? for some real numbers A and . Thus, our theorem says, in effect, that the force functions used by J. Keller (or, to be more precise, a slightly more general class of force functions) are indeed optimal. 8 Proof 1. Let us first prove that the optimal family Fopt is scale-invariant, i.e., that R (Fopt ) = Fopt for all  and . Indeed, let  > 0 be a positive real number. By definition of optimality, for every family F 2 F , we have F  Fopt . In particular, we can conclude that R1= (F )  Fopt . From the scale-invariance of the optimality criterion, we can now conclude that R (R1= (F ))  R (Fopt ), i.e., that F  R (Fopt ). This is true for every family F 2 F and therefore, the family R (Fopt ) is optimal. But since the criterion is final, there is only one optimal family; hence, R (Fopt ) = Fopt . So, the optimal family is indeed scale-invariant. 2. Scale-invariance R (Fopt ) = Fopt means, in particular, that for every function f (r) from the optimal family Fopt , we have R (f ) 2 Fopt . By definition, R (f ) means f (  r), and Fopt consists of all the functions of the type C  f (r). Thus, we conclude that for every  > 0, there exists a C > 0 (depending on ) for which, for every r, we have f (  r) = C ()  f (r): (1) All decreasing solutions of this functional equations are known (see, e.g., [1]); these solutions are f (r) = A  r? . Thus, the theorem is proven. For readers’ convenience, let us describe how this can be proven when f (r) is a differentiable function. In this case, f (  r) is a differentiable function as well, hence their ratio C () = f (  r)=f (r) is also differentiable. Thus, both sides of the equality (1) are differentiable with respect to . Differentiating both sides relative to  and substituting  = 1, we get the following equation: r  dfdr(r) = C0  f (r); (2) where C0 denotes the derivative of the function C () at  = 1. Multiplying both sides of the equation (2) by dr=(r  f ), we conclude that: Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference Copyright c 2001 C0  drr : Integrating, we get ln(f ) = C0  ln(r) + C1 . df f = Hence, f = exp(C  ln(r) + C1 ) = exp(C1 )  rC0 ; i.e., the desired formula for A = exp(C1 ) and = ?C0 . Acknowledgments This work was supported in part by NASA grants NCC5-209 and NCC 2-1232, by the Air Force Office of Scientific Research grant number F49620-00-1-0365, by Grant No. W-00016 from the U.S.-Czech Science and Technology Joint Fund, and by Grant NSF 9710940 Mexico/Conacyt. The authors are thankful to Jim Keller for useful discussions. References [1] J. Aczel, Lectures on functional equations and their applications, Academic Press, N.Y.– London, 1966. [2] J. Allen, Maintaining knowledge about temporal intervals”, Communications of the ACM, 1983, Vol. 26, pp. 832–843 (reprinted in [10]). [3] J. Allen, Towards a general theory of action and time, Artificial Intelligence, 1984, Vol. 23, pp. 123–154. [4] J. Allen and P. Hayes, A common-sense theory of time, Proceedings of the 9th International Joint Conference on Artificial Intelligence, Los Angeles, 1985, pp. 528–531. [5] J. F. Allen and H. A. Kautz, A model of naive temporal reasoning, In: J. R. Hobbs and R. C. Moore (eds), Formal theories of the commonsense world, Ablex, Norwood, NJ, 1985, pp. 251–268. [6] J. M. Keller and P. Matsakis, Aspects of hugh-level computer vision using fuzzy sets, Proceedings FUZZ-IEEE’99, Seoul, Korea, 1999, pp. 847–852. [7] P. Matsakis and L. Wendling, A new way to represent the relative position between areal objects, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, Vol. 21, No. 7, pp. 634–643. [8] H. T. Nguyen and V. Kreinovich, Applications of continuous mathematics to computer science, Kluwer, Dordrecht, 1997. [9] O. Sjahputera, J. M. Keller, P. Mastakis, P. Gader, and J. Marjamaa, Histogram-based matching measures, Proceedings of the 19th International Conference of the North American Fuzzy Information Society NAFIPS’2000, Atlanta, Georgia, July 13–15, 2000, pp. 392–396. [10] D. Weld and J. de Kleer, Qualitative reasoning about physical systems, Morgan Kaufmann, San Mateo, CA, 1989. Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference Copyright c 2001