CENTRAL CIRCULATION BOOKSTACKS The person charging this material is re- sponsible for its renewal or its return to the library from which it was borrowed on or before the Latest Date stamped below. You may be charged a minimum fee of $75.00 for each lost book. Theft, mutilation/ and underlining of book* are reasons for disciplinary action and may result in dismissal from the University. TO RENEW CALL TELEPHONE CENTER, 333-8400 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN MA V DEC 14 1998 JAN 1 (i 2Q( 01 ls% When renewing by phone, write new due date below previous due date. L162 Digitized by the Internet Archive in 2013 http://archive.org/details/wellfoundedorder932plai f C.2f UIUCDCS-R-78-932 JyuJ&t! UILU-ENG 78 1724 Well -Founded Orderings for Proving Termination of Systems of Rewrite Rules by July 1978 David A. Plaisted DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN • URBANA, ILLINOIS UIUCDCS-R-78-932 Well -Founded Orderings for Proving Termination of Systems of Rewrite Rules by David A. Plaisted Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 July 1978 This research was supported in part by the National Science Foundation under Grant MCS 77-22830. Well -Founded Orderings for Proving Termination of Systems of Rewrite Rules ABSTRACT Well-founded partial orderings can be used to prove the termination of programs, and can also be used for algebraic simplification. A new class of well-founded orderings is presented which can be used to prove the termination of programs expressed as sets of rewrite rules. The orderings are syntactically defined in terms of a lexicographic ordering and an ordering on multisets and require an ordering on the function and constant symbols to be specified. This technique of proving termination appears to be of practical importance, because it is able to handle re- write rules that arise from typical recursive programs. Several efficient algorithms are presented which allow the termination of a set of rewrite rules to be verified in linear time, in cases to which this method applies. These results can be viewed in terms of an incomplete system of logic in which short termination proofs exist. The well-founded orderings may also be useful for proofs by mathematical induction in various areas of mathematics. A general characterization of a class of rewrite rules is presented, for which termination can be proved using these orderings. 1. Introduction Many programs can conveniently be expressed as sets of term rewriting rules. For example, a program to compute the factorial function can be expressed as follows: fact(s(x)) ■* s(x) * fact(x) fact (0) ■+ 1 The append function in LISP can be expressed this way: append (cons(x,y) ,z) -»■ cons(x,append(y,z)) append (NIL,z) -* z We are interested in efficient, general methods for proving that such sets of rewrite rules always terminate. Although this is in general an un- decidable problem [6], we have found a method that works well in many cases of practical interest. For related work, see [4], [5], [7], [10], [11] Suppose that s l ■*■*! s + t n n is a set of rewrite rules. We have found a class of partial orderings on terms with the following properties: 1. If t- < s. in the ordering for 1 < i < n, then the set ii J — — of rewrite rules is guaranteed to always terminate. 2. Given s^ and t. , there is an efficient procedure for deciding if t. < s. in the ordering. This paper presents an efficient, simple method for proving termination ol sets of rewrite rules in many cases of practical interest. Common cases that our method cannot deal with are also presented. -3- Partial orderings of the kind we investigate are useful in devising general methods for algebraic simplification. Some of the most powerful theorem provers currently in use rely heavily on general sim- plification procedures ([1], [2], [3], [12], [13]). Also Lankford [9] has described a complete theorem proving strategy for the first-order predicate calculus which is based on the concept of simplification. See [8] for another application of such orderings. Our approach is related to that of Dershowitz [4]. He presents many specialized methods, some of which can provide termina- tion proofs that our approach cannot provide. We present a, single., general method which can handle cases not covered by Dershowitz's techniques. Huet [5] is primarily concerned with showing that term re- writing systems are "confluent", and not with exhibiting particular well- founded orderings. Iturriaga's work [7] is hard to understand and not generally available. Lipton and Snyder [10] give another method for proving termination. Of all the above approaches, only Dershowitz's approach and the approach presented here are suitable for proving termination of rewrite rules of the kind obtained when writing recursive programs. Dershowitz's nested multiset ordering is directly applicable only to one function symbol at a time, whereas the ordering presented here is directly applicable to many function symbols at the same time. Also, we give explicit programs for computing the ordering and analyze their time complexity. In addition, we give a general characterization of a class of rewrite rules all of which can be handled by our ordering. Both of these techniques seem to be new in this area, although Dershowitz may use the latter. Perhaps Iturriaga's ordering can be adapted to our purposes, but he gives neither an explicit program nor a general characterization for his ordering. -4- 2. Simplification Orderings Definition A term is an expression formed from function symbols, constant symbols, and variables, properly combined, Thus f(x,g(c,y)) is a term. We consider constant symbols as function symbols having no arguments. A term without variables in it is a ground term. We use the usual notion of substitution. A substitution is considered as a multiple replacement, simultaneously replacing all variables of a term by terms. Thus {x <- f(x), y -e g(c)} is a substitution. We assume all function and constant symbols are taken from some set F of symbols. Definition : We say an ordering "<" is well-founded if there is no infinite sequence t-, , t~, t 3 , ... such that t- > t- + , for all i _> 1. (Such a sequence t, , t~, t 3 , ... is called an infinite descending sequence . ) Definition : A partial ordering "<" on terms is a simplification ordering if it has the following four properties: 1. It is a total ordering on ground terms. 2. It is a well-founded ordering. (That is, there are no infinite descending sequences.) 3. (Consistency with respect to substitution) If t-, and tp are terms and t-, < t~ in the ordering, then for all substitutions e, t,0 t. if there is some substitution e with the following properties: s.e is a subterm of s, and t is obtained from s by replacing one occurrence of s.e in s by t.8. Note that if t- < s. in some partial simplification ordering then t.e < s.e also by consistency with respect to substitution. Hence t < s by consistency with respect to subterm replacement. ■6- We say that a set s, -*■ t, , s« •* to, ..., s -*■ t of rewrite rules fails to terminate on input u, if there is an infinite sequence u, , u ? , u~, ... such that for all i >_ 1 , u. + , is obtained from Li- using some replacement in the set. If no such infinite sequence exists, we say the set of rewrite rules terminates on input u, . Theorem 2.3. Suppose s, -*• t, , . . . , s -*■ t is a set of rewrite rules. Suppose there is a partial simplification ordering "<" such that t. < s- in the ordering for 1 < 1 < n. Then the set of rewrite rules terminates on all inputs. Proof : Assume the rules fail to terminate on input u, . Let u, , u ? , u 3 , ... be an infinite sequence of terms such that for all 1 L 1 » u i + -i i s obtained from u. by using some replacement in the set of rewrite rules. We showed above that u., n < u. in the ordering, for all i. Hence u-, , u^, u^, ... is an infinite descending sequence in the ordering. But this is impossible, because a partial simplification ordering is well-founded. Thus the given set of rewrite rules is guaranteed to terminate on all inputs. 3. Multisets A multiset is a set in which an element can occur more than once. We write multiset union as * or usually u. The number of occurrences of an element in a multiset A u B is the sum of the number of its occurrences in A and B. Definition : A sequence of function symbols (or sequence ) is a sequence of function and constant symbols from F. -7- Definition : If term t is of the form f (t, , ... , t ) then f is the top-level function symbol of t. Also, {t-, , ... , t } are top-level subterms of t. Definition : If t is a ground term of form f(t, , ... , t ) then a path in t is a sequence beginning with f and followed by a path from some top-level subterm of t. Definition : If t is a ground term then Paths (t) is the multi- set of paths of t. Thus n Paths(f(t 1 , ... , t)) = {f} w Paths(t-). 1 n i=l 1 Paths(c) = {c} for constant symbols c. Thus Paths (f (c,g(a,b) )) = {fc, fga.fgb}. Definition : If a is a path then Subseq(a) is the multiset of subsequences of a. Thus Subseq(fa) = {f,A} Subseq(a) = {f} Subseq(a) tw Subseq(a) where A denotes the empty sequence and concatenation denotes subsequence concatenation. Hence Subseq(fg) = {f,g,fg,A}. Definition : If S is a finite multiset of elements ordered by some total order relation, let List (S) be a list {s-, , s ? , ... , s } of the elements of S in non-increasing order. Each element must appear the same number of times in List (S) as in S. Given a total ordering on some set S, we define a total ordering on finite multisets of elements of S as follows: Suppose U and V are multisets of elements of S. Suppose U f V. Let {u,, u 2 , ... u,} be List (U) and let {v-, , v-, ... , v £ } be List(V). -8- If List (U) is a proper prefix of List(V), we say U < V. If List (V) is a proper prefix of List (U), we say V < U. Otherwise, let j be min{i: u, f v.} Then we say U < V if u. < v., U > V if u, > v.. Note that if the ordering on S is well-founded, so is the order- ing on multisets of elements of S. We assume that there is a total ordering on the function and con- stant symbols F appearing in terms t that we are dealing with. If g > f in this ordering, we intuitively think of g as being more complicated than f, and we say g is larger than f. We order sequences of symbols lexicographically. Suppose F = (f^, f ^ , ... f^) and G = ( g Q , g-j , ... , g ) are two such sequences of symbols. Suppose F | G. If F is a prefix of G, we say that F < Gin the lexicographic ordering. If G is a prefix of F, we say G < F. Other- wise, let j be min{i: f . ± g.}. We say F < G if f. < g. , F > G otherwise. Although this is not a well-founded ordering on sequences, useful well- founded orderings can be obtained from it. 4. A Subsequence Ordering on Paths We order multisets of sequences by extending the sequence ordering to multisets as indicated above. If a and 3 are paths, we say a < $ in the subsequence ordering if Subseq(a) < Subseq(p) in the multiset ordering for subsequences. Note that the subsequence ordering on paths is a total ordering. -9- Theorem 4.1 . For all paths a, 3 and for all function symbols f, Subseq(a) < Subseq(3) iff Subseq(fa) < Subseq(fg). The proof is quite simple. Definition : If a is a path and g is a function symbol, let # (a) be the number of occurrences of g in a. Definition : If a is a path, let maxf(a) be the maximal function symbol in a. Let maxf(a,3) be the maximal function symbol in a or 3, if a and 3 are paths. Let R(a,3) be the following relation on paths a and 3: Suppose g is maxf(a,3). Suppose a is ougapg. . .a ga ., and 3 is 3-,g3 2 g. . .3 g3 +1 where # (a)=m and # (3)=n. Then R(a,3) is true if m g. Then List(a) begins with f,m*gf, (l^ggf, and List(e) begins with f ,n*gf ,(|J)*ggf ,. . . Hence if mg, Subseq(cf){f} < Subseq(3){f }. Suppose # q (a) = # g (3). Suppose f>g. Then [_ist(Subseq(a){f }) and List(Subseq(3){f>) will both begin with f m*gf , (o)*99' ,r ' • • • >( m -i ^s" 1 " f 'g mf - After this, they will agree on all sequences having j or more g's at the beginning. Also, they will agree 12- on all sequences having j-1 g's not all of them chosen from the first j-1 g's. Hence the ordering on a and 6 will be determined by the ordering on Subseq(a l .gy){f} and Subseq(3 -gy Hf) for suitable y. As before, Subseq(a.gy ){f}= Subseq(a,y ){f } * Subseq(a .){g} Subseq(y){f} and J J \J Subseq(3 i 9y){f} = Subseqfa .y){f } * Subseq(3 -){g} Subseq(y )(f }. Using J J J induction and an argument as above, we obtain that Subseq(a){f} < Subseq(3)(f} Suppose # a (a) = # a (3) and f < g. In this case, Subseq(a){f} y y and Subseq(3){f) agree on all subsequences beginning with j or more g's. They also agree on all subsequences beginning with j-1 g's not all chosen from among the first j-1 g's. Hence the ordering on Subseq(a){f} and Subseq(3){f) is determined by the ordering on Subseq(a.gy ){f} and Subseq(js.gy ){f}, as above. Using induction and an argument as above, we obtain that Subseq(a){f} < Subseq(3)(f }. This completes the proof. Corollary 1: For all paths a, 3 and all function symbols f, Subseq(a) < Subseq(3) iff Subseq(af) < Subseq(3f). Proof : Subseq(af) = Subseq(a) w Subseq(a){f} and Subseq(3f) = Subseq(3) * Subseq(3){f }. Also, Subseq(a){f} < Subseq(3){f> iff Subseq(a) < Subseq(3) by the above theorem. Corollary 2: The subsequence ordering on paths is a well- founded ordering. Proof : By the above theorem, Subseq(a) < Subseq(3) iff R(a,3). But it is easy to see that the relation R is a well-founded partial ordering on paths. This result is slightly surprising, since the multiset ordering on arbitrary multisets of sequences is not well-founded. However, if we restrict ourselves to multisets of sequences obtained from paths, this ordering is well-founded. -13- Definitio n: If a is a path, let Desc(a) be the multiset of non-increasing subsequences of a. Theorem 4.3. For all paths a and 3, Subseq(a) < Subseq($) iff Desc(a) < Desc (3). Proof : By induction on naxf(a-$). Let a be a,ga g. . .a ga r , + - ) and let 3 be 3-igBo9---B 93 +1 , where g = maxf(a.e) and # (a) = m and # g (e) = n. We show Subseq(a) < Subseq(3) implies Desc(a) < Desc(3). It is clear that Subseq(a) = Subseq(3) implies Desc(a) = Desc (3), and Subseq(a) > Subseq(B) implies Desc(a) > Desc(3) by symmetry. Assume that Subseq(a) < Subseq(3). If m Subseq(a.)< Subseq(3j). Hence Subseq(a.a , +] . . .a m+1 ) < Subseq(0.3j +1 • • -B m+1 ) • We can assume inductively, therefore, that Desc(a.a. + -, . . .a + , ) < Desc(3 j 3 j+r ..3 m+1 ). Hence g J_1 Desc( aj a j+1 . . .a m+] ) < g J " 1 Desc(3 j 3 j+] • . -3 m+1 ) and the theorem is proven. -14- Note that the lexicographic ordering on non-increasing sequences of function symbols is a well-founded ordering. Hence the induced multiset ordering on multisets of such sequences is also well- founded. This gives us another way to show that the subsequence ordering on paths is well-founded. We now derive some results which are related to an efficient algorithm for computing the subsequence ordering on paths. Theorem 4.4. Suppose x is the maximal subsequence of a path a. Then x is a non-increasing sequence and the last symbol of x is the same as the last symbol of a. Theorem 4.5. Suppose x and y are the maximal subsequences of a and 3, respectively. Suppose the last symbols of a and 3 are dif- ferent. Then x and y differ, and so Subseq(a) < Subseq(3) iff x < y. Theorem 4.6. Suppose a and 3 are paths, and f and g are symbols. Suppose that a < 3g in the path ordering, and f < g in the symbol ordering. Suppose that the last symbol of a is not g. Then af < 3g in the path ordering. Proof : Let f,f ...f be the maximal subsequence of a, and let g,g,,...g be the maximal subsequence of 3g. Note that both subsequences will be non-increasing sequences. Since the last symbols of a and Bg differ, the sequences f-,fo'" f m ancl 9i99' ' " 9 (i ' 1 '^ er » so f^f 2 ---f m < g-j 92 * • • 9 n in the lexicographic ordering. Since g is g and 9]92'-'9 n is a non-increasing sequence, g. ^g for all i, l<1 Vz.-fjf since g^.-.g,, > V 2 -»V -15- Suppose g, g 2 - . . g - is identical to f-|f 2 ...f.. Then j < n, because j = n implies a > eg in the path ordering. Since f < g, we have f < g. + , and so f-|fp...f-f < g-igo- • •9-j9-j + i • • -9 n - Thus a f < 3g in the path ordering. Theorem 4.7. Suppose that a and 3 are paths. Let a = a'y and 3 = 3'y where y is the longest common suffix of a and 3. Let x and y be the maximal elements of Desc(a') and Desc($'), respectively, in the lexicographic ordering on subsequences. Then Subseq(a) < Subseq(3) iff x < y in the lexicographic ordering on subsequences. (In fact, x and y are the maximal elements of Subseq(a') and Subseq(s'), respectively.) 5. A Path Ordering on Terms Given ground terms t-, and to, we order them by the multiset order- ing on their paths. That is, we order paths a and 3 by the subsequence ordering, i.e., a < 3 iff Subseq(a) < Subseq{3). We then order t-, and t 2 by the multiset ordering on Paths(t,) and Paths(to). Thus, if List(Paths (t-j)) s {<»]• a 2 i ••• > a m > and List(Paths(t 2 ) ) = {3i» B 2 » ••• » 3 n >, we say t-, < t 2 if List (Paths(t 1 )) is a proper prefix of List(Paths(t 2 )) or if Subseq(a-) < Subseq(B-) where j = min{i: a. f g. }. Hence we are really using two levels of multisets to order t, and t ? : multisets of paths , each path considered as a multiset of sequences . Note that this gives a well-founded ordering on ground terms, since we are ordering multisets of paths and the ordering on paths is a well-founded ordering. Definition : The path ordering on terms ^s ::o fined as follows: a) If t, and t 2 are ground terms, t, < t 2 if Paths(t-,) < Paths(t 2 ) in the above ordering. 16- b) If t 1 and to are non-ground terms, t ] < to if for all e such that t,e and t 2 e are ground terms, Paths(t ] o) < Paths(t 2 e) in the above ordering. Theorem 5. The path ordering on terms is a partial simplification ordering. Note that we may have Paths(t,)= Paths(t 2 ) even if t, and to are not identical. For example, f(a,b) and f(b,a) have the same multiset of paths. If t, and t 2 are ground terms and have unequal multisets of paths, then either t, < t 2 or t 2 < t, in the path ordering on tei ;rms 6. Computing the Orderings Given paths a and 3, we present an algorithm to compute whether Subseq(a) < Subseq(3). Let |a| be the length of a, i.e., the number of occurrences of symbols in a. Let a[i] be the i symbol in a, and assume that the first symbol is a[l]. procedure order(a,3); m •*- |a | ; n «- |e| ; while a[m] = 3[n] and m > and n >0 do (m «- m - 1; n -<- n - 1); j_f m = and n = then return ("a = 3"); i <- 1 ; j +■ 1 wh i 1 e i <_ m and j < n do ( if a[i] > e[j] then j <- j + 1 else if a[i] < 3[j] then i •*■ i + 1 else (1 «- 1 + 1; j «- j + 1);); j_f i ' rn then return ("3 > a") ejje_ return ("a > 3"); end ordf'r; -17- This method requires at most |a| + 1 3 1 - 1 comparisons to compute the ordering on a and 3. It is easy to show that any comparison-ba^ed method of computing this ordering requires at least |a| comparisons on in- puts a, 3 such that a < 3 in the subsequence ordering on paths. Wo now show that the procedure "order" is correct. Let y be the longest common suffix of a and 3. Suppose a=a'y and 3=3'y . At the end of the first while statement, m=|a'| and n=|e'|. If a=3 then m=n=o and the procedure will return "a=3". We show inductively that the second while statement, started on strings a 1 and 3' which do not have the same last character, will work correctly. If a'=A and 3'^A or if a'^A and 3'=A, it is clear that the program works correctly. Suppose a'^A and 3 ' j^A Let a' be a-.g-.a2 and let 3' be 3i 9p e 2 ' wnere 9i 1S maxf(a' ) and # (a-.)=o. Also, g 2 is maxf(3') and # q (3,)=o. Thus we have specified the first occurrences of g, and g 2 in a 1 and 3', respectively. Suppose g-, > g 2 . Then i will stop scanning at |a-,| +1 but j will continue all the way through 3' and so eventually j > n and the algorithm will return "a>3", which is correct. Similarly, if g 2 > g-i the algorithm will return "3>a", which is correct. Suppose g-| = g 2 - Tnen a>3 iff a 2 > B 2 » by previous results and the fact that a 2 t 3 2 - We eventually have i = |a-, | + 1 and j = |b,| + 1. This causes the statement ("M+l; W+l) to execute. From then on, the while statement behaves as if it were started on a 2 and Bp. We can assume inductively, therefore, that the program will output "a > 3" if a 2 > 3 2 , and "a < 3" if a 2 < 3 2 - But a 2 > 3 2 iff a > 3 and a 2 < 3 2 iff a < 3, so the program is correct. This completes the proof. -18- The following algorithm orders two paths a and B in the subsequence ordering on paths. It scans a and B from back to front rather than from front to back. Notation is as in the previous algorithm. procedure orderb(a,B); i«-|a| ; while a[i] = B[j] and_ i>0 and j>0 do ("M-l; J+-J-1); if i=0 and j=0 th_en return("a=B") ; if i=0 and j>0 then return ( "a0 anjd j=0 then return ( "a>B" ) ; LI: if i>0 then (1+1-1; if a [i]<3[j] then goto LI e lse goto L2) e lse return ("a<3"); L2: if j>o then (W-lJ if a[i]>B[j] then goto L2 else goto LI) else return ("a>B"); end orderb; The reason this algorithm works is the following: Let N(y-|»Yo) be the following relation on paths y, and y «: Y, < Y„ in the subsequence ordering, but Y, > (y« with the first symbol removed) in the subsequence ordering. Then for all paths Y,, Y 2 we have that N(y,, grJand f < g implies N(f Yl , gy 2 )- N( Y -,, 9y 2 ) and f > g implies N(gy 2 ,f Yl ). In the above algorithm, let y be the longest common suffix of a and p. Suppose a s a'Y and B=B'y- Let a, be a[i ]a[i+l], . .a[m] and let 3, be -19- 3[jMj+l]...e[n], where m=|a'| and n=|e'|. Then at LI, N^, 3-|) is always true and at L2, N(3 r a-,) is always true. Given ground terms t, and t 2 , we compute the path ordering on t, and to as follows: Let SI be Paths^) and let S2 be Paths(t 2 ). If SI = S2 then t, = t~ in the ordering. If SI f S2 and SI c S2 then t, < t 2 in the ordering. If SI =f S2 and S2 c SI then to < t, in the ordering. Otherwise, sort SI and S2 in non-increasing order using the above procedure to compute the subse- quence ordering on paths. We thus obtain Li st (SI ) = {a-,, a ? , ... , a } and Lis t (S2 ) = (3i, 3 2 , ■•■ ' e n^' Let J be mi ' n ^' 1 ' : a i t &j}- Then t i < t ? in the path ordering on terms if a, < 3. in the subsequence ordering on paths. Otherwise t-, > to. Let Size(t) be the number of occurrences of function and constant symbols in a term t. Suppose Size(t,) = q, and Size(to) = qo- Then SI has at most q, elements. Also, each element of SI is of length at most q,. Hence each comparison done while sorting SI requires at most 2q-, - 1 comparisons of symbols, and we can sort SI in 0((q, log q,)(2q-, - 1)) or 2 2 (q-. log q-. ) comparisons. Similarly we can sort S2 in 0(q 2 log q 2 ) compari- 2 2 sons. Finding j requires at most min(q,, q 2 ) comparisons and comparing a. and 3. requires at most q-, + q 2 - 1 comparisons. So the entire algorithm 2 2 is of complexity 0(q, log q-, + q 2 log q 2 ) comparisons. It is oossible to do bettor thon this, l-.'c present a method which can sort n paths in time 0(L min(logk,logn)) where L is the sum of the lengths of the paths and k is the number of distinct function symbols. We present a related method for computing the path ordering on ground -20- terms t, and t«. The time required is 0(Lmin(logk,logL) ) where L is size (t^) + size (t ? ) and K is the number of distinct function symbols. Hence this is a linear time method if the set of function symbols is fixed. Given in paths, the following algorithm will output them in order, smallest first, according to the subsequence ordering on paths. The time required is 0(|_ min(logk, logn)) where L is the sum of the lengths of the paths and k is the number of distinct function symbols. The algorithm considers v(j) to be the j path, for 1 <_ j < n, and v(j) is considered to be a stack. Thus f*=v(j) removes the first function symbol from v(j) and assigns it to f. The function symbols are identified with the integers {l,2,...,k}. The function symbols are ordered by the usual ordering on the integers. Thus 1<2, 2<3 et cetera. The algorithm makes use of k queues Q, , Q ? , ..., Q. together with another queue T. The empty queue (or stack) is denoted by A. Stack and queue operations are indicated in the usual way by <= . Thus j<=T means remove an element from the queue and assign it to j. Statements such as T«-A and T-eQ. are not queue operations but ordinary assignment statements. Thus T«-a sets T to be the empty queue, and T+-Q. sets T to be the current configuration of the queue Q.. The algorithm outputs a list a-j, a,,, ..., a of integers such that the initial values of v(a,), v(a~), ..., v(a ) constitute the list of paths in nondecreasing order. At any time in the execution of the algorithm, the list Q,Q 2 ...Q k of paths is in increasing order, where each queue is listed from front to back and where we only consider the subsequence ordering on the portion of the path already seen. -21- It would be interesting to know if there is a similar algorithm which outputs the paths largest first, procedure sort(v,n); flag-^true; TVA; for j*-l to n do T^j ; for j-*-1 to_ k do Q.-<-A; while flag dp_ (while TYA do (J-T; if v(j)=A then output(j) else (f-v(j); Q f <=j);); lf(3J)(QjM) then (i-Hnin{j:Q.^A}'» [Use a priority queue for this] T^; Q^A) else flag^-false; ) ; end sort; To show that procedure "sort" works, it suffices to show that on any pair v(i), v(j) of input strings, it behaves the same as procedure "order". For strings with differing last characters, this is easy to do. For strings with common nontrivial suffixes, this is more difficult but still can be done by considering the relative position of v(i) and v(j) within a queue, if they are in the same queue. -22- The following algorithm also sorts n paths v(l), v(2), ..., v(n) and outputs the integers {1, 2, ..., n} in an order corresponding to a listing of the paths in non-decreasing order. Notation is as in procedure sortj The difference is that the paths are scanned from back to front rather than from front to back . That is, f*=v(j) means "Let f be the last symbol in path v(j) and delete this symbol from the end of v(j)." However, the queue operations j *■ Q. et cetera are as before. This algorithm is more useful than the preceding one in certain situations. procedure sortb(v,n); for j=l to_ k do_ Q.-Hl; for j=l to n do if v(j)=A then output(j) else (f-v(j); Q f *-J); while (3j)(QjM) do (iVmin{j:Q./A}; [Use a priority queue for this] if v(j)=A then output(j) else (f-v(j); Q f -j);); end sortb; We can verify that "sortb" works by showing that it treats input strings v(i), v(j) the same way that "orderb" does. Common nontrivial suffixes do not present a problem because the strings are scanned from back to front. ■23- We present an algorithm which will, given a set {t, ,...,t } of n ground terms, sort the multiset .*, Paths (t- ) in time 0(Lmin(logk, logL)) n where L = .£, Size(t-) and k is the number of distinct function symbols in the terms t . . For purposes of this algorithm, we associate subterms of a term t with prefixes of paths in t. If the same subterm occurs more than once in t, we associate a prefix of a path with each occurrence of the subterm in t. Suppose t is of the form f(v, ,...,v ). With t itself we associate the path consisting of the single function symbol f. Suppose w is a subterm of v. for sum i, l . . .< a , b > where b, is arbitrary and b. = 1 if a. and a. -, are identical, b- = o otherwise. If path a. occurs in more than one term, we are assuming that it is counted the right number of times as coming from each term. By scanning this list in reverse, we can easily find the largest term in {t-,, t 2 , ..., t } in linear time. It is not difficult to modify the procedure sortt so that it computes the bits b. correctly. Define Paths(t) for a non-ground term t as follows: Paths (x) = {x} for variables x Paths (c) = (c) for constant symbols c Paths(f(t r ... , t n )) = {f} w Paths(t i ) Thus Paths(f(g(x, ,c) , x 2 )) = {fgx, , fgc ,fXp}. Define Size(t) for a non-ground term t to be the number of occurrences of function and constant symbols and variables in it. Given not necessarily ground terms t, and t ? , we compute the path ordering on t-. and to as follows: Let Paths (t^ be s] u S^ u . . . u sj u R and let Paths (t 2 ) be S^ u s 2 u ... u S^ u R 2 , where S 1 and R. are all multisets. Also, {x ] , ... , x. } is the set of variables appearing in t-, or t 2 , and S. = {a: ax. e Paths (t.)}. In addition, R. is the set of paths of t. ending with a constant symbol Thus, if t ] = f(g(x r c),x 2 ) then s] = {fg}, S 2 = {f}, R ] = {fgc}. -26- Theorem 6.1 . t-, < t 2 in the path ordering on terms iff both of the following are true: ,1 of paths. 1 p a) For all j, 1 <_ j <_ k, S ! <_ s . in the ordering on multisets b) t-|9 < t ? 6 in the ordering on terms, where e is the substitu- tion replacing all variables by the minimal constant symbol of F. The direct method of applying this theorem yields an algorithm whose worst 2 2 case complexity is 0(qj log q-j + q 2 log q„) where q. = Size(t-). Using techniques similar to those mentioned above, we can compute the ordering on non-ground terms in linear time, assuming that the set of function symbols is fixed. rt ! e now develop some results leading up to a nroof of the above theorem, and present an efficient algorithm for computing the path ordering on non-ground terms. Theorem 6.2. Suppose S, * T, > S ? w T ? in a multiset ordering, where S, , T, , S ? , and T ? are multisets of elements from some totally ordered set. Suppose T, >_ T ? in the multiset ordering. Suppose a is a mapping on elements of T, and T ? which is 1-1, order preserving, and monotone increasing on T, w T ? . Then S, w (T,a) > S ? * (T^a) in the multiset ordering. Proof : It suffices to consider the case in which S, n S^ = and T-j n T„ = since elements which occur the same number of times in S^ and S^ or T, and Tp don't affect the argument. Assume then that S 1 n S« = and T, n T = 0. Let s,, Sp, t-, , and t~ be maximal elements in S, , S~, T, , and T„, respectively. Then t-, > tp since T, > T ? . Also, t-,a > t^a -27- since a is 1-1 and order preserving. In addition, t,a > t, and t ? a > t ? since a is monotone increasing. Furthermore, t,a and t«a are the maximal elements of T, a and T2a, respectively. Suppose that s, >_ t, . Then s, > s 2 since max(s,, t, ) >_ max(s ? , t ? ) and S, n Sp = 0. Also, t-,a > t 2 a as noted above. Hence max(s,, t,a) > max(s 2 , tpa) so S, * T,a > S 2 w T^a. Suppose that s, < t, . Then t, > s 2 since max(s, , t-. ) >_ max(s 2 , t 2 ). Hence t-.a > s~ since a is monotone increasing. Also, t-,a > t~a since a is 1-1 and order preserving. Hence t-,a > max(s ? , t ? a) so S-j * T^ > S 2 W T 2 a. Corollary: Suppose S, itfV, m ... i«V >S,*W, w ... mW J vv 1 1 n 2 1 n in a multiset ordering as above. Suppose V. ^W. for j = 1, 2, ..., n. J J Suppose the mappings a. are 1-1, order preserving, monotone increasing mappings on V. * W, for j = 1 , 2, . . . , n. Then S ] * v^, * V 2 a 2 * ... * V n a n 2 11 2 2 n n Proof : By repeated application of the above theorem. Proof of Theorem 6.1. : Assume t-, < t 2 . Then by definition of the path ordering on non-ground terms, t-,6 < t ? e where e is as in the theorem. So b) is true. 1 2 Suppose that for some j, S. > S. in the ordering on multisets of paths. Choose substitution a to replace x. by a very large term and x- by wery small terms, for i f j. Then t-, a > t ? a. Hence a) is true if t-, < t 2 . Assume a) and b) are true. We show that t-, < t 2 in the path ordering on terms. Let e' be a substitution replacing all variables -28- x-,, ..., x, by ground terms v-j , ..., v k , respectively. We show that t,e' < t-e' in the path ordering on terms. Suppose c is the minimal constant symbol in F. We know by b) that Paths(t 1 )lc) Paths(t 2 ){c} , where Paths(t ] ){c) denotes {xc:x e Paths(t ] ) and similarly for Paths(t 2 ){c}. By a), S.{c} < S 2 {c} for 1 < j < k. 1 2 Hence n,*S.{c} 5 n.*S.{c} for 1 < j < k, where n, is the number of paths J J J J J in Paths(v-). (Here n,*S.{c} means n. copies of S.{c} unioned together, J J J J J and S.{c} means {xc : XeS.}.) J J k -i ^ 2 Thus .*, n.*S.{c} w R < .w,n.*S.{c} * R Q by the above remark J=l J J 1 J=l J J 2 and b). Thus .^ si Paths (v.) * R ] < * S 2 . Paths (v.) * R 2 by the above corollary. (The mappings we are using are those that replace a path of form 3c by a path of form 3y for some y e Paths (v^), y f {c}. Such a mapping is 1-1, order-preserving, and monotone increasing, as required.) k 1 Note that Paths(t,e ') = .*,S'. Paths(v-) w R and Paths(t ? e ) = k .*,S. Paths(v.) w R . Hence tne 1 < t«e ' in the path ordering on terms. J=l J J 2 12 Since this is true for all e 1 such that t-,6 1 and t 2 ' are ground terms, t, < t 2 in the path ordering on terms. This completes the proof. If t-i and t 2 are not necessarily ground terms, then we can still compute the path ordering on t-, and t ? in linear time using the above theorem, assuming that the number of function and constant symbols occurring in t-, and t ? is bounded. We perform a simple modification to the algorithm for ground terms, as follows: Let x., S., and e be as in the above theorem. We modify the algorithm so that it prints out two lists: 1. The paths of Paths(t-,e) * Paths(t 2 e) in non-decreasing order, each path identified as coming from t, or t ? . -29- The paths inj^sj) *[jVJ] in 2. The paths i n [.,*■, S .; | *| H ^-, S H | in non-decreasing order, where each path is identified as belonging to a particular set S.. That is, the values of i and j are also given. Both of these lists can be computed during the same scan of t, and t ? . Also, by reading these lists in reverse order, it can easily be determined in linear time whether the conditions of the theorem are true. Hence we can compute the ordering in linear time, assuming that the number of function symbols is bounded. A slight subtlety in this and the preceding algorithm is that when scanning the output in reverse, we don't really need to know the whole path for each path on the list. All we need to know is whether the path is identical to the preceding element of the list. Hence the lists can be printed and scanned in linear time, since we only need a constant amount of information per path (assuming that the relevant integer indices don't get too big). The replacements that the path ordering can handle are very similar to those which Dershowitz's nested multiset ordering can deal with [4]. However, the path ordering can deal with more than one function symbol being "pushed in" at the same time. 7. Examples The following replacements are all simplifications in the path ordering on terms. Assume function symbols are ordered alphabetically. Thus a < b, b < c et cetera. -30- g(f(x,y)) - f(g(x),g(y)) e(d(x,y)) ■* d(e(x),e(y)) g(f(x,y)) •> ff(g(e(x)),g(e(y))) g(f(x,y)) - f(f(g(x),g(y)),f(g(x),g(y))) g(e(x)) + f(x,g(x)) e(c(x)) - d(x,e(x)) Note that the same ordering can handle aVj_ of these replacements at the same time. 8. General Characterization We now present a method of obtaining a fairly general class of simplifications in the path ordering on terms. Definitio n: If s and s are ground terms, then the replacement ? , -» s ? is a strong simplification in the path ordering if the maximal element of Paths ( s ? ) is less than the maximal element of Paths (s-i) in the ordering on paths. Defini t ion : If s, and s ? are not necessarily ground terms, then the replacement s -, -* So is a strong simplification in the path ordering if for all substitutions o such that s e and so are ground terns, s.o -» s ? o is a strong simplification in the path ordering. Example s: The following replacements are all strong simplifications in the path ordering on terms, assuming a < b, b < c, et cetera: f(x,y) -» x e(d(x,y)) - e(x) e(d(x,y)) ■* e(c(x)) In general, if s is a subterm of s then s -»■ s is a strong simplification in the path ordering. Also, the preceding six examples of simplifications are all strong simplifications in the path ordering. -31- Theorem 8.1 . Let notation be as in the theorem characterizing the path ordering on non-ground terms. Then t 2 -*• t^ is a strong simplification in the path ordering iff both of the following conditions are true: a) For all j, l count(x) + count(y) h(g(x),g(y)) -» h(x,y) h(g(x),c(y)) - f(h(x,y),h(x,y)) h(g(x),g(y)) -> f(h(e(x),e(y)),n(e(x),e(y))) ■32- We still cannot deal with replacements in which only one argument of h gets simpler, however. Here is an example of such a replacement: h(g(x),g(y)) - e(h(g(x),y)) If we could find a general ordering in which such replacements were simplifications, then we could probably handle the distributive replace- ments of multiplication over addition. The following result helps us to prove the above theorem: Theorem 8.3. Suppose a and 3 are paths such that Subseq (a) < Subseq (3). Let f be a function symbol which is less than the first function symbol of 3 in the symbol ordering. Then Subseq (fa) < Subseq (3) Coro l lar y: Suppose a and 3 are paths such that Subseq (a) < Subseq (3). Suppose 3 is a path composed entirely of function symbols which are less than the first symbol of 3 in the symbol ordering. Then Subseq (got) < Subseq (3) . We still cannot deal with the following two useful replacements using this ordering: (x+y) + z - x + (y+z) x * (y+z) ^x*y+x*z We are developing other methods which can handle these replacements in addition to replacements such as those discussed above. In fact, we have recently developed methods which can handle the second of the above replacements, in addition to replacements similar to those in- cluded in the above characterization. -33- 9. Summary and Conclusions We have described a simple class of well-founded partial orderings which can be used to obtain short proofs of termination of systems of rewrite rules, in many cases which appear to be of practical interest. These "path of subterm" orderings are based on multisets of sequences of function symbols from the terms we are working with. We have given explicit programs which can decide whether s < t in such an ordering, given terms s and t. These programs run in linear time, assuming that the set of function symbols is fixed. Also, we have described a general class of replacements, all of which are simplifi- cations in the ordering presented. We have presented some limitations of this method of partial ordering. In future work we plan to discuss recent results extending these ideas to overcome some of the limitations of the path of subterms ordering, at the expense of some simplicity. ■34- References [I] Ballantyne, A. M. , and Bledsoe, W. W. , Automatic proofs of theorems in analysis using nonstandard techniques. J. ACM 24:3 (1977), pp. 353-374. [2] Bledsoe, W. W., Non-resolution theorem proving. Artificial Intelligence 9:1 (1977), pp. 1-35. [3] Boyer, Robert S., and Moore, Strother J., A lemma driven automatic theorem prover for recursive function theory. Proceedings of the 5th International Joint Conference on Artificial Intelligence, Massachusetts Institute of Technology (1977). [4] Dershowitz, N., personal communication. [5] Huet, Gerard, Confluent reductions: abstract properties and applications to term rewriting systems. Proceedings of the 18th Annual Symposium on Foundations of Computer Science (1977). [6] Huet, Gerard and Lankford, Dallas, On the uniform halting problem for term rewriting systems. Report No. 283, IRIA (March 1978). [7] Iturriaga, R., Contributions to mechanical mathematics. Ph.D. Thesis, Carnegie-Mellon University (1967). [8] Knuth, D. E., and Bendix, P. B., Simple Word Problems in Universal Algebras. Computational Problems in Abstract Algebra, Leech, Ed., Pergamon Press (1970), pp. 263-297. [9] Lankford, Dallas S., Canonical algebraic simplification in compu- tational logic. Report No. ATP-25, Southwestern University, Department of Mathematics, Georgetown, Texas (May 1975). [10] Lipton, R. J., and Snyder, L. , On the halting of tree replacement systems. Proceedings of a Conference on Theoretical Computer Science, University of Waterloo, Canada (1977). [II] Manna, Z., and Ness, S., On the termination of Markov algorithms. Proceedings of the Third Hawaii International Conference on System Sciences (1970). [12] Suzuki, Norihisa, Automatic program verification II: verifying programs by algebraic and logical reduction. Report No. STAN-CS- 74-473, Stanford University (1974). [13] Weyhrauch, R. W. , A users manual for FOL. Stanford Artificial Intelligence Laboratory Memo AIM-235.1 (1977). BLIOGRAPHIC DATA EET 1. Report No. UIUCDCS-R-78-932 Tide and Subt itle "Well -Founded Orderings for Proving Termination of Systems of Rewrite Rules" 3. Recipient's Accession No. 5. Report Date July 1978 Author(s) David A. Plaisted 8. Performing Organization Rept. No. Performing Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 10. Project/Task/Work Unit No. 11. Contract/Grant No. MCS 77-22830 Sponsoring Organization Name and Address National Science Foundation Washington, D.C. 13. Type of Report & Period Covered 14. Abstracts: Well-founded partial orderings can be used to prove the termination of programs, and can also be used for algebraic simplification. A new class of well- founded orderings is presented which can be used to prove the termination of programs expressed as sets of rewrite rules. The orderings are syntactically defined in terms of a lexicographic ordering and an ordering on multisets and require an ordering on the function and constant symbols to be specified. This technique of proving termination appears to be of practical importance, because it is able to handle rewrite rules that arise from typical recursive programs. Several efficient algorithms are presented which allow the termination of a set of rewrite rules to be verified in linear time, in cases to which this method applies. These results can be viewed in terms of an incomplete system of logic in which short termination proofs exist. The well-founded orderings may also be useful for proofs by mathema- tical induction in various areas of mathematics. A general characterization of a class of rewrite rules is presented, for which termination can be proved using these orderings. Key Words and Document Analysis. 17a. Descriptors proof of termination, total correctness, well-founded sets, rewrite rules, term rewriting systems, tree replacement systems, equational systems, reductions, mathematical induction, sorting, algebraic simplification, theorem proving, recursive programs, recursive schemata, partial orderings. >• Identifiers/Open-Ended Terms COSATI Field/Group Availability Statement 19. Security Class (This Report) IINri.A.S.SlFlED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 22. Price NTIS-35 ( 10-70) USCOMM-DC 40329-P7 1 2i m ■ r r~,