UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN Sn* i efor e tte u2L?K borrowed ffsp f DEC 7 2(j|0 WrtJSTS!,*' Ph °* ,e ' "*■ "• *« *„ below U62 no. 981 MaU UIUCDCS-R-79-981 Canonical Simplification of Finite Objects Well Quasi -Ordered by Tree Embedding by Thomas C. Brown, Jr. August 1979 UILU-ENG 79 17 27 Canonical Simplification of Finite Objects Well Quasi -Ordered by Tree Embedding THOMAS C. BROWN, JR. ++ Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois March 1979 This work was supported in part by the National Science Foundation under grant NSF MCS 77-22830. Author's mailing address: Computer Science Department, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1 . ABSTRACT A finite object space can be viewed as the set of finite and infinite (edge-) ordered trees representable as finite ordered vertex- labeled digraphs. We show that the order-preserving homeomorphic tree- embedding relation on finite objects over a well quasi -ordered (wqo) label alphabet is wqo. Canonical simplifiers require well-founded object- orderings to orient and ensure finite termination of replacement rules. We characterize these orderings as homomorphic refinements of the wqo tree- embedding relation, and we show this relation's time complexity to be O(mxn) where m and n are the edge counts of the digraph representations. Digitized by the Internet Archive in 2013 http://archive.org/details/canonicalsimplif981brow -1- 0. Introduction Conceptually infinite objects arise naturally in A-calculus and related programming formalisms familiar to computer scientists [1]. Many of these objects - e.g., effective procedures and the objects they process - have finite representations as (edge-) ordered digraphs with vertex labels in some previously defined finite object space. Canonical simplifiers and other procedures use well-founded orderings on such spaces to orient their replacement rules and ensure termination of their computations. This note establishes a fundamental connection between these orderings and a homeomorphic embedding relation on the finite or infinite ordered tree representations of finite objects. A quadratic time-bounded embedding procedure and a A-calculus application are included to illustrate the practical significance of this connection. The concepts involved are illustrated by the following pair of labeled digraphs: ^ These objects may represent recursive program schemas with conditional branches, or terms in a A-applicative syntax with fixed point operators [53] Incomparable under the (NP-complete [10]) graph-embedding relation, they are equivalent under the induced order of homeomorphic embedding on their infinite ordered tree representations [§1.13]: (1) -2- - C/ v r/\b a^d dT*Sfe»c/w{ •••-.. The indicated vertex correspondence shows the initial part of an embedding h: u < v as defined precisely in §1. It preserves the label ordering (identity in this case) on corresponding vertices, and it embeds distinct branches descending from a vertex in distinct branches descending from the corresponding vertex, in accordance with the orderings on their outgoing edge sets. Homeomorphic embedding and equivalence are decided in 0( x ) elementary operations by the procedure described in §2, where is the number of edges in the finite digraph representation of u. The basic result of §1 is that finite objects over a well quasi-ordered (wqo [13]) vertex label alphabet are wqo by the tree embedding relation. A quasi-ordered set (Q, < ) (where < is reflexive and transitive) is wqo if (i) it is well founded: q. > q k+ -i(keu>) implies q^ < CL+^k rL n ) for some number n e u>; and (ii) it admits no infinite an ti chains A c Q, where u j? v(u, v e A) (iii) every infinite sequence g:oo -*■ Q has an infinite subsequence (q :kew)(j. < j. ,) where q. < q. (k eu). J k K K ' J k J k+1 Thus every infinite sequence of finite digraph-representable trees over a finite (or more generally, wqo) alphabet has an infinite homeomorphic embedding subsequence. -3- This result extends Kruskal's tree theorem [17] from finite trees to finitely representable trees by a straightforward extention of Nash-Williams' simplified finite-tree argument [26]. Thus it strengthens the general maxim that finite object spaces constructed reasonably from wqo building blocks are wqo. For transfinite objects (infinite sets, sequences, or trees) the maxim fails [32]. On the basis of Rado's and other counterexamples due to Kruskal , Nash-Williams identified a restricted class of wqosets which are preserved under many transfinite object space constructions [27, 28]. These better quasi-ordered sets (bqosets) include all finite quosets and all of the finite object spaces over bqosets con- sidered below. Laver [21, 22, 23] has extended Nash-Williams' results to transfinite w-level bounded (and even deeper) trees over a bqoset (without the edge-order preservation constraint), and has since refined these argu- ments to the class of embeddings which respect a bqo-labeled linear ordering (of a quite general type [21]) on the edges or branches leaving each vertex. The wqo preservation argument for finite object spaces in §1 is of independent interest for its relative simplicity, its corollary complexity bounds, and its applications to canonical simplification. Specifically, we show in §3 that an equationally generated reducibility relation on a class of A-applicative terms is well founded iff the replacement rules refine a quasi- simplification ordering (qso) on these terms. Well foundedness of qso's on terms over a wqo (sub-)vocabulary follows by the main result of §1. These results hold for the initial "rationally closed" algebra [8] with cyclic terms. They complement previously investigated methods for reasoning about and processing finite (possibly cyclic) objects [ 9, 24, 25, 38], and they facilitate construction and recognition of canonical (rule based) simpli- fies with the Church-Rosser property [ 2, 15, 16, 19]. -4- 1 . Well Quasi -Ordering by Tree Embedding By a space we normally mean a decidable (countably) infinite set of finite objects; such spaces are recursively isomorphic to the natural numbers [35]. Specifically, given a quoset z_ = (z, <) where z c some space, we are interested in the space R consisting of a representative (up to isomorphism) of each finite rooted edge-ordered digraph with vertex labels in z. Tree embedding is most naturally defined below on the set R of labeled trees representing R ; the induced relation on R is easily computed [§2]. R (or a "well formed subspace" [§3]) is a suitable carrier for the initial rationally closed algebra []1,37] or rational object space [8] over I. Conventions . Metavariables over R E :r, s, t, u, v; over z:a, b, c, d. Occasionally we identify c in z with its edgeless singleton digraph (s c R ). > = converse of <; x = x, < > = null sequence. Metavariables over w* : a, 3, y, 6. The following "data type algebra" provides a convenient formal basis for reasoning about R : -5- 1.1 Definitions . (R^ v {j.}, z, 6, A, w) is a two-sorted algebra where (i) 6(u) = out-degree of root vertex of u; (ii) x(u) = label of root vertex of u (in z); and equivalently /...n =/ k -th (immediate) constituent (in R ) of u, if k e <5(u); ). As a technical convenience we also define (iv) p (u) = root vertex of u. While not explicitly concerned with vertices, we find it convenient to assume them to be natural numbers in §2. We extend (iii) inductively to u)*: (V) Vk = (U a\k)' 1f U a £R E ; (vi) D(u) = {a e a)*: U e R v }. a Z D(u) is a finitary tree domain (u e R ): 1 .2 Definition . A tree domain is a set D c go* such that (i) < > e D; and (ii) a.k e D implies a.k , a e D. It is finitary if {k: a.k e D} is finite (a e D). Given u e R , u:D(u) ■*■ z is the labeled tree defined by u(a) = x(u ). R = {u:u e R }. 1.3 Definitions. A vertex p(u ) in u is said to be cyclic if u . = u for some 3 t < >; otherwise acyclic . We may speak of a. B as a cycle in u with root a. The edge represented by a path a.k in u is the k-th directed arc, which connects u and u k. It is said to cyclic if a a. K — * ■ — u . = u for some B and some prefix y of a; otherwise acyclic . (A cyclic edge or vertex may be indexed several times by the prefix set of a cycle a.B in u. ) -6- 1.4 Definitions. Distinct constituents u , u„ (or their roots) are a 3 said to be mutually connected if u ., = u„ and u„ , = u for some ■ a. 3 3 3. a a a',3', in which event a. 3'. a' and 3. a 1 . 3' are cycles with respective roots a, 3. The (unique maximal strongly connected [30]) component of u consists of u (p(u)) and all constituents (vertices) mutually connected with u (with associated edges). SC(u) = .f {3 e D(u): u is mutually connected with u} thus {SC(u ): a e D(u)} partitions D(u) into tree-address sets of distinct components. 1.5 Corollary [30]. Each object in R has a unique root component, from which ewery other component is accessible by a path through some acyclic edge leaving (a member of) that component. 1.6 Definition . Let a a 3 = longest common prefix of a and 3 in w*. A mapping h: D(u) -*• D(v) is said to be a (homeomorphic) embedding of u in v (h: v <^ v) if (i) h(a a 3) = h(a) a h(3)(a.3 e D(u)); (ii) a.j e D(u) and < i < j imply existence of i' < j 1 , 3, y such that h(a.i) = h(a).i'.3 and h(a.j) = h(a).j'.Y; and (iii) x(u o ) < ^(u h(a) )(a e D(u)). We define u < v iff h:u < v for some embedding h. Note that a/. } a j by this definition (unless a < d); b*^ b J\ the more general embeddings are admitted if (i) is replaced by "h(a a 3) is a prefix of h(a) a h(3)". (That R is wqo by these is an immediate consequence of Theorem 1.7.) 7- Now we can state the main result of this section more precisely: 1.7 Theorem , (z, <) wqo implies (R , <) wqo. The following lemma summarizes some well-known properties of wqo sets. These follow from the definitions by sequence-refinement arguments. An infinite sequence g over Q is said to be good if q- < q. for some i < j (otherwise bad [26]), and ubiquitous if is has an infinite < - ordered subsequence (q. < , j. < j. +1 , k e u). j \y q.; K K ' K J k+1 1 .8 Lemma . (a) Q is wqo iff each infinite sequence over Q is good iff each infinite sequence over Q is ubiquitous. (b) (Q n uQ <) is wqo iff Q~ and Q, are wqo. (c) If (Q-, <.: ) is wqo (i = 0, 1) then Q Q x Q-, is wqo with the product ordering: (q Q , q-. ) < (q '> q-i ' ) iff q i ^ q^ (i = 0, 1). (d) If h: Q ■* R is a qoset homomorphism (x < y implies h(x) < R h(y) and h[Q] = R then Q wqo implies R wqo. Now to prove Theorem 1.7 we use the first of several applications of Nash-Williams 1 "minimal bad sequence" argument [260- Suppose E^ wqo and R not wqo. Let u^ be an irreducible < - minimal element which begins some bad sequence over R„. Let u, , , be an irreducible < - minimal M z k+1 object which extends (jUq, ..., u.) to a prefix of some bad sequence over R . Let W be the set of (maximal.) non-root component constituents of elements of the bad sequence y = (u. : k e <*>). -8- 1.9 Lemma . (W, <) is wqo. Proof . If W is finite this is obvious. Otherwise refine some bad sequence over W to obtain a bad t = (t. : k e w) such that (i) t is a (non-root component) constituent of u.; and (ii) t k ? u. (i = 0, ..., u.; k > 0). Then (u Q , ..., u-, t Q , t-, , ...) is bad because u- < t. implies u- < u for some u^ where n > j (contra choice of u ). However, t n < u- contradicts n — n j the choice of u. in u ■ Remarks . Irreducible objects were not essential here; they are required in subsequent "minimal bad sequence" arguments which split components. The lemma suggests that we view each element of u as a root component whose outgoing acyclic edges (if any) terminate in a new wqo vocabulary (W, <). Formally, transform u, into an object u, ' over z u &/ with cyclic root-component labels in I and adjacent acyclic vertices labels in W. Now observe that u.' < u. ' implies u- < u. : indeed, there exists h: u. ■> u. where h embeds SC(u.) in SC(u, ). 1.10 Definitions . The component height CH(u) is defined inductively by CH(u) = Sup{CH(u ): a e D(u) - SC(u)}. a S = {u e R : CH(u) < 1 and all cyclic vertices Li Lt of u are in its root component} S contains all strongly connected u e R (CH(u) = 0), and S wu contains all elements u k ' (Remark). Now observe that the alleged bad sequence is refuted (proving Theorem 1.6) by the following (with z u W for e): -9- 1.11 Theorem . _E wqo implies (S , <) wqo, Indeed, from this result and the ubiquity property of wqosets [Lemma 1.8(a)] we infer the following strengthening of Theorem 1.8 by another "minimal bad sequence" argument: 1.12 Corollary . l_ wqo implies that for each infinite sequence u over R, there is an infinite subsequence (u. : k e w) wherein u. < u. by J k J k J k+1 an embedding h: u. < u. which maps SC(u. ) into SC(u. ) - i.e., h J k J k+1 k+1 "embeds the root component of u . in the root component of u k+1 Proof . Suppose not; let u be a < - minimal sequence for which the alleged embedding subsequence does not exist, and obtain W (wqo by_ Theorem 1.7) as in Lemma 1.9. By the preceding remark applied to an infinite embedding subsequence of (u k ': k e u>) over S u w we obtain the desired subsequence of u." The remaining "minimal bad sequence" argument for Theorem 1.11 involves analysis and synthesis of components. It may be helpful to observe at the outset that embeddings of parts yield embeddings of wholes in S„ but not in R : 2 — I [rs] O 1 2 O o \3 c \J (2) That r < t, s < t, and [rs] } t has no bearing on arguments below because t it S (even though CH(t) = 1). The invariance of < under analysis and synthesis operations on S is clarified by the following representation Li for (5 E , <). -10- 1.13 Definitions . A cell over E is a word cw e e x (z u {t})* where t is a new symbol, t < t, and t is unrelated to elements of z by <. cw is said to be cyclic if w contains ; otherwise acyclic . Let S be the space of finite nonempty cell sets, each either a singleton or a set of cyclic cells. Define u in S for u in S by U(u)}, if 6(u) = 0; «a(u) a(u q ) ... x(u w u \- )>}» if u is acyclic, 6(u) > 0; -> : ken}, otherwise, u = ( 0). Define u < v iff to each cw in u there corresponds c'w 1 in v such that c < c' and if |w| > then for some new* where <_ Tin < • • . < n i w i- f. |W | we have (for all k e |w|) either w. < w' or w. < a where w' = t and a k n k k n k occurs in some member of v. Examples . For u, v in (1) we have u = {aTT, bTdd, cdT} = v. However, S is not a canonical representation for (S , <): (3) Here we have r = {ctt, CTd} - s = {cdT, ctt} > t = {bTd, ctt} - u = {cTd, bid, ctt} - v = {cdT, bdT, ctt}. For example s < t because both cdT and ctt match ctt in t due to the presence of d in bTd. -11- 1.14 Definition . u(u) is said to be reducible if {x} < u - {x} for some cell x in u; otherwise irreducible . Thus u and v in (3) both reduce to t. Reduction preserves -; however, an object can have distinct irreducible reducts: ^-C A*^ & ^ £^> Here we have f ={ cdT, ctt, CTd} - s = {ctt, CTd} - t = {ctt, cd-r}. 1.15 Lemma. (3L, <) represents (S„, <), and embeddings of cyclic pairs imply embeddings of their joins in 5L (S ) : (a) u < v iff u < v; (b) if s and t are cyclic and (s, t) < (s 1 , t') then s u t < s' u t' . Consequently, u - u' for some irreducible u', and t s u 1 implies t g u (t, U E S z ). Proof . If u or v is acyclic, then (a) is obvious. Suppose u, v cyclic with respective root component sets {u : k e m}, {v Q : ken}. a k \ Let {x : k e m} be the cells corresponding to {u : k e m}, and {y. : k c n} K Oti, K the cells corresponding to (v p : k e n) A. -v. ^ Suppose h : u < v. Then v./ \ = v R for some j, establishing a functional embedding M(x., y.) of u into v satisfying Definition 1.13. -12- Conversely, if M : u -»■ v is such an embedding, then we can define h : u ■* v by induction; u = u for some i, whence h<> = 3- where m(x., y.). C-j J i j Given h(a) where u = a. we must have v. / \ = v„ for some j by definition of M, whence the extension of h below a based on Definition 1.13. (Consider the three cases for k z 5(u ), x. = cw, y. = c'w', n as stipulated: (i) w, < w' c Z; (ii) w. = t = w' ; and (iii) w. < a where w' = j and a occurs k n k k n k k n k in some cell of v. Definition of h(a.k) is straightforward in each case - e.g., in (iii) let h(a.k) = h{a).r]..y where X(v Q n ) = a.). K Pj*V Y (b) Given s < s 1 and t < t' where s and t are cyclic, it follows by (a) that s 1 and t' are cyclic and s u t, s'u t' are sets of cyclic cells. Moreover, matchings M : s -> t' and M, : t -»■ t' define a matching M u M. : s t a s t s u t ■> s' u t' . Now suppose {x} < u - {x} where x z u. Then u and x are both cyclic, whence u = {x} u (u - {x}) < u - {x} by (b). Thus u - u - {x}, whence u - u' for some irreducible u 1 . If tu(x) c Q' where x t t then t < u' - u, whence t < u' - u by irreducibility of u'.i We conclude the proof of Theorem 1.11 by supposing Z_ wqo (wpo) and u an irreducible < - minimal bad sequence over S . Let {u,:k z w} = li u 1/ u W where U = {u k : 5(u k ) = 0}; 1/ = {u k : 6(u k ) > a |u k | = 1} W = {u k : |u k | > 1}. By Lemma 1.8 (a,b) it suffices to show that (J, I/, and W are wqo. For U c Z this is immediate. -13- Let V % = {u e 1/ : 6(u) = 1}, IT = {u e (/ : 6(u) > 1}. I/' consists of "monadic loops" c. x ) elementary assignment, indexing, and scalar comparison operations, where = number of edges in u. The algorithm operates by associating with each (strongly connected) component of v a record of all components of u which can be embedded within or below that component. It is similar to bottom-to-top occurrence-finding procedures [14]. Its correctness proof is based in part on the following "strong em- bedding" refinement of < [§1.6, §1.10]. 2.1 Definition . An embedding h : u ■* v is strong if h[SC(u )]_£. SC(v,/ \) (a e D(u)). Remark . In other words, h "embeds components within components." Not all embeddings are strong: (4) Evidently, h does not map SC(s) = {< >, 0, 0.0, 0.0.0, 1,...} into SC(t) = {< >, 1,...}. However, such an embedding can always be constructed from a given embedding. -15- 2.2 Lemma . If u < v, then there exists a strong embedding h': u < v. Proof . Consider u and v as trees of strongly connected components (with cyclic edges), joined by acyclic edges. The argument is by induction on CH(u). Suppose h[SC(u)] i_ SC(v h<> ). Then SC(u) is infinite; there must be some S 1 e Dom (v) such that h[SC(u)] n SC(v ol ) is infinite. Choose p a e SC(u) such that h(a) e SC(v r1 ). There is 3 in SC(u) such that u R = u. It follows that h(a.3.y) e SC(v , ) for all y e SC(u). (h[SC(u)] cannot have infinite intersections with two distinct SC-classes in Dom (v).) Thus, let h' < > = h(a.6), and let h'(y) = h(a.B.y) (y e SC(u)). Now consider a path 6.k in u where 5 e SC(u) and p(u~ . ) is not connected to p(u). Then, h embeds u r . = u n r , in v u/ „ r , \ (where K 6.k a.3.o.k h(a.$.6.k) h(a.3.6.k) may or may not be in SC(v Rl )), and CH(u. . %) < CH(u). It follows by the induction hypothesis that we can extend h' to embed u~ . strongly in v, , „ o i \ for each such path 6.k.i h(a.3-<5.k) Given the sufficiency of a strong embedding test and the apparent effectiveness of (S„, <) as a representation for (S , <), it is only natural to consider an algorithm for (u < v) in R„ which reduces this problem to simpler decisions (s < t) in S yl . The algorithm for R y is based on the foil ow- ing "component representation," essentially an extension of ~ from S to R y 2.3 Definitions . Given u in R , u is defined by induction on CH(u) {X(u)>, 6(u) = 0; { X(u) u ...u s ,y }, 6(u) > and SC(u) = {<>}; u = k k { X(u )e ...e«/ \- : k e n}, otherwise, k a k where u has root component {u ,..., u } and a o a n" k -16- Q ,•> if a. .j represents an acyclic edge of u [§1.3]; a k .j k t, otherwise. The components of u consist of u (of height CH(u)) and components of cell elements e. = u . (of height CH(u .)). J a k .j b a k* J Convention . In the following procedure, u is represented as a list U of components represented in order of increasing component height (i.e., u corresponds to the last entry). The k-th component U(k) is a set of one or more cells , each consisting of a label and a list of indices j < k; the index k in U(k) represents a cyclic edge (t) in u. Similarly for v and V. 2.4 Procedure . SE(u, v) where u, v e R , returns (u < v). # 1 Let U, V be the indexed arrays representing u, v. 2 Variable A : [|U| x |V| -> Bool], initially A(i,j) = False; [A(i,j) = True iff U(i) appears (has been embedded) in V(j)] 3 For k «- U to | V |" : 4 [ [Propogate appearances from proper components of V(k)]: 5 For cV in V(k): 6 For j *■ to |w' |~: 7 A(*. k) ♦■ A(*. k) v a(*. wj); [Scan U for new components embeddable in V(k)]: 8 For j +- to |U|": 9 [rf_ ~A(j,k), then [test for new component embedding] 10 If Embeds (A, U, j, V, k) then 11 [A(j, k) +■ True; 12 If j = |U|" then Return (True) [u embedded]; ] -17- 13 Return (False) [u never embedded in v] end SE Remarks 1. The propogation step #6 requires 0(|U|) elementary operations for each j, k, value, or 0( x ) operations in all. 2. The embedding test #10 essentially implements < on S„, treating acyclic edges leaving U(j) as pointers into "leaves" previously embedded (as recorded by A) in V(k) and its components. 3. Step 1 requires 0( + ) operations, given adjacency struc- tures for u and v [36]. -18- 2.5 Procedure . Embeds (A, U, j, V, k): Boolean; £ 1 Variable match: Boolean «- False; 2 For cw in_ U(j) : 3 [ For c'w' in_ V(k) until match: 4 If c < c* then 5 If | w | > then 6 If |w| < |w'| then 7 [ For p, q «- while [p < |w| a [p = |w|~vq < |w'|"]: 8 [q <- q until [L(w , w') v q = |w' |]; 9 If q < |w'| then [p +■ p ; q «- q + ]]; 10 If p = | w [ then match «- True] 11 else match «- False 12 else match <- True 13 else match «- False [end inner loop]; 14 If ~ match then Return (False)]; 15 Return (True) [each cell matched]; 16 L(m,n) = [ lf_ m = j then [cyclic edge in U(j)] 17 [n = k] [cyclic edge in V(k) required] 18 else [ m < j, acyclic edge] 19 A(m,n) [even if n = k, after propogation]] end Embeds Remarks 1. |U(j)|<_ number of cyclic edges in j-tji component +1 ; similarly for |V(k)|. -19- 2. Matching algorithm #4.. 13 [#7..9] requires 0(|w| + |w'|) < ( | w | x | w ' | ) elementary operations to find n of Definition 1.13 (successive q values at #9 if p = |w| (complete match) at end). 3. Embeds (A, U, j, V, k) is executed <_ once for each j e |u|, k e | V|, requiring 0(e. x e,) elementary operations each execution (e. = E(|w| for cw in u(j)), e, = Z(|w'| for c'w' in V(k)). 2.6 Theorem . SE(u, v) = True iff u < v, andSE(u, v) returns True or False within 0( x ) elementary operations. Indication of proof . The time bounds are established by the remarks following Procedures 2.4 and 2.5. Correctness is established by induction on CH(u), in effect replacing U by some prefix U 1 and showing that SE(u', v) correctly records in A each embedding of each component of U 1 in a component of V. For the induction base, verify that if U' represents a singleton object c in I then Procedure 2.4 (with j restricted to <_ |U| ) sets A(0, k) = True iff c < c' where c 1 occurs as a cell label in some com- ponent of V(k). Embeds detects these appearances even if V(k) = {c 1 } of com- ponent height 0. The bottom-to-top order of components in U is critical for this argument: in Steps #8.. 12 several nested components of U may be embedded in a single component V(k); the lower level embeddings must be recorded in A(*, k) so as to satisfy the L-predicate of Embeds when the higher level embeddings are tested.! -20- 3. Canonical Simplification This section applies the wqo preservation result of §1 to the design of canonical (rewrite-rule based) simplifiers for a typed X- applicative syntax A . Z consists of a vocabulary £ = C u V (constants and variables) and an assignment * = A + t* of types in a lower semi- lattice (t*, £ v , i) with minimal element i the null ("undefined") type. An abstraction Xx.u in A„ is interpreted (details omitted) as a function defined only on objects of type T y(x) (a subdomain of the interpretation's structure). Type free X-calculus results from inclusion of universal -typed variables, and a first-order language results from allowing only individual sorted variables in ^. C„ is assumed to include the logical constant "=", and an equation ((=u)v) in A is abbreviated [u = v]. A canonical simplifier for A uses a finite set E of equations as term rewriting or replacement rules in addition to the appropriate X- conversion schemas. Often it is necessary to decide whether the reducibil- ity relation _> f is well founded in the sense that u >_ f v for some >_ p-minimal term v(u e A„). v is said to be > p-minimal if v >_ F v' implies v 1 ^ F v. (In practice the set of > F -minimal terms should be decidable.) The principal result of this section is that >ris well founded iff ^ F £ > for some quasi -simplification ordering (qso) ^. These orderings formalize both syntactical and semantical concepts of complexity. The smallest qso on A is the tree-embedding relation , u(Yu) (u £ Aj.) where X„ is the set of instances of the A-conversion laws for a given com- pactly typed algebraic signature £. In the A-appl icative syntax A^ the term Y is (5) where the bound variables are of "universal" type t . A natural computer representation for (5) would use four types of tagged pointers (atom, -22- bound variable, application, abstraction): A t A f B r* iJ^r f 4 dV - OC I $ — . ' $ T $ T In the rationally closed syntax A" [ 8 ] the factorial function is represented by (YAf.An:Nat.=>[n = 0] 1 [n x fn~])~: Nat (6) where => is the "conditional" operation constant. Note the distinction between bound variable references ($) and the recursive invocation (*) of "factorial." In A" each occurrence of (YAf.u) in A is replaced by u 1 wherein each occurrence of f within u is replaced by a cyclic reference to the root of u'. We mention (but do not formalize) A~ because it provides a natural basis for reasoning about recursive program schemas [5], where distinct "unfoldings" of a recursive schema have isomorphic infinite tree representations. The concepts and results stated below for A are easily extended to A" on the basis of §1. -23- 3.1 Definitions . A compactly typed signature (cts) is a system 2 = (Z, T v> % x ) wnere T v :A x ^ t t( t z :1 "*" T * in Particular) and (x*, c^,, i ) is a lower semilattice with minimal element j. , staisfying (a) - (d): (a) If fi^S = i then f^S' = l for some finite S' S(Sc T * ). (b) If V U k } Vz (v k } (k = °' ]) then T z ( Vi } -z T z ( Vi } - (c) If x z (x)EjT z (y) and t z (u)c^t z (v) (x, y e l^; u,v e A z where y not free in v) then t (Xx.u) c t (Xy.[y/x]v). (d) Z = C u 1/ (drawn from two disjoint classes of "atomic" symbols), and iy.Tyiy) = x 7 (x)} is infinite for each variable x in 1/ . A is the subspace of R,„ r *-, + x [§1] defined inductively by Z (Zu{<*, $) ut*) J (e) E£ A (identifying elements of Z with their singleton graphs); (f) if u, v e A- then A y contains (u v): A U V (g) if x e U , u e A , and 1/ contains variables of functional type [x (x) ■> t (u)], then A contains Xx.u: T,.(X) u » where X(Xx.u) = t„(x) and u' is the result of replacing each edge terminating at (a vertex labeled by) x by a unary vertex labeled with $ and outgoing edge terminating at p(Xx.u). Note that X(Xx.u) = label (t (x)) of p(Xx.u) as defined in §1; the two uses of X are unrelated and contextually unambiguous. -24- Notation . We employ the usual abbreviations: (t t, ...t ) for (...(^t^.-.tn), Xx Q x 1 ...x n .t for Xx Q . (Xx-, . . .x n .t) . Remarks 1. The definition itself does not specify how t is extended from E to A . Normally, t£ consists of sorts (subtypes of i„, the class of individuals), functional types [a ■*■ $] and subtypes of these, and generic types (such as the universal type t ) which properly include func- tional types. Condition (g) limits X-terms to types for which there are variables, resulting in a more or less standard first-order applicative syntax with no abstractions when only individual sorted variables are in V . 2. The lattice meet operation n-obtained from Sy should be effective; otherwise, a first-order (typed) unification algorithm cannot be defined [4 ]. 3. We may (but need not below) assume that the greatest lower boundO z S is defined for all Sslt* Without condition (a) we would not have a first-order language: let S = {a, :k e w} and consider the equations E~ = {[x, = x, + ,]:k e w} where T y(x.) = a. . Iff) S = -k, then E^ cannot be satisfied by any I_-structure . If fl S 1 f i. for ewery finite S' E S then the corresponding subsystems E^, are satisfiable, contrary to the first- order compactness principle. First-order adequacy is all that can be required of an effective calculus, even if a standard second-order model [34] provides the intuitively natural denotational semantics [12]. 4. Something should be said about types of combinations ((Xx.u)t) where t (t) £ ^ (x). One could simply forbid such combinations when T _(t) n t (x) = i , as in finite simple type theory. More satisfactory (in general) is the convention that x y ((Xx.u)t) is a type of objects which includes T „(u), for interpretations wherein t has type t (x), and the type of i ("undefined"), -25- for interpretations wherein t does not. From a cts I we obtain a quasi-ordering < on A„. 3.2 Definition . Let < be the identity relation onZu{$, <*}, extended to t* by a_ ^ u[6t/a] for each [s = t] in 5, in E y , a in D(u) such that u = es. u - r v iff u > r v > j- u. a — t — t — E Remarks 1. Of particular relevance is the X-conversion relation > where \ consists of all instances in A of the 3-conversion and (type-free) n-conversion schemas: -26- 6 E : [(Xx.u)t = [t/x]u] (x E (t)c x E (x)); [(Xx.u)t = i] (T z (t)n z T z (x) = i ); r\ : [Ax.(ux) = u] (x not free in u, t (x) = i ). The case wherein (t)n t^(x) fi andi y (t) <£ t „(x) is analogous to situations which require "run-time type checking" in strongly typed programming languages with (polymorphic) type inclusions [29, 33]. In this event (Xx.u t) should perhaps reduce to a conditional expression - e.g., (=>(x_(x)t)([t/x]u)±) where x y (x) is used as a monadic predicate, =>t xy = x, and =>± xy = y. 2. Recall that 2l F is said to be well founded iff u > E v where v >. r v ' implies v > f v' (u e A ). Well foundedness (rather than finite termination) is appropriate when E contains such equations as [x+y = y+x]. 3.7 Definition . A quasi-simpl ification ordering (qso) on A„ is a quasi - ordering < such that (i) u a < u(a e D(u)); (ii) if u < u' and v < v' then (uv) < (u'v 1 ); (iii) if t„(x)c x„(y) and u < v where y does not occur in v, then Ax.u < Ay.[y/x] v, and (iv) if u < v then 0u < 9v (0 e E ). < is simplification ordering if, in addition, it totally orders the set of closed (ground, variable-free) terms in A (A~). Similar orderings have been investigated previously for standard first-order term languages [ 2» 6, 31]. The condition (iii) preserves semantic inclusions of functions denoted by X-terms. Normally, these orderings are required to be well founded (at least modulo z.) --often a tedious -27- property to verify [16]. As we shall see, this additional assumption is usually superfluous [§§3.9, 3.10]. 3.8 Lemma . ^„is the smallest qso on A . Proof . Suppose < a qso. We argue by induction on terms that i_ c < and <^is a qso. Clearly < satisfies (i), (ii), (iv); if u < v by these rules and the fact that<, is a quasi-ordering, then u < v. Suppose t (x)c x (y) Li Li {_, L-, and u = >n > . > is generated by the set Z -E E ~E of instances [9s = et] of equations in E such that 0s > 6t. Define u (v v iff u >. v ^- u. 3.10 Theorem . 3 Suppose (E - X) finite and [s = t] e E implies that each free variable of t occurs in s. Then >r is well founded iff there exists a qso -28- < such that >r = ^p. Proof . The theorem holds for both A„ and A" ; in the latter case we assume \ v contains the reductions [YXx.u = (YXx.u)"]. Note that if u >. v then each label appearing in v also appears in u. Also, the set of all equations [(Xx.s)t = [t/x ]s] in X whose left-hand side occurs in u is finite. Thus the set of instances [9s = 0t] of equations in E where 9s occurs in u is finite, and by the assumption on variables of t no new variables are introduced. Now suppose >£ = >t where < is a qso. It suffices to show that (T ,r is well founded. Let <> be the reflexive and transitive closure of (v = cv.i Remarks 1. The condition on variables in equations of E is typically satisfied. Here, as in [15], it can be removed under certain circumstances. Also, if C is finite and ( T * , c„, i ) is a wqo set then the finiteness con- ditions on E - L can be removed. 2. Given a qso <, equations in E may be reordered so as to obtain an equivalent system E 1 such that t ^ s for each equation [s = t] in E' . Effective methods have been developed for transforming E into an equivalent system E 1 such that dvi is confluent (on ground terms u. ): u = E u, implies u, > E , v (k = 0, 1) for some v. These methods are useful in mechanized reasoning with equality [ 3, 19, 20], -29- 4. Conclusions It has been shown that finite ordered digraphs over a wqo vertex label alphabet are wqo by the homeomorphic embedding relation on their infinite tree representations. This relation can be computed in quadratic time; it is the smallest of a useful class of (quasi) simplification orderings used by term replacement systems. These results provide a practical basis for comparing structural complexity of program schemas and other naturally cyclic finite objects. Theorem 3.10 might be extended in several directions. For example, when is the set of > F -minimal terms decidable, and when is wdecidable on this set? What can be said about effectiveness of a qso < for which >_ F = > F ? Acknowledgement . A connection between simplification orderings and finite tree embedding was brought to the author's attention by D. Plaisted and N. Dershowitz. The latter contributed numerous helpful discussions and criticisms culminating in the argument of §1. -30- NOTES Personal communication. These conventions are based on a two-valued truth-value lattice wherein represents "undefined", "null type", and "false", and represents both "universal type" and "true". This lattice is related to the more usual three and four-valued truth-value domains [34] in [4], 3 This argument generalizes the argument for finite acyclic objects due to N. Dershowitz [ 7 ]. Note that if > P = >v and there exists no infinite descending chain (u.:k e w) where u. > u. + , (k e w) then >, has the finite termination property (ftp). Conversely, if > f satisfies ftp, then > E =^n where < is a qso which admits no infinite descending chain. •31 1. BShm, C. (ed.), A-Cal cuius and Computer Science Theory, Lecture Notes in Computer Science 37 (edited by G. Goos and J. Hartmanis); Springer-Verlag, New York, 1975. 2. Brown, T. C, A Structured Design Method for Specialized Proof Procedures (Ph.D. Thesis), California Institute of Technology, 1974 (Uni- versity Microfilms, Anne Arbor, Michigan). 3. Brown, T. "Equational reasoning with derived retractions." Research Note, University of Illinois at Urbana-Champaign, April 1979. 4. Brown, T., "Reduction systems with compactly typed operator signatures," in preparation. 5. Burstall , R. M., and Darlington, J. "A transformation system for developing recursive programs," JACM 24 (1977), 44-67 6. Dershowitz, N., and Manna, Zohar, Proving Termination with Multiset Orderings, Standford AI Lab., Memo AIM-310 (March 1978). 7. Dershowitz, N., A note on simplification orderings. UIUC , March 1979. 8. Ehrich, H. D. "Outline of an algebraic theory of structured objects. Automata , Languages, and Programming (edited by S. Michelson and R. Milner, Edinburgh University Press (1976), 508-530. 9. Ehrig, H., H. J. Kreowiski, A. Maggiolo-Schettini , B. K. Rosen, and J. Winkowski, "Deriving structures from structures," Math. Found. Comp. Sci . 1978 (J. Winkowski, ed.), Lecture Notes in Comp. Sci. 64 , Springer-Verlag. 10. Garey, M. R., and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness . W. H. Freeman and Company, San Francisco, 1979. 11. Goguen, J. A., J. W. Thatcher, E. G. Wagner, and J. B. Wright, "Initial algebra semantics and continuous algebras," JACM 24 (1977), 68-95. -32- 12. Harel ,D. Arithmetical completeness in logics of programs, in 5th Automata , Languages, and Programming Symposium, Springer-Verlag, 1978. 13. Hindley, R. , B. Lercher, and J. P. Seldin, Introduction to Combinatory Logic (Cambridge: University Press, 1972). 14. Hoffman, C. M. , and M. J. O'Donnell, "An interpreter generator using tree pattern matching," 6th Annual ACM Symposium on Principles of Programming Languages (San Antonio, Texas, 1979), pp. 169-179. 15. Huet, G., "Confluent reductions: abstract properties and applications to term rewriting systems," Proc. 18th Annual IEEE Symposium on Foundations of Comp. Sci . 16. Knuth, D., and P. Bendix, "Simple word problems in universal algebras," Computational Problems in Abstract Algebra , J. Leech, Ed., Pergammon Press, 1970. 17. Kruskal, J. , "Well-quasi-ordering, the tree theorem, and Vazsonyi's conjecture." Trans. Amer. Math Soc. 95 (1960), 210-225. 18. Kruskal, J., "The theory of well-quasi-ordering: A frequently discovered concept," J. Combinatorial Theory Ser. A 13 (1972), 297-305. 19. Lankford, D. , "Canonical inference," Report ATP-32, Math Dept., University of Texas, Austin, 1975. 20. Lankford, D., and Ballentyne, A. M. Decision Procedures for Simple Equa- tional Theories with Commutative Axioms, 21. Laver, R. , "On Fraisse's order type conjecture," Ann, of Math. 93 (1971), 89-111. 22. Laver, R. , "An order type decomposition theorem," Ann, of Math. 98 (1973), 96-119. 23. Laver, R. , "Better quasi-orderings and a class of trees," Studies in Foundations and Combinatories Advances in Mathematics Supplementary Studies, Vol. 1 (reprint). 24. Levy, M. R. , Data Types with Sharing and Circularity (Ph.D. Thesis), Technical Report CS-78-26, Dept. Computer Science, University of Waterloo, Waterloo, Ontario, Canada, 1978. -33- 25. Levy, M. R. , and T. S. E. Maibaum, "Continuous data types," Research Report CS-79-08, Dept. Computer Science, University of Waterloo, Waterloo, Ontario, Canada, 1979. 26. Nash-Williams, C, St. J. A., "On Well-quasi-ordering finite trees," Proc . Cambridge Philos. Soc. 59 (1963), pp. 833-835. 27. Nash-Williams, C. St. J. A., "On well-quasi-ordering infinite trees," Proc . Camb. Phil. Soc. 61 (1965), 697-720. 28. Nash-Williams, C. St. J. A., "On better quasi-ordering transfinite sequences," Proc. Camb. Phil. Soc. 64 (1968), 273-290. 29. O'Donnell ,M.J. , "A programming language theorem which is independent of Peano Arithmetic," Proc . 11 th Annual ACM Symposium on Theory of Computing , 1979 30. Ore, Oystein, Theory of Graphs , American Mathematical Society Colloquium Publi- cations, Vol. 38 (1962). 31. Plaisted, D. , Well-founded Orderings for Proving Termination of Systems of Rewrite Rules, Report UIUCDCS-R-78-932, Dept. Comp. Science, University of Illinois at Urbana-Champaign (July 1978). 32. Rado, R., Partial well -orderings of sets of vectors, Mathematika 1 (1954), 89-95. 33. Reynolds, J. C. "Towards a theory of type structure," Colloquium on Programming , Paris, 1974. 34. Scott, D., "Data types as lattices," SIAM J. Comput. 5 (1976), 522-585. 35. Shonfield, J., Degrees of Unsolvability . North Holland, Amsterdam, 1971. 36. Tarjan, R. E., "Depth first search and linear graph algorithms," SIAM J. Comput . 1 (1972), 146-160. 37. Wagner, E. G., J. W. Thatcher, J. B. Wright, "Programming languages as mathematical objects," Mathematical Foundations of Computer Science 1978 (J. Winkowski, Ed.), Lect. Notes in Comput. Science 64 , pp. 84-101. 38. Wegbreit, Ben, "Proving properties of complex data structures," JACM 23 (1976) pp. 389-396. BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R-79-981 2. 3. Recipient's Accession No. 4. Title and Subtitle Canonical Simplification of Finite Objects Well Quasi -Ordered by Tree Embedding 5. Report Date Auqust 1979 6. 7. Authors) Thomas C. Brown, Jr. 8* Performing Organization Rept. No. 9. Performing Organization Name and Address Dept. of Computer Science University of Illinois Urbana, Illinois 61801 10. Project/Task/Work Unit No. 11. Contract/Grant No. NSF-MCS-77-22830 12. Sponsoring Organization Name and Address National Science Foundation Washington, D.C. 13. Type of Report & Period Covered 14. 15. Supplementary Notes 16. Abstracts A finite object space can be viewed as the set of finite and infinite (edge-) ordered trees representable as finite ordered vertex-labeled digraphs. We show that the order-preserving homeomorphic tree-embedding relation on finite objects over a well quasi -ordered (wqo) label alphabet is wqo. Canonical simplifiers require well-founded object-orderings to orient and ensure finite termination of replacement rules. We characterize these orderings as homomorphic refinements of the wqo tree-embedding relation, and we show this relation's time complexity to be 0(mxn) where m and n are the edge counts of the digraph representations. 17. Key Words and Document Analysis. 17a. Descriptors Well quasi-ordering, simplification ordering, rational (regular) trees, canonical simplification, homeomorphic tree-embedding. 17b. Identif lers /Open-Ended Terms 17e. COSATI Field/Group 18. Availability Statement 19. Security Class (This Report) UNCLASSIFIED 21. No. of Pages 20. Security Class (This Page UNCLASSIFIED 22. Price FORM NTIS-3S (10-70) USCOMM-DC 40329-P7t JUH 12 FEB 2