Bflfijs E2ifbflfiy BWBBw mm* n tw LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 I£6r no. 782-787 cop. 2 The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN . 9REC1 L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/dynamicstorageal785nair s/c/.sy like UIUCDCS-R-76-785 ,c . 7SS W V///V///A — Figure 2.U After Request 3 a+b a+b+c ^ y////////// A Figure 2.5 After Request k It is clear that in spite of the fact that enough space exists in memory to accommodate 'D', we are prevented from doing so because the space is not available as a contiguous piece. Extending this concept to a large memory and to a large number of requests for allocation and deallocation, a similar situation may often be reached. All the used spaces in core may be so distributed in memory that not even one of the intervening free spaces can accommodate the next request. And yet, the total amount of noncontiguous free space may be many times the requested size. The degree of fragmentation is dependent on i) the size of the memory ii) the size, distribution and lifetime of the requests, and iii) the allocation policy adopted. Thus, for the example shown in Figures 2.1 through 2.5, we could have satisfied the request 5 by i) increasing LtoL'>a+b+c+d ii) breaking D into two parts of sizes d T < b and d" < L - (a+b+c) where d = d' + d" iii) allocating 'C' at (L-c) instead of at (a+b). While physical limitations prevent constant use of solution (i), unpredictability prevents the "trial-and-error" approach of solution (iii). olution (ii) may sometimes be possible if a link is maintained between the two parts of the whole. This turns out to be inconvenient when 'D 1 is a load module written in position-dependent code. The possibility of memory compaction is ruled out because of system limitations to dynamic relocation, as mentioned earlier. 7 Another suggestion (Clifton [2]) claims that it may be possible to postpone allocation of a critical request until some deallocations reduce the checkerboarding to the required extent. This solution tends to be impractical in a monoprogramming environment with a number of serial job steps, where allocation implies the immediate need of the space for computational or I/O buffering purposes. 2.2 Notation The following mathematical quantities to be used for notational convenience, are being defined here. The status of a memory M after the k request has been satisfied can be defined by M^ = [L,N f ,N u ,F,U], where L = the total size of the memory in some convenient units (like bytes for the PDP-11) N = the number of free blocks N = the number of used blocks (two contiguous used blocks will not count as 1 block) F = configuration of free spaces U = configuration of used spaces. F is an N,,- tuple and U is an N -tuple : P= {F 1 ,F 2 ,...,F^} U = [U^U,,,...,!^ ) u where F. = (A. ,i.) and U. = (A.,i.), 'A' representing the low address of ill 3 3 3 the block and X V representing its length. If F. = (k.,H.) then £(F. ) will indicate the length vector of F. 1111 1 and A(F. ) will indicate the low address vector of F. . X(U.) and A(U.) 1 -^ o J have similar significance. The situations in Figures 2.1 through 2.5 can then be represented symbolically as : M = [ L, 1, 0, { 0, L) , { \} ] , X denoting empty M 1 = [L,l,l,{(a,L-a)},{(0,a)}] Mg = [L,l,2,{(a+b,L-a-b)},{(0,a), (a,b)}] M^ = [L,l,3,{(a+b+c,L-a-b-c)},{(0,a), (a,b),(a+b,c)}] M^ = [L,2,2,{(a,b), (a+b+c,L-a-b-c)},{(0,a),(a+b,c)}] For any system, the following relations are always true: i) N u (i) - N f (i) > - 1 ii) |N u (i+l) - N u (i)| = 1 iii) |N f (i+l) - N f (i)| < 2 2.3 Figures of Merit The following performance criteria were selected for investigation in each of the algorithms simulated. 1. Average time per allocation (t). This is important because the cost of using a certain algorithm is proportional to the time taken for satisfying the requests, which should be minimized. In certain cases, in the absence of any limit on the time, blocks may be moved physically in core to achieve compaction. The technique utilized to instrument t involved the average number of com]/ is (between the required size and the size of a free block under observation) made by the system before finding a suitable block of core for an allocation request. 2. The number of requests (i) satisfied before fragmentation shuts out an allocation request: A particular simulation run is stopped as soon as this situation arises. If N«(l) indicates the value of N_ for the memory configuration M_ then I is the smallest positive integer such that the request size IL. , satisfies the relation N f Z Z(F .) > R_ , > i(F. ), i = 1,2,...,N„(I) 0=1 J Evidently, the nature of the distribution of the request sizes will be one of the factors influencing the value of I. 3. The least size (S) of the biggest area of free core given by S = min ( max £(F.(l))) all i l d, then the request 5 could be fulfilled and the area f D' could be allocated. On reducing the request size b to below that of d the fragmentation causes the system to run out of core. However, such cases occur rarely, and in most practical situations with large numbers of allocations and deallocations, an increase in the size of a request or a number of requests worsens the situation. t The random number generator was obtained by the technique of combining random number generators to obtain a "more random" sequence. Chi -square statistics from run tests indicated that a 96% randomness can be easily obtained. The outline of the design of the generator, attributed to Marsaglia and MacLaren [8], was obtained from Knuth [5]. 12 The above figures of merit have been found to be quite useful for comparing algorithms. While each figure changes with the set of requests used, a good idea of the relative merit of various algorithms may be obtained by comparing the values obtained for a well-chosen fixed set of requests. 13 3. SYSTEM SIMUIATION AND MONITORING This chapter will outline certain simulation techniques which have been found to be suitable for the study of storage allocation algorithms. Techniques convenient for monitoring the system behavior will also be described. 3.1 Simulation Model For the purposes of this simulation, the system is simply that portion of the PASCAL run-time package which receives commands for allocation and deallocation of space in core, along with the memory space and its contents. The relationship that exists between the system and its environment can be represented as shown in Figure 3.1. The blocks within the dotted lines represent the system to be simulated while the rest of the blocks indicate the environment. A simpler picture would be as shown in Figure 3.2. This evidently is a simplistic model, which becomes more intricate on the introduction of a feedback loop, whereby the system environment can take remedial action in case of inability of the system to fulfill a request. This is typically the case in a demand paging system, where the remedial action involves the swapping out of certain chosen pages from core. The current study is limited to situations where no distinction is made between the name space of a job- step and the physical space associated with it. 11+ ALLOCATION PROGRAM BEING EXECUTED CORE REQUEST/ DEALLOCATION ROUTINE TO SEARCH FOR CONVENIENT FREE BLOCK ROUTINE TO SEARCH FOR THE BLOCK TO BE DEALLOCATED (OPTIONAL) ROUTINE TO LOAD AT ^PROPER POINT FROM SEC. STORAGE ERROR ROUTINE ROUTINE TO UNLOAD INFORMATION BACK TO SEC. STORAGE (OPTIONAL) Figure 3.1 System Model REQUEST FOR ALLOCATION OF RELEASE OF CORE SPACE CORE MEMORY ALLOCATION POLICIES SYSTEM RETURNING COMMAND * INDICATING ACTION TAKEN ON REQUEST SYSTEM ENVIRONMENT (includes operating system and user programs ) Figure 3.2 Simplified System Model 15 3.2 Simulation Data It is generally the practice of researchers to try their algorithms out using certain standard request distributions like uniform, erlang, exponential, hyperexponential or combinations therefrom. In these cases, the pseudo- random number generator must be made with great care, particularly in simulation programs working under stringent space restrictions. However, standard distributions do not represent, in most cases, the actual distribution of request sizes or of the lifetime of the requests. (The lifetime of a request is defined as the time interval between the allocation of a request and the subsequent deallocation of the corresponding block. ) Also, an algorithm suitable for a system under certain conditions may not be suitable for a different system or under different conditions. It may hence prove to be useful to study the actual request set generated by the system. Thereafter, the set of requests used as data for the simulation study may be chosen from among the "typical" sets, as the author did, or may be designed as a pseudo-random set conforming to the distributions of the system-generated set. The deterministic set could then be perturbed to note the effect on the system of changes in the sizes of procedures, arrays, etc. More details may be obtained from the paper (Nair [9]). 3.3 System Monitoring The PASCAL run-time package is capable of generating a system log which monitors all requests for allocation (GET) and deallocation (FREE). A section of a sample log is shown below: 16 03-GCT-75*15 03 06 < 886) 2 PASCAL : PTWO FREE 49728 560 03-0CT-75*15 03 06 < 888) 2 PASCAL : PTWO FREE 48614 1110 03-GCT-75*15 03 06 ( 890 ) 2 PASCAL: PTWO FREE 55188 30 03-GCT-75*15 03 06 ( 892) 2 PASCAL : PTWO FREE 48538 72 03-0CT-75* 1 5 03 06 ( 894) *y PASCAL : PTWO FREE 48494 40 03-GCT-75*15 03 06 ( 896) 2 PASCAL: PTWO FREE 48450 40 03-GCT-75*15 03 06 < 898 ) 2 PASCAL: PTWO FREE 48406 40 03-0CT-75*15 03 06 ( 900 ) *? PASCAL: PTWO FREE 25504 560 03-0CT-75*15 03 06 < 902 ) 2 PASCAL: PTWO FREE 24940 560 03-GCT-75*15 03 06 ( 904 ) 2 PASCAL : PTWO EOP 03-GCT-75*15 03 06 ( 905) 2 PASCAL : PTWO FREE 26068 16922 03-0CT-75*15 03 06 ( 907) 2 PASCAL: PTWO FREE 50856 66 03-0CT-75*15 03 06 < 909) 2 PASCAL : PASS34 GET 38440 4550 03-0CT-75*15 03 06 < 911) •y PASCAL: PASS34 GET 49812 1110 03-GCT-75*15 03 06 ( 913) 2 PASCAL : PTHREE GET 49742 66 03-0CT-75*15 03 06 ( 915) 'y PASCAL : PTHREE GET 49706 32 03-0CT-75*15 03 06 < 917) 2 PASCAL : PTHREE GET 49190 512 03-0CT-75* 1 5 03 06 < 919) 2 PASCAL : PTHREE GET 48674 512 03-0CT-75*15 03 07 ( 921) •7/ PASCAL : PTHREE GET 48638 32 03-0CT-75*15 03 07 ( 923) •~? PASCAL: PTHREE FREE 48674 512 The following useful information can be extracted from the log: i) the histogram indicating the frequency of allocation requests of different sizes ii) the distribution of the lifetime of requests iii) the maximum number of active core spaces of a particular size that co-exist in memory. This is useful in the implementation of boundary algorithms to be described in Chapter 5. It must be noted here that the initial configuration of the system is explicitly one in which there is no free core and that the system frees all the available core at the outset. It is, therefore, possible to determine most of the system characteristics in a time independent manner and with no initial assumptions. ;bably useful to mention here that over 80f of all jobs PCh and education.-, I < n\rLronments on the PDP-11 here are 17 compilation jobs. It has also been noticed, as mentioned earlier, that these PASCAL compilation jobs are among the most critical where core requirements are concerned. Figures 3.3 and 3.^ show the frequency histogram and the lifetime distribution respectively for a typical set of requests produced by the compiler. Figures 3.5 and 3.6 are the corresponding graphs for a noncompiler job. It is interesting to examine the characteristics of the distributions : 1. The sizes 512 and 32 merit special attention. In the particular PASCAL system under study, the file buffer size is 1000 octal bytes or 512 decimal bytes. It is clear that input, output and intermediate file buffers will be a part of every program and hence it is not surprising that there is a concentration of requests at the 512 bytes mark. Similarly, the association of a header of 32 bytes with every file buffer explains the unusual concentration in the region of 32 bytes. These figures (512 and 32) are characteristic of the implementing system; yet it is reasonable to expect such a set of figures with every system. 2. Other than the requests of the type described above, the rest of the chart in Figures 3.3 and 3-5 show a large number of requests of small size (< 150 bytes), some requests of very large size (> 10,000 bytes) and only a few requests of intermediate size. The actual numbers clearly depend on the program generating the requests. We may conclude intuitively that the request sizes which cause peaking in the frequency histogram, along with the very large Figure 3.3a 18 o OJ o o CD 4- o CO"? J— ^ en LU OJ Li_'-' CD CC LU CD o CD a CD O-- FREQUENCY HISTOGRAM FOR VARIOUS REQUEST SIZES ( COMPILATION JOB ) ^.0 OL^ £L + Jl 1.0 2.0 3.0 4.0 REQUEST SIZE IN BYTES (X10 2 ) 5.0 6.0 o o OJ o CD + o coin h- o~> UJ ID C3 OC OC UJ CD 'o • en o CO o en 19 Figure 3. 3b FREQUENCY HISTOGRAM FOR VARIOUS REQUEST SIZES ( COMPILATION UOB ) ^.0 pLfci ■— P- + J~I~I r=L_|_ n-i. J" 1.0 2.0 3.0 4.0 REQUEST SIZE IN BYTES (X10 3 ) 5.0 6.0 20 o cu o »-< CO o CD O CO"? I— "* CO LU ZD a LU ^o CU Li-- 1 O QC LU CD o en CD o 4- co ^1.0 Figure 3.U LIFETIME HISTOGRAM ( COMPILATION JOB ) w~l 1 >v Jii + -O. + £L_ r 10.0 20.0 30.0 40.0 REQUEST LIFETIME IN SECS 50.0 60.0 21 o CD O I- 10 CO UJ Z> o LU CE o CC LU CD -co o Tj.o Figure 3.5 FREQUENCY HISTOGRAM FOR VRRIQUS REQUEST SIZES ( NON-CQMPILflTIQN UQB ) H + + + n XL 1.0 2.0 3.0 U.O REQUEST SIZE IN BYTES (X10 2 ) 5.0 6.0 22 o- o Figure 3.6 LIFETIME HISTOGRAM ( N0N-COMPILRTION JOB ) o CO CO UJ ID o LU az o O cc LU OD o o o -- ^.0 =1 — c + 3.0 6.0 9.0 12.0 REQUEST LIFETIME IN SECS 15.0 —I 18.1 23 sized requests will probably govern the performance of various algorithms. 3. The lifetime distribution indicates a function decaying in approximately an exponential manner. The rate of decay appears definitely greater than unity. To determine the behavior of the individual sizes, the lifetime distribution was determined for sizes 512 and 32 and are shown in Figures 3.7 and 3.8 respectively for the typical compilation job. A similar behavior is noticed again. The physical implication is that most allocated spaces in core are short-lived and this is particularly true for the smaller sized requests. The smaller sizes correspond to the array allocations and dynamically allocated PASCAL "records, " while the large requests correspond to large program segments like PASCAL external procedures. It seems reasonable again to infer that the lifetime distributions for other PASCAL- like systems will be similar. 3.^- Stopping Rules and Validation One common notion is that a simulation run should continue until the allocation algorithm being tested is unable to satisfy a request (e.g., Knuth [h] and Shore [10]). This technique hence presupposes the existence of a perennial source of data, like that obtained from some pseudo-random generator. One drawback to this technique is that the sequence of requests generated may not be practical. For instance, there always exists a probability, however small, that a large sized request, say 25K, will appear even though another similar sized block has not yet been 2k Figure 3.7 cu o + o OP co£ CO LU zd LU oc o Ll_CD O ac LU CD O tO-- LIFETIME HISTOGRAM EOR REQUESTS 0E SIZE 512 I COMPILATION JOB ) HI '-..- 10.0 REQUEST 1 30.0 I. IFETIME IL h 40. IN SECS 50.0 60.0 o \-^ en LU a LU DC o Q_CD CC LU CD z: is o en U)-- 25 Figure 3.8 ^ LIFETIME HISTOGRAM FOR REQUESTS OF SIZE 32 b'1 ( COMPILATION JOB ) ZL, ,__□ 1 , — no 1 p ^.0 10.0 20.0 30.0 LJO.O 50.0 60.0 REQUEST LIFETIME IN SECS 26 deallocated in a hOK memory, when a random number generator is used. (The overlay structure may prevent this from happening in the actual system. ) It is not impossible to take all such special factors into account while designing the pseudo-random number generator — it is merely inconvenient. A more pragmatic approach would involve tailoring a typical set of requests actually provided by the system so as to achieve a certain degree of generality. This is also suitable particularly when the performance of an algorithm is required to be known only in comparison to another, rather than on an absolute scale. The various steps in the author's simulation, which utilized the principles just mentioned, can be summarized as follows : 1. A random program was compiled and the set of requests obtained was arbitrarily termed the "standard" set S. 2. This set was studied to note the existence of peaks in the frequency histogram like at sizes 512 and 32. 3. The values for the various performance criteria of Chapter 2 were determined using the set S, and the existing first-fit algorithm. (Sizes showing peaks were not perturbed while performing perturbability tests. ) h. Step 3 was repeated for other algorithms. 5. Results obtained were then compared with those for the existing algorithm. An interesting advantage of this technique is that i) the use of the system generated request log automatically ensures the validity of the at ion model, and 27 ii ) the check with the existing system in step 3 above ensures the validity of the simulation program. (Validation of the algorithm coding itself is however a strenuous task and involves checking by manual simulation. ) 28 k. STANDARD ALGORITHMS AND VARIATIONS This chapter describes some of the commonly used algorithms and suggests modifications to improve their performance. k.l Best-Fit Algorithm Let M. = [L,N f ,N u ,F,U] , F = (P 1 ,F 2 ,...,F K } , and u = {\J lt u 2 ,...,v } , u where the symbols have the significance indicated in Chapter 2. Let a be the size of core requested. In the best-fit algorithm an F„ is to be found, 1 < x < N„, such that x — — i i) i(F v ) > a, and ii) i(F„) < i(F. ) V i such that £(F. ) > a x 1 1 A search for the required block would usually take us through all elements of the set F. (in some cases an exact match with the required size may be found, in which case the search may be terminated. ) A PASCAL procedure for simulating this would be: 29 CONST TYPE SOMELARGENUMBER-32767; " FOR PDP-11 NODE=RECORD ADDRESS: INTEGER LENGTH: INTEGER INFO: INTEGER LPTR: ~NODE END; VAR ACTIVE, FREE: *NODE; "LOW ADDRESS OF BLOCK" "LENGTH OF BLOCK" "IDENTIFYING INFORMATION" "LINKED LIST POINTER" "LIST HEADS FOR CIRCULAR LISTS" PROCEDURE FINDBESTBLOCK< A: INTEGER; VAR P:~NODE); -THIS PROCEDURE RETURNS A POINTER P TO THE SMALLEST FREE BLOCK WITH" "LENGTH GREATER THAN OR EQUAL TO A" VAR BEGIN PTR, FOLLOWPTR: MINDIFF. DIFF: •^NODE) INTEGER; INTEGER; "FOLLOWPTR ALWAYS FOLLOWS PTR" "MIN. DIFFERENCE IN SIZE FROM A" "SIZE OF BLOCK POINTED AT BY" "PTR- A" END; MINDIFF: =SOMELARGENUMBER; P:=NIL; PTR: =( FOLLOWPTR: =FREE)~. LPTR; WHILE =0)MDIFF<=MINDIFF) THEN BEGIN MINDIFF: =DIFF; P: =PTR; END; "IF DIFF>=0. ..." PTR: =< FOLLOWPTR: =tPTR)"\ LPTR} END; "WHILE " "FINDBESTBLOCK" A reduction in the search time could be achieved if the chain is linked according to increasing length of the free blocks, £(F. ), instead of in address order (increasing A(F. ) values). The average number of comparisons would then be approximately N-/2 instead of N , where N is the number of elements in the free chain. However, another NV2 comparison will have to be made, on an average, to i) return a used block to the free list, and to ii) reposition a free block formed out of fragmentation by an allocation request. 30 Use of a modified structure may lead to an average of the order of % p N for the number of comparisons, but this is only at the expense of inefficiency in coalescing of adjacent inactive blocks. When N is not very large, and this is often the case, such a structure may not prove to be worthwhile. Ease of coalescing is a great advantage and hence the ordering by A(F. ) is preferred over the ordering by i(F. ). k.2 First-Fit Algorithm Given memory configuration M. and request size a as before, this algorithm finds F , 1 < x < N such that i) i(F ) > a, and ii) i(F.) < a V j < x Thus the search stops as soon as a block bigger than the requested size is found. No attempt is made to minimize the difference between £(F ) and a as in the best-fit algorithm. The PASCAL implementation for this algorithm would be as shown below. (The global declarations are not repeated here. ) PROCEDURE FINDFIRSTBLOCK=A' VAR PTR, FOLLOWPTR: 'NODE; BEGIN P: =NIL; PTR: =( FOLLOWPTR: =FREE ) \ LPTR; WHILE a, and ii) i(F. ) < a V i such that c. < i < x, if x > c, or y l l — —I i(F. ) < a V i such that (l < i < x or c. < i < N f ) if x < c. 2. it sets c. ,, *■ c. if £(F ) > a, and 1+1 1 x 7 c. , *- c.+l if i(F ) = a & c. 4 N_, or 1+1 i x 1 ' f c. . «- 1 if i(F ) = a & c. = N- 1+1 x 1 f to Evidently if c. = N = 1, the memory has no free spaces left and hence c. , is not defined. But this situation almost never occurs i+l in practice. For a deallocation request, c. keeps pointing to the same free block. Hence if U is the block to be deallocated, then x c. . - y such that A(F ) < A(F ) < A(F ) + l(p ) i+l J y - z ± y y This takes into account the possibility that coalescing may occur on either side of the freed block. A PASCAL implementation of this algorithm is shown below. VAR CYCLEPTR: ~NODE; "GLOBALLY DECLARED POINTER' PROCEDURE FINDCYCLEBLGCMA: INTEGER; VAR P:~NODE); VAR PTR. FOLLOWPTR: ^NODE; EEGIN P: =NIL; PTR: =FOLLOWPTR: =CYCLEPTR; WHILE (FOLLOWPTR^. LPTR\=CYCLEPTR) MPTR"\ LENGTH; "P POINTS TO THE BLOCK ASSOCIATED WITH " "INFORMATION I" COALESCEIFPOSSIBLE=P-\ ADDRESS )& < CYCLEPTR \ ADDRESS a, and ii) i(F. ) < a, Vi such that r. < i < x, if x > r or i(F. ) < a, Vi such that (1 < i < x or r. < i < N f ) if x < t ± k2 2. It sets r. n *- r. if ^(F ) > a, and 1+1 1 x r ±+1 - r ± + 1 if i(F x ) = a & r . ^ N f , or r i+1 - 1 if i(F x ) = a & r ± = N f . Thus this portion is similar to the corresponding portion for the cycle algorithm. During deallocation, however, the pointer is always reset to the point where the deallocation took place. Again, if U is the block of active core which has just been deallocated, r. . - y such that A(F__) < A(U ) < A(F ) + i(F ) i+-L y *- y y This algorithm would be written in PASCAL as : VAR ROVERPTR: "NODE; PROCEDURE FINDRGVERBLOCMA: INTEGER; VAR P: NODE); "THIS PROCEDURE IS THE SAME AS FOR F I NDC YCLEBLOCK EXCEPT FOR THE" "USE OF ROVERPTR INSTEAD OF CYCLEPTR" END, "FINDROVERBLGCK" PROCEDURE RO VERDEALLOCATE (1:1 NTEGER ) ; VAR P: 'NODE, BEGIN FIND( I , P) J COALESCEIFPOSSIBLE(P); "P HAS THE SAME SIGNIFICANCE" "MENTIONED BEFORE" ROVERPTR =P; "NOTE THE DIFFERENCE HERE" I NSERT I NTOFREEL I ST ( P ) ; END) "RGVERDEALLOCATE" 1*3 4.6 Comparison of Performances (Cycle with Rover) In the case of the rover algorithm, the pointer "roves" back and forth always pointing to the latest point of allocation or deallocation. In the cycle algorithm, the pointer keeps moving forward and around the circularly linked list of free nodes in a "cyclic" manner. The degree of freedom allowed hence is greater for the pointer in the rover algorithm than in the cycle algorithm. Farther, an allocation request is often encountered, whose size is the same as the size of the block just freed. These two points help in reducing the degree of fragmentation for the rover algorithm in comparison to the cycle algorithm. Results from simulation are presented in Table 4.2. Simulation with several data samples show that the set of input data samples that were successfully allocated by the cycle algorithm forms a subset of the set for the new rover algorithm. Table 4.2 Comparison of Cycle Algorithm with Rover Algorithm Cycle Rover Least size of biggest free (S) - - Total core at that point (C ) - - Mean size of biggest (W) 16978 18832 Critically tolerant size (T) 45700 43600 Perturbability (P) afo 0% Average time per alloc, (t) 1.051 1.084 kh An increase in the average time per allocation (about 3%) is noticed for the rover algorithm in comparison to the cycle algorithm. This difference can be explained by the fact that "until some significantly large request for allocation comes along, the "cycle pointer" keeps pointing to the front of an unventured area of memory. Thus for a greater number of requests than in the rover algorithm, the required block will be obtained after the very first comparison. The perturb ability is seen to be zero for both algorithms. This means that the algorithms were unable to satisfy the standard set of requests obtained from the PASCAL compiler. Both S and C are hence meaningless when a request set is not satisfied by an algorithm. Thus there are situations, like the PASCAL system under investigation, where neither of the modifications (cycle or rover) is suitable because of the particular nature of the requests. An example will now be cited to aid in the analysis of this failure. Consider the memory configuration M. - [L,N_,N ,F,U] with 1 f u F = { (a+b,c), (a+b+c+d, L-a-b-c-d)) and U = { (0, a), (a,b), (a+b+c,d)}, where L - (a+b+c+d) > a + c This is pictured in Figure k.7. m m w, a a+b atb+c a+b+c+d v//A x iBed free Figure k.7 Configuration After Request i ^5 Now suppose the requests following are: Request i+1: Deallocate - from location Request i+2 : Allocate - size f e', e > a, e > c Request i+3: Allocate - size 'f, f < a Request i+k: Allocate - size 'g', g < c Request i+5: Allocate - size 'h', L - (a+b+c+d+e) > h > L - (a+b+c+d+e+f+g) Irrespective of the initial position of the roving pointer or the cyclic pointer, it is clear that this set of requests will not be satisfied by the rover or cycle algorithms but will be satisfied by both the best-fit as well as the first- fit algorithms. It appears then that these modified algorithms are not suitable when a large request for allocation follows a number of small requests, with no deallocations from a region behind the pointer and towards the front end of memory. The simulation study lends credence to this surmise. A comparison of the four algorithms with different memory sizes is provided in Figures k.Q through 4.10. k.7 Combination of Best-Fit and Rover Algorithms It has just been mentioned that failure in the cycle and rover modifications to the first- fit algorithm occur usually when a large request for allocation follows a number of small requests (with certain restrictions on the region of deallocation). The failure in the actual simulation occurred for a request size of 21K bytes, which is seen to be comparable to the total memory size, L, of k2K bytes. 46 Figure k.Q COMPARISON OF ALGORITHMS 2.5 5.0 7.5 10.0 12.5 ADDITION TO EXISTING TOTRL MEMORY SIZE (X10 3 ) 15.0 hi Figure k.9 COMPARISON OF ALGORITHMS q FIRST-FIT © BEST-FIT A CYCLE <*> ROVER ^.0 2.5 5.0 7.5 10.0 12.5 15.0 ADDITION TO EXISTING TOTAL MEMORY SIZE(X10 3 ) hb o OD O r- Figure U.10 COMPARISON OF ALGORITHMS m FIRST-FIT © BEST-FIT A CYCLE <+, ROVER ^.0 2.5 5.0 7.5 10.0 12.5 15.0 ADDITION TO EXISTING TOTAL MEMORY SIZE (X10 3 ) h9 This observation has led the author to discover an allocation algorithm, the principle behind which is expected to have tremendous use not only to PASCAL- like systems but to dynamic allocation algorithms in general. The algorithm uses a technique which switches dynamically between algorithms to obtain better performance. When the largest area of free memory reduces to a level comparable with that of the bigger allocation requests, it is necessary to make more efficient utilization of the free core. Failure in the cycle and rover algorithms was brought about by the inability of the algorithms to detect such situations. At the same time, the best-fit algorithm essentially detects such a situation on every occasion. (This over-meticulousness of the best-fit algorithm results in the production of a conglomeration of tiny inactive pieces which increases the degree of fragmentation. ) It then seems appropriate to use the rover algorithm at all times except when the value of the total free memory, F , (or almost equivalently, the size of the biggest free block) exceeds a certain cutoff value, C, at which time better utilization of the available space can be made using the best-fit algorithm. The algorithm will thus consist of the following basic steps: Al: Check if the next request is one for allocation or for deallocation. In the latter case go to A2, in the former case go to A3. A2 : Perform the deallocation by the standard technique, coalescing adjacent free blocks if possible, then reset the "roverptr" to the point of deallocation, and update the value of total free core available. Go to Al. 50 A3: If the total free memory space (F ) is less than the cutoff value (C) employ the Rover algorithm to look for the required space, else employ the best-fit algorithm. In either case, got to AU if search is successful and A5, if not. Ak: Reset the "roverptr" appropriately to the point of allocation, update the total available memory space (F ) and go back to Al for the next request. A5: Indicate that no core is available, and terminate. (in multiprogramming, continue with the other jobs, if there are any. ) A possible PASCAL implementation for this algorithm is shown below : PROCEDURE FINDBLOCK( A: INTEGER; VAR P:~N0DE>; PROCEDURE FINDBESTBLOCK< A: INTEGER; VAR Pr^NODE); VAR PTR, FOLLOWPTR: "NODE; MINDIFF, DIFF: INTEGER; BEGIN MINDIFF: =SOMELARGENUMBER; P. =NIL; PTR: =< FOLLOWPTR: =PTR)^. LPTR; WHILE (PTR\=FREE)&(MINDIFF\=0) DO BEGIN DIFF:=PTR~ LENGTH-A; IF =0)&=0. . . " PTR: =( FOLLOWPTR: =PTR)"\ LPTR; END; "WHILE" END; "FINDBESTBLGCK" PROCEDURE FINDROVERBLOCK; VAR PTR. FOLLOWPTR: ~NODE; BEGIN P =NIL; PTR: =FOLLOWPTR =ROVERPTR; WHILE (FOLLOWPTR^. LPTR\=ROVERPTR ) &( PTR~ LENGTHCUTOFF VALUE THEN F I NDBESTBLOCK < A. P ) ELSE F I NDROVERBLOCK ( A, P ) ; IF . ADDRESS; IF P*\ LENGTH\=A THEN BEGIN P~. ADDRESS: =P~. ADDRESS+A; P-\ LENGTH: =P~. LENGTH-A; END "IF P-\ LENGTH\=A" ELSE REMOVE < P. FREE); "REMOVE FROM FREE LIST" MOVE < Q, USED ) ; "INSERT Q AT THE PROPER PLACE " "IN THE USED CHAIN" TOT ALAVA I L ABLESPACE : =TOT AL AVA I LABLESPACE-A; END; "ELSE BEGIN" END* "ALLOCATE" PROCEDURE DEALLOCATE (INFO: INTEGER); VAR P, Q '"NODE; BEGIN FINDCINFO. Q); "Q IS THE NODE WITH THE REQUIRED INFO" TOTALAVA I L ABLESP ACE: =TOTALAVAILABLESPACE+Q /N . LENGTH; COALESCEIFPOSSIBLE(P); "P RETURNS A POINTER TO THE "LOW ADDRESS OF THE FREED BLOCK" "AFTER ANY POSSIBLE COALESCING " ROVERPTR: =P; I NSERT I NTOFREEL I ST < P ) ; END* -DEALLOCATE" 52 It seems reasonable to expect some of the characteristics of each of the algorithms when two algorithms are combined in the indicated manner. The degree of dominance of one algorithm over the other is determined by the cutoff value. Thus the sets of requests satisfied by this algorithm should include some, if not all, of those sets satisfied by one (best-fit, in this case) but not by the other (rover). Further, the average time taken per allocation is expected to be intermediate to the time taken by the fast rover algorithm and that taken by the slow best- fit algorithm. The above conjectures were verified by actual simulation of the algorithm. Various cutoff values were used, a value of corresponding to the absolute best-fit algorithm and a value of U2100 corresponding to the rover algorithm. The results are summarized in Table k.3 and graphically presented in Figures U.ll through k. 1^. It is noticed that while the combination manages to salvage the good points of both algorithms, it inherits some of the bad traits too. Thus a deterioration is noticed in the perturbability (about 2$% for a cutoff of 25000) from the best-fit case. This probably is offset by the fact that the time taken per allocation has reduced in the corresponding case by about 60%. (The actual time taken will be a little more because of the extra overhead involved in maintaining pointers and checking the total amount of free core remaining. ) 53 o -p O si -p •H CD o o ch Sh 0) Ph o C3 o •H -P <3 •H in od > on -=i* CD H o O * -H/ o o CO H 1 1 VO O 3 no -4- H o 8 O o * CO H 1 1 & ON oo -d- i-i o 8 o Bn 3 VO o o IA H H CO H OJ 0O H -=T OJ* o CO OJ O ^ OJ o o no o J- OJ o CO OJ m rH LTN LT\ H CO o • OJ ■ H -5 OJ o o CO o VO 3! 8 Jt 81 o CO po rH VO o rH CO o • CO H -H- oo o CO CO o ^ oo o O CO o -Hr vo o CO ON m r-\ H ir\ H CO o • rH H -H- -=f- O -H- OJ o ^5. LTN o co o o -4" o o CO o 00 rH CO o H ON o rH H -5 -Hf' o VO OJ 8 ^ CO o CM 3 LTN VO LTN VO OJ rH oo t- OJ ON o • rH -d- LTN o VO CO o ^ c- o o ® 3 o LTN CO H LTN H m OJ ON ON • r-\ 0O VO o VO VO 8 "^ OJ o $ rH ON OJ LT\ J" ITN rH OJ OJ OJ ON ON H 00 VO* o VO 3 O O H OJ OJ OJ OJ CT\ ON H OO VO* s o -d d a •p •H o K -P •H - oo CO o • rH -3- oo O vo OJ o ^9. VO o VO LTN o t>- rH 1 o H oo o H LP* OO ON ON • H oo -* o OJ VO o -fcft H o CO CO o F- O LP VO oo o H H OJ OO o\ ON • rH oo o -eft J" H $ VO CO o CO O o VO oo LTN H H OO ON CO • H oo -=t .h through 5.7. An immediate revelation is that, with a threshold of 0, the search time is very small. This can be explained by the fact that, at least in the initial stages, the required size is found at the very first attempt and the allocation is performed always on the far side of the free space. A side effect to this is the continual erosion of the first free space, which in the initial stages happens to be the biggest also. After a stage there is no contiguous core left to allocate even medium sized requests, let alone large sized requests. This results in the poor figures 6h Vs ^ zd Figure ^>.l Memory Configuration on Arrival of a Large- sized Request ^ 21 Figure 5.2 Memory Configuration after Satisfying Request by First-fit Algorithm //// m $, m i w%. Figure 5.3 Memory Configuration after Satisfying Request by New Threshold Algorithm 65 -3 o OJ vO 8 ^ H o CO CO CO O H VO oo LA H rH SI CO ON rH CO OO J"' o OJ vo O ^ rH Q CO OO O 00 O O vo oo LT\ rH H O on ON CO « -4" H oo -=f o OJ VO O ^5. H 8 CO CO O CO O vo OO LTN H rH ir\ oo ON OO • OO H OO -4" o OJ VO O ^. H 8 00 CO o CO O VO oo LTN rH H o oo ON CO • oo rH OO -* o OJ VO 8 ^. H o CO CO LS- O o VO oo LTN rH H IA on ON CO • OJ H OO -=f 8 OJ VO O ^5- -4 CO J3F o Ls- VO o VO & LTN rH O o m CO • OJ look for an inactive 512-byte sized preallocated space. If none are available go to the general area and look for a big enough space by the first-fit technique, ii) For all other requests use the first- fit technique in the general area, iii) For deallocation of a block of size 512, coalesce with neighboring blocks, as in standard deallocation, only if the block is not a preallocated one. iv) For deallocation of a block of size 512 in the preallocated area, just switch the tag in the 2 -dimensional array to "inactive. " This algorithm was simulated for various numbers of reserved blocks of size 512 and most of the surmises above were found true. The results are tabulated in Table 5-2 and depicted in the graphs of Figures 5.8 through 5.12. Figures 5.8 through 5.H show a steady deterioration in the perturbability characteristics as the number of preallocated spaces is increased from 2 to 12. The reason for this, as mentioned before, is the low utilization of the reserved space. Also, the average time taken per allocation is seen to decrease gradually as the number increases and at i reserved spaces, the time taken per allocation and the least size of biggest block are both down by 33$. Table 5.2 Variation of Performance with Number of Preallocated. Spaces of Size 512 73 # of preallocated spaces of size 512 2 U 6 8 10 12 S 3682 3508 3^62 21+76 lWt 588 - W 19386 1U988 13352 11528 10728 10152 - T 38500 38600 38700 39700 1+0700 U1700 I428OO P 18$ IB* ui 17$ 12$ k% 0$ t 1+.101 3.968 3.381 2.768 2.508 2.393 2.005 Ik Figure 5.8 FIRST-FIT ALGORITHM WITH PREALLOCflTION OF SPACES FOR REQUESTS OF SIZE 512 2 4 6 8 10 NUMBER OF PRERLLOCflTED SPACES (EACH 512 BYTES) 12 75 CD- S' Figure 5-9 FIRST-FIT ALGORITHM WITH PREALL0CATION OF SPACES FOR REQUESTS OF SIZE 512 _lo CLcd cc en O * O- + -+ + + 2 U 6 8 10 NUMBER OF PREflLLCCflTED SPRCES (EACH 512 BYTES) 12 76 3*' Figure 5 • 10 FIRST-FIT ALGORITHM WITH PREALLOCATION OF SPACES FOR REQUESTS OF SIZE 512 4 6 8 10 NUMBER OF PREALLQCflTED SPACES (EACH 512 BYTES) 77 o o ■s> o ;t Figure 5.11 FIRST-FIT ALGORITHM WITH PREALLOCATION OF SPACES FOR REQUESTS OF SIZE 512 m CO UJ o cr QCo ljjLO Q LU S o o ■+ ■+ + + H 2 4 6 8 10 12 NUMBER OF PRERLLOCRTED SPACES (EACH 512 BYTES) 78 UJ CO cod UJOJ o o Li_o CO CO LU figure 5-12 COMPARISON OF ALGORITHMS o FIRST-FIT ALGORITHM FIRST-FIT WITH 2 PREALLQCATED SPACES FOR REQUESTS OF SIZE 512 ^J.O 4.0 8.0 12.0 16.0 20.0 PERCENTAGE DEGREE OF PERTURBATION 24. 79 More interesting, however, are the plots on figure 5.12. The potentiality of the algorithm is clearly seen. With 2 reserved spaces, the algorithm seemingly starts inferior to the first-fit algorithm but picks up and surpasses the first-fit algorithm as the degree of perturbation increases and as the core situation gets tighter. 5.1+ Preallocation of 32-byte Sized Spaces The small degree of success obtained by use of the algorithm just described leads to the notion that it is quite worthwhile to treat very frequently occurring sizes as special cases. A glance again at Figure 3.3 shows that the size 32 is a good candidate for experimentation. Monitoring of the size 32 indicated that the maximum number of allocated blocks of that size at any given time is 10. It does not seem necessary to determine a time weighted average in this case because the total size of preallocated core is only 320 bytes. The first-fit algorithm was then simulated after reserving 10 spaces for 32-byte sized requests and a varying number of spaces for 512-byte sized requests. The results are tabulated in Table 5.3- Comparing these figures with those shown in Table 5.2 it is seen that the degree of fragmentation reduces almost imperceptibly by preallocating spaces for size 32. The reduction in time taken per allocation is, however, immediately obvious. To prove that preallocation of a large number of spaces of small size does not lead to excessive fragmentation (unlike the preallocation of large- sized sizes), the simulation was repeated for 30 reserved spaces of 80 Table 5.3 Performance for 10 Reserved Spaces of Size 32 # of preallocated spaces of size 512 i 2 k 6 8 10 12 S 3W 327^ 327^ 22>42 1210 588 - W 15188 1070^ 9136 7258 6678 6320 - T 38700 38900 38900 39900 U0900 ^2000 1+3000 P l6f 16$ 16% 16% 10$ H QP/o t 3.^53 2.706 2.193 1.680 1.1+38 1.337 I.096 Table 5.4 Performance for 30 Reserved Spaces of Size 32 81 # of preallocated spaces of size 512 2 4 6 8 10 S 2758 2554 255^ 1522 588 - W 15700 11424 9598 8094 7524 - T 39^00 39600 39600 40600 1+1700 42700 P ltyt M M 12$ % Of t 3.362 2.664 2.149 1.655 1.422 1.227 82 size 32. (It may be remembered that 10 is the maximum number of blocks of size 32 required by most programs. ) The negligible change in the time taken per allocation is because most (probably all) of the 32 -byte sized requests are satisfied from the preallocated area with only 10 spaces. The extra 6^0 bytes in this area are never used. It can be seen from the table and from graphs in Figures 5-13 through 5.16 that in spite of the tripling of the reserved space, the increase in the degree of fragmentation is not too significant. 83 Figure 5.13 FIRST-FIT ALGORITHM WITH PREALL0CATION OF SPACES FOR REQUESTS OF SIZES 512 * 32 a U 6 8 tO 12 NUMBER OF PREflLLQCRTED SPRCES (EACH 512 BYTES) 8U CD-p 3* LO-- 3* zr en O Sc M" i — i CO accu _i CD Figure 5.lU FIRST-FIT ALGORITHM HITH PREALLOCflTION OF SPACES FOR REQUESTS OF SIZES 512 $ 32 -I H 4 SPACES FOR SIZE 32 10 SPACES FOR SIZE 32 30 SPACES FOR SIZE 32 4 -» 2 4 6 8 10 NUMBER OF PRERLLQCRTED SPRCES fERCH 512 BYTES) 12 85 o 3T o * Figure 5.15 FIRST-FIT ALGORITHM WITH PREflLLOCATION OF SPACES FOR REQUESTS OF SIZES 512 &, 32 UJ Q_ CO o a SPACES FOR © 10 SPACES FOR A 30 SPRCES FOR 2 H 6 8 10 NUMBER OF PREflLLQCRTED SPACES (EACH 512 BYTES) 86 o o T o LO Figure 5.l6 FIRST-FIT ALGORITHM WITH PREALL0CATION OF SPACES FOR REQUESTS OF SIZES 512 * 32 ^0 n SPACES FOR SIZE 10 SPACES FOR SIZE 32 A 30 SPACES FOR SIZE 32 2 U 6 8 10 NUMBER OF PREflLLGCflTED SPACES (EACH 512 BYTES) 87 6. CONCLUSION It is surprising that, in spite of the fact that the first-fit algorithm appears at first glance to be "just another algorithm, " an algorithm without any apparent structure or organization, it is one of the most robust of all allocation algorithms. Other algorithms which perform nearly as good or a little better than the first-fit algorithm are i) combination of best-fit and rover algorithms ii) modified first- fit using preallocated spaces for request sizes 512 and 32. Many other algorithms described in this thesis cut down tremendously on the time taken per allocation, but only at the expense of increased fragmentation of memory. If memory size is not a limitation, the cycle or the rover algorithm may be employed, the latter having an edge over the former. For any system, algorithms may be found, which are modifications of the commonly used algorithms, but which perform better than the common algorithms. It is emphasized though, that the specific modifications are dependent on the system or the type of system. Based on the results presented in this thesis, it is suggested that, in spite of the fact that a few better algorithms exist, the existing first-fit algorithm be retained for the PASCAL system. It appears that the request sets not satisfied by the first-fit algorithms, but satisfied by some of the other mentioned algorithms, are so few that they do not 88 warrant a change in the allocation strategy. Further, the first-fit possesses the advantage of unbelievable simplicity (which incidentally leads to smaller core requirements for the allocation routine). Only half the importance of this thesis is due to the results just mentioned. The other half should go to the tools utilized in arriving at these results—specifically, the simulation techniques, the monitoring techniques and the performance criteria. Most of the performance criteria were found to be useful indications of the suitability of an algorithm. Outstanding among those used to measure the degree of fragmentation were i) the least size of the biggest area of free core, ii) the mean size of the biggest free block, and iii) the critically tolerant size. It is suggested that the new algorithms mentioned in this thesis be tried out in other nonPASCAL-like environments. 89 LIST OF REFERENCES [1] Brinch Hansen, P., O perating System Principles, Prentice-Hall, Englewood Cliffs, New Jersey, 1973. [2] Clifton, M. H. , "Randell's Method and Dynamic Storage Allocation Systems," M.S. Thesis, UIUCDCS-R-75-73 1 *, Department of Computer Science, University of Illinois, June, 1975. [3] Fishman, G. S. , Concepts and Methods in Discrete Event Digital Simulation , J. Wiley & Sons, New York, 1973. [k] Knuth, D. E., The Art of Computer Programming, Vol. 1, "Fundamental Algorithms, " Addison-Wesley, Reading, Massachusetts, I969. [5] Knuth, D. E., The Art of Computer Programming , Vol. 2, "Seminumerical Algorithms, " Addison-Wesley, Reading, Massachusetts, 1969. [6] Krishnaswamy, J., "Extensions to the Language PASCAL," Department of Computer Science Report, University of Illinois, to appear. [7] Madnick, S. E. and Donovan, J. J., Operating Systems , McGraw-Hill, New York, I97I+. [8] Marsaglia, G. and MacLaren, M. D., "Uniform Random Number Generators," J. ACM, Vol. 12, No. 1, January, 1965, pp. 83-89. [9] Nair, R., "Simulation Guidelines for the Study of Dynamic Storage Allocation Algorithms, " 9th Hawaii International Conference on System Sciences, January, 1976. [10] Shore, J. E. , "On the External Storage Fragmentation Produced by First-Fit and Best-Fit Allocation Strategies, " CACM , Vol. 18, No. 8, August, 1975, P- ^33. [11] Stocks, I., "PASCAL- 11: An Implementation of PASCAL in a Minicomputer Environment," Ph.D. Thesis, Department of Computer Science, University of Illinois, to appear May, 1976. L5CRAPHIC DATA 1. Report No. UIUCDCS-R-76-785 r ind Subtitle DYNAMIC STORAGE ALLOCATION - SIMULATION TECHNIQUES AND EFFICIENT ALGORITHMS 3. Recipient's Accession No. 5. Report D»te February, 1976 6. Ravlndra Kumar Nair 8. Performing Organization Rept. No. forming Organization Name and Address Department of Computer Science University of Illinois Urbana, Illinois 10. Project/Taslc/Work Unit No. 11. Contract /Grant No. NSF DCR 72 -037^0 A01 nsoring Organization Name and Address National Science Foundation Washington, DC 13. Type of Report & Period Covered 14. 'plementary Notes ■tracts To investigate the performance of the PDP-11 PASCAL system to various storage :>cation algorithms, a simulation program was written in PASCAL. Statistics :. eating the performance of the system with the existing first-fit algorithm, ■ gathered and compared with corresponding figures obtained from other common iirithms like best- fit algorithm and Knuth's modification to the first-fit ;>rithm. Results indicate that the existing system performs as good as, if not ' *'ords and Document Analysis. 17a. Descriptors core allocation, simulation, PASCAL- 11 entifiers/Open-Ended Terms 0SAT1 Field/Group ^lability Statement 19. Security Class (This Report) SIEJ WNr . l . ASSIFI E D cunty Class ( I hi 20. Security Class (ims Page UNCLASSIFIED 21. No. of Pages 22. Price 'TlS-35 ( 10-70) USCOMM-DC 40329-P71 V & .'•• , B* UNIVERSITY OF ILLINOIS-URBANA 510.84 IL6R no C002 no 782 787(1976 Design o! Irredundant MOS Networks a p 3 0112 088402588 ■ 'h ■■■1 m MB SBCn UP 1