Tải bản đầy đủ (.pdf) (5 trang)

THE FRACTAL STRUCTURE OF DATA REFERENCE- P13 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (130.04 KB, 5 trang )

46
the average residency time. The adjusted hit ratios are obtained by applying the
proportionality relationship expressed by (1.19), with θ = .25; for example,
the miss ratio for the
ERP application is projected as .36 x (262/197)
25
= .34.
Based upon the projected residency times and hit ratios, we may then compute
the cache storage requirements, as already discussed for the previous table.
In this way, we obtain an objective of 2256 megabytes for the cache size of
the target system. If these requirements could be met exactly, then we would
project an aggregate hit ratio, for the three applications, of 72 percent.
THE FRACTAL STRUCTURE OF DATA REFERENCE
As a final step, we must also consider how to round off the computed
cache memory requirement of 2256 megabytes. Since this requirement is very
close to 2 gigabytes, we might choose, in this case, to round down rather
than up. Alternatively, it would also be reasonable to round up to, say, 3
gigabytes, on the grounds that the additional cache can be used for growth
in the workload. To account for the rounding off of cache memory, we can
apply the proportionality expressed by (1.23). Thus, after rounding down to 2
gigabytes, we would expect the aggregate miss ratio of the target system to be
.28 x (2048/2256)
25/.75
= .29, identical to the current aggregate miss ratio
of the three applications.
If the available performance reporting tools are sufficiently complete, it is
possible to refine the methods presented in the preceding example. In the
figures of the example, the stage size was assumed to be equal to .04 megabytes
(a reasonable approximation for most
OS/390 workloads). The capability to
support direct measurements of this quantity has recently been incorporated


into some storage controls; if such measurements are supported, they can be
found in the System Measurement Facility (
SMF) record type 74, subtype 5.
Also, in the example, we used the total miss ratio as our measure of the
percentage of
I/O’s that require more cache memory to be allocated. A loophole
exists in this technique, however, due to the capability of most current storage
controls to accept write requests without needing to wait for a stage to occur.
In a storage control of this type, virtually all write requests will typically be
reported as “hits,” even though some of them may require allocation of memory.
For database
I/O, this potential source of error is usually not important, since
write requests tend to be updates of data already in cache. If, however, it is
desired to account for any write “hits” that may nevertheless require allocation
of cache memory, counts of these can also be found in
SMF record type 74,
subtype 5 (they are called write promotions).
Finally, we assumed the guestimate θ = .25. If measurements of the single
-
reference residency time are available, then θ can be quantified more precisely
using (1.16).
Use of Memo y by Multiple Workloads
47
2. ANALYSIS OF THE WORKlNG HYPOTHESIS
It is beyond the scope of the present chapter to analyze rigorously every
potential source oferror in a capacity planning exercise ofthe typejustpresented
in the previous section, nor does a “back of the envelope” approximation
method require this. Instead, we now focus on the following claim, central
to applications of the working hypothesis: that it makes very little difference
in the estimated hit ratio of the cache as a whole, whether the individual

workloads within the cache are modeled with their correct average residency
times, or whether they all are modeled assuming a common average residency
time reflecting the conditions for the cache as a whole.
Obviously, such a statement cannot hold in all cases. Instead, it is a statement
about the realistic impact of typical variations between workloads. As the data
presented in Chapter 1 suggests, the values of the parameter θ, for distinct
workloads within a given environment, often vary over a fairly narrow range.
This gives the proposed hypothesis an important head start, since the hypothesis
would be exactly correct for a cache in which several workloads share the same
value of the parameter θ. In that case, the common value of θ, together with
the fact that all the workloads must share a common single reference residency
time τ would then imply, by (1.12), that the workloads must also the same
average residency time as well.
Consider, now, a cache whose activity can be described by the multiple
workload hierarchical reuse model; that is, the cache provides service to n
individual workloads, i = 1, 2, . . . , n, each of which can be described by the
hierarchical reuse model. The true miss ratio of the cache as a whole is the
weighted average of the individual workload miss ratios, weighted by
I/O rate:
(3.2)
We must now consider the error that results from replacing the correct miss
ratio of each workload by the corresponding estimate m
^
i
, calculated using the
average residency time of the cache as a whole. Using the proportionality
relationship expressed by (1.19), the values m
^
i
can be written as

(3.3)
Thus, the working hypothesis implies an overall miss ratio of
(3.4)
48
THE FRACTAL STRUCTURE OF DATA REFERENCE
To investigate the errors implied by this calculation, we write it in the
alternative form
were we define
(3.5)
This expression for m
^
can be expanded by applying the binomial theorem:
(3.6)
where the “little-o” notation indicates terms higher than second order.
Using (1.16), we define
to be the aggregate value of 8 for the cache as a whole. Note, as a result, that
in addition to the definition already given, ζ
i
also has the equivalent definition
– (3.7)
where we have applied (1.12) and taken advantage of the fact that each workload
must share the same, common value oft.
By applying (1.16)), we may rewrite the first
-
order terms of (3.6) as follows:
(3.8)
But since each miss corresponds to a cache visit, the aggregate residency
time is computed over misses; that is,
(3.9)
Use of Memory by Multiple Workloads

49
so
and (3.8) reduces to
Combining (3.2), (3.6), and (3.10), we now have
(3.1 1)
Thus, m
^
= m except for second
-
order and higher terms.
In a region sufficiently close to θ
1
= θ
2
= = θ
n
= θ (or equivalently,
T
1
= T
2
= . . . = T
n
= T), the second
-
order and higher terms of (3.11) can be
approximated as uniformly zero. The region where these second
-
order terms
have at most a minor impact is that in which |ζ

i
|<<1 for i = 1, 2, . . . , n. This
requirement permits wide variations in the workloads sharing the cache.
For example, suppose that there are two workloads i = 1, 2, with values θ
i
equal to .1 and .3 respectively; and suppose that these workloads share a cache
in which, overall, we have θ = .2. Then the absolute value of ζ
i
is no greater
than .1/.7 = .14 for either workload. As a result, the absolute value of either of
the second order summation terms of (3.11), calculated without the summation
weights r
i
m
i
/ rm, does not exceed .02. But the summation of these terms,
multiplied by the weights r
i
m
i
/rm, is merely a weighted average; so in the
case of the example, the quantity just stated is the largest relative error, in either
direction, that can be made by neglecting the second order terms (i.e. the error
can be no larger than 2 percent of m). Since the second order terms are so
relatively insignificant, we may conclude that the third
-
order and higher terms,
shown as o(ζ
i
2

) must be vanishingly small.
50 THE FRACTAL STRUCTURE OF DATA REFERENCE
This chapter’s working hypothesis has also proved itself in actual empirical
use, without recourse to formal error analysis [22]. Its practical success con
-
firms that the first
-
order approximation just obtained remains accurate within
a wide enough range of conditions to make it an important practical tool.

×