Scheduling resources for continuous media databases5845279Abstract Various systems and methods of scheduling media segments of varying display rate, length and/or periodicity on at least one clustered, vertically-striped or horizontally-striped continuous media database volume. With respect to the at least one horizontally-striped database volume, one method includes the steps of: (1) associating a display value with each of the media segments, (2) sorting the media segments in a non-increasing order of value density to obtain an ordered list thereof and (3) building a scheduling tree of the media segments, the scheduling tree having a structure that increases a total display value of the media segments. Claims What is claimed is: Description TECHNICAL FIELD OF THE INVENTION
TABLE 1
______________________________________
"PACKSEGMENTS"
______________________________________
Input: A collection of CM segments C = {C.sub.1, . . . ,C.sub.N } and a
number
of disks n.sub.disk.
Output: C' .OR right. C and a packing of C' in n.sub.disk unit capacity
bins.
(Goal: Maximize .SIGMA.C.sub.i .epsilon. C' value (C.sub.i).)
1. Sort the segments in C in non-increasing order of value to
obtain a list L =< C.sub.1, . . . ,C.sub.N > where p.sub.i .gtoreq.
p.sub.i+1. Initialize
load (B.sub.j) = value(B.sub.j) = 0, B.sub.j) = .0., for each bin
(i.e., disk) B.sub.j, j=1, . . . ,N.
2. For each segment C.sub.i in L (in that order):
2.1 Let B.sub.j be the first bin (i.e., disk) such that load (B.sub.j) +
size(C.sub.i).ltoreq. 1.
2.2 Set load(B.sub.j) = load(B.sub.j) + size(C.sub.i), value(B.sub.j) =
value(B.sub.j) +
value(C.sub.i), B.sub.j = B.sub.j .orgate.{C.sub.i }, and L = L-{C.sub.i
}.
3. Let B.sub.<i>, I=1, . . . ,n.sub.disk be the bins corresponding to the
n.sub.disk
largest values in the final packing. Return C' = .orgate..sup.n.sbsp.disk
.sub.i=1 B.sub.<i>. (The
packing of C' is defined by the B.sub.<i> 's).
______________________________________
Turning now to FIGS. 4A and 4B, illustrated are schematic diagrams of vertical striping and horizontal striping. In vertical striping (FIG. 5A), each column 410, 420, 430 of a given segment matrix is declustered across all disks 400A, 400B, 400C of a given CMOD server. This scheme is similar to fine-grained striping or RAID-3 data organization, since each column of each segment has to be retrieved in parallel from all disks (as a unit). "PACKSEGMENTS" is able to operate with vertical striping. In this case, the size vector for each media segment is one-dimensional and consists of the normalized bandwidth requirement (or consumption) for the media segment. In horizontal striping (FIG. 4B) the columns 450, 460, 470, 480, 490 of a given segment matrix are mapped to individual disks 440A, 440B, 440C in a round-robin manner. Consequently, the retrieval of data for a transmission of C.sub.i proceeds in a round-robin fashion along the disk array. During each round, a single disk is used to read a column of C.sub.i and consecutive rounds employ consecutive disks. Consider the periodic retrieval of C.sub.i from a specific disk. By virtue of the round-robin placement during each transmission of C.sub.i, a column of C.sub.i must be retrieved from that disk periodically at intervals of n.sub.disk rounds. Furthermore, to support EPPV service, the transmissions of C.sub.i are themselves periodic, with a period T.sub.i =n.sub.i .multidot.T. Thus, the retrieval of C.sub.i from a specific disk is a collection of periodic real time tasks with period T.sub.i (i.e., the media segment's transmission), where each task consists of a collection of subtasks that are n.sub.disk .multidot.T time units apart (i.e., column retrievals within a transmission). A simplified version of this problem occurs when, for each media segment C.sub.i, l.sub.i .ltoreq.n.sub.disk .multidot.T holds. In this case, periodic retrieval of a media segment consists of a simple periodic task. Turning now to FIGS. 5A and 5B, illustrated are a generalized scheduling tree structure 500 for simple periodic tasks, where this task model is applicable to media segments for which l.sub.i .ltoreq.n.sub.disk .multidot.T holds, according to the present invention and a particular scheduling tree structure for an exemplary set of tasks for which l.sub.i .ltoreq.n.sub.disk .multidot.T (containing nodes 540, 550, 560, 570, 580, 590 and edges 542, 544, 552, 554, 562, 582). FIGS. 5A and 5B are presented primarily for the purpose of providing an overview of the scheduling tree structure concept of the present invention. A scheduling tree (of the present invention and as described below) determines a "conflict-free" schedule for the periodic retrieval of media segments that are part of the scheduling tree. That is, the retrieval of these media segments will not collide in a round. One fundamental concept of the present invention is that all tasks in a subtree rooted at some edge 512, 522 emanating from node n (such as a node 510) at level 1 uses time slot numbers that are congruent to I(mod .pi..sub.1 (n)) where I is a unique number between 0 and .pi..sub.1 (n)-1. Satisfying this invariant recursively at every internal node 520, 530 ensures the avoidance of conflicts. An internal node n at level 1 is candidate for period n.sub.i (n.sub.i =T.sub.i /T) if and only if .pi..sub.l-1 (n).vertline.n.sub.i and gcd ##EQU3## A period n, can be scheduled under any candidate node n in a scheduling tree. Two possible cases exist: If .pi..sub.1 (n).vertline.n.sub.i then the condition above guarantees that n (in a tree having a node 600 and edges 610a, 610b, 610c) has at least one free edge 610d at which n.sub.i can be placed. If .pi..sub.1 (n) n.sub.i then, to accommodate n.sub.i under node n (in a tree having a node 600 and edges 620a, 620b, 620c, 620d, 630a, 630b, 630c), n must be split so that the defining properties of the scheduling tree structure are kept intact. This may be done as follows. Let ##EQU4## Node n is split into a parent node with weight d and child nodes with weight ##EQU5## with the original children of n divided among the new child nodes; that is, the first batch of ##EQU6## children of n are placed under the first child node, and so on. It is apparent that this splitting maintains the properties of the structure. Furthermore, the condition set forth above guarantees that the newly created parent node will have at least one free edge for scheduling n.sub.i. The set of candidate nodes for each period to be scheduled can be maintained efficiently, in an incremental manner. The observation here is that when a new period n.sub.i is scheduled, all remaining periods advantageously only have to check a maximum of three nodes, namely the two closest ancestors of the leaf for n.sub.i and, if a split occurred, the last child node created in the split, for possible inclusion or exclusion from their candidate sets. As above, each task is assumed to be associated with a value and that improving the cumulative value of a schedule is the objective. The basic idea of the heuristic of one aspect of the present invention (termed BUILDTREE) is to build the scheduling tree incrementally in a greedy fashion, scanning the tasks in non-increasing order of value and placing each period n.sub.i in that candidate node M that implies the minimum value loss among all possible candidates. This loss is calculated as the total value of all periods whose candidate sets become empty after the placement of n.sub.i under M. Ties are always broken in favor of those candidate nodes that are located at higher levels (i.e., closer to the leaves), while ties at the same level are broken using the postorder node numbers (i.e., left-to-right order). When a period is scheduled in .GAMMA., the candidate node sets for all remaining periods are undated (in an incremental fashion) and the method continues with the next task/period (with at least one candidate in 1'). FIG. 6 illustrates a flow diagram of a method 600 of building a scheduling tree for a limited segment model wherein l.sub.i .ltoreq.T. The method 600 may be embodied as a procedure termed "BUILDTREE" set forth in Table II below:
TABLE II
______________________________________
"BUILDTREE"
______________________________________
1. Input: A set of simple periodic tasks C = {C.sub.1, . . . ,C.sub.N }
and 1.sub.i .ltoreq.
n.sub.disk. T with corresponding periods P = {n.sub.1, . . . ,n.sub.N },
and a
value () function assigning a value to each C.sub.i.
Output: A scheduling tree .GAMMA. for a subset C' of C. (Goal: Maximize
.SIGMA.C.sub.i .epsilon. C' value (C.sub.i).)
1. Sort the tasks in C in non-increasing order of value to obtain
a list L = <C.sub.1,C.sub.2, . . . ,C.sub.N >, where value (C.sub.i)
.gtoreq. value (C.sub.i+1).
Initially, .GAMMA. consists of a root node with a weight equal to
n.sub.1.
2. For each periodic task C.sub.i in L (in that order):
2.1 Let cand(n.sub.i, .GAMMA.) be the set of candidate nodes for n.sub.i,
in
. (Note that this set is maintained incrementally as the tree is built.)
2.2 For each n .epsilon. cand(n.sub.i, .GAMMA.), let .GAMMA..orgate.{n.sub
.i }.sub.n denote the tree that
results when n.sub.i is placed under node n in 64 . Let loss(n) -
{C.sub.j, .epsilon. L-{C.sub.i }.vertline. cand(.GAMMA..orgate.{n.sub.i
}.sub.n) = .0.}
and value loss(n)) =
.SIGMA.c.sub.j.epsilon.loss(n) value(C.sub.j).
2.3 Place n.sub.i under the candidate node M such that value
(loss (M)) = min.sub.n.epsilon.cand (ni, r) {value(loss (n)) }. (Ties are
broken in
favor of nodes at higher levels.) If necessary, node M is split.
2.4 Set .GAMMA. = .GAMMA..orgate.{n.sub.i }.sub.M, L=L-loss(M).
2.5 For each task C.sub.j, .epsilon. L, update the candidate node set
cand(n.sub.j, .GAMMA.).
______________________________________
With reference to FIG. 6, the method begins in a step 610 wherein media segments (tasks) are sorted in a non-increasing order of value to obtain a list L=<C.sub.1, C.sub.2, . . . , C.sub.N >. Next, for each periodic task in order, a candidate set of nodes is developed (in a step 620, a tree is built iteratively (in a step 630), where n.sub.i is placed under a selected candidate node (in a step 640) and candidate nodes are updated for remaining periods (in a step 650). Let N be the number of tasks in C. The number of internal nodes in a scheduling tree is always going to be O(N). To see this, note that an internal node will always have at least two children, with the only possible exception being the rightmost one or two new nodes created during the insertion of a new period. Since the number of insertions is at most N, it follows that the number of internal nodes is O(N). Based on this fact, it is easy to show that BUILDTREE runs in time O(N.sup.3). Example 2: Consider the list of periods<n.sub.1 =2, n.sub.2 =12, n.sub.3 =30>(sorted in non-increasing order of value). Turning now to FIGS. 7A through 7D, illustrated is the step-by-step construction of the scheduling tree (comprising nodes 700, 710, 720, 710a, 710b, 730a, 730b) using BUILDTREE. Note that period n.sub.3 splits the node with weight 6 into two nodes with weights 3 and 2 (the node 720 splits into nodes 720a, 720b). In the general case, when the lengths of the media segments are not restricted, periodic media segment retrieval under horizontal striping was defined above as a periodic real-time task C.sub.i with period ##EQU7## (in rounds) that consists of a collection of ##EQU8## subtasks (c.sub.i being the number of columns in the matrix for media segment C.sub.i) that need to be scheduled n.sub.disk rounds apart. The basic observation here is that all the subtasks of C.sub.i are themselves periodic with period n.sub.i, so the techniques of the previous section can be used for each individual subtask. However, the scheduling method also needs to ensure that all the subtasks are scheduled together, using time slots (i.e., rounds) placed regularly at intervals of n.sub.disk. Heuristic methods for building a scheduling tree in this generalized setting will now be set forth in detail. An important requirement of this more general task model is that the insertion of new periods cannot be allowed to distort the relative placement of subtasks already in the tree. The splitting mechanism described in the previous section for simple periodic tasks does not satisfy this requirement, since it can alter the starting time slots for all subtasks located under the split node. Instead, the present invention employs a different method for "batching" the children of the node being split, so that the starting time slots for all leaf nodes remain unchanged. This new splitting rule is as follows: if the node n is split to give a new parent node with weight d, then place at edge I of the new node (I=0, . . . , d-1) all the children of the old node n whose parent edge weight was congruent to I(mod d). Turning now to FIGS. 8A and 8B, illustrated are an exemplary scheduling tree (having nodes 800, 810a, 810b) before and after a split therein using the above-described splitting rule of the present invention (and adding a node 810c). Example 3: FIG. 8A illustrates a scheduling tree with two tasks with periods n.sub.1 =6, n.sub.2 =6 assigned to slots 0 and 1. FIG. 8B depicts the scheduling tree after a third task with period n.sub.3 =15 is inserted. Although there is enough capacity for both n.sub.1 and n.sub.2 in the subtree connected to the root with edge 0, the splitting rule of the present invention forces n.sub.2 to be placed in the subtree connected to the root with edge 1. In this setting, the notion of a candidate node is defined as follows: an internal node n at level 1 is candidate for period n.sub.i if and only if .pi..sub.l-1 (n).vertline.n.sub.i and there exists an I .epsilon..vertline.{0, . . . , d-1} such that all edges of n with weights congruent to I (mod d) are free, where ##EQU9## However, under the generalized model of periodic tasks of the present invention, a candidate node for n.sub.i can only accommodate a subtask of C.sub.i. This is clearly not sufficient for the entire task. The temporal dependency among the subtasks of C.sub.i means that the scheduling tree method of the present invention should make sure that all the subtasks of C.sub.i are placed in the tree at distances of n.sub.disk. One way to deal with this situation is to maintain candidate nodes for subtasks and use a simple predicate based on the equation: ##EQU10## for checking the availability of specific time slots in the scheduling tree. The scheduling of C.sub.i can then be handled as follows. Select a candidate node for n.sub.i and a time slot u.sub.i for n.sub.i under this candidate. Place the first subtask of C.sub.i in u.sub.i and call the predicate repeatedly to check if n.sub.i can be scheduled in slot u.sub.i ##EQU11## If the predicate succeeds for all j, then C.sub.i is scheduled starting at u.sub.i. Otherwise, the method can try another potential starting slot u.sub.i. A problem with the approach outline above is that even if the number of starting slots tried for C.sub.i is restricted to a constant, scheduling each subtask individually yields pseudo-polynomial time complexity. This is because the number of scheduling operations in a trial will be ##EQU12## where ##EQU13## is part of the problem input. The present invention provides a polynomial time heuristic method for the problem. To simplify the presentation, it is assumed that every period n.sub.i is a multiple of n.sub.disk. Although it is possible to extend the heuristic described herein to handle general periods, it is believed that this assumption is not very restrictive in practice. This is because round lengths T are typically expected to be in the area of a few seconds and periods T.sub.i are typically multiples of some number of minutes (e.g., 5, 10, 30 or 60 minutes). Therefore, it is realistic to assume the smallest period in the system can be selected to be a multiple of n.sub.disk. The objective is to devise a method that ensures that if the first subtask of a task C.sub.i does not collide with the first subtask of any other task in the tree, then no other combination of subtasks can cause a collision to occur. This means that once the first subtask of C.sub.i is placed in the scheduling tree there is no need to check the rest of C.sub.i 's subtasks individually. The method of the present invention sets the weight of the root of the scheduling tree to n.sub.disk. (This is possible since the n.sub.i 's are multiples of ndisk.) This implies that consecutive subtasks of a task will require consecutive edges emanating from nodes at the first level (direct descendents of the root), which are first-level ancestors of the leaf nodes where the subtasks are placed. When the first subtask of a task is placed at a leaf node, at least some of the consecutive edges of the first-level ancestor node of that leaf are disabled, so that the slots under those edges cannot be used by the first subtask of any future task. By the previous observation, ##EQU14## consecutive edges of the first-level ancestor of the leaf for n.sub.i must be disabled, starting with the right neighbor of the edge under which that leaf resides. (s.sub.i is the number of subtasks of C.sub.i) This "edge disabling" is implemented by maintaining an integer distance for each edge e emanating from a first-level node that is equal to the number of consecutive neighbors of edge e that have been disabled. The placement method of the present invention should maintain two invariants. First, the distance of an edge e of a first-level node is always equal to max.sub.C.sbsb.1 {s.sub.i }-1, where the max is taken over all tasks placed under e in the tree. Second, the sum of the weight of an edge e of a first-level node n and its distance is always less than the weight of n (so that the defining properties of the tree are maintained). Based on the above method, the notion of a candidate node can be defined as follows: let n be an internal node at level l. Let n.sub.i be a period and define ##EQU15## Node n is candidate for period n.sub.i if and only if .pi..sub.l-1 (n).vertline.n.sub.i and the following conditions hold: 1. If n is the root node, n has a free edge. 2. If level(n)=1, there exists an I .epsilon.{0, . . . , d-1} such that all (non-disabled) edges of n whose sum of weight plus distance is congruent to (I+j) (mod d), for 0.ltoreq.j<s.sub.i, are free. 3. If level(n).gtoreq.2, 3.1 there exists an I .epsilon.{0, . . . , d-1} such that all edges of n with weight congruent to I (mod d) are free; and, 3.2 s.sub.i -1--ancestor.sub.-- edge.sub.2 (n)<ancestor-node.sub.1 (n) and s.sub.i +ancestor.sub.-- edge.sub.2 (n) is less than or equal to the weight of the (non-disabled) edge following ancestor.sub.-- edge.sub.2 (n), if there is such an edge. Note that clause 2 ensures that edge distances are maintained when the first-level nodes are split. Turning now to FIG. 9, illustrated is a flow diagram of a method 900 of building a scheduling tree for periodic tasks having equidistant subtasks. The method 900 may be embodied as a procedure termed "BUILDEQUIDTREE" set forth in Table III below:
TABLE III
______________________________________
"BUILDEQUIDTREE"
______________________________________
Input: A set of periodic tasks C = {C.sub.1, . . . , C.sub.N } with
corresponding periods P = {n.sub.1, . . . , n.sub.N } and a value ()
function
assigning a value to each C.sub.1. Each task consists of subtasks
placed at intervals of n.sub.disk.
Output: A scheduling tree .GAMMA. for a subset C' of C. (Goal: Maximize
.SIGMA..sub.c,.epsilon. C' value (C.sub.i).)
1. Sort the tasks in C in non-increasing order of value to obtain
a list L = < C.sub.1, C.sub.2, . . . ,C.sub.N >, where value (C.sub.i)
.gtoreq.
value (C.sub.i+1). Initially, .GAMMA. consists of a root node with a
weight equal to
n.sub.disk.
2. For each task C.sub.i in L (in that order):
2.1 Select a candidate node n for n, in .GAMMA.. (Ties are broken
in favor of nodes at higher levels).
2.2 If w(n) .vertline./ni, split n.
2.3 Schedule the first subtask of C.sub.i under n. (Ties are
broken in favor of edges with smaller weights).
2.4 Let d be the distance of the ancestor edge at the first
level of the leaf corresponding to n.sub.i. Set the distance of
this edge to max{d, s.sub.i - 1}.
______________________________________
BuildEquidTree can be used to construct a scheduling tree in polynomial time. With reference to FIG. 9, the method 900 begins in a step 910 wherein the tasks are sorted in a non-increasing order of value to obtain a list L=<C.sub.1, C.sub.2, . . . , C.sub.N >. Next, for each periodic task in order, a candidate node n is selected (in a step 920), n is split if w(n) n.sub.i (in a step 930), the first subtask of the task is scheduled under n (in a step 940) and edge distances are set (in a step 950). Turning now to FIGS. 10A through 10C, illustrated are an exemplary scheduling tree (variations of which are designated 1000, 1010, 1020) with equidistant subtasks being built according to the method illustrated in FIG. 10. Example 4: Consider three tasks C.sub.1, C.sub.2, C.sub.3 with s.sub.1, s.sub.2, s.sub.3 =2, 1, 3 and n.sub.1, n.sub.2, n.sub.3 =12, 18, 10 and n.sub.disk =2. FIGS. 10A through 10C illustrate the three states of scheduling tree after placement of C.sub.1, C.sub.2 and C.sub.3, respectively. An interesting property of the scheduling tree formulation is that it can easily be extended to handle time slots that can fit more than one subtask (i.e., can allow for some tasks to collide). As set forth above, this is exactly the case for the rounds of EPPV retrieval under horizontal striping. The subtasks of C.sub.i can be thought of as items of size (C.sub.i).ltoreq.1 (i.e., the fraction of disk bandwidth required for retrieving one column of media segment C.sub.1) that are placed in unit capacity time slots. In this more general setting, a time slot can accommodate multiple tasks as long as their total size does not exceed one. The problem can be visualized as a collection of unit capacity bins (i.e., time slots) located at the leaves of a scheduling tree, whose structure determines the eligible bins for each task's subtasks (based on their period). With respect to the previous model of tasks, the main difference is that since slots can now accommodate multiple retrievals it is possible for a leaf node that is already occupied to be a candidate for a period. Hence, the basic idea for extending the methods of the present invention to this case is to keep track of the available slot space at each leaf node and allow leaf nodes to be shared by tasks. Thus, the notion of candidate nodes can simply be extended as follows: let n be a leaf node of a scheduling tree r corresponding to period p. Also let S(n) denote the collection of tasks (with period p) mapped to n. The load of leaf n is defined as: load(n)=.SIGMA..sub.C.sbsb.i.sub..epsilon.S(n) size(C.sub.i). A node n at level l is candidate for a task of C.sub.i (with period n.sub.i) if and only if: 1. n is internal, conditions in the previous definition of candidate node hold, or 2. n is external (leaf node) corresponding to n.sub.i (i.e., .pi..sub.1 (n)=n.sub.i), and load(n)+size (C.sub.i).ltoreq.1. With these extensions, it is easy to see that the BuildEquidTree method can be used without modification to produce a scheduling tree for the multi-task capacity case. To construct forests of multiple non-colliding scheduling trees, trees already built can be used to restrict task placement in the tree under, construction. By the Generalized Chinese Remainder Theorem, the scheduling method needs to ensure that each subtask of task C.sub.i is assigned a slot u.sub.i such that u.sub.i .notident.u.sub.j (mod gcd (n.sub.i, n.sub.j)) for any subtask of any task C.sub.j that is scheduled in slot u.sub.j in a previous tree within the same forest. A general packing-based method set forth below can be used for combining independently built scheduling forests. Of course, a forest can always consist of a single tree. The objective is to improve to the utilization of scheduling slots that can accommodate multiple tasks. Given a collection of tasks, scheduling forests are constructed until each task is assigned a time slot. No pair of tasks within a forest will collide at any slot except for tasks with the same period that are assigned to the same leaf node as described in Section 5.3. A simple conservative approach is to assume a worst-case collision across forests. That is, the size of a forest F.sub.i is defined as size (F.sub.i)=max.sub.n.sbsb..sub.68 Fi (load (n.sub.j)) where n.sub.j is any leaf node in F.sub.i, and the load of a leaf node is as given above. Further, a forest F.sub.i has a value: value (F.sub.i)=.SIGMA..sub.C.sbsb.j.sub..epsilon.Fi value (C).sub.j. Thus, under the assumption of a worst-case collision, the problem of maximizing the total scheduled value for a collection of forests is a traditional 0/1 knapsack optimization problem. A packing-based heuristic as PACKSEGMENTS can be used to provide an approximate solution. In some cases, the worst-case collision assumption across forests may be unnecessarily restrictive. For example, consider two scheduling trees .GAMMA..sub.1 and .GAMMA..sub.2 that are constructed to be independently. Let e.sub.1 be an edge emanating from the root node n.sub.1 of .GAMMA..sub.1 and e.sub.2 be an edge emanating from the root node n.sub.2 of .GAMMA..sub.2. If e.sub.1 mod (gcd (n.sub.1, n.sub.2)).noteq.e.sub.2 mod (gcd (n, n.sub.2)) holds, then the tasks scheduled in the subtrees rooted at e.sub.1 and e.sub.2 can never collide. Using such observations, more sophisticated packing-based methods for combining forests can be constructed. Preliminary performance experimentation has been undertaken to compare the average performance of the methods introduced in by the present invention for supporting EPPV service. For the experiments, two basic workload components were employed, modeling typical scenarios encountered in today's pay-per-view CMOD media segment servers. Workload #1 consisted of relatively long MPEG-1 compressed media segments with a duration between 90 and 120 minutes (e.g., movie features). The display rate for all these media segments was equal to r.sub.i =1.5 Mbps. To model differences in media segment popularity, the workload comprised two distinct regions: a "hot region" with retrieval periods between 40 and 60 minutes and a "cold region" with periods between 150 and 180 minutes. Workload #2 consisted of small media segment segments with lengths between 2 and 10 minutes (e.g., commercials or music media segments). The display rates for these media segments varied between 2 and 4 Mbps (i.e., MPEG-1 and 2 compression. Again, segments were divided between a "hot region" with periods between 20 and 30 minutes and a "cold region" with periods between 40 and 60 minutes. Each component was executed in isolation and mixed workloads consisting of mixtures of type #1 and type #2 workloads were also investigated. The basic performance metric was the effectively scheduled disk bandwidth (in Mbps) for each of the resource scheduling methods introduced by the present invention. Scaleup experiments in which the offered load (i.e., number of segments to be scheduled) was proportionate to the system size (i.e., number of disks in the server) were concentrated upon. Further, in all cases, the expected storage requirements of the offered load were insured to be approximately equal to the total disk capacity. This allowed the storage capacity constraint for the striping-based methods to be ignored. The results of the experiments, with type #1 workloads with hot regions of 30% (a graph 1100) and 10% (a graph 1110) are shown in FIGS. 11A and 11B, respectively. Clearly, the horizontal striping-based method outperforms both clustering and vertical striping over the entire range of values for the number of disks. Observe that for type #1 workloads, the maximum number of segments that can be scheduled is limited by the aggregate disk storage. Specifically, it is easy to see that the maximum number of segments that can fit in a disk is 3.95 the average number of concurrent streams for a segment is (0.3.multidot.3+0.7.multidot.1)=1.6. Thus the maximum bandwidth that can be utilized on a single disk for this mix of accesses is 1.6.multidot.3.95.multidot.1.5=9.48 Mbps. This explains the low scheduled bandwidth output shown in FIGS. 11A and 11B. Note that, in most cases, the scheduling tree heuristics of the present invention were able to schedule the entire offered workload of segments. On the other hand, the performance of vertical striping methods quickly deteriorates as the size of the disk array increases. The performance of the clustering method of the present invention under Workload #1 suffers from the disk storage fragmentation due to the large segment sizes. A deterioration can also be observed in the performance of clustering as the access skew increases (i.e., the size of the hot region becomes smaller). This can be explained as follows: PACKSEGMENTS first tries to organize the segments that give the highest profit (i.e., the popular segments). Thus when the hot region becomes smaller the relative value of the scheduled subset (as compared to the total workload value) decreases. The relative performance of the three methods for a type #2 workload with a 50% hot region is depicted in FIG. 12A (a graph 1200). Again, the horizontal striping-based method outperforms both clustering and vertical striping over the entire range of n.sub.disk. Note that, compared to type #1 workloads, the relative performance of clustering and vertical striping methods under this workload of short segments is significantly worse. This is because both these methods, being unaware of the periodic nature of segment retrieval, reserve a specific amount of bandwidth for every segment C.sub.i during every round of length T. However, for segments whose length is relatively small compared to their period, this bandwidth is actually needed only for small fraction of rounds. FIG. 12A clearly demonstrates the devastating effects of this bandwidth wastage and the need for periodic scheduling methods. Finally, FIG. 12B depicts (in a graph 1210) the results obtained for a mixed workload consisting of 30% type #1 segments and 70% type #2 segments. Horizontal striping is once again consistently better than vertical striping and clustering over the entire range of disk array sizes. Compared to pure type #1 or #2 workloads, the clustering-based method is able to exploit the non-uniformities in the mixed workload to produce much better packings. This gives clustering a clear win over vertical striping. Still, its wastefulness of disk bandwidth for short segments does not allow it to perform at the level of horizontal striping. In general, the period T.sub.i of a media segment C.sub.i may be greater than its length l.sub.i. The methods presented above for clustering and vertical striping can be used to schedule such media segments, however, they may be unnecessarily restrictive. If T.sub.i >l.sub.i, then under clustering and vertical striping, the retrieval of a media segment C.sub.i can be modeled as a collection of periodic real-time tasks with period T.sub.i =n.sub.i .multidot.T, where each task consists of a collection of C.sub.i subtasks that are T time units apart and have a computation time equal to the column retrieval time. (C.sub.i is the number of columns in C.sub.i.) Note that the only difference between this task model and the one defined above is that the distance between consecutive subtasks is only one time slot (rather than n.sub.disk). The scheduling tree methods and packing-based methods of the present invention for combining forests and trees can easily be modified to deal with this case. It has been assumed to this point that segments are stored on disks using a matrix-based layout scheme. That is, each column of a segment matrix is stored contiguously. A column is nothing more than the total amount of data that needs to be retrieved in a round for all concurrent display times. Thus, the matrix-based layout provides the advantageous property of reducing the disk latency overhead within a round for all the concurrent phases to a single t.sub.lat. On the other hand, the scheduling and organizing methods of the present invention can also handle conventional data layout methods that do not exploit the knowledge of retrieval periods during data layout. In addition to supporting EPPV service, the tree-based scheduling methods of the present invention can offer support for the Random Access service model described above, which places resource reservations to allocate independent physical channels to each individual CMOD client. Under the Random Access service model, the maximum number of streams that can be concurrently retrieved and, therefore, the maximum number of concurrent clients that can be supported is limited by the available resources. From the above, it is apparent that the present invention provides various systems and methods of scheduling media segments of varying display rate, length and/or retrieval period on at least one clustered, vertically-striped or horizontally-striped CM database volume. With respect to the at least one clustered database volume or the at least one vertically-striped database volume, one method includes the steps of: (1) associating a display value and a normalized bandwidth requirement with each of the media segments, (2) sorting the media segments in a non-increasing order of value density to obtain an ordered list thereof and (3) organizing the media segments into the at least one database volume in an order that increases a total display value of the media segments. With respect to the at least one horizontally-striped database volume, one method includes the steps of: (1) associating a display value with each of the media segments, (2) sorting the media segments in a non-increasing order of value density to obtain an ordered list thereof and (3) building a scheduling tree of the media segments, the scheduling tree having a structure that increases a total display value of the media segments. Although the present invention has been described in detail, those skilled in the art should understand that they can make various changes, substitutions and alterations herein without departing from the spirit and scope of the invention in its broadest form.
|
Same subclass Same class Consider this |
||||||||||
