Allocating storage for aligned data6507901Abstract A method and computer program for allocating storage for a header and one or more data elements in a data storage facility are disclosed. The method and computer program include computing a hole size B that is a portion of a word that would be unallocated if storage were allocated to the header and to the data elements in a preferred order. The method and computer program include finding a subset of data elements S={F.sub.i1, F.sub.i2, . . . , F.sub.in } that satisfy the equation (SizeModN(F.sub.i1)+SizeModN(F.sub.i2)+ . . . +SizeModN(F.sub.in))mod N=B where N is the largest alignment requirement associated with any data element. The method and computer program include allocating storage to data elements in S first and allocating storage to the remaining elements in the preferred order. A lookup table and a method for building the lookup table are also disclosed. Claims What is claimed is: Description BACKGROUND
TABLE 1
k i .multidot. k i .multidot. k mod N
1 3 3
2 6 2
3 9 1
4 12 0
5 15 3
6 18 2
It may be observed that M=3, since (i.multidot.k)mod N.noteq.0, for 1.ltoreq.k.ltoreq.3, and (i.multidot.(M+1))mod N=0. The significance of M can be explained in the context of equation (1). Consider a subset of data elements S={F.sub.i1, F.sub.i2, F.sub.i3, F.sub.i4, F.sub.i5, F.sub.i6 } that satisfy equation (1) such that SizeModN(F.sub.i1)=3, SizeModN(F.sub.i2)=3, SizeModN(F.sub.i3)=3, SizeModN(F.sub.i4)=3, SizeModN(F.sub.i5)=3, SizeModN(F.sub.i6)=2. Equation (1) can be written as (3+3+3+3+3+2)mod 4=B, or 17 mod 4=B, or 1=B. There are k (=5) data elements for which SizeModN is equal to 3. The contribution of these data elements towards the sum in the left hand side of equation(1) is (3.multidot.5)mod 4=(3.multidot.1)mod 4=3. From the table above, k=1 results in (3.multidot.1)mod 4=3. Hence, it is possible to find a smaller subset with only 1 data element, F.sub.i1, whose contribution to equation (1) will be identical to that of the set {F.sub.i1, F.sub.i2, F.sub.i3, F.sub.i4, F.sub.i5 }. In this example, a smaller subset that will satisfy equation (1) is {F.sub.i1, F.sub.i6 }. The value of (i.multidot.k) mod N is a distinct value for all values of k.ltoreq.M, after which the values simply repeat with a periodicity of (M+1). Hence, when N=4, it is sufficient to consider at most 3 data elements for which SizeModN=3 for inclusion in a solution to satisfy equation (1). Table 2 below shows the values of M for different values of i and N.
TABLE 2
N i M
2 1 1
4 1 3
4 2 1
4 3 3
8 1 7
8 2 3
8 3 7
8 4 1
8 5 7
8 6 3
8 7 7
Let M.sub.sum,N be the sum of the values of M for a given N. N=2.fwdarw.M.sub.sum,2 =1. N=4.fwdarw.M.sub.sum,4 =3+1+3=7. N=8.fwdarw.M.sub.sum,8 =7+3+7+1+7+3+7=35. Let the tuple <v,f> denote a set (that allows duplicates) where the element v occurs f times. Define sets S.sub.2, S.sub.4 and S.sub.8 as: ##EQU1## It is sufficient to consider all the subsets of S.sub.N (N=2, 4, 8) to find all possible solutions for equation (1). It should be clear that S.sub.N contains all possible distinct elements and increasing the frequency of any element will not increase the number of solutions. Hence, the problem has been reduced from computation on all subsets of a set with maximum number of data elements to computation on all subsets of a set with at most 35 elements. This is still a very expensive operation, since it requires examination of 2.sup.35 subsets. The second key observation, which is described next, will drastically reduce the number of subsets to be examined. Upper Limit on Number of Distinct Elements The second observation is relevant for N=4 and N=8. It will put an upper limit on the number of distinct elements, D.sub.N, that may be present in a solution for equation (1). It will be shown that any solution that has more than D distinct elements can be reduced or is equivalent to a solution with less than or at most D distinct elements. The result to be shown is that: N=4.fwdarw.D.sub.4 =2 N=8.fwdarw.D.sub.8 =3 Proof: Case N=4 Consider a subset Sub.sub.4 ={<1, f.sub.1 >, <2, f.sub.2 >, <3, f.sub.3 >}, with 3 (>D.sub.4) distinct elements. From equation (1), (1.multidot.f.sub.1 +2.multidot.f.sub.2 +3.multidot.f.sub.3)mod 4=B. Assume that f.sub.1.ltoreq.f.sub.3 without loss of generality. Hence, B=(1.multidot.f.sub.1 +2.multidot.f.sub.2 +3.multidot.(f.sub.3 -f.sub.1 +f.sub.1))mod 4=(2.multidot.f.sub.2 +4.multidot.f.sub.1 +3.multidot.(f.sub.3 -f.sub.1))mod 4=(2*f.sub.2 +3*(f.sub.3 -f.sub.1))mod 4. Hence, an equivalent solution has been found using a smaller set {<2,f.sub.2 >, <3, f.sub.3 -f.sub.1 >}. The key observation is that the sum of two of the distinct elements, 1 and 3, is equal to N. This reduction will be possible in general when the sum of two or more distinct elements is equal to N or a multiple of N. This principle also applies when N=8. Case N=8 There are 7 distinct elements which may be represented by the set {1, 2, 3, 4, 5, 6, 7}. Now, 1+7=8, 2+6=8 and 3+5=8. Hence, any subset that contains both 1 and 7 as distinct elements can be reduced to a subset with one less distinct element that contains either 1 or 7, but not both. Similarly, any subset that contains both 2 and 6 as distinct elements can be reduced to a smaller subset that contains either 2 or 6, but not both. Similarly, any set that contains both 3 and 5 as distinct elements can be reduced to a set that contains either 3 or 5, but not both. Thus, the maximum number of distinct elements that need to be considered has been reduced from 7 to 4. The 4 distinct elements will be 1. Either 1 or 7. 2. Either 2 or 6. 3. Either 3 or 5. 4. 4 Hence, there will be 8 subsets with 4 distinct elements: 1. {1, 2, 3, 4} 2. {1, 2, 5, 4} 3. {1, 6, 3, 4} 4. {1, 6, 5, 4} 5. {7, 2, 3, 4} 6. {7, 2, 5, 4} 7. {7, 6, 3, 4} 8. {7, 6, 5, 4} In each case, it is quite easy to verify that there is one combination that adds up to 8 or 16. The combination that adds up to 8 or 16 is shown in bold above. Consider the first case--where there are 4 distinct elements in this subset, the 4 distinct elements being {1, 2, 3, 4}. Now, let the frequencies of these elements be f1, f2, f3 and f4. That is, we are really considering a subset where there are f1 elements for which SizeModN(size)=1, f2 elements for which SizeModN(size)=2, f3 elements for which SizeModN(size)=3, f4 elements for which SizeModN(size)=4 Hence, by equation (1) we have (f1.multidot.1+f2.multidot.2+f3.multidot.3+f4.multidot.4)mod 8=B, or f1.multidot.1+f2.multidot.2+f3.multidot.3+f4.multidot.4=8Q+B, where Q is some positive integer. Let us assume without loss of generality that the smallest value among f1, f3 and f4 is f3. Note that in this example 1+3+4=8, so we are interested in finding the smallest frequencies among f1, f3 and f4. Then, we have f3.multidot.1+(f1-f3).multidot.1+f2.multidot.2+f3.multidot.3+f3.multidot.4+ (f4-f3).multidot.4=8Q+B, rearranging f3.multidot.(1+3+4)+(f1-f3).multidot.1+f2.multidot.2+(f4-f3).multidot.4=8Q+ B, or (f1-f3).multidot.1+f2.multidot.2+(f4-f3).multidot.4=8(Q-f3)+B Thus, we can see that we have found a smaller sub-set which contains (f1-f3) elements where SizeModN(size)=1 f2 elements where SizeModN(size)=2 (f4-f3) elements where SizeModN(size)=4 ps This smaller sub-set can also satisfy the same equation. Thus, instead of requiring 4 distinct elements to satisfy this equation, we need only 3. We can similarly show for the remaining 7 cases (where there is some combination of elements which add to 8 or 16), that we need only 3 distinct elements. This completes the proof that N=8.fwdarw.D=3. Further, this proves that all possible solutions can be computed in polynomial time. There is an upper limit on the number of distinct elements, and there is an upper limit on the frequency for each distinct element. These values will be used to compute all the valid solutions. Computation of Valid Solutions The valid solutions can be computed by hand for N=2 and N=4 because the numbers are small. The valid solutions for N=8 can be computed programmatically. Valid Solutions for N=2 When N=2, the only hole size that is possible is B=1. The only valid solution when N=2 and B=1 is <1,1>, which indicates that a hole one byte in size can be filled as long as there is one odd-sized data element. Valid Solutions for N=4 When N=4, the hole size can be 1, 2 or 3. When B=1, the following solutions are valid: 1. {<1,1>} 2. {<3,3>} 3. {<2,1>,<3,1>} Solution <1,1> indicates that a valid solution for a hole size of 1 is one data element whose size modulo 4 is 1. Solution <3,3> indicates that a valid solution for a hole size of 1 is three data elements, each having a size modulo 4 of 3. Solution {<2,1>,<3,1>} indicates that a valid solution for a hole size of 1 is one data element having a size modulo 4 of 2 and one data element having a size modulo 4 of 3. When B=2, the following solutions are valid: 1. {<1,2>} 2. {<2,1>} 3. {<3,2>} When B=3, the following solutions are valid: 1. {<1,3>} 2. {<3,1>} 3. {<1,1>,<2,1>} Valid Solutions for N=8 When N=8, the hole size B can be any of {1, 2, 3, 4, 5, 6, 7}. A valid solution to equation (1) may include 1, 2, or 3 distinct data elements, since D.sub.8 =3, as shown above. A valid solution will consist only of combinations of selected data elements as defined below: There are 7 ways of choosing 1 distinct element from 7 elements. There are 21 ways of choosing 2 distinct data elements from 7 elements. The combinations 1 and 7, 2 and 6, 3 and 5 can be ignored. There are 35 ways of choosing 3 distinct elements from 7 elements, out of which only 16 need to be considered. Five subsets that contain both 1 and 7, five subsets that contain both 2 and 6, five subsets that contain both 3 and 5, and the subsets {1,2,5}, {1,3,4}, {7,6,3} and {7,5,4} need not be considered, since they can be replaced with equivalent sets that contain only 2 distinct elements. A listing of an example of program code to calculate the valid solutions when N=8 is shown below:
/* max freq of i where 1 <= i <= 7 */
int maxfreq[8] = { 0, 7, 3, 7, 1, 7, 3, 7 };
int howmany[8] [4];
int MaxSum;
typedef struct _freqcount {
unsigned short sizemod8;
unsigned short freq;
} freqcount;
typedef struct _freqcountentry {
unsigned short count;
freqcount freqlist[3];
} freqcountentry;
typedef struct _freqcountlist {
unsigned short count;
freqcountentry entry[150];
} freqcountlist;
freqcountlist sizemod8[8];
int ones, twos, threes;
int foo(int i, int j, int k)
{
int l, m, n;
int summod8;
int p, q, r;
int t;
int done;
#if 0
printf("*******************************************************************
*******
***************.backslash.n");
printf(".backslash.ti.backslash.tfreq(i).backslash.tj.backslash.tfreq(j).b
ackslash.tk.backslash.tfreq(k).backslash.tsum mod 8.backslash.n");
printf("*******************************************************************
*******
***************.backslash.n");
#endif
for (l = 1; l <= maxfreq[i]; l++)
for ( m = 1; m <= maxfreq[j]; m++ )
for ( n = 1; n <= maxfreq[k]; n++ )
{
summod8 = ( i * l + j * m + k * n ) & 7;
/* Ignore if summod8 == 0 */
if ( summod8 == 0 )
continue;
/* Ignore cases where summod8 can be obtained with a combination
of (i, j, k ) where the frequencies of i, j and k are less
than l, m and n.
*/
for(p = 0; p <= l ; p++)
for(q = 0; q <= m ; q++)
for(r = 0; r <= n ; r++)
{
if ( ((i * p + j * q + k * r) & 7) ==
summod8 )
goto foolabel;
}
foolabel:
if ( ! ( p == l && q == m && r == n ))
continue;
#if 0
printf(".backslash.t%d.backslash.t%d.backslash.t%d.backslash.t%d.backslash.
t%d.backslash.t%d.backslash.t%d.backslash.n",i,l,j,m,k,n,summod8);
#endif
howmany[summod8][3]++;
threes++;
if ( MaxSum < (i*l + j*m + k*n) )
MaxSum = i*l + j*m + k*n;
t = sizemod8[summod8].count;
sizemod8[summod8].count++;
sizemod8[summod8].entry[t].count = 3;
sizemod8[summod8].entry[t].freqlist[0].sizemod8 = i;
sizemod8[summod8].entry[t].freqlist[0].freq = l;
sizemod8[summod8].entry[t].freqlist[1].sizemod8 = j;
sizemod8[summod8].entry[t].freqlist[1].freq = m;
sizemod8[summod8].entry[t].freqlist[2].sizemod8 = k;
sizemod8[summod8].entry[t].freqlist[2].freq = n;
}
}
bar(int i, int j)
{
int l, m, n;
int summod8;
int p, q;
int t;
int done;
#if 0
printf("*******************************************************************
*******
***************.backslash.n");
printf(".backslash.ti.backslash.tfreq(i).backslash.tj.backslash.tfreq(j).b
ackslash.tsum mod 8.backslash.n");
printf("*******************************************************************
*******
***************.backslash.n");
#endif
for (l = 1; l <= maxfreq[i]; l++)
for ( m = 1; m <= maxfreq[j]; m++ )
{
summod8 = ( i * l + j * m ) & 7;
/* Ignore if summod8 == 0 */
if ( summod8 == 0 )
continue;
for(p = 0; p <= l ; p++)
for(q = 0; q <= m ; q++)
{
if ( ((i * p + j * q ) & 7) == summod8 )
goto barlabel;
}
barlabel:
if (!( p == l && q == m ))
continue;
#if 0
printf(".backslash.t%d.backslash.t%d.backslash.t%d.backslash.t%d.backslash
.t%d.backslash.n",i,l,j,m,summod8);
#endif
twos++;
if ( MaxSum < ( i*l + j*m ) )
MaxSum = i*l + j*m;
howmany[summod8][2]++;
t = sizemod8[summod8].count;
sizemod8[summod8].count++;
sizemod8[summod8].entry[t].count = 2;
sizemod8[summod8].entry[t].freqlist[0].sizemod8 = i;
sizemod8[summod8].entry[t].freqlist[0].freq = l;
sizemod8[summod8].entry[t].freqlist[1].sizemod8 = j;
sizemod8[summod8].entry[t].freqlist[1].freq = m;
}
}
sim(int i)
{
int l, m, n;
int summod8;
int t;
#if 0
printf("*******************************************************************
*******
***************.backslash.n");
printf(".backslash.ti.backslash.tfreq(i).backslash.tsum mod
8.backslash.n");
printf("*******************************************************************
*******
***************.backslash.n");
#endif
for (l = 1; l <= maxfreq[i]; l++)
{
summod8 = ( i * l ) & 7;
/* Ignore if summod8 == 0 */
if (summod8 == 0)
continue;
#if 0
printf(".backslash.t%d.backslash.t%d.backslash.t%d.backslash.n",i,l,summod
8);
#endif
ones++;
if ( MaxSum < i*l )
MaxSum = i*l;
howmany[summod8][1]++;
t = sizemod8[summod8].count;
sizemod8[summod8].count++;
sizemod8[summod8].entry[t].count = 1;
sizemod8[summod8].entry[t].freqlist[0].sizemod8 = i;
sizemod8[summod8].entry[t].freqlist[0].freq = l;
}
}
main()
{
int i,j,k;
/* All the cases where 3 distinct numbers are chosen from 1 . . . 7
*/
foo(1,2,3);
foo(1,2,4);
foo(1,3,6);
foo(1,4,5);
foo(1,4,6);
foo(1,5,6);
foo(2,3,4);
foo(2,3,7);
foo(2,4,5);
foo(2,4,7);
foo(2,5,7);
foo(3,4,6);
foo(3,4,7);
foo(4,5,6);
foo(4,6,7);
foo(5,6,7);
/* All the cases where 2 distinct numbers are chosen from 1 . . . 7
*/
bar(1,2);
bar(1,3);
bar(1,4);
bar(1,5);
bar(1,6);
bar(2,3);
bar(2,4);
bar(2,5);
bar(2,7);
bar(3,4);
bar(3,6);
bar(3,7);
bar(4,5);
bar(4,6);
bar(4,7);
bar(5,6);
bar(5,7);
bar(6,7);
sim(1);
sim(2);
sim(3);
sim(4);
sim(5);
sim(6);
sim(7);
printf("Number of ones = %d.backslash.n", ones);
printf("Number of twos = %d.backslash.n", twos);
printf("Number of threes = %d.backslash.n", threes);
printf("MaxSum = %d.backslash.n", MaxSum);
printf(".backslash.tMod
8.backslash.tS(1).backslash.tS(2).backslash.tS(3).backslash.tTotal.backsla
sh.n");
for( i = 1; i <= 7; i++)
{
int sum = howmany[i][1] + howmany[i][2] + howmany[i][3];
printf(".backslash.t%d.backslash.t%d.backslash.t%d.backslash.t%d.backslash
.t%d.backslash.n", i, howmany[i][1],
howmany[i][2], howmany[i][3], sum );
}
for(i = 1; i <= 7; i++)
{
printf("*******************************************************************
*******
*.backslash.n");
printf("Number of entries for %d=.backslash.t%d.backslash.n",i,
sizemod8[i].count);
printf("*******************************************************************
*******
*.backslash.n");
for(j = 0; j < sizemod8[i].count; j++)
{
freqcount *freqp;
k = sizemod8[i].entry[j].count;
freqp = sizemod8[i].entry[j].freqlist;
if ( k == 1)
printf(".backslash.t%d.backslash.t%d.backslash.n",freqp[0].sizemod8,
freqp[0].freq);
else if (k == 2)
printf(".backslash.t%d.backslash.t%d.backslash.t%d.backslash.t%d.backslash
.n",freqp[0].sizemod8,
freqp[0].freq, freqp[1].sizemod8,
freqp[1].freq);
else if (k == 3)
printf(".backslash.t%d.backslash.t%d.backslash.t%d.backslash.t%d.backslash
.t%d.backslash.t%d.backslash.n",freqp[0].sizemod8,
freqp[0].freq, freqp[1].sizemod8,
freqp[1].freq, freqp[2].sizemod8,
freqp[2].freq);
}
}
}
The above example program determines the valid solutions by performing the following processing, as shown in FIG. 6, for s from 1 to t (block 600): for selected s-tuples (x.sub.1, x.sub.2, . . . x.sub.s) (block 605) and for B ranging from 1 to p (block 610) find the smallest frequencies (f.sub.1, f.sub.2, . . . f.sub.s) such that (x.sub.1.multidot.f.sub.1 +x.sub.2.multidot.f.sub.2 + . . . +x.sub.s.multidot.f.sub.s)mod N=B (block 615), where N is the largest alignment requirement associated with a data element, t is an upper limit on the number of distinct elements in a valid solution, as discussed above, and p is N-1. The s-tuples and frequencies are then stored in the look up table indexed by B (block 620). In the above example program, t=3 and p=7. The s-tuples are selected such the sums of two or more elements of the selected s-tuples do not equal 2.sup.t or a multiple of 2.sup.t. In the example program above, for which t is 3, the s-tuples are selected such that the sum of two or more elements of the selected s-tuples do not equal 8 or a multiple of 8. In the above example program, the selected s-tuples include: (1,2,3), (1,2,4), (1,3,6), (1,4,5), (1,4,6), (1,5,6), (2,3,4), (2,3,7), (2,4,5), (2,4,7), (2,5,7), (3,4,6), (3,4,7), (4,5,6), (4,6,7), and (5,6,7). In the above example program, the selected 2-tuples include: (1,2), (1,3), (1,4), (1,5), (1,6), (2,3), (2,4), (2,5), (2,7), (3,4), (3,6), (3,7), (4,5), (4,6), (4,7), (5,6), (5,7), and (6,7). In the above example program, the selected 1-tuples include: 1, 2, 3, 4, 5, 6, and 7. After the above program runs, the s-tuples and respective frequencies are stored (block 700) in a look up table 705, as shown in FIG. 7. When it is time to allocate storage for a header and data elements, the look up table 705 is accessed (block 710). The data structure for the N=8 portion of the look up table 805, called freqcountlist in the program listing above, is repeated below for discussion purposes:
typedef struct _freqcount {
unsigned short sizemod8;
unsigned short freq;
} freqcount;
typedef struct _freqcountentry {
unsigned short count;
freqcount freqlist[3];
} freqcountentry;
typedef struct _freqcountlist {
unsigned short count;
freqcountentry entry[22];
} freqcountlist;
The "freqcount" data structure contains a data element modulus object D.sub.n (sizemod8), which contains sizemodN (D.sub.n) where N is the largest alignment requirement associated with any data element, and a frequency object F.sub.n (freq), which contains the number of respective data elements D.sub.n required to be included in a valid solution to equation (1). In the example shown above N=8. For example, if a valid solution requires 3 data elements whose length modulo 8 is 1, then freq=3 and sizemod8 =1 for that entry in the table. The "freqcountentry" data structure contains the number of distinct data elements (count) and the type and frequency of each distinct data element (freqlist) to solve equation (1). For example, if a valid solution requires 2 data elements whose length modulo 8 is 1, 1 data element whose length modulo 8 is 2, and 1 data element whose length modulo 8 is 3, a freqcountentry data element for that valid solution would contain the following:
count = 3 indicates "how many" distinct elements
freqlist[0].sizemod8 = 1 indicates sizemod8 of first distinct element
freqlist[0].freq = 2 indicates frequency of first distinct element
freqlist[1].sizemod8 = 2 indicates value of second distinct element
freqlist[1].freq = 1 indicates frequency of second distinct element
freqlist[2].sizemod8 = 3 indicates sizemod8 of third distinct element
freqlist[2].freq = 1 indicates frequency of third distinct element
The "freqcountlist" data structure contains a hole size data object, which is all of the valid solutions for N=8 and a particular hole size B. The freqcountlist data structure has 22 entries because the maximum number of solutions for any hole size is 22, as shown in Table 3 below, which is one of the byproducts of the program code shown above:
TABLE 3
Number of Valid Solutions
Hole Size N = 2 N = 4 N = 8
1 1 3 22
2 3 20
3 3 22
4 17
5 22
6 20
7 22
The sizemod8[8] array, which is an alignment data object, contains all of the solutions for N=8 and all hole sizes B from 1 to 7. A similar data structure for N=4 is shown below:
typedef struct _freqcount {
unsigned short sizemod4;
unsigned short freq;
} freqcount;
typedef struct _freqcountentry {
unsigned short count;
freqcount freqlist[2];
} freqcountentry;
typedef struct _freqcountlist {
unsigned short count;
freqcountentry entry[3];
} freqcountlist;
freqcountlist sizemod4[3];
Note that freqcount has been limited to 2 freqlist elements because the valid solutions for N=4 never include data elements having more than two sizemod4 values. The freqcountentry has been limited to 3 entry elements because there are never more than 3 valid solutions for a particular hole size, as shown in the table above. Finally, the sizemod4 array has been limited to 3 entries because there are only 3 relevant hole sizes, as shown in the table above. The data stored in these data structures, plus a single entry for N=1, form a lookup table which can be used to allocate storage for aligned data. The lookup table is indexed using N and B. A program would allocate storage space to a header and a set of data elements by accessing the lookup table using N and B, which were calculated earlier, as indexes to retrieve a first set of one or more modulo values M.sub.1,1 . . . R and a frequency F.sub.1,1 . . . R for each modulo value (block 800). The program would then search for a subset T={A.sub.i1, A.sub.i2, . . . , A.sub.in } of data elements from those retrieved such that for every p from 1 to R there is a set of F.sub.1,p data elements such that the size of those data elements modulo N is M.sub.1,p (block 805). If such a subset is found, the subset is the valid solution for the allocation problem and the processing is completed (block 810). If, on the other hand, the search for a subset T fails using the first set of one or more modulo values M.sub.1,1 . . . R and the frequency F.sub.1,1 . . . R for each modulo value, the program would access the lookup table again using N and B as indexes to retrieve a second set of one or more modulo values M.sub.2,1 . . . R and a frequency F.sub.2,1 . . . R for each modulo value (blocks 815 and 800). The program would then search for subset T using M.sub.2,1 . . . R and frequency F.sub.2,1 . . . R (block 805). This cycle would repeat until all of the valid solutions for the calculated N and B have been exhausted. If the search for subset T of data elements fails using the calculated N and B, the program continues by setting B=B-1 (block 820). If B=0 (block 825), the program will stop looking for valid solutions (block 830). Otherwise, the search will continue (block 800). This method will ensure that the total storage allocated for the header and the data elements is optimal. The total size allocated using the very simple approach is H+B+FL, where H is the size of the header, B is the size of the hole and FL is the total size of all the data elements. Let FL=Q*N+R, where Q is a positive number and R is equal to FL modulo N. There is at least one subset of data elements whose sum of the SizeModN function will be equal to R. The above algorithm will identify this sub-set and will move the data elements appropriately, if R.ltoreq.B. If R>B, then the algorithm will try to identify another sub-set (if one exists) such that sum of SizeModN function will yield B, B-1, B-2 and so on. All possible valid solutions have been pre-computed and stored in the look up table. So, this method will find an optimal solution. The following is an additional application of the above result: Consider an example where the header, and TYPE1 data elements are followed by TYPE2 data elements. The TYPE2 data elements may have an alignment requirement of 8, 4, 2 or 1. There is an additional constraint that all the TYPE2 data elements must be stored according to decreasing order of alignment requirement. So data elements that need an alignment of 8 will precede data elements that need an alignment of 4, which in turn will precede data elements that need an alignment of 2 and 1. In this case, it is possible to compute the size of a hole C, which will immediately follow the last TYPE1 data element and precede the first TYPE2 data element. Let the maximum alignment of a TYPE2 data element be CA. The length of C will depend on CA, and may be as high as (CA-1). The value of B depends on N, and may be as high as (N-1), prior to the application of the storage allocation method for TYPE1 data elements. Hence, the combined hole size (B+C) will be (N-1)+(CA-1). Since the maximum values of N and CA may be 8, the combined hole size may be as high as 14. After application of the storage allocation method described here for TYPE1 data elements, the combined hole size (B+C) will be limited by Maximum(N-1, CA-1). Thus, the combined hole size will reduce from 14 to 7 in the worst case. The proof for this will be in 3 cases: (a) N=CA (b) N>CA and (c) N<CA Let the total size of TYPE1 data elements be FL=QN+R, Q>0 and R=FL modulo N. Let the start of TYPE1 and TYPE2 data elements be FS and CS respectively. In all cases, the value of (B+C) will be reduced by CA after applying the storage allocation method for storing TYPE1 data elements. Proof for case (a): N=CA CS=FS+FL+C, implies C=(CA-R), since CS is computed by finding the nearest multiple of (FS+FL) with respect to CA, and CS=N. Suppose B+C>N, then, B+(CA-R)>CA (since C=(CA-R) and N=CA) or, B-R>0, or R<B The storage allocation algorithm will find a subset whose sum of SizeModN will be R, which exists since the total length of TYPE1 data elements is FL=QN+R. Since R is less than B, this subset will be moved to the space earlier allocated to the hole B. Hence, the new value of B will be equal to (B-R), and the new value of C will become 0. The new value of (B+C) has reduced by CA. Proof for case b: N>CA There are 3 sub-cases to consider: In each case, it will be observed that the value of B will exceed the value of R=FL modulo N. Hence, the above method will move these data elements and reduce B by R. The value of C will become 0. The value of (B+C) will be reduced by at least CA. (N=4, CA=2): The values of B, C for which the sum will greater than or equal to N are: (B,C)=(3,1): FL will be either 4Q+1 or 4Q+3. (N=8, CA=2): The values of B, C for which the sum will greater than or equal to N are: (B,C)=(7,1). FL will be either 8Q+1, 8Q+3, 8Q+5 or 8Q+7. (N=8, CA=4): The (B,C) values for which B+C will be greater than or equal to N are: (B,C)=(7,1): FL=8l+3, or 8l+7. (B,C)=(6,2): FL=8l+2, or 8l+6. (B,C)=(7,2): FL=8l+2, or 8l+6. (B,C)=(5,3): FL=8l+1, or 8l+5. (B,C)=(6,3): FL=8l+1, or 8l+5. (B,C)=(7,3): FL=8l+1, or 8l+5. Proof for case c: N<CA There are 3 sub-cases to consider: (N=2, CA=8), (N=4, CA=8) and (N=2, CA=4) There are 3 cases where N<CA. For each case, the combinations of (B,C) are shown below for which B+C will be greater than or equal to Maximum(N,CA)=CA. In each case it will be observed that B>=FL mod FA. Hence the above method will be able to move the subset of data elements into the hole B, so that the value of B will reduce to (B-R). Assume k and l are positive integers below. Let FS indicate the start of the TYPE1 data elements, and FL the total size of TYPE1 data elements. N=2, CA=4. (B, C) combinations are (1, 3). (B,C)=(1, 3): Then FS=4k and FL=4l+1, or FS=4k+2 and FL=4l+3 N=2, CA=8. (B, C) combinations are (1, 7). (B,C)=(1, 7): Then FS=8k and FL=8l+1, or FS=8k+2 and FL=8l+7, or FS=8k+4 and FL=8l+5, or FS=8k+6 and FL=8l+3. FA=4, CA=8. (B, C) combinations are (1, 7), (2, 6), (2, 7), (3, 5), (3, 6), (3, 7). (B,C)=(1, 7): Then FS=8k and FL=8l+1, or FS=8k+4 and FL=8l+5. (B,C)=(2, 6): Then FS=8k and FL=8l+2, or FS=8k+4 and FL=8l+6. (B,C)=(2 ,7): Then FS=8k and FL=8l+1, or FS=8k+4 and FL=8l+5. (B,C)=(3, 5): Then FS=8k and FL=8l+3, or FS=8k+4 and FL=8l+7. (B,C)=(3, 6): Then FS=8k and FL=8l+2, or FS=8k+4 and FL=8l+6. (B,C)=(3, 7): Then FS=8k and FL=8l+1, or FS=8k+4 and FL=8l+5. There is another application of the storage allocation method described here. Originally, it was assumed that TYPE1 data elements have a size that is a multiple of their alignment requirement. Consider an application where this assumption is relaxed. For example, a data element may require alignment of 4, but it may require only 10 bytes of storage. In this case, the simple method of storage allocation (where data elements are stored in descending order of alignment requirement) will create multiple holes, in addition to the hole B. Basically, a hole will be created with each data element whose size is not a multiple of the alignment requirement. The method of storage allocation described here can be applied in an iterative manner to fill up the hole B, and also the other holes associated with a data element whose size is not a multiple of its alignment. This method may be applied to the largest holes first and then to the smaller ones. The storage allocation method has been described where the maximum alignment of a data element is 8. It is possible to generalize this method where the maximum alignment of a data element (N) is higher, say 16 or 32. This will require the following steps: Let S.sub.N be the set {1, 2, 3, . . . , (N-2), (N-1)} Determine an upper limit on the maximum frequency for each element in S.sub.N. This can be done using the method described earlier ("Upper Limit on Number of Data Elements"). Determine an upper limit on the number of distinct elements from S.sub.N that need to be considered in any solution set. Let this upper limit be denoted by "n". Compute the set of valid solutions using a program. The methodology for this program will be similar to the one used to compute valid solutions for N=8. The following is a brief description on computing an upper limit on the number of distinct elements for larger N (N>8). Let f(N) denote the upper limit on the number of distinct elements for N. Suppose we want to find if K distinct elements can be present in a subset of S.sub.N. This will involve the computation done below by CheckDistinct(N, K). It will basically check whether K distinct elements are permitted and return true or false accordingly.
CheckDistinct(N, K)
begin
do /* For each subset S.sub.K = { a.sub.1, a.sub.2, . . . a.sub.K }
containing K elements from S.sub.N */
do /* For each non-empty subset S.sub.L of S.sub.K ( 1 <= L <= K
) where SL = {
b.sub.1, . . . b.sub.L } */
Found = false
SumModN = (b1 + b2 + . . . + bL) mod N
if (SumModN == 0)
Found = true
done /* For each non-empty subset S.sub.L of S.sub.K ( 1 <= L <=
K ) *
if ( found == false )
return true /* There is 1 subset with K distinct elements such that
the sum of 2 or more elements does not add up to N or a
multiple of N */
done /* For each subset S.sub.K = { a.sub.1, a.sub.2, . . . a.sub.K }
containing K elements from S.sub.N */
return false; /* Every subset with K distinct elements can be reduced to
one with
fewer distinct elements, since the sum of 2 or more elements adds
to N or a multiple of N */
end /* CheckDistinct */
Thus, f(N)=n, where CheckDistinct(N, n) returns true and CheckDistinct(N, n+1) returns false. Intuitively, and on the basis of the results found for most small values of N, f(N) will satisfy the following relation: f(N)=maxK where K*(K+1)/2-1<=N Example: N=9. Here S.sub.N ={1, 2, 3, 4, 5, 6, 7, 8} It is easy to see that any subset may contain 1 or 8, but not both. Similarly it may contain 2 or 7, but not both, 3 or 6 but not both and 4 or 5 but not both. Hence, the maximum number of distinct elements will not exceed 4. Two sub-sets with 4 distinct elements are {1, 3, 4, 7} and {8, 6, 5, 2}. Here K=4 and it satisfies the condition that K*(K+1)/2-1<=N. This equation for f(N) provides useful information for small values of N (6<=N<=20). It is believed that the equation will also provide useful information for some larger values of N. The various implementations of the invention are realized in electronic hardware, computer software, or combinations of these technologies. Most implementations include one or more computer programs executed by a programmable computer. In general, the computer includes one or more processors, one or more data-storage components (e.g. volatile and nonvolatile memory modules and persistent optical and magnetic storage devices, such as hard and floppy disk drives, CD-ROM drives, and magnetic tape drives), one or more input devices (e.g., mice and keyboards), and one or more output devices (e.g., display consoles and printers). The computer programs include executable code that is usually stored in a persistent storage medium and then copies into memory at run-time. The processor executes the code by retrieving program instructions from memory in a prescribed order. When executing the program code, the computer receives data from the input and/or storage devices, performs operations on the data, and then delivers the resulting data to the output and/or storage devices. The text above described one or more specific embodiments of a broader invention. The invention also is carried out in a variety of alternative embodiments and thus is not limited to those described here. For example, while the invention has been described in relation to a system employing a 64 bit processor, systems using other types of processors would benefit from the invention. Many other embodiments are also within the scope of the following claims.
|
Same subclass Same class Consider this |
||||||||||
