Method for interacting with a test subject with respect to knowledge and functionality6301571
Abstract
The invention is a method for interacting with a test subject with respect to knowledge or functionality characterized by a plurality of states in one or more domains. A domain is a set of facts, a set of values, or a combination of a set of facts and a set of values. The set of facts for a knowledge domain is any set of facts. The set of facts for a functionality domain is a set of facts relating to the functionality of a test subject. A state is denoted as a fact state, a value state, or a combination state, a fact state being characterized by a subset of facts, a value state being characterized by a subset of values, and a combination state being characterized by a combination of a subset of facts and a subset of values. The method consists of specifying one or more domains, specifying a domain pool for each domain comprising a plurality of test item blocks consisting of one or more test items, specifying a class conditional density for each test item in each test item block for each state in each domain, selecting one or more test item blocks from the one or more domain pools to be administered to a test subject, and processing the responses of the test subject to the one or more test item blocks administered to the test subject.
Claims
What is claimed is:
1. A method for interacting with a test subject with respect to knowledge or functionality characterized by a plurality of states in one or more domains, a domain being a set of facts, a set of values, or a combination of a set of facts and a set of values, the set of facts for a knowledge domain being any set of facts, the set of facts for a functionality domain being a set of facts relating to the functionality of a test subject, a state being denoted as a fact state, a value state or a combination state, a fact state being characterized by a subset of facts, a value state being characterized by a subset of values, a combination state being characterized by a combination of a subset of facts and a subset of values, a first state being higher than or equal to a second state and a second state being lower than or equal to a first state if (1) the subset of facts or a subset of values associated with the first state respectively includes the subset of facts or is greater than or equal to the subset of values associated with the second state or (2) the subset of facts and the subset of values associated with the first state respectively includes the subset of facts and is greater than or equal to the subset of values associated with the second state, the method comprising the steps:
(a) specifying one or more domains where each domain comprises a plurality of states and determining the higher-lower-neither relationships for each state in each domain, the higher-lower-neither relationships for a state being a specification of which states are higher, which states are lower, and which states are neither higher or lower, the plurality of states for at least one domain including a first, second, and third fact state characterized by subsets of facts wherein (1) the first and second fact states are higher than the third fact state and the first fact state is neither higher nor lower than the second fact state or (2) the first fact state is higher than the second and third fact states and the second fact state is neither higher nor lower than the third fact state;
(b) specifying a domain pool for each domain comprising a plurality of test item blocks, a test item block consisting of one or more test items, a test item administered to a test subject resulting in one of a plurality of possible responses;
(c) specifying a class conditional density f.sub.ibd (x.vertline.s) for each test item i in test item block b for domain d for each state s in each domain, a class conditional density being a specification of the probability of a test subject in state s of domain d providing a response x to the test item i in the test item block b, each test item partitioning one or more domains into a plurality of partitions according to the class conditional densities associated with the test item, a partition being a subset of states for which the class conditional densities are the same or the union of such subsets;
(d) selecting one or more test item blocks from the one or more domain pools to be administered to a test subject;
(e) processing the responses of the test subject to the one or more test item blocks administered to the test subject, the relationship of the test subject to domains being representable by a state probability set (SPS); and
(z) repeating method from step (d) until method termination criteria are satisfied.
2. The method of claim 1 wherein step (a) comprises the steps:
(a1) determining the intersections of the partitions of states by one or more hypothetical test item blocks with hypothetical partitions.
3. The method of claim 1 wherein step (a) comprises the steps:
(a1) determining the intersections of the partitions of states by the test item blocks in the domain pool; and
(a2) replacing a first domain configuration with a second domain configuration, the second domain configuration states being the intersections of the partitions of the first domain configuration states by the test item blocks, the higher-lower-neither relationships of the second domain configuration states being derived from the higher-lower-neither relationships of the first domain configuration states.
4. The method of claim 3 wherein step (b) further comprises the step:
(b2) adding new types of test item blocks to the test item pool to increase the number of intersections of the partitions.
5. The method of claim 1 wherein in step (a) a state is removed from a domain if the number of test subjects in a specified population satisfying a condition is less than a specified number, the condition being that a test subject's posterior probability for the state is less than a specified threshold.
6. The method of claim 1 wherein step (b) comprises the step:
(b1) determining the intersections of the partitions of states by one or more test item blocks in a domain pool.
7. The method of claim 1 wherein step (b) comprises the steps:
(b1) determining the partition of states by test item block 1 in a domain pool; and
(b2) determining intersections of partition of states by test item block N in a domain pool with the intersections of partitions of states by test item blocks 1 through N--1 in the domain pool, N taking on successive values of 2 through N, N being an integer.
8. The method of claim 1 wherein step (b) comprises the steps:
(b1) determining the sharpness of a test item block from a domain pool, sharpness being a measure of the capability of a test item block to discriminate between test subjects in different states, sharpness being measured by use of one or more discrepancy measures; and
(b2) removing the test item block from the domain pool if its sharpness does not satisfy a predetermined criterion.
9. The method of claim 1 wherein step (b) comprises the step:
(b1) administering hypothetically hypothetical test item blocks with hypothetical partitions and hypothetical class conditional densities.
10. The method of claim 1 wherein step (c) comprises the steps:
(c1) specifying one or more prior parameter distribution functions for each of a collection of test items, the class conditional densities for the test items being determinable from the parameter distribution functions;
(c2) obtaining a sequence of responses to a sequence of test item blocks from the domain pool by each of a plurality of training-sample test subjects;
(c3) updating the SPS of each of one or more of the plurality of training-sample test subjects based on a sequence of responses using an initial SPS and the class conditional densities;
(c4) determining training-sample test subject's tentative classification in at least one domain;
(c5) updating the parameter distribution functions utilizing the one or more training-sample test subjects' tentative classifications to obtain the current parameter distribution functions; and
(c6) repeating steps (c3), (c4), (c5), and (c6) for active parameter distribution functions, an active parameter distribution function being a parameter distribution function for which a repeat termination rule has not been satisfied, random sampling from an SPS being used at least once in determining a training-sample test subject's tentative classification while repeating steps (c3), (c4), (c5), and (c6).
11. The method of claim 1 wherein step (c) comprises the steps:
(c1) identifying test items having questionable class conditional densities, a questionable class conditional density being indicated by a sharpness criterion not being satisfied; and
(c2) changing a class conditional probability density of one or more test items to achieve greater sharpness.
12. The method of claim 1 wherein in step (c) class conditional densities are dependent on test subject-related factors in addition to a test subject's knowledge or functionality.
13. The method of claim 1 wherein step (e) further comprises the step:
(c1) specifying an initial SPS for the test subject with respect to a domain.
14. The method of claim 1 wherein a domain pool includes a multi-item test item block consisting of a plurality of test items.
15. The method of claim 14 wherein the total number of multi-item test item blocks administered for one domain or a combination of two or more domains equals a predetermined number.
16. The method of claim 1 wherein in step (c) the class conditional density for a test item is a function of a difficulty parameter which is a measure of the difficulty that a test subject will have in providing the best response to the test item, the probability of a test subject providing the best response to the test item decreasing as the difficulty parameter varies in the direction of greater difficulty of the test item.
17. The method of claim 1 wherein in step (d) the selection of a test item block is in accordance with a test item block sequence generated in accordance with specified sequence generation rules.
18. The method of claim 1 wherein one or more strategy trees are defined for each of one or more domains, a strategy tree comprising a plurality of paths with each path beginning with the first test item block to be administered, continuing through a sequence alternating between a particular response to the last test item block and the specification of the next test item block, and ending with a particular response to the final test item block in the path, step (d) comprising the steps:
(d1) selecting a strategy tree based on a comparative evaluation of a plurality of the defined strategy trees utilizing one or more item objective functions, an item objective function providing a measure of effectiveness of a test item in classifying a test subject in a domain; and
(d2) selecting the test item block by consulting the strategy tree selected in step (d1).
19. The method of claim 1 wherein one or more strategy trees are defined for each of one or more domains, a strategy tree comprising a plurality of paths with each path beginning with the first test item block to be administered, continuing through a sequence alternating between a particular response to the last test item block and the specification of the next test item block, and ending with a particular response to the final test item block in the path, step (d) comprising the steps:
(d1) selecting in a random manner a strategy tree from a plurality of the defined strategy trees; and
(d2) selecting the test item block by consulting the strategy tree selected in step (d1).
20. The method of claim 1 wherein in step (d) the domain pool from which a test item block is to be selected is chosen from the group consisting of (1) the domain pool associated with the next domain in a specified domain sequence, (2) the domain pool associated with a domain chosen randomly, (3) the domain pool associated with a domain chosen on the basis of one or more uncertainty measures, (4) the domain pool associated with a domain chosen on the basis of one or more ranking measures, (5) the domain pool associated with a domain chosen on the basis of the values of one or more loss functions, (6) the domain pool associated with a domain chosen on the basis of the values of one or more SPS's, and (7) the domain pool associated with a domain chosen by a process dependent on the prior satisfaction of one or more stopping rules.
21. The method of claim 1 wherein in step (d) the selection of a test item block is based on an objective function that is a function of one or more objective functions.
22. The method of claim 1 wherein step (d) comprises the steps:
(d1) selecting the test item block by consulting a strategy tree if a strategy tree is available, a strategy tree comprising a plurality of paths with each path beginning with the first test item block to be administered, continuing through a sequence alternating between a particular response to the last test item block and the specification of the next test item block, and ending with a particular response to the final test item block in the path, the specification of each test item block in a strategy tree being based on a comparative evaluation of specified collections of test item blocks in one or more domain pools; otherwise,
(d2) performing a comparative evaluation of specified collections of test item blocks in one or more domain pools; and
(d3) selecting the test item block based on the results of the comparative evaluation of step (d2).
23. The method of claim 22 wherein in step (d2) multi-item test item blocks are compared, a multi-item test item block consisting of a plurality of test items.
24. The method of claim 22 wherein the specified collection for a domain are those test item blocks that have not yet been selected for administration to the test subject.
25. The method of claim 22 further comprising the step:
(d4) determining for a test item block in a domain pool the weighted frequency and/or the probability of being selected; and
(d5) removing a test item block from the domain pool if the weighted frequency and/or the probability of being selected is less than a predetermined value.
26. The method of claim 22 wherein a truncated strategy tree is obtained by removing one or more test item blocks at the path ends of a specified strategy tree if the weighted loss in administering test items for the truncated strategy tree is less than the weighted loss for the specified strategy tree, the weighted loss for a strategy tree being obtained by weighting a loss function over paths in the strategy tree and test subject states, the loss function being a measure of the loss associated with administering the test items in a path of the strategy tree.
27. The method of claim 26 wherein the loss function is a function of (1) the state of a domain, (2) a classification decision action that specifies a state, and (3) the number of test item blocks administered.
28. The method of claim 26 wherein the loss function consists of two additive components, the first component being a measure of the loss associated with the classification of the test subject after administering one or more additional test item blocks, the loss associated with an incorrect classification being higher than the loss associated with a correct classification, the second component being the cost of administering the one or more additional test item blocks.
29. The method of claim 28 wherein the first component of the loss function is (1) a constant A.sub.1 (s) if the test subject would be classified correctly after administering the one or more additional test item blocks and (2) a constant A.sub.2 (s) if the test subject would be classified incorrectly after administering the one or more additional test item blocks, the constants A.sub.1 (s) and A.sub.2 (s) having a possible dependence on the state s, the second component of the loss function being the sum of the individual costs of administering the one or more additional test item blocks.
30. The method of claim 22 wherein there are a plurality of domains and the comparative evaluation utilizes a domain objective function, the domain objective function being a function of one or more block objective functions, a block objective function being a function of one or more item objective functions, a second function being a function of a first function includes the second function being identical to the first function, an item objective function providing a measure of effectiveness of a test item in classifying a test subject in a domain, a block objective function providing a measure of effectiveness of a test item block in classifying a test subject in a domain, a domain objective function providing a measure of effectiveness of a test item block in classifying a test subject in a plurality of domains.
31. The method of claim 30 wherein at least one of the item objective functions is a weighted loss function given the hypothetical administration of a sequence of k test items, k being an integer, a loss function being a function of (1) a state in the domain, (2) a classification decision action that specifies a state, and (3) the number k of test items to be administered.
32. The method of claim 30 wherein at least one of the item objective functions is a function of a test item difficulty parameter and a state, the difficulty parameter being a measure of the difficulty that a test subject will have in providing the best response to a test item.
33. The method of claim 30 wherein at least one of the item objective functions is a weighted uncertainty measure, an uncertainty measure gauging the uncertainty as to which of the test item's partitions the test subject is in, an uncertainty measure being smallest when all but one of the partition probabilities are near 0, a partition probability being the probability of the test subject being in the partition.
34. The method of claim 30 wherein at least one of the item objective functions is a weighted uncertainty measure, an uncertainty measure being smallest and the test item being best when all but one of the SPS probability density values are near 0.
35. The method of claim 30 wherein at least one of the item objective functions is a weighted uncertainty measure, an uncertainty measure being smallest and the first test item in a sequence of test items being most effective when all but one of the probability density values are near 0 after the hypothetical administration of the sequence of test items.
36. The method of claim 30 wherein at least one of the item objective functions is a weighted distance measure between the SPS after a hypothetical administration of a sequence of test items and the SPS prior to the hypothetical administration of the sequence of test items, the distance measure being a measure of the differences in the two SPSs.
37. The method of claim 30 wherein at least one of the item objective functions is a weighted discrepancy measure summed over pairs of states, a discrepancy measure for a test item given two states being a measure of the distance between the class conditional densities for the test item and the two states.
38. The method of claim 30 wherein at least one of the item objective functions is a two-valued function .PHI., the function .PHI. being a function of (1) a test item and (2) a first state and a second state, .PHI. having a first value if the test item separates the first and second states, .PHI. having a second value if the test item does not separate the first and second states.
39. The method of claim 38 wherein .PHI. has a first value for a plurality of the test items for a specified first state and a specified second state, the test item being selected in a random manner from the plurality of test items.
40. The method of claim 30 wherein at least one of the item objective functions is the sum of .pi.(j).pi.(k)d.sub.jk (i) over all states j and k in the domains for which an SPS is specified, .pi.(j) denoting the members of the SPS, d.sub.jk (i) denoting a measure of the degree of discrimination between states j and k provided by test item i as measured by a discrepancy measure on the corresponding class conditional densities.
41. The method of claim 30 wherein at least one of the item objective functions is a weighted loss function for k=1, a loss function being a function of (1) a state in a domain, (2) a classification decision action that specifies a state, and (3) the number k of test items to be administered.
42. The method of claim 30 wherein at least one of the item objective functions is a loss function consisting of two additive components, the first component being a measure of the loss associated with the classification of the test subject after administering k test items, the loss associated with an incorrect classification being higher than the loss associated with a correct classification, the second component being the cost of administering the k test items.
43. The method of claim 42 wherein the first component of the loss function is (1) a constant A.sub.1 (s) if the test subject would be classified correctly after administering k additional test items and (2) a constant A.sub.2 (s) if the test subject would be classified incorrectly after administering k additional test items, the constants A.sub.1 (s) and A.sub.2 (s) having a possible dependence on the state s, the second component of the loss function being the sum of the individual costs of administering the k additional test items.
44. The method of claim 30 wherein at least one of the item objective functions is based on the Fisher information function.
45. The method of claim 30 wherein at least one of the item objective functions is a precision function.
46. The method of claim 30 wherein the domain objective function changes when one or more domain-objective-function criteria are satisfied.
47. The method of claim 46 wherein at least one of the domain-objective-function criteria is based on an uncertainty measure.
48. The method of claim 46 wherein at least one of the domain-objective-function criteria is based on one or more stopping rules.
49. The method of claim 22 wherein there is only one domain and the comparative evaluation utilizes a block objective function, the block objective function being a function of one or more item objective functions, a second function being a function of a first function includes the second function being identical to the first function, an item objective function providing a measure of effectiveness of a test item in classifying a test subject in a domain, a block objective function providing a measure of effectiveness of a test item block in classifying a test subject in a domain.
50. The method of claim 49 wherein at least one of the item objective functions is a weighted loss function given the hypothetical administration of a sequence of k test items, k being an integer, a loss function being a function of (1) a state in the domain, (2) a classification decision action that specifies a state, and (3) the number k of test items to be administered.
51. The method of claim 49 wherein at least one of the item objective functions is a function of a test item difficulty parameter and a state, the difficulty parameter being a measure of the difficulty that a test subject will have in providing the best response to a test item.
52. The method of claim 49 wherein at least one of the item objective functions is a weighted uncertainty measure, an uncertainty measure being a measure of the uncertainty as to which of the test item's partitions that the test subject is in after the administration of a test item, an uncertainty measure being smallest and the test item being most effective when all but one of the partition probabilities are near 0, a partition probability being the probability of the test subject being in the partition.
53. The method of claim 49 wherein at least one of the item objective functions is a weighted uncertainty measure, an uncertainty measure being smallest and the test item being best when all but one of the SPS probability density values are near 0 after the hypothetical administration of the test item.
54. The method of claim 49 wherein at least one of the item objective functions is a weighted uncertainty measure, an uncertainty measure being smallest and the first test item in a sequence of test items being most effective when all but one of the SPS probability density values are near 0 after the hypothetical administration of the sequence of test items.
55. The method of claim 49 wherein at least one of the item objective functions is a weighted distance measure between the SPS after a hypothetical administration of a test item and the SPS prior to the hypothetical administration of the test item, the distance measure being a measure of the differences in the two SPSs.
56. The method of claim 49 wherein at least one of the item objective functions is a weighted distance measure between the SPS after a hypothetical administration of a sequence of test items and the SPS prior to the hypothetical administration of the sequence of test items, the distance measure being a measure of the differences in the two SPSs.
57. The method of claim 49 wherein at least one of the item objective functions is a weighted discrepancy measure summed over pairs of states, a discrepancy measure for a test item given two states being a measure of the distance between the class conditional densities for the test item and the two states.
58. The method of claim 49 wherein at least one of the item objective functions is a two-valued function .PHI., the function .PHI. being a function of (1) a test item and (2) a first state and a second state, .PHI. having a first value if the test item separates the first and second states, .PHI. having a second value if the test item does not separate the first and second states.
59. The method of claim 58 wherein .PHI. has a first value for a plurality of the test items for a specified first state and a specified second state, the test item being selected in a random manner from the plurality of test items.
60. The method of claim 49 wherein at least one of the item objective functions is the sum of .pi.(j).pi.(k)d.sub.jk (i) over all states j and k in the domains for which an SPS is specified, .pi.(j) denoting the members of the SPS, d.sub.jk (i) denoting a measure of the degree of discrimination between states j and k provided by test item i as measured by a discrepancy measure on the corresponding class conditional densities.
61. The method of claim 49 wherein at least one of the item objective functions is a weighted loss function for k=1, a loss function being a function of (1) a state in a domain, (2) a classification decision action that specifies a state, and (3) the number k of test items to be administered.
62. The method of claim 49 wherein at least one of the item objective functions is a loss function consisting of two additive components, the first component being a measure of the loss associated with the classification of the test subject after administering k test items, the loss associated with an incorrect classification being higher than the loss associated with a correct classification, the second component being the cost of administering the k test items.
63. The method of claim 62 wherein the first component of the loss function is (1) a constant A.sub.1 (s) if the test subject would be classified correctly after administering k additional test items and (2) a constant A.sub.2 (s) if the test subject would be classified incorrectly after administering k additional test items, the constants A.sub.1 (s) and A.sub.2 (s) having a possible dependence on the state s, the second component of the loss function being the sum of the individual costs of administering the k additional test items.
64. The method of claim 49 wherein at least one of the item objective functions is based on the Fisher information function.
65. The method of claim 49 wherein at least one of the item objective functions is a precision function.
66. The method of claim 1 wherein in step (d) a test item block is tentatively selected using a predetermined selection rule, a random decision being made either to confirm the selection of the tentatively-selected test item block or to select another test item block.
67. The method of claim 66 wherein the test item blocks are ordered according to an effectiveness criterion associated with the predetermined selection rule, the tentatively-selected test item block being the most effective test item block, a plurality of the next-in-order test item blocks being denoted as the better test item blocks, one of the better test item blocks being selected for administration if the decision is made to select a test item block other than the tentatively-selected test item block.
68. The method of claim 67 wherein the selection of one of the better test item blocks is randomly made, the random selection being biased in accordance with the order of the better test item blocks.
69. The method of claim 1 wherein in step (d) each of a plurality of test item block selection rules produces a candidate test item block, the test item block selected for administration being a random selection from the plurality of candidate test item blocks.
70. The method of claim 1 wherein in step (d) the selected test item block is the test item block that maximizes a weighted relative ranking measure based on a plurality of test item block selection rules, a weighted relative ranking measure being a weighted function of the relative rankings of effectiveness for each test item block with respect to a plurality of item selection rules.
71. The method of claim 1 wherein step (d) comprises the steps:
(d1) selecting a test item block on the basis of specified rules of selection; and
(d2) rejecting the test item block with a probability based on an estimate of the exposure rate of the test item block, the exposure rate being a function of one or more state-specific exposure rates, a rejection of a test item block being followed by repeating steps (d1) and (d2); otherwise, confirming the selection of the test item block for administration to a test subject.
72. The method of claim 1 wherein in step (d) a plurality of test-item-block sequences are generated, the test item blocks being selected from one of the plurality of test-item-block sequences based on a test-item-block sequence selection rule, the test-item-block sequence selection rule being based on a comparative evaluation of the test-item-block sequences based on one or more item objective functions.
73. The method of claim 1 wherein in step (d) the selection is made from one or more active domain pools, an active domain pool being associated with a domain for which one or more domain stopping rules have not been satisfied.
74. The method of claim 73 wherein a domain stopping rule is based on the SPS associated with one of a plurality of domains.
75. The method of claim 74 wherein the selection of the SPS is based on an uncertainty measure.
76. The method of claim 73 wherein at least one of the domain stopping rules is one of the group consisting of (1) that the marginal posterior value for a state in a domain is greater than a specified value, (2) that the posterior variance of an SPS is less than a specified value, (3) that a weighted uncertainty measure with respect to an SPS is less than a specified value, (4) that a weighted distance measure between an initial SPS and an SPS after administration of k test item blocks, k being an integer equal to or greater than one, exceeds a specified value, (5) that a weighted loss function is less than a specified value, (6) that the largest value of an SPS exceeds a specified value, (7) that responses to a predetermined number of test item blocks have been processed, (8) that responses to a predetermined number of test item blocks from a domain pool have been processed, (9) that given the hypothetical selection and administration of one or more sequences of k test item blocks, a weighted loss function is greater than a specified value, k being an integer equal to or greater than one, k for each sequence being the same or different from the k for any other sequence, (10) that given the hypothetical selection and administration of one or more sequences of k test item blocks, a weighted uncertainty measure decreases by less than a specified value, the specified value being expressed either in absolute terms or relative to the value of the weighted uncertainty measure prior to the hypothetical selection and administration of the one or more sequences of k test item blocks, k being an integer equal to or greater than one, k for each sequence being the same or different from the k for any other sequence, (11) that given the hypothetical selection and administration of one or more sequences of k test item blocks, a weighted distance measure increases by less than a specified value, the specified value being expressed either in absolute terms or relative to the value of the weighted distance measure prior to the hypothetical selection and administration of the one or more sequences of k test item blocks, k being an integer equal to or greater than one, k for each sequence being the same or different from the k for any other sequence, and (12) that the variance of an estimate of a value for the test subject is less than a specified value.
77. The method of claim 1 wherein a test subject's relationship to a domain is represented by an SPS, the SPS being updated during each execution of step (e).
78. The method of claim 1 further comprising the steps:
(h) repeating steps (d), (e), and (z) for a plurality of test subjects;
(i) deleting superfluous states from the one or more domains; and
(j) adding missing states to the one or more domains.
79. The method of claim 1 further comprising the steps:
(h) repeating steps (d), (e), and (z) for a plurality of test subjects;
(i) determining the weighted frequency of administration for a test item block in a domain pool associated with a domain; and
(j) deleting the test item block from the domain pool if the weighted frequency of administration is less than a predetermined value.
80. The method of claim 1 further comprising the steps:
(h) repeating steps (d), (e), and (z) for a plurality of test subjects;
(i) determining the ideal response pattern for a test subject classified in each of one or more domain states for each administered sequence of test item blocks using the class conditional densities associated with each test item in each test item block, an ideal response pattern being a value or a set of values; and
(j) deleting a state from a domain if its corresponding ideal response pattern does not satisfy a specified criterion with respect to a specified number of test subject patterns, the specified criterion being expressed in terms of one or more distance measures, a distance measure being a measure of the differences between a test subject response pattern and an ideal response pattern.
81. The method of claim 1 further comprising the steps:
(h) repeating steps (d), (e), and (z) for a plurality of test subjects;
(i) determining the ideal response pattern for a test subject classified in each of one or more domain states for each administered sequence of test item blocks using the class conditional densities associated with each test item in each test item block, an ideal response pattern being a value or a set of values; and
(j) adding a state to a domain if a specified number of ideal response patterns do not satisfy a specified criterion with respect to one or more test subject response patterns, the specified criterion being expressed in terms of one or more distance measures, a distance measure being a measure of the differences between a test subject response pattern and an ideal response pattern.
82. The method of claim 1 further comprising the step:
(f) classifying the test subject in one or more domains in accordance with one or more decision rules if one or more stopping rules are satisfied, step (f) being performed after step (e) and before step (z).
83. The method of claim 82 wherein in step (f) a test subject is classified to a combination or value state, step (f) including the step:
(f1) transforming the one or more values associated with the state into one or more other values.
84. The method of claim 82 wherein a decision rule in step (f) is to classify to a state selected from the group consisting of (1) the state associated with the highest value in the SPS, (2) the state associated with the smallest value for a weighted loss function, (3) the state that has the greatest likelihood of being the true state of the test subject, (4) the state of a second domain that is equivalent to the state in which the test subject has been classified in a first domain, and (5) a state of a second domain based on a function of an SPS of a first domain.
85. The method of claim 82 wherein in step (f) a score value is based on a function of values corresponding to observed responses to test item blocks.
86. The method of claim 82 wherein a decision rule for a domain in step (f) is a function of the SPS corresponding to the domain.
87. The method of claim 82 wherein a state in a second domain is equivalent to a state in a first domain, the state in the second domain being expressed as a function of an ideal response pattern associated with the state in the first domain.
88. The method of claim 82 wherein classification to a state in a second domain is based on functions of ideal response patterns associated with one or more states in a first domain.
89. The method of claim 82 wherein an attribute is a subset of facts from one or more domains, the probability that a test subject possesses an attribute being called an attribute probability, an attribute probability being determined from one or more SPS's.
90. The method of claim 89 wherein a determination as to whether or not an attribute is possessed by the test subject is based on the attribute probability.
91. The method of claim 89 wherein in step (d) the domain pool from which a test item block is to be selected is the domain pool associated with a domain chosen on the basis of one or more attribute probabilities.
92. The method of claim 82 further comprising the step:
(h) remediating the test subject, step (h) being performed after step (f) and before or after step (z).
93. The method of claim 92 being repeated one or more times.
94. The method of claim 93 wherein the test subject's progress in remediation is expressed in terms of a change in classification.
95. The method of claim 92 wherein in step (h) a remediation program for a test subject classified to state X is a compilation of facts associated with one or more other states in the domain and a procedure for teaching the facts in the compilation to a test subject, the compilation not including facts associated with state X.
96. The method of claim 92 wherein in step (h) a criterion for selecting among domains on which to base remediation is that a dominant posterior probability value in a domain SPS exceeds a certain threshold level.
97. The method of claim 92 wherein in step (h) the specification of a remediation program for a state depends on an associated SPS.
98. The method of claim 92 wherein an attribute is a subset of facts from one or more domains, the probability that a test subject possesses an attribute being called an attribute probability, an attribute probability being calculated from one or more SPS's, the specification of a remediation program in step (h) being based on one or more attribute probabilities of a test subject.
99. The method of claim 92 wherein step (h) comprises the steps:
(ha) compiling a collection of one or more topics, a topic being a set of facts, a set of values, or a combination of a set of facts and a set of values that characterize knowledge and/or functionality, the set of facts that characterize knowledge being any set of facts, the set of facts that characterize functionality being a set of facts relating to the functionality of a test subject;
(hb) compiling a collection of one or more treatments for each topic, a treatment comprising materials intended to teach a test subject;
(hc) specifying a plurality of question blocks for each of the one or more treatments of step (hb), a question block consisting of one or more questions, a response distribution being assigned to at least one of the questions in at least one of the question blocks;
(hd) selecting one or more topics from those in the collection of step (ha) for remediation;
(he) selecting one or more treatments from those specified in step (hb) for the topics selected in step (hd);
(hf) obtaining responses to one or more question blocks associated with the treatments selected in step (he) from a test subject after exposure to the one or more treatments or step (he); and
(hg) obtaining a measure of the effectiveness of the treatments of step (he) utilizing one or more of the response distributions assigned in step (hc).
100. The method of claim 99 wherein in step (hd) a topic is selected based on an SPS.
101. The method of claim 99 wherein the treatments specified in step (hb) can be classified as to treatment type, step (he) comprising the steps:
(he1) selecting one or more treatment types from a treatment-type pool for a topic selected if step (hd), the number of treatment types in the treatment-type pool being limited to one if a treatment-type selection process (TSP) stopping rule is satisfied; and
(he2) selecting one or more treatments from each treatment type selected in step (he1).
102. The method of claim 101 wherein in step (he1) the selection process is based on a weighted improvement measure, an improvement measure being a measure of the difference between a first and second knowledge representation associated respectively with a first and second state in a domain.
103. The method of claim 101 wherein in step (hg) the value of a treatment parameter is a measure of effectiveness, a probability distribution being associated with the treatment parameter, the selection process of step (he1) utilizing the probability distributions associated with one or more treatment parameters.
104. The method of claim 103 wherein the probability distribution associated with a treatment parameter is a function of the test subject's SPS.
105. The method of claim 101 wherein step (hg) includes the step:
(hg1) estimating the value of a treatment parameter associated with a treatment type utilizing one or more responses to question blocks, a treatment parameter being a measure of effectiveness.
106. The method of claim 101 wherein in step (he1) the selection of a treatment type is based on one of a group consisting of (1) a weighted response function, (2) a weighted reward function, and (3) a weighted treatment loss function.
107. The method of claim 101 wherein in step (he1) selection of treatment type is based on one or more response distributions for questions, the response distributions being functions of one or more treatments or a treatment type.
108. The method of claim 101 wherein in step (he1) selection of treatment type is based on a weighted objective function, the weighting being done with respect to one or more response distributions for questions, a response distribution being a function of one or more treatments or a treatment type.
109. The method of claim 101 wherein in step (e1) a treatment type is selected by a process selected from the group consisting of (1) a random process and (2) a process selected randomly from plurality of treatment selection.
110. The method of claim 101 wherein step (e1) includes the steps:
(el-1) creating a plurality of remediation strategies, a remediation strategy being representable by one or more remediation strategy trees;
(el-2) selecting a best remediation strategy based on a comparative evaluation of the remediation strategies utilizing one or more objective functions; and
(el-3) selecting a treatment type from the best remediation strategy.
111. The method of claim 101 wherein in step (he1) selection of a treatment type is based on an SPS.
112. The method of claim 101 wherein in step (he1) selection of a treatment type is based on one or more item objective functions.
113. The method of claim of claim 99 further comprising the steps:
(h) repeating method from step (he) for one or more active topics, an active topic being a topic for which one or more treatment stopping rules have not been satisfied, a treatment stopping rule being one of the group consisting of (1) based on a function of one or more responses to question blocks, (2) that one of one or more predetermined sets of responses to question blocks have been obtained, (3) that a predetermined number of responses to question blocks have been obtained, (4) that a weighted treatment loss function value exceeds a predetermined value after hypothetical or actual administration of one or more treatments, (5) that a weighted treatment loss function value exceeds a predetermined value after hypothetical or actual administration of one or more questions, (6) that weighted treatment loss function value exceeds a predetermined value after hypothetic or actual administration of one or more topics, (7) the combination of one or more treatment stopping rules, (8) based on one or more responses, (9) based on one or more response function values, (10) that a predetermined number of treatment types have been administered, and (11) that a predetermined number of treatments have been administered; otherwise:
(i) repeating method from step (d) unless a method termination rule is satisfied.
114. The method of claim 113 wherein a treatment loss function incorporates one or more of the group consisting of (1) a cost of administered treatment types, (2) a cost of administered treatments (3) a cost of administered questions, (4) a cost of administered topics, (5) response function values, and (6) a function of a state in a domain.
115. The method of claim 99 wherein there is a one-to-one correspondence between a plurality of test items and a plurality of questions, a response distribution for a test item being the same as a response distribution for a corresponding question.
116. The method of claim 99 further comprising the steps:
(h) repeating method from step (e) for one or more active topics, an active topic being a topic for which one or more treatment stopping rules have not been satisfied; otherwise:
(i) repeating method from step (d) unless a method termination rule is satisfied.
117. The method of claim 1 wherein in step (e) an SPS is assigned to a first domain based on the SPS obtained for a second domain.
Description
BACKGROUND OF THE INVENTION
This invention relates generally to methods and systems for testing humans and systems and the subsequent classification of humans into knowledge states and systems (including human systems) into functionality states. More specifically, the invention relates to computer-implemented testing and classification systems.
The process of testing and classification requires meaningful and accurate representations of the subject domains in terms of domain states. A domain state that a test subject is in is determined by sequentially administering to the test subject test items involving different aspects of the subject domain. The responses of the test subject to the test items determines the state of the test subject in the subject domain.
The implementation of such a testing and classification process by means of a computer has the potential of providing an efficient and effective means for identifying the remedial actions required to bring the test subject to a higher level of knowledge or functionality.
The partially ordered set ("poset") is a natural model for the cognitive and functionality domains. Two states i and j in a poset model S may be related to each other in the following manner. If a test subject in state i can respond positively to all the test items to which a test subject in state j can, but a test subject in state j may not be able to respond positively to all the test items to which a test subject in state i can, we say that i contains j and denote this by the expression i.gtoreq.j. Note that a positive response on any item should provide at least as much evidence for the test subject being in state i as in state j. Thus, the domain states are partially ordered by the binary "i contains j" relation. Note that the cognitive level or the functionality level of a test subject in state i is equal to or higher than that of a test subject in state j. Similarly, the cognitive level or the functionality level of a test subject in state j is equal to or lower than that of a test subject in state i. Accordingly, state i is said to be equal to or higher than state j and state j is said to be equal to or lower than state i.
Poset models in an educational context have been proposed before. However, they have either been Boolean lattices or posets closed under union in the sense that the union of any two members of the poset is also in the poset. This restriction is undesirable in that it leads to models that can be quite large. For example, allowing the number of test items to define the model can lead to models with as many as 2.sup.N possible states where N is equal to the number of test items. With this approach the responses to the test items permits immediate classification with very little analysis. However, having such overly large models ultimately results in poor classification performance.
When sequential item selection rules have been used in classifying states in a poset, the approach has not been accomplished in a decision-theoretic context. Consequently, there was no assurance that the classification process would converge rapidly nor, in fact, that it would converge at all.
There is a need for a testing and classification system which is based on sound scientific and mathematical principles and which, as a result, can accurately and efficiently determine the domain states of humans and systems. It is reasonable to base such a system on poset models, but it should be possible to use general, even non-finite posets rather than the specialized posets that are typical of present-day systems. It is important that model selection and fitting for any particular domain be based on appropriate analysis rather than simply a result of the choice of test items. Similarly, the selection of test items should be based on appropriate analysis with reference to the domain model rather than being a more-or-less ad hoc process that ultimately gives birth to its own domain model.
BRIEF SUMMARY OF THE INVENTION
The invention is a method for interacting with a test subject with respect to knowledge or functionality characterized by a plurality of states in one or more domains. A domain is a set of facts, a set of values, or a combination of a set of facts and a set of values. The set of facts for a knowledge domain is any set of facts. The set of facts for a functionality domain is a set of facts relating to the functionality of a test subject. A state is denoted as a fact state, a value state, or a combination state, a fact state being characterized by a subset of facts, a value state being characterized by a subset of values, and a combination state being characterized by a combination of a subset of facts and a subset of values.
A first state is higher than or equal to a second state and a second state is lower than or equal to a first state if (1) the subset of facts or a subset of values associated with the first state respectively includes the subset of facts or is greater than or equal to the subset of values associated with the second state or (2) the subset of facts and the subset of values associated with the first state respectively includes the subset of facts and is greater than or equal to the subset of values associated with the second state.
The method comprises steps (a),(b),(c),(d),(e), and (z). Step (a) consists of specifying one or more domains where each domain comprises a plurality of states and determining the higher-lower-neither relationships for each state in each domain, the higher-lower-neither relationships for a state being a specification of which states are higher, which states are lower, and which states are neither higher or lower.
Step (b) consists of specifying a domain pool for each domain comprising a plurality of test item blocks. A test item block consists of one or more test items where a test item administered to a test subject results in one of a plurality of possible responses.
Step (c) consists of specifying a class conditional density f.sub.ibd (x.vertline.s) for each test item i in test item block b for domain d for each state s in each domain. A class conditional density is a specification of the probability of a test subject in state s of domain d providing a response x to the test item i in the test item block b. Each test item partitions one or more domains into a plurality of partitions according to the class conditional densities associated with the test item. A partition is a subset of states for which the class conditional densities are the same or the union of such subsets.
Step (d) consists of selecting one or more test item blocks from the one or more domain pools to be administered to a test subject, and step (e) consists of processing the responses of the test subject to the one or more test item blocks administered to the test subject. The relationship of the test subject to domains is representable by a state probability set (SPS).
Step (z) consists of repeating the method from step (d) until method termination criteria are satisfied.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 depicts an example of a simple poset model as a Hasse diagram.
FIG. 2 shows a flow diagram for the process executed by a computer in classifying a test subject and providing remediation in the case of a human test subject or remediation guidance in the case of a system test subject.
FIG. 3 shows the flow diagram associated with one embodiment of the classification step shown in FIG. 2.
FIG. 4 shows a portion of a strategy tree embodiment of the classification step shown in FIG. 2.
FIG. 5 shows the relationship between the loss function and a strategy tree.
FIG. 6 depicts a poset model as a Hasse diagram.
FIG. 7 shows the image of the mapping of a test item pool on the poset model of FIG. 6.
FIG. 8 depicts a more complicated poset model as a Hasse diagram.
FIG. 9 depicts the poset model of FIG. 8 with a missing state.
FIG. 10 depicts the poset model of FIG. 8 with many states missing.
FIG. 11 shows an example of the sequence of steps performed in executing the remediation process.
DETAILED DESCRIPTION OF THE INVENTION
The first objective of the present invention is to provide meaningful and accurate representations of a cognitive domain in the case of humans and a functionality domain in the case of systems (where "systems" includes humans considered as systems) through the use of partially ordered sets (posets). The second objective of the present invention is to provide a method for efficiently and accurately testing and classifying humans into cognitive domain states and systems into functionality domain states. The third objective is to provide a remediation program keyed to a domain state and designed to bring a human or a system to a higher domain state.
The classification process consists of administering a sequence of response-generating test items to the test subject. The responses to the test items provides the means for classifying the test subjects into particular domain states. In the case of humans, the test items might be questions or problems relating to particular subject matter such as arithmetic, chemistry, or language. In the case of a system, a test item might consist of (1) causing a system to be in a particular operating state and (2) causing certain inputs to be entered into the system, the response to be observed being the operating state of the system after the specified inputs have been entered into the system. The operating state and the functionality state of a system are two different concepts. For example, if the system is an aircraft, the operating state could be defined as its position and attitude and the time rate of change of its position and attitude. Its functionality state is a measure of the degree to which each of the functions that must be performed within the aircraft system to maintain it in an operating state are being performed.
The classification capability of the system provides the means for prescribing remedial programs aimed at propelling humans and systems into higher and higher states in their respective cognitive and functionality domains.
The poset models which provide the basis for the present invention may be either finite or non-finite. In many educational applications, it can be assumed that the poset is finite with top and bottom states denoted by 1 and 0 respectively, 1 denoting complete knowledge or functionality in a particular domain, and 0 denoting essentially no knowledge or functionality. However, this invention applies just as generally to poset models without a top and/or bottom state. For much of the material to follow, it will be assumed that the underlying model is a discrete poset. Later, a non-discrete poset model will be described.
A formal definition of a partially ordered set and a partial order is as follows. Let P be a set with a binary relation .ltoreq.. P is said to be a partially ordered set, and the binary relation .ltoreq. a partial order if for all elements i, j, and k in P the following conditions hold: is i.ltoreq.i, i.ltoreq.j and j.ltoreq.i implies i=j, and i.ltoreq.j, j.ltoreq.k implies i.ltoreq.k. an example of a poset is in FIG. 1, depicted as a Hasse diagram. Note A.ltoreq.1, B.ltoreq.1, 0.ltoreq.A, 0.ltoreq.1, etc. If i.ltoreq.j and j.ltoreq.i, i and j are said to be incomparable, and the relation "strict inequality" is said to occur on P when i.ltoreq.j and i.noteq.j.
Associated with each test item and each domain state of a test subject is a class conditional density f.sub.i (x.vertline.s) which is the probability of a state-s test subject providing the response x to a test item i. For simplicity, test items are assumed to be conditionally independent in the sense that responses to previously administered items do not affect the response to a presently-administered item. It should be noted, however, that this assumption can be relaxed, and all the techniques to be described below can be applied to the case where the test items are not conditionally independent. Moreover, response x can be viewed as multi-dimensional without loss of generality.
For a properly structured poset model S and properly designed test items, a test item will partition S into subsets according to the class conditional densities associated with the test item. It will be assumed for now that a test item partition consists of the subsets of states which share the same class conditional density. In practice, an expert must initially specify the partitions according to how he believes the response distributions should be structured and possibly shared among states. Specification and/or estimation of the class conditional densities can then be conducted. (The estimation process will be described below.) Modification of the partitions is possible after feedback from data analysis, as will also be described below.
One of the subsets may be a principal dual order ideal (PDOI) generated by a state in S. The PDOI generated by a state s is the set of states {j in S: s.ltoreq.j}, where .ltoreq. denotes the partial order relation. A common partition for an item will have two elements, one being a PDOI, generated say by s in S. Under such circumstances, the test item is said to be of type s or associated with s. A test subject in the PDOI of state s is more likely to provide a response to a test item of type s that is consistent with the knowledge or functionality of state s than a test subject who is not.
More generally, reference to an item's type refers to its partition. Note that one of the partitions could be the union of PDOIs.
The system works best when the response distributions reflect the underlying order structure on the model S. This can be achieved for instance by imposing order constraints on item parameters associated with corresponding item partitions. For example, with Bernoulli response distributions, it may be natural to assume f.sub.i (X=1.vertline.s.sub.1).ltoreq.f.sub.i (X=1.vertline.s.sub.2) for item i if s.sub.1.ltoreq.s.sub.2 in S where X=1 implies a positive outcome where we use the term "positive" in the sense that the outcome is more consistent with the knowledge or functionality of state s.sub.2 than that of state s.sub.1. The system works most efficiently in applications when such order constraints on the class conditional response distributions are natural for the underlying poset model.
Clearly, each state may have its own response distribution for an item. However, in practice, this may present a difficult estimation problem such as having too many parameters. Hence, using the minimum number of natural partitions for an item is desirable for simplifying the density estimation process.
In the educational application with Bernoulli responses, a natural possible two-element partition for an item is the subset of states in S wherein test subjects have the knowledge or functionality to provide a positive response to the item and the complement of the subset. It is natural to assume that the probability of a positive response by test subjects in this subset of states to be greater than that for test subjects in the complement. Further, specifying one of the subsets as a union of PDOIs can reflect that there exists multiple strategies to obtain a positive response.
The partially ordered set structure of domain states permits great efficiency in classification of test subjects. For example, suppose the set S consists of the states 0, A, B, C, AB, AC, BC, ABC (=1) where the cognitive or functionality domain is divided into three areas A, B, and C. The symbol 0 denotes no knowledge or functionality. The symbols AB, AC, and BC denote knowledge or functionality equivalent to the unions of individual area pairs. And the symbol ABC denotes knowledge or functionality equivalent to the union of all of the individual areas. Assume the item distributions to be Bernoulli, with the probability of positive responses given that the test subject has the knowledge or functionality to provide a positive response to be 1, and the probability of a positive response given that he does not to be 0. Administering a test item of type A partitions the set S into the PDOI of A (i.e. A, AB, AC, ABC) and the complement (i.e. 0, B, C, BC). If the test subject gives a positive response, a test item of type B partitions the PDOI of A into the PDOI of B (i.e. B, AB, BC, ABC) and the complement (i.e. 0, A, C, AC). If the test subject again gives a positive response, we have narrowed down the possible states for the test subject as being the intersection of the PDOI of A and the PDOI of B or the set consisting of AB and ABC. If the test subject now gives a negative response (i.e. not a positive response) to a test item of type ABC, we have determined that the test subject should be classified in state AB. Thus, by administering only three test items, we have managed to uniquely classify a test subject into one of 8 possible states. In general, the classification process becomes more complex as the response distributions become more complex in the sense that there may exist a variety of possible responses, not all of which are statistically consistent with the true state identity.
The basis for the computer-implemented testing and classification process is a poset model S and a test item pool I. The statistical framework used to classify test subject responses is decision-theoretic. This entails selection of a loss function to gauge classification performance. In general, a loss function should incorporate a cost of misclassification and a cost of observation. For a given test subject, an initial state probability set (SPS at stage 0) is assigned as well, and denoted as .pi..sub.0. The SPS at stage 0 consists of prior probabilities concerning the test subject's state membership in S, and there exists in the set for each state s in S a prior probability value .pi..sub.0 (s). The decision-theoretic objective in classification is to minimize an integrated risk function.
There are three main issues in classification: item selection, deciding when to stop the item administration process, and making a decision on classification once stopping is invoked. We define a strategy .delta.to be the incorporation of an item selection rule, stopping rule, and decision rule. What is desired is to find strategies that minimize the integrated risk function R(.pi..sub.0.delta.) which will be defined later. For a description of the framework when S is finite, see J. Berger, Statistical Decision Theory and Bayesian Analysis, Second Edition, Springer-Verlag, New York, 1985, p. 357.
As mentioned earlier, loss functions should incorporate a cost of misclassification and a cost of observation. Whether a decision rule misclassifies depends on which state is true. Hence, the system assumes that loss functions depend on the true state s in S, a decision rule d(x.sub.n) which is a function of the response path x.sub.n of length n, and n, the number of observations. Note that d(x.sub.n) can be viewed as a function of the final SPS through x.sub.n and the initial SPS, .pi..sub.0. A loss function may be denoted by L(s,d,n) where d is the action that the decision rule d(x.sub.n) takes. Being a function of n includes the case where test items have their own cost of observation.
In order for a loss function to encourage reasonable classification, it will further be assumed that for fixed s in S and fixed number of observations n, when the decision rule takes an action that results in a misclassification, the value of a loss function will be greater than or equal to the value if the classification decision was correct. Similarly, for fixed s in S and fixed classification decision, the value of a loss function will be non-decreasing in n, the number of observations.
Besides serving as objective functions to measure the performance of the classification process, such loss functions are used in stopping rules, generating decision rules, and item selection.
Given a loss function and initial SPS, it is desired to find a strategy a which minimizes ##EQU1##
where f(x.sub.N.vertline.s) is the response distribution for possible response path x.sub.N of length N for a given state s in S, N is random and dependent on the response path, item sequence is selected by .delta. and the stopping rule is given by .delta., and the classification decision rule d(x.sub.N) is given by .delta.. This quantity is known as the integrated risk of .delta. given the initial SPS. It is the criterion on which to base the performance of strategies in the classification process. If the possible responses are continuous, then one would integrate as opposed to sum over all possible response paths. When N=0, equation (1) gives the average loss with respect to the initial SPS.
A linear version of a loss function is given by the equation ##EQU2##
where L(s,d,N) is the loss function associated with the action of classifying in state d a test subject whose true state is s after administering test items i.sub.1, i.sub.2, . . . , i.sub.N. The constants A.sub.1 (s) and A.sub.2 (s) are the losses associated with correct and incorrect classifications respectively for state s being true. Assume A.sub.1 (s).ltoreq.A.sub.2 (s). This relation signifies that the loss associated with a correct assignment is always less than or equal to the loss associated with an incorrect one. The cost of administering a test item C(i.sub.n) suggests that the cost may be a function of the test item. For example, the complexity of items may vary and the cost of developing and administering the items may vary as a result. For simplicity, the cost of administering a test item can be assumed to be the same for all test items.
For purposes of discussion, let us assume that C(i.sub.n)=0.1 (a constant) for all test items and that A.sub.1 (s)=0, A.sub.2 (s)=1 for all states s in S. Suppose at stage n, the largest posterior value in the SPS is 0.91. The optimal decision rule for this loss function in terms of minimizing the integrated risk given a response path and given that stopping has been invoked is to take the action of classifying to the state with the largest probability value in the final SPS. An optimal decision rule in terms of minimizing the integrated risk is referred to as the Bayes decision rule. With respect to this loss function and the corresponding integrated risk, it is not worth continuing since the reduction in average misclassification cost cannot possibly exceed the cost of taking another observation. If C(i.sub.n) were equal to 0 for all test items, it would presumably be worth continuing the administering of test items indefinitely in order to obtain higher and higher probabilities of a correct assignment since the cost of an incorrect assignment overpowers the nonexistent cost of administering test items. This example gives an indication of how the cost of observation plays a role in deciding when to stop, and how the cost of misclassification and cost of observation must be balanced.
The basis for the computer-implemented testing and classification process is a poset model S and a test item pool I that is stored in the computer memory.
Consider again the poset model in FIG. 1. For the cognitive domain of arithmetic, state A might represent a mastery of addition and subtraction, state B might represent a mastery of multiplication and division, and state 1 might represent a mastery of arithmetic, the union of states A and B. For the functionality domain of an AND gate, state A might represent the proper functioning of a NAND gate, state B might represent the proper functioning of an inverter which inverts the output of the NAND gate, and state 1 might represent the proper functioning of both the NAND gate and the inverter (i.e. the AND gate).
The flow diagram 1 for the process executed by the computer in classifying a test subject and providing remediation in the case of a human test subject or remediation guidance in the case of a system test subject is shown in FIG. 2. The process begins with the initialization step 3 whereby the poset model, the test item pool, and the test subject's initial state probability set are stored in computer memory. The poset model is defined by a list of states, the PDOI for each state, information about the costs of correct and incorrect classification for each state (given that the state is the true state of the test subject), and a forwarding address to a remediation program for each state.
A test item pool is a collection of test items. A test item pool is always linked to a particular poset model. Associated with each test item in the test item pool are class conditional densities. The expression f.sub.i (x.sub.n.vertline. s) denotes the class conditional density associated with the n'th administered test item i, x.sub.n being one of the possible responses to the n'th test item given that the state s is the true state of the test subject.
The test subject's initial state probability set (SPS) includes a member for each state in the poset model and is denoted by .pi..sub.0. The notation .pi..sub.0 (s) denotes the probability value in the collection of probabilities .pi..sub.0 (s) for the system's prior belief that the test subject belongs to state s, where s represents any one of the states. There are a number of possible choices for the test subject's initial SPS. One possibility is to assign a non-informative initial SPS which does not take into account subjective information about the test subject and thus treats all test subjects the same. An example of such an initial SPS is a uniform set in which all of the probabilities are equal. This choice is attractive in that there is no need for prior information about the test subject. Another example of a non-informative initial SPS is one in which the probabilities are derived from the distribution of prior test subjects among the poset states.
Ideally, the initial SPS should be tailored to the test subject. An initial SPS which heavily weights the true state of the test subject will lead to fast and accurate classification. Individualized initial SPSs can be constructed by using prior information concerning the test subject's domain state. In the case of humans, performance on examinations, homework, and class recitations can provide guidance. In the case of systems, previous operating performance would provide useful information for tailoring an initial SPS.
After the initialization step 3 has been performed, the classification step 5 takes place. Information necessary to classify a test subject into a domain state is obtained by successively calculating the test subject's SPS at stage n, denoted by .pi..sub.n, after the test subject's response to each of a sequence of N administered test items, n taking on values from 1 to N. Additionally, denote the probability at stage n that the test subject belongs to state s to be .pi..sub.n (s), for any i in S. The determination of the value of N depends on the stopping rule. Note that N is random, and that its value is dependent on the current SPS and the remaining available item pool at each stage. The determination of the value of N will be described later.
The test subject's posterior probability .pi..sub.(n+1) (s.vertline.X.sub.n+1 =x, It.sub.n+1 =i) for membership in state s at stage n+1 is obtained from the equation ##EQU3##
where X.sub.n+1 =x denotes that the test subject's observed response at stage n+1 is x, It.sub.n+1 =i denotes that item i is the (n+1)th administered item and f.sub.i (x.vertline.s) is the class conditional density associated with state s evaluated at x for item i. The symbol f.sub.i (x.vertline.s) denotes a class conditional density associated with either a discrete or continuous random variable X (see e.g. Steven F. Arnold, MATHEMATICAL STATISTICS, Prentice-Hall, Englewood Cliffs, N.J., 1990, pp. 44-46).
The updating rule represented by the above equation is known as the Bayes rule, and it will be the assumed updating rule. Note that it applies generally when the class conditional density functions are joint densities and/or conditional densities dependent on previous responses. Other updating rules for obtaining .pi..sub.(n+1) (s.vertline.X.sub.n+1 =x, It.sub.n+1 =i) from .pi..sub.n (s) may be used by the system. For alternative updating rules, it will be assumed that the updated state posterior probability value be a function of the SPS at stage n and the class conditional densities for all the states in S evaluated at observed response x. They should also have the property that for observed response X.sub.n+1 =x for any item i and fixed conditional density values for all states not equal to s, .pi..sub.(n+1) (s.vertline.X.sub.n+1 =x, It.sub.n+1 =i)) is non-decreasing in f.sub.i (x.vertline.s). This should hold for all s in S and possible responses x. Of course, Bayes rule is an example of such an updating rule.
After N test items have been administered to the test subject, the test subject is classified. After a test subject is classified, the remediation step 7 can take place by providing the human test subject with the knowledge he does not have or by providing a technician the necessary information to repair at least some of the functional defects existing in the test subject.
The flow diagram associated with one embodiment 8 of the classification step 5 is shown in FIG. 3. The first step 9 is to clear the test item counter which keeps track of the number of test items administered to the test subject. In step 11, the test item to be administered to the test subject is selected. A test item is selected from the test item pool by applying an item selection rule.
A useful approach to developing item selection rules is to employ an objective function to measure the "performance" or "attractiveness" of an item in the classification process. In practice, this objective function may be dependent upon an item's characteristics such as how it partitions the poset model, what the corresponding distributions are within the partitions, an SPS, and/or the observed item response. Clearly, the probability values in the SPS and the item responses can vary. The objective function can be weighted, usually by a class conditional density, and the weighted objective function summed/integrated over all the possible values for the inputs used by the objective function. In this way, one can obtain an "average" or "expected" value of the objective function which can, for instance, systematically take into account the variation in the SPS and/or the variation of the possible item responses.
This is done by summing/integrating over all possible input values the product of the objective function and the corresponding weighting function. Examples are given below. For the examples, it will be assumed that the system is at stage n, and that the current SPS is .pi..sub.n.
An important class of objective functions are uncertainty measures on an SPS. These are defined to be functions on an SPS such that the minimum value is attained when all but one of the values in the SPS has value zero. This minimum may not be unique in that other SPS configurations may attain the minimum as well.
A subset of item selection procedures which employ uncertainty measures as an objective function are those that gauge the uncertainty among the mass on an item's partitions with respect to an SPS. For such procedures, it is desirable for the item partitions to have a high level of (weighted) uncertainty. The idea is that the more the mass is spread across an item's partitions, the more efficiently the item can discriminate between states that have significant mass in the SPS ("the more birds that can be killed by one stone"). This is important because in order to build a dominant posterior probability value in the SPS, item sequences must discriminate between or separate all states with significant mass. Conversely, note that if all the mass is on one partition, there will be no change in the SPS if the updating rule is Bayes rule. The motivation of these procedures is to avoid this scenario as much as possible as measured by an uncertainty measure. Assuming that all items have a partition corresponding to a PDOI generated by a state in S, consider the simple example below, which selects item i in the available item pool that minimizes
h(.pi..sub.n,i)=.vertline.m.sub.n (i)-0.5.vertline. (4)
and where ##EQU4##
and e(i) is the type of test item i. For this criterion, as with all others, ties between items can be randomized. Note m.sub.n (i) is the mass on one of the partitions of item i at stage n, and the objective function .vertline.m.sub.n (i)-0.5.vertline. measures uncertainty among the item partitions with respect to the SPS at stage n, .pi..sub.n. Actually, to satisfy the earlier definition of an uncertainty measure, we need to multiply the objective function by (-1).
This rule is based on a very simple criterion which is an advantage in terms of computational complexity. However, the rule is not very sophisticated. It does not take into account the response distributions of each test item. Also, the rule may not perform well when the test items have more than two partitions.
Another motivation for classification is that it is desirable for the SPS to have mass concentrated on or around one element. Using the uncertainty measures with the defining property should encourage selection of items that on average lead towards the ideal SPS configuration of mass on one point.
An important example of an uncertainty function is Shannon's entropy function En(.pi..sub.n) where ##EQU5##
Note that the minimum is indeed attained when one element in the poset model has value 1 in the SPS. A weighted version of this objective criterion is sh.sub.1 (.pi..sub.n, i) where
sh.sub.1 (.pi..sub.n,i)=.intg.En(.pi..sub.(n+1).vertline.X.sub.(n+1) =x,It.sub.(n+1) =i)P(X.sub.(n+1) =X.vertline..pi..sub.n, It.sub.(n+1) =i)dx (7)
where i now denotes any test item in the test item pool that has not yet been administered to the test subject. The symbol En(.pi..sub.(n+1).vertline.X.sub.(n+1) =X, It.sub.(n+1) =i) denotes En as a function of .pi. calculated after the test subject responds to the (n+1)'th administered test item given the response by the test subject to the (n+1)'th administered test item X.sub.(n+1) equals x and the (n+1)'th administered test item It.sub.(n+1) equals i. The symbol P(X.sub.(n+1) =x .vertline..pi..sub.n, It.sub.(n+1) =i) denotes the mixed probability that X.sub.(n+1) equals x given .pi..sub.n and given that item i was chosen to be the (n+1)'th administered item.
Note that the equation is based on .pi..sub.(n+1) 51 X.sub.(n+1) =x, It.sub.(n+1) =i which denotes the SPS at stage n+1 given the observed response for the item administered at stage n+1 is x and the item selected for stage n+1 is item i. This criterion selects the item i in the available item pool which minimizes the right-hand side of the equation. Note that the weighting function in this case is P(X.sub.(n+1) =x .vertline..pi..sub.n, It.sub.(n+1) =i) which is given by ##EQU6##
It is a function of the values in the SPS for each state in S multiplied by the density values of the corresponding response distributions associated with each state. Indeed, it is a mixed probability distribution on the space of possible response values given It.sub.(n+1) =i and .pi..sub.n and on the poset model S.
If the class conditional density f.sub.i (x.vertline.s)=f.sub.i (x) is associated with the response by a state-s test subject to a test item of type e(i) when e(i) is less than or equal to s and f.sub.i (x.vertline.s)=g.sub.i (x) is otherwise associated with the response, then sh.sub.1 (.pi..sub.n,i) is given by the following equation
sh .sub.1 (.pi..sub.n,i)=m.sub.n (i).intg.En(.pi..sub.(n+1).vertline.X.sub.(n+1) =x,It.sub.(n+1) =i)f.sub.i (x)dx +(1-m.sub.n (i)).intg.En(.pi..sub.(n+1).vertline.X.sub.(n+1) =x,It.sub.(n+1) =i)g.sub.i (x)dx (9)
An alternative to sh.sub.1 (.pi..sub.n,i) is sh.sub.1 '(.pi..sub.n,i):
sh.sub.1 '(.pi..sub.n,i)=sh.sub.1 (.pi..sub.n,i)-E.sub.n (.pi..sub.n) (10)
Minimizing sh.sub.1 '(.pi..sub.n,i) with respect to i is equivalent to minimizing sh.sub.1 (.pi..sub.n,i).
The use of the alternative formulation sh' can lead to a reduction in computational complexity since in the two-partition case, it can be viewed as a convex function of m.sub.n (i). Employing computationally simple item selection rules aids in the feasibility of employing large poset models and employing k-step extensions (see below).
A generalization of this class of selection rules in one that selects a test item to be administered to a test subject which minimizes the expected value of an SPS function after taking into account the possible responses to the next k administered test items, k being an integer. Item selection rules which look ahead k steps are attractive in that they are better able to exploit the potential of the items remaining in the test item pool.
The expected value sh.sub.k (.pi..sub.n,i) of En(.pi..sub.n) after administering k test items can be calculated in a straightforward manner using the recursive formula ##EQU7##
where "min over j" means the value of the quantity in brackets for an item j from the preceding available item pool which minimizes the value. The version of the equation where P can be represented by f.sub.i and g.sub.i is ##EQU8##
where e(i) is the type of test item i.
The same framework for constructing item selection rules applies to distance measures on two different SPSs: for instance, .pi..sub.n and .pi..sub.(n+1).vertline.X.sub.(n+1) =x, It.sub.(n+1) =i. Let a distance measure between two SPSs be such that, given SPSs a and b, the distance measure is a function of a and b that attains its minimum given a when a=b. Note that this minimum may not necessarily be unique. The motivation behind such a measure is that it is undesirable for an item not to lead to change between successive SPSs. An example of such a distance function is the sum over all of the states of the absolute difference of corresponding SPS elements associated with each state. Consider the item selection rule based on this objective function which selects item i in the available item pool that maximizes Kg(.pi..sub.n,i) where ##EQU9##
The version of this equation that is obtained when item i has two partitions represented by f.sub.i and g.sub.i and is associated with type e(i) is ##EQU10##
Note that each term in the sum comprising the distance function on the SPSs is weighted correspondingly by the weighting function .pi..sub.n (s)f.sub.i (x.vertline.s) for each s in S and possible response x given It.sub.(n+1) =i and .pi..sub.n.
The k-step version of the above equation is ##EQU11##
where "min over j" means the value of the quantity in brackets for an item j from the preceding available item pool which minimizes the value.
Yet another important class of item selection rules are based on objective functions that measure the "distance" or "discrepancy" between class conditional densities associated with the various states in S. The term "distance" or "discrepancy" is to be interpreted as a measure of the discrimination between the class conditional densities. Formally, it is assumed that a discrepancy measure is a function of two class conditional densities such that, for class conditional densities c and d, the discrepancy measure takes on its minimum given c when c=d. This minimum may not be unique. The motivation of adopting such objective functions is that items become more desirable for classification as the discrepancy between its class conditional densities increases. Conversely, if class conditional densities are equivalent, then statistically there will be no relative discrimination between the respective states in the subsequent SPS.
An example of item selection rules based on such objective functions include those that select the item i in the available item pool which maximizes the weighted discrepancies wd(.pi..sub.n,i) where ##EQU12##
where d.sub.jk (i) is a discrepancy measure between the class conditional densities of states j and k for item i. Note that each distance between a pair of states is weighted by the product of the corresponding probability values in the current SPS. A particularly simple d.sub.jk (i) is the one which equals 0 if f.sub.i (x.vertline.j) equals f.sub.i (x.vertline.k) and 1 otherwise.
As an illustration, suppose item i partitions the set of states into two subsets with item type denoted by e(i). Suppose f.sub.i (x.vertline.j) equals f.sub.i when e(i).ltoreq.j and equals g.sub.i otherwise. Examples of discrepancy measures for f.sub.i and g.sub.i include the Kullback-Liebler distance given by ##EQU13##
and the Hellinger distance given by ##EQU14##
Still another class of item selection rules are the k-step look-ahead rules. These rules employ as objective functions loss functions such as those described earlier. Again, the objective functions will usually be weighted over the possible input values. The motivation behind such criteria is to reduce the average cost of misclassification while balancing the average cost of observation. There are a variety of possible loss functions that one might use. Importantly, the loss function used in item selection may differ from that used in the integrated risk determination (see above). If the same loss function is used, then the k-step look-ahead rule selects the best k-step strategy which leads to the greatest reduction in the integrated risk within a k-step horizon. Note that it is possible that less than k items may be administered in a k-step strategy.
A one-step look-ahead rule can be based on the expected loss LA.sub.1 defined by the equation ##EQU15##
where L(s,d(x),1) is the loss function, and item i is selected from the available test item pool. Of the remaining yet to be administered items in the test item pool, the one which is associated with the smallest value of LA.sub.1 would be chosen as the (n+1)'th item to be administered. It may be assumed that d(x) is the Bayes decision rule after response x is observed.
If the class conditional density f.sub.i (x.vertline.s)=f.sub.i (x) is associated with the response by a state-s test subject to a test item i of type e(i) when e(i) is less than or equal to s and f.sub.i (x.vertline.s)=gi(x) is otherwise associated with the response, then LA.sub.1 (.pi..sub.n,i) is given by the following equation ##EQU16##
where L(s,d(x),1) can be viewed as a function of .pi..sub.(n+1).vertline.X.sub.(n+1) =x, It(n+1)=i. If the loss function has constant cost of observation and 0-1 misclassification cost, this criterion reduces to choosing the item that will give the largest expected posterior value in .pi..sub.(n+1).
A k-step look-ahead rule utilizes the expected loss LA.sub.k in administering the next k test items. The quantity LA.sub.k is defined recursively by the equation ##EQU17##
where "min over j" means the value of the quantity in brackets for an item j from the preceding available item pool which minimizes the value. The version of the equation when item i has two partitions represented by f.sub.i and g.sub.i and is associated with type e(i) is ##EQU18##
Not all reasonable item selection rules need be based directly on objective functions. First, let us begin with the definition of an important concept in item selection. An item i is said to separate the states s.sub.1 and s.sub.2 in S if the integral/sum over all possible responses of the class conditional density f.sub.i given s.sub.1 and/or f.sub.i given s.sub.2 of the absolute difference of the class conditional densities is greater than zero. In other words, states s.sub.1 and s.sub.2 are separated if, with positive probability with respect to one of the densities, the respective two-class conditional densities are different. This definition can be generalized to: an item is said to separate two states if for a discrepancy measure such as in equations (18) or (19) for the corresponding class conditional densities, the resultant value exceeds a predetermined value. The class of discrepancy measures utilized in the invention coincides with those utilized in item selection rules based on weighted discrepancy measures. Indeed, the criterion for separation can be generalized further by considering a plurality of discrepancy measures, and establishing the separation criterion to be satisfied if for instance two or more measures exceed predetermined values, or all the measures exceed a predetermined value, or other such conditions.
Let us now introduce the function .PHI. which, given a separation criterion and two states in S, determines if an item from a given item pool indeed separates the two states. The outcome "yes" can be assigned the value 1 and "no" the value 0. An application of .PHI. is to generate for two states the subset of items which separates them from the available item pool.
As an illustration, suppose the poset in FIG. 6 is the underlying model S. Further, let the corresponding item pool contain four items, each with two partitions, of types C, AC, BC, and 1 respectively and whose corresponding class conditional densities satisfy the given property of separation. Then, for states AC and BC for instance, given this item pool, .PHI. can be used to generate the subset of items which separate them, the items of types AC and BC. All this involves is to group together all items for which their .PHI.-values are equal to 1.
The function .PHI. can also be used to generate reasonable item selection rules. One procedure is as follows:
1. Find the two states in S with the largest values in the current SPS at stage n;
2. Use .PHI. to identify items in the available item pool that will separate these states;
3. Select the item in the resultant subset of items provided by .PHI. with the largest discrepancy value with respect to a discrepancy measure such as in equations (18) and (19) of the class conditional densities of the two states, or, allow the system to randomize selection among those items, thus avoiding use of an objective function altogether. The class of discrepancy measures that can be used in this item selection procedure is equivalent to the class that can be used in the item selection rules based on discrepancy measures on class conditional densities.
All the rules discussed above can be randomized. This involves introducing the possibility that the item selected by a given rule at a given stage may, with positive probability be exchanged for another item. This randomization may be weighted by the relative attractiveness of the items with respect to the item selection criterion.
One technique which implements such randomization of item selection is simulated annealing (see S. Kirkpatrick et al., "Optimization by Simulated Annealing", Science, 220, pp. 671-679). The inclination to "jump" to another item is regulated by a "temperature" (i.e. the probability distribution associated with the randomization process is controlled by a "temperature" parameter). The higher the temperature, the more likely a jump will occur. An item selection rule used in conjunction with simulated annealing can be run at various temperatures, with each run referred to as an annealing.
Specifically, one implementation of simulated annealing would be to regulate jumping with a Bernoulli trial, with the probability of jumping a function of the temperature parameter. The higher the temperature, the higher the probability that a jump will indeed occur. Once a jump has been decided upon, the probability distribution associated with alternative items could for instance be proportional to the respective relative attractiveness of the items with respect to the item selection criterion in question.
The motivation for employing such a modification to an item selection rule is that sometimes myopic optimization may not necessarily lead to item sequences with good overall performance as measured for instance by the integrated risk. Several annealings can be run and the corresponding strategies analyzed to see if improvement is possible.
Another technique for possibly improving upon a collection of item selection rules is to hybridize them within a k-step horizon. This procedure develops new item selection rules based upon collections of other rules. For each rule in a given collection, a k-step strategy is constructed at each stage in the classification process. The hybridized rule selects the item which was used initially in the best of the k-step strategies as judged by a criterion such as the integrated risk with respect to the current state SPS. (A different loss function to judge the k-step strategies than the one used for the general classification process may be used.) Hence, the hybridized rule employs the item selection rule which is "best" at each particular state in terms of a k-step horizon, so that overall performance should be improved over just using one item selection procedure alone.
Other hybridizing techniques are possible. As an example, given a plurality of item selection rules, an item can be selected randomly from the selections of the rules. Alternatively, each test item in the available test item pool can be assigned a relative ranking of attractiveness with respect to each selection rule: for instance "1" for the most attractive, "2" for the second most attractive, etc. The test item with the highest average ranking among the selection rules is selected. Clearly, the ranking values can also be based on the relative values of weighted objective functions. In general, criteria based on weighted relative rankings of attractiveness with respect to a plurality of item selection rules will be referred to as relative ranking measures, with the higher the weighted relative ranking, the more attractive the item.
After selecting the next test item in step 11 of the classification process 5 shown in FIG. 3, the selected test item is flagged which indicates that the selected test item is not available for future selection. The selected test item is then administered to the test subject and the test subject's response is recorded in step 13. The test subject's SPS is then updated in step 15 in accordance with equation (3) and the test item counter is incremented in step 17.
The decision is made in step 19 as to whether at this point the administering of test items should be stopped and the test subject classified. The simplest criterion for making this decision is whether or not any of the members of the test subject's SPS exceeds a classification threshold. If any of the members of the test subject's SPS does exceed the threshold, the test subject is classified in step 21 in the state associated with the member of the test subject's SPS having the greatest value and the remediation process 7 begins.
If none of the members of the test subject's SPS exceeds the classification threshold, the test item count recorded in the test item counter is compared with a test item limit in step 23. If the test item count is less than the test item limit, the classification process returns to the item selection step 11 and the process continues. If the test item count exceeds the test item limit, it is concluded that the classification process is not succeeding and the classification process is terminated in step 25. The classification process may not succeed for a variety of reasons. For example, if the responses provided by the test subject are inconsistent with respect to any state in S (i.e. no dominant posterior probability emerges in the SPS functions), it would be impossible to properly classify the test subject.
Another possible stopping rule that may be employed is the k-step look-ahead stopping rule. It involves the same calculations as with a k-step look-ahead item selection rule and results in a k-step strategy .delta..sub.k with respect to the classification decision-theoretic loss function.
Given a current SPS, the system must decide whether to continue or stop. The k-step look-ahead stopping rule will favor stopping if R(.pi..sub.n, .delta..sub.k)>=R(.pi..sub.n, .delta..sub.0), where .delta..sub.0 is the strategy that stops at the current position. The strategy .delta..sub.k may be represented by a strategy tree (see below). Of course, other item selection criteria can be used to construct 67 .sub.k besides that of the equation for LA.sub.k given above. Additionally, the loss function used in the k-step look-ahead stopping criterion may differ from those used in other contexts.
The k-step look-ahead stopping rules can be based on other weighted objective criteria besides a loss function. Consider the uncertainty and distance measures on SPS vectors. After constructing .delta..sub.k at a given stage, if the weighted (expected) reduction in an uncertainty measure is less than a predetermined value, or the increase in the distance between the weighted (expected) SPS at stage n+k and the current SPS is not greater than a specified value, stopping may be invoked.
Stopping rules do not necessarily have to look ahead k steps. A stopping rule may be a function of the current SPS. For instance, if a weighted uncertainty measure on the current SPS is less than a predetermined value, stopping can be invoked. Similarly, if a weighted distance measure, for instance, between the initial SPS and the current one is larger than a predetermined value, it would be attractive to stop, and stopping can be called. Using loss functions, a stopping rule could depend on whether or not a weighted loss is less than a predetermined value. A stopping rule could be based on such a criterion as well. Weighting for these stopping rule criteria could for instance be with respect to the class conditional density values corresponding to the test item responses administered up to the current stage and the initial SPS.
Consider the following examples. Suppose a loss function has a cost of observation of 0 until n>10 and then becomes 1 with no misclassification cost. The corresponding stopping rule for this loss function will invoke stopping if and only if the number of observations reaches 10 (cf FIG. 3, step 23). Note how this loss function belongs to the class of loss functions described earlier. Also note that this loss function is tailored for developing reasonable stopping rules and may not correspond to the loss function used in the integrated risk function.
Consider now the uncertainty measure which calculates the quantity (1 minus the largest posterior probability in the SPS). The corresponding stopping rule could then be the simple one described above, which stops if the largest posterior probability in the current SPS exceeds a threshold value. Note that the two examples described above can be used in conjunction to develop a stopping criterion, such as invoking stopping if and only if one or both of the rules calls for stopping. An alternative would be to invoke stopping if and only if both rules call for stopping. Clearly, with a plurality of stopping rules, various such combinations can be used in constructing a new stopping rule.
Recall that the decision rule which minimizes the integrated risk with respect to a loss function and initial SPS is called the Bayes decision rule. The decision rule is a function of the observed response path and its corresponding response distributions. Due to computational difficulty, it may sometimes be easier to use a Bayes decision rule from a different context (i.e. different initial SPS and different loss function). For example, if misclassification costs vary among the states in S, it may not always be the Bayes decision rule to select the state with the largest posterior probability in the final SPS, yet it may still be attractive to do so.
Moreover, when the underlying poset model has an infinite number of states, it is possible for purposes of deriving a decision rule to let the initial SPS have infinite mass. The best decision rules in terms of minimizing the integrated risk with respect to such initial SPS prior distributions are called generalized Bayes rules. These rules also may be useful. Once again, note that the loss functions used in the decision process may differ from those used in the classification process (integrated risk criterion) and those used in item selection and/or stopping. As in item selection, when using stopping or classification decision criteria, ties between decisions can be randomized. For emphasis, it should be noted that item selection and/or stopping rules can vary from stage to stage and decision rules from test subject to test subject.
A portion of a strategy tree embodiment 31 of the classification step 5 is shown in FIG. 4. A strategy tree specifies the first test item to be administered together with all subsequent test items to be administered. Each test item in the strategy tree after the first is based on the test subject's response to the last test item administered and the updated SPS. Strategy trees are representations of strategies. A strategy tree is a plurality of paths, each path beginning with the first test item to be administered, continuing through a sequence alternating between a particular response to the last test item and the specification of the next test item, and ending with a particular response to the final test item in the path. The classification of the test subject, based on the final updated SPS for each path, is specified for each path of the strategy tree. Note that strategy trees can be used when the response distributions are continuous if there are a finite number of possible response intervals associated with an item choice. Also, multiple branches emanating from a node in the tree indicates multiple possible response outcomes.
Thus, the identity of the next test item in the strategy tree can be determined by referencing a memory location keyed to the identity of the last test item administered and the response given by the test subject to the last test item. The last test item to be administered for each path in the strategy tree is identified as such in the memory, and each response to that last test item is associated in memory with the appropriate classification of the test subject who has followed the path that includes that particular response. Directions for remediation are also stored in memory for each path of the strategy tree.
It is assumed in FIG. 4 that the test subject's response to a test item can be either positive or negative. The first item to be administered is specified by the strategy tree to be item-3 and is administered in step 33. The response is analyzed in step 35. If the response to item-3 is positive, the next item to be administered is specified by the strategy tree to be item-4 which is administered in step 37. The response to item-4 is analyzed in step 39. If the response to item-4 is positive, the administering of test items ceases, classification of the test subject occurs in step 40, and the test subject transitions to the remediation step 7 (FIG. 2). If the response to item-4 is negative, item-7 is administered in step 41 in accordance with the strategy tree specification. The process continues in a similar manner after step 41 until a stopping point is reached and classification occurs.
If the response to item-3 is determined in step 35 to be negative, item-1 is administered in step 43 as specified by the strategy tree and analyzed in step 45. If the response to item-1 is positive, the administering of test items ceases, classification of the test subject occurs in step 46, and the test subject transitions to the remediation step 7 (FIG. 2). If the response to item-1 is negative, either item-9, item-2, . . . , item-5 is administered in steps 47, 49, . . . , 51, as specified by the strategy tree. The process continues after these steps until a stopping point is reached and classification occurs.
Such strategy trees are developed starting with the initial SPS and the test item pool, and determining the sequence of test items to be administered using the test item selection procedures described above. A strategy tree branches with each administration of a test item until stopping is invoked by a stopping rule.
It may be possible to create a more efficient strategy tree from an existing one by evaluating a weighted loss function one or more test items back from the final test item in a path and determining whether the additional test items in the strategy tree are justified by a reduction in the weighted loss function.
The relationship between the loss function and a strategy tree is illustrated in FIG. 5. A circle 61 denotes a test item and line segments 63 and 65 denote the possible responses to the test item. Test item 67, the first test item in the strategy tree, is the beginning of all paths in the strategy tree. Each path terminates with a line segment such as line segment 63 which does not connect to another test item. The loss function L(s,d,n) for each path can be determined after classification occurs at the end of each path, assuming the test subject's true classification is s, as indicated in FIG. 5.
The loss function cannot be used directly in refining a strategy tree since one never knows with absolute certainty the true classification of a test subject. Instead, the weighted loss function (i.e. integrated risk) R(.pi..sub.0, .delta.) is used for this purpose.
As mentioned above, a strategy tree can be refined by using the weighted loss function. Suppose the weighted loss function of the strategy tree .delta..sub.1 of FIG. 5 is R(.pi..sub.0, .delta..sub.1). Now eliminate test item 61 and call this revised tree .delta..sub.2 with an weighted loss function R(.pi..sub.0, .delta..sub.2). If R(.pi..sub.0, .delta..sub.2) is less than R(.pi..sub.0, .delta..sub.1), the reduced weighted loss function suggests that strategy tree .delta..sub.2 is preferable to original strategy tree .delta..sub.1.
Rather than eliminating only one test item, one might choose to eliminate test items 61 and 63, thereby obtaining strategy tree .delta..sub.3. Again, if R(.pi..sub.0, .delta..sub.3) is less than R(.pi..sub.0, .delta..sub.1), the reduced weighted loss function suggests that strategy tree .delta..sub.3 is preferable to original strategy tree .delta..sub.1. There are obviously many possible modifications of the original strategy tree that might be investigated using the weighted loss function as the criterion of goodness. A systematic approach would be to employ a "peel-back" approach. This entails "growing" the tree with a computationally-simple stopping rule such as the one which decides to stop when one of the states in S has a posterior probability value which exceeds a threshold value or when the number of observations exceeds a threshold. Then, the system can "peel-back" the tree and refine the stopping rule in terms of the weighted loss function by applying a k-step look-ahead stopping rule only to all the sub-trees at the end of the tree with a branch at most k steps from termination (k>=1). This approach becomes attractive when applying the k-step look-ahead stopping rule at each stage in the strategy tree is computationally expensive.
An important application of the technology used to generate sequential test sequences is in the development of fixed sequence tests. A fixed sequence test (fixed test) is a sequence of items that are to be administered to all test subjects, with no sequential selection involved. A test length may be predetermined or can be determined during design given a decision-theoretic framework as used in the sequential setting. Indeed, the same classification framework can be used in the fixed test context as well (use of loss functions with costs of misclassification and observation, integrated risk functions, an initial SPS, etc.). The objective for this problem then is to choose the fixed sequence from an item pool which minimizes the integrated risk for a given loss function and initial SPS. Note that choosing the test length (i.e. deciding when to stop) may be an issue since the loss function may include a cost of observation. Also, note that during actual administration of a given fixed test, it is possible to allow test subjects to stop before completing all of the test items in the fixed sequence, using stopping rules as described earlier. Decision rules are analogous in the fixed test context in that their objective is to make a classification decision which minimizes a weighted loss function.
All the previous item selection rules such as those based on weighted objective functions can be adapted to this application as well, along with the techniques of extending them for k-step horizons, hybridizing a collection of them, and introducing randomization to the selection process. As an example, items can be selected iteratively via the sh-criterion by choosing at stage n+1 the item i from the remaining available item pool which minimizes ##EQU19##
where i.sub.1, i.sub.1, . . . , i.sub.n are the previously selected items at stage 1 up through stage n respectively and ##EQU20##
The function f is the joint class conditional density for responses x.sub.1, . . . , x.sub.n, x.sub.n+1 given state s and item sequence i.sub.1, . . . , i.sub.n, i. In addition, the probability of a test subject being in a particular test item partition can be calculated for instance by weighting the probability values that would be given by the possible SPSs that could result from administration of the fixed test items up to state n. Recall that the probabilities of a test subject being in a test item's partitions are quantities used by certain item selection rules.
Item selection criteria based on the function .PHI. can also be used in this context as well. First, list all pairs of states that need to be separated, optionally giving more weight to certain separations (e.g. requiring that a certain separation should be done twice). The objective in selecting a fixed sequence would then be to conduct as many of the desired separations as possible, using for a given pair of states and a given separation criterion the function .PHI. to determine whether an item results in a separation between them. An item selection criterion would be to choose an item which results in as many of the remaining desired separations as possible. Once an item is administered, the list of desired remaining separations is updated by removing the resultant separations.
In the strategy tree context, the restriction that the same item sequence be administered to all test subjects is equivalent to requiring all branches in a tree to be equivalent. In general, one can view the process of selecting a fixed test as a special case of the general sequential analytic problem. At each stage n of the tree-building process, n>=1, instead of allowing each node to be associated with its own item, developing a fixed test is equivalent to requiring that all nodes at the same stage n of the test share the same item selection. Note that the "peel-back" approach to constructing a stopping rule can still be applied.
Conversely, developing fixed test sequences has application in sequential testing. Recall k-step look-ahead item selection and stopping rules, which require development of a k-step horizon strategy at each stage. This can be computationally costly if k is large and the poset model and item pool are complex. As an alternative, one can instead calculate a fixed test sequence within a k-step horizon in place of a k-step strategy. Criteria for item selection and stopping based on using a k-step horizon fixed test are analogous.
For both the sequential and fixed test settings, the above techniques can be used to design the item pool (fixed test) in terms of what type of items should be constructed. To gain insight, classification is conducted on hypothetical items with hypothetical item types and item response distributions. Since the classification process is being simulated, an infinite number of each of the item types of interest within a range of class conditional densities that reflect what is to be expected in practice can be assumed. From the hypothetical item pool, strategy trees or fixed sequences can be constructed for various initial SPS configurations. The composition of these constructions in terms of the hypothetical item types selected gives guidance as to how to develop the actual item pool or fixed sequence. Hypothetical item types that appear most frequently on average and/or have high probability of administration for instance with respect to SPSs and class conditional densities are candidates to be constructed. Analyzing the item composition of a number of simulated classification processes is an alternative approach to gaining insight into item pool design. Note that these approaches can be applied to actual test item pools as well. Actual test items that are not administered with high frequency on average and/or do not have high probability of administration, for instance with respect to SPSs and class conditional densities, are candidates for removal.
An important consideration in the implementation of the invention is the development of a model of the domain of interest and the associated test item pool. Concerns in developing the model include whether the model has too many or too few states. Desirable properties of the test item pool include having accurately specified items which strongly discriminate between states and having a sufficient assortment of item types to allow for effective partitioning of the states.
A model is too large when some of the states are superfluous and can be removed without adversely affecting classification performance. A model is too small when some of the important states are missing. An advantage to having a parsimonious model is that for test subjects in states that are specified, it doesn't require on average as many test items to reach the classification stage and to classify with a particular probability of error as it does for a larger model which contains the smaller one. The disadvantage is that test subjects in states that are not present in the model cannot be appropriately classified.
A good model gives useful information concerning the remediation of test subjects. Each state should be meaningful in assessing the knowledgeability or functionality of the test subject. Moreover, the model should be complex enough to be a good representation of all the relevant knowledge or functionality states in a given subject domain. Hence, balancing parsimony while accurately representing the subject domain is the primary challenge of model development.
The selection of items for the test item pool entails determining how effective a test item is in distinguishing between subsets of states. The effectiveness of a test item is determined by the degree of discrimination provided by the response distributions associated with the test item and the subsets of states. The degree of discrimination provided by response distributions can be measured in a variety of ways. Two possibilities are illustrated by equations (18) and (19), with larger values indicating a larger degree of discrimination. In general, discrepancy measures from the same class as employed in item selection can be used.
The starting point for the development of a domain model and its associated test item pool is the postulating of model candidates by experts in the field of the domain and the generation of test item candidates of specified types for the test item pool. Within each model, the experts may have an idea as to which states may be superfluous and where there may be missing states. Further, the experts may have an idea as to which items do not discriminate well between subsets of states or whose association with the domain states may be vague and need to be investigated. These prior suspicions are helpful in that they allow the user to experiment through design of a training sample of test subjects in order to gain information necessary to make decisions about item performance and model structure.
With respect to the relationship between domain models and the test item pool, it is of considerable importance that the item pool can discriminate among all of the states. Whether this is true or not can be determined by a mapping on the poset model, given a test item pool with fixed item partitions. In general, the item partitions may be specified such that they do not necessarily correspond to the subsets with shared class conditional response distributions, and in fact can be specified without taking into consideration actual estimated class conditional response densities. Moreover, separation criteria can be used for specifying alternative partitions, such as grouping together states whose class conditional density discrepancies are small. These alternative partitions can be used below and in item selection rules. Hypothetical items with hypothetical partitions can be used as well. The mapping consists of the following sequence of operations: partitioning the domain set of states by means of a first item in the test item pool into its corresponding partition, partitioning each of the subsequent subsets in the same manner by means of a second item, resulting in the further possible partitioning of each partition of the first item; continuing the partitioning of the resultant subsets at each stage of this process by means of a third, fourth, . . . , nth type of item until either there are no more items left in the item pool or until each state in the original poset is by itself in a subset. The latter situation implies that the item pool can discriminate between all of the states in the domain in relation to the fixed item partitions. If the final collection of subsets contains one subset that has more than one member, the implication is that the test item pool cannot discriminate between those states in that subset, again in relation to the fixed item partitions. The image of this mapping can be partially ordered, with the partial order induced in the sense that x'.ltoreq.y' for x' and y' in the image if there exists x and y in the original poset such that x.ltoreq.y and the images of x and y are x' and y' respectively.
FIGS. 6 and 7 give an illustration of this mapping. Suppose the item pool contains 4 items, each with two partitions and associated respectively with states {C,AC,BC,1}. FIG. 6 shows the original poset model. FIG. 7 is the image of the mapping on the poset model of FIG. 6. The image shown in FIG. 6 indicates that the test item pool was unable to separate states 0, A, B, and AB.
A resultant image poset can be viewed as the effective working model for classification in relation to the item pool and given item partitions and is a reduction from the original poset model. In practice, this reduction in the number of states can be substantial, depending on the item pool. States that are not discriminated by the mapping are effectively viewed as one state in the image. Also, if classification is to be conducted on the image poset, note that an item's partition must be updated in relation to the new model. If the partial order on the image is induced as above, then an item's partition in the image is just the image of the partition, and the system can automatically update the item type specification.
The unavailability of item types is a natural constraint for the model. Certain item types may be awkward, such as an item requiring exactly one skill. Item construction constraints are a factor in the type of models that can be used in classification. Thus, the mapping described above gives important information about the type of models that can be constructed and whether the item pool needs to be augmented in order to better separate states. It can be used to analyze the performance of a fixed test, to see which states may not be separated by the fixed test.
The mapping on the domain model should be performed immediately after candidate domain models and candidate test item pools have been defined in order to provide insight as to possible constraints imposed on the model and possible flaws in the process for selecting candidate items for the test item pool. The mapping should be repeated for any modifications of the candidate models and test item pools.
Sometimes it is of interest to generate ideal response patterns. Consider the following example from the educational application. Given a poset model, suppose that the response distributions for the items are Bernoulli, and that each item has two partitions. Then, given each item, it can be determined whether a test subject in a specified state in the poset model has the knowledge or functionality to give a positive response, in which case a "1" is used to denote the response. Otherwise a "0" is used to denote the response. The final sequence of 1s and 0s, ordered in the same way as the test items, is referred to as the ideal response pattern. For this case, the ideal response pattern for a test subject in the specified state are the responses that would be observed if the test subject's responses perfectly reflected the test subject's state.
In general, an ideal response for an item given a specified state can be any representative value of the class conditional density for the item. In the continuous response case, this could be the mean of the density. Further, instead of an ideal response value, the ideal response can be represented by an ideal set if values possibly including ideal intervals of values (depending on whether the class conditional density is discrete or continuous). An example of a possibly useful ideal response interval for an item given a specified state is the set of values within a specified distance from the class conditional density mean for that state. When the response is multi-dimensional, an ideal response could be a value or set of values in the multi-dimensional space of possible responses. Ideal response patterns will contain information about each item in a given item sequence.
Given a test subject response pattern g and an ideal pattern h, distance measures on the patterns can be used to gauge whether the ideal pattern is "close" to the test subject pattern. For the example above, a reasonable distance measure would be to count the number of discrepancies between patterns, with an ideal pattern said to be "close" to a test subject pattern if that number is less than a certain specified number, or, equivalently, if the percentage of discrepancies is l |