Rabu, 23 Januari 2013

Item Response Theory (IRT)


Presented By:

o   Fifin Naili Riskiyah
o   Justsinta Sindi Alivi
o   Sitti Fatimah Saleng
o   Uliya Nafida
Item Response Theory (IRT)

1.      Differences between classical test theories (CTT) and Item Response Theories
Area
CTT
IRT
Model
Linear
Non-linear
Level
Test
Item
Assumption
Weak (e.g.,easy to meet with test data)
Strong (e.g. more difficult to meet with test data)
Item-ability relationship
Not specified
Item characteristic functions
 Ability
Test scores or estimated true scores are reported on the test score scale (or a transformed test-score scale)
Ability scores are reported on the scale - ∞ to + ∞ (or a transformed scale)
Invariance of item and person statistics
No – item and person parameters are sample dependent
Yes – item and person parameters are sample independent, if model fits the test data
Item statistics
p, r
b, a, and c (for the three parameters model) plus corresponding item information functions

2.      Advantages and disadvantages of IRT
The advantages of IRT :
q  IRT does not based on group dependent
q   Students’ score which is described is not dependent test
q   IRT does not need a paralel test to determine test reliability
q   An IRT model requires a measure of accuracy for each score level of ability
q   Items analysis accommodates matching test items to examinee knowledge levels

The disadvantages of IRT:
q  Assumptions underlying the use of IRT models are more stringent than those required of classical test theory
q  IRT models tend to be more complex and the models-output are more difficult to understand, particularly with non-technically oriented audiences
q  IRT models require large samples to obtain accurate and stable parameter estimates.      


3.      Item Calibration and Ability Estimation
A teacher wants to give her students a math test to assess their skill level at the beginning of the school year. She develops two hard questions, one average question and two easy questions. After she gives the test to 5 students, she uses Item Response Theory (IRT) to determine different characteristics of each question.
Math test:
1.       105,100/ 167 =
2.       673 x 465 =
3.       6 x 7 =
4.       5 – 3 =
5.       2 + 2 =


Item 1
Item 2
Item 3
Item 4
Item 5
Average
Person 1
1
1
1
1
1
1
Person 2
0
1
1
1
1
0.8
Person 3
0
0
1
1
1
0.6
Person 4
0
0
0
1
1
0.4
Person 5
0
0
0
0
1
0.2

In this example, student 1 who answered all five items correctly, is tentatively considered to possess 100% proficiency. Student 2 has 80% proficiency, student 3 has 60%, student 4 has 40%, and student 5 has 20%.
These score in terms of percentages are considered tentative because first, in IRT there is another set of terminology for proficiency.
Second, we cannot judge a student’s ability just based on the number of correct item only. Rather the item attribute, such as its difficulty level, should also be taken into account.
In the preceding example, no students have the same scores. But what would happen if there is a student, say student 6, whose score is the same as that of student 4?
Person 4
0
0
0
1
1
0.4
Person 5
0
0
0
0
1
0.2
Person 6
1
1
0
0
0
0.4

In this case we cannot draw a firm conclusion that they have the same level of proficiency because student 4 answered two easy items correctly, whereas student 6 answered two hard questions correctly.
This example is an ideal case in which more proficient students answer all items correctly, and less proficient students answer the easier item and fail the hard ones. However, these results rarely occur in reality and are just “too good to be true”. This ideal case is known as The Guttman Pattern.




We can also make a tentative assessment of the item attribute based on this example.

Item 1
Item 2
Item 3
Item 4
Item 5
Average
Person 1
1
1
1
1
1
1
Person 2
0
1
1
1
1
0.8
Person 3
0
0
1
1
1
0.6
Person 4
0
0
0
1
1
0.4
Person 5
0
0
0
0
1
0.2
Average
0.8
0.6
0.4
0.2
0






Item 1 seems to be the most difficult because only one person out of five could answer correctly.
 It is tentatively asserted that the difficulty level in terms of the failure rate for item 1 is 0.8, meaning 80% of students were unable to answer the item correctly in other words, the item is so difficult that it can “beat” 80% of students. Please note that for person proficiency we count the number of successful answers. But for item difficulty we count the number of failures.
Item 5 is the easiest because 100% or all of the students were able to answer the item correctly. The difficulty level is 0. However, as you might expect, the issue will become more complicated when some items have the same pass rate but are answered correctly by students of different skill levels.
In this example, item 1 and item 6 have the same difficulty level (0.8).

Item 1
Item 2
Item 3
Item 4
Item 5
Item 6
Average
Person 1
1
1
1
1
1
0
0.83
Person 2
0
1
1
1
1
0
0.67
Person 3
0
0
1
1
1
0
0.50
Person 4
0
0
0
1
1
0
0.33
Person 5
0
0
0
0
1
1
0.33
Average
0.8
0.6
0.4
0.2
0
0.8


However, item 1 was answered correctly by a person who has high proficiency (83%) whereas item 6 was not. The person who answered item 6 correctly has 33% proficiency. It is possible that the wording in item 6 tends to confuse good students. Therefore, the item attribute of item 6 is not clear-cut.
We call the portion of correct answers for each person, tentative student proficiency (TSP). We call the pass rate for each item tentative item difficulty (TID).
Given the tentative information we obtained about item difficulty and student proficiency, we can predict the probability of answering a particular item correctly given the proficiency level of a student.

Item 1
Item 2
Item 3
Item 4
Item 5
TSP
Person 1
0.55
0.60
0.65
0.69
0.73
1
Person 2
0.50
0.55
0.60
0.65
0.69
0.8
Person 3
0.45
0.50
0.55
0.60
0.65
0.6
Person 4
0.40
0.45
0.50
0.55
0.60
0.4
Person 5
0.35
0.40
0.45
0.50
0.55
0.2
TID
0.80
0.60
0.40
0.20
0.00


For example, the probability that person 1 can answer items correctly is 0.73. there is no surprise person 1 has a tentative proficiency of 1. But tentative difficulty of item 5 is 0. Person 1 is definitely “smarter” or “better” than item 5. The probability that person 2 can answer item 1 correctly is 0.5.The tentative student proficiency is 0.8, and the tentative item difficulty is also 0.8.
In other words, the person’s ability “matches” the item difficulty. When the student has a 50% chance to answer the item correctly, the students has no advantage over the item and vise versa.
When you move your eyes across the diagonal from upper left to lower right, you will see a “match” (0.5) between a person and an item several times.

Item 1
Item 2
Item 3
Item 4
Item 5
TSP
Person 1
0.55
0.60
0.65
0.69
0.73
1
Person 2
0.50
0.55
0.60
0.65
0.69
0.8
Person 3
0.45
0.50
0.55
0.60
0.65
0.6
Person 4
0.40
0.45
0.50
0.55
0.60
0.4
Person 5
0.35
0.40
0.45
0.50
0.55
0.2
TID
0.80
0.60
0.40
0.20
0.00


However, if we put the table of probability and the table of raw scores together, we will find something strange.

Item 1
Item 2
Item 3
Item 4
Item 5
Average
Person 1
1
1
1
1
1
1
Person 2
0
1
1
1
1
0.8
Person 3
0
0
1
1
1
0.6
Person 4
0
0
0
1
1
0.4
Person 5
0
0
0
0
1
0.2
Average
0.80
0.60
0.40
0.20
0.00


According to upper table, the probability of person 5 answering item 1 to 4 correctly ranges from 0.35 to 0.5. But actually the students failed all four items.  As you see, the data and the probability model do not necessarily fir together. In IRT, this discrepancy can be used to further calibrate the estimation until the data and the model converge.
In short, both items attribute and the student proficiency should be taken into consideration in order to conduct item calibration and proficiency estimation. The data give us the tentative student proficiency and item difficulty, which are used to fit the model, and then the model is used to predict the data. Needless to say, there will be some differences between the model and the data in the initial steps. It takes many cycles to reach convergence.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     

Tidak ada komentar:

Posting Komentar