Psychometrics Week 4

Exercise 1

We plot the IIC for every item:

set.seed(1)
require("ltm")

fit2PL <- ltm(LSAT ~ z1)
plot.ltm(fit2PL, type = "IIC", zrange = c(-6, 6))

plot of chunk unnamed-chunk-2

Exercise 2

# First fit linear model using code from last week:
require("OpenMx")

fit <- fit2PL
n<-dim(fit$X)[2]; p<-plot(fit, items=1, type="ICC", plot=FALSE)
r<-length(p[,2]); X<-p[,1]; iccs<-matrix(NA,ncol=n,nrow=r)
for (i in 1:n) iccs[,i]<-plot(fit,items=i,type="ICC", plot=FALSE)[,2]
Y<-apply(iccs, MARGIN=1, sum);
#Create an MxModel object
oneFactorModel <- mxModel("Common Factor Model Path Specification",
                          type="RAM",
                          mxData(observed=LSAT,type="raw"),
                          manifestVars=c("Item 1", "Item 2", "Item 3", "Item 4", "Item 5"),
                          latentVars="Factor",
                          # residual variances
                          mxPath(from=c("Item 1", "Item 2", "Item 3", "Item 4", "Item 5"),
                                 arrows=2,
                                 free=TRUE,
                                 values=c(1,1,1,1,1),
                                 labels=c("e1","e2","e3","e4","e5")
                          ),
                          # latent variance
                          mxPath(from="Factor",
                                 arrows=2,
                                 free=FALSE,
                                 values=1,
                                 labels ="varFactor"
                          ),
                          # factor loadings
                          mxPath(from="Factor",
                                 to=c("Item 1", "Item 2", "Item 3", "Item 4", "Item 5"),
                                 arrows=1,
                                 free=c(TRUE,TRUE,TRUE,TRUE,TRUE),
                                 values=c(1,1,1,1,1),
                                 labels =c("l1","l2","l3","l4","l5")
                          ),
                          # means
                          mxPath(from="one",
                                 to=c("Item 1", "Item 2", "Item 3", "Item 4", "Item 5","Factor"),
                                 arrows=1,
                                 free=c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE),
                                 values=c(1,1,1,1,1,0),
                                 labels =c("meanx1","meanx2","meanx3","meanx4","meanx5",NA)
                          )
) # close model
oneFactorFit <- mxRun(oneFactorModel)

## Running Common Factor Model Path Specification

estimates<-oneFactorFit@output$estimate

# Now make the plot:
plot.ltm(fit2PL, type="IIC", items=c(3), zrange=c(-6,6))
abline(estimates["l3"]^2 / estimates["e3"], 0, col="red")

plot of chunk unnamed-chunk-4

We can conclude:

Under the 2PL model, the third item gives the most information around ~-0.5.
Under the linear model, the third item gives constant information, independent of underlying ability (due to the model structure).

It is hard to compare the two plots directly: we're not comparing IIC for different items, but for the same item, with the same results, but with different models. This means that the estimated ability (per individual) is also different between the models. We could at most conclude that it is possible for the third item to give more information under the 2PL model than under the linear model, in some cases.

(Relative Efficiency for different subsets of the test, as in exercise 8, is a valid way to directly compare information measures.)

[Note that the possible ability estimates for the 2PL model fall within -2 and 0.7 (according to factor.scores), and on that interval the 2PL model gives almost consistently more information.]

Exercise 3

plot.ltm(fit2PL, type = "IIC", items = 0, zrange = c(-6, 6))

plot of chunk unnamed-chunk-5

We see that the test is most suited for the purpose of distinguishing between students with ability between say -2.5 and -0.5. For abilities outside that, the questions do not give much information, that is, they do not distinguish equally well. This is especially the case for higher scores.

Exercise 4

plot.ltm(fit2PL, type = "IIC", items = 0, zrange = c(-6, 6))
abline(estimates["l1"]^2/estimates["e1"] + estimates["l2"]^2/estimates["e2"] + 
    estimates["l3"]^2/estimates["e3"] + estimates["l4"]^2/estimates["e4"] + 
    estimates["l5"]^2/estimates["e5"], 0, col = "red")

plot of chunk unnamed-chunk-6

We have the same problem as in exercise 2: the linearity of the TCC of the linear model is a result of the model structure. Choosing that model instead of the 2PL one does not mean that the test is suddenly strong at distinguishing between +6 and +7 students.

What it does mean is that in the linear model, we always get ~0.44 of information about the test takers, and in the linear model it depends on the answers: for some, we get slightly more information, and for some, we get a little or a lot less information.

In practice this means that for the linear model the error variance will be the same regardless of (estimated) ability, and for the 2PL model it will be lower for some, and (much) higher for others.

[Note that the possible ability estimates for the 2PL model fall within -2 and 0.7 (according to factor.scores), and on that interval the 2PL model gives almost consistently more information.]

Exercise 5

The standard error of the estimate is the square root of the variance, which is 1/I(f), where I is the information function. So, we make the same plots again, replacing the Y value by sqrt(1/Y):

inf <- plot.ltm(fit2PL, type = "IIC", items = 0, zrange = c(-6, 6), plot = FALSE)
plot(inf[, 1], sqrt(1/inf[, 2]), type = "l", ylab = "std.err of estimate", main = "Total std.err", 
    xlab = "Ability")
abline(sqrt(1/(estimates["l1"]^2/estimates["e1"] + estimates["l2"]^2/estimates["e2"] + 
    estimates["l3"]^2/estimates["e3"] + estimates["l4"]^2/estimates["e4"] + 
    estimates["l5"]^2/estimates["e5"])), 0, col = "red")

plot of chunk unnamed-chunk-7

Exercise 6

score <- factor.scores(fit2PL, matrix(c(1, 1, 0, 0, 0), 1, 5))
# 90% confidence interval, assuming normality:
conf <- c(score$score.dat$z1 - score$score.dat$se.z1 * qnorm(0.95), score$score.dat$z1 + 
    score$score.dat$se.z1 * qnorm(0.95))
conf

## [1] -2.2307  0.4084

We can conclude that according to this model the 90% confidence interval is from -2.23 to 0.41. Because this range is so wide, we cannot conclude much more from this result.

It seems that getting just the first two questions right is not a great result, but it also seems that our 5-question test is not a very powerful measure.

Exercise 7

We simply scale by the SD, 15, and then add the mean, 100:

conf * 15 + 100

## [1]  66.54 106.13

Exercise 8

We define test A to be the first three items, and test B to be the last two items. We use range [-2,+2]. For the 2PL-model, we want to calculate the relative efficiency, that is, I_A(f) / I_B(f), or Var(e_B)/Var(e_A).

information.a <- information(fit2PL, range = c(-2, 2), items = 1:3)
information.b <- information(fit2PL, range = c(-2, 2), items = 4:5)
information.a$InfoRange/information.b$InfoRange

## [1] 2.384

Exercise 9

The same as exercise 8, except now for the linear factor model and range [-1,+1].

The range is not important, since the information is linear over the range, and both A and B are over the same range. So, we simply divide total information coefficients:

information.a.lin <- estimates["l1"]^2/estimates["e1"] + estimates["l2"]^2/estimates["e2"] + 
    estimates["l3"]^2/estimates["e3"]
information.b.lin <- estimates["l4"]^2/estimates["e4"] + estimates["l5"]^2/estimates["e5"]
(information.a.lin/information.b.lin)[[1]]

## [1] 2.299

Comparing this result to Exercise 8, we see that the relative efficiency is similar, which is not an unexpected result, considering that both tests are estimated to do do decently on the given ranges (as seen in Exercises 1-5).

Exercise 10

We load the lordif package, and find the Anxiety data as described in R's ?lordif help page, and produce the requested plots. Commentary per plot is at the bottom.

require("lordif")
data(Anxiety)
resp.data <- Anxiety[paste("R", 1:29, sep = "")]
gender <- Anxiety$gender
gender.DIF <- lordif(resp.data, gender)
print(gender.DIF)
plot.lordif(gender.DIF)

plot of chunk unnamed-chunk-12

In this analysis we search for items that are different estimators for man than they are for women. The lordif function finds such items and “flags” them. Here, those are items 6, 7 and 19.

The plots:

The first plot, Trait Distributions, shows the density graphs for the Reference group (gender 0, Male) vs the Focal group, female. Here, theta is the IRT theta estimate. We see that women are more likely to have a high estimated theta, which is an estimate of anxiety.
The second [type of] plot consists of four subplots:
1. The Item True Score Function shows the ICC (i.e. estimated answer as a function of theta) for men (black) and women (red).
2. The Differences in Item True Score Functions is simply the absolute difference between the two curves in the previous graph.
3. The Item Response Functions shows the item response functions per group, for the possible answers for that question.
4. The Impact (Weighted by Density) plot shows the difference between the two groups in the previous plot; the differences are summed, weighted by the female group density.
Analyzing the three items:
- For item 6, we see that with the same theta (estimated anxiety), men will answer higher than women, for theta around 0.
- Anxious women will give much higher answers to question 7.
- Anxious men will give somewhat higher answers to question 19.
The plot group has two subplots:
1. Boxplot of difference between theta estimates with and without taking group into effect.
2. Scatterplot with pre-group theta on x-axis, and theta adjustment when taking group into account on the y axis, with colors to indicate subgroup. Note that negative y-values mean the estimated theta went up after accounting for groups.
The boxplot is not very interesting here (besides looking symmetric)– the scatterplot shows that the estimated theta scores for most women went up when accounting for gender, especially for low initial theta. For men the effect is smaller, but reverse.