Notes on MMLE Conditioning Variables and Facets

1. Conditioning Variables for Marginal Maximum Likelihood Estimation

Under Marginal Maximum Likelihood Estimation, we have the additional assumption that ability thetas are normally distributed. Where the population of interest is split into subgroups, for example males and females, the assumption of normality is applied by assuming that each of the two groups has its own underlying normal distribution.

1.1 Categorical conditioning variables

It is obviously important that we get the population assumption right, otherwise our estimates of the ability of the population will be incorrect. Therefore, in our data, we need to correctly categorise sub-population groups.

When this is done, the population mean becomes the grand mean, and \(\alpha\) becomes the coefficient for the associated dummy-coded regressor.

When these specifications are applied to the model, plausible values drawn from it will account for the different prior population distribution patterns.

We can use simulation to illustrate two different distributional patterns for male and female subpopulations.

####################### Generate response data with male and female distributions #######################
rm(list = ls())                                               # clears global environment
library(TAM)                                                  # loads IRT package
set.seed(6778)                                                # sets seed so all this can be replicated
N <- 3000                                                     # sets number of students
theta <- c(rnorm(N/2,mean=0,sd=1.5),rnorm(N/2,mean=.5,sd=1))  # defines real male-female ability distributions
I <- 20                                                       # defines number of items
delta <- seq( -2 , 2 , len=I )                                # defines item difficulty vector
p1 <- plogis( outer( theta , delta , "-" ) )                  # creates probability (P) matrix
resp <- 1 * ( p1 > matrix( runif( N*I ) , nrow=N , ncol=I ) ) # generates actual item response data
colnames(resp) <- paste("I" , 1:I, sep="")                    # adds column names
gender <- rep(1:2,each=N/2)                                   # generates categorical variables

#################### Account for the different male-female ability distributions ########################
mod1 <- tam.mml(resp, group=gender)   # group is gender (it could include a large number of categories)
summary(mod1)
PV <- tam.pv(mod1)
pv1 <- PV$pv$PV1.Dim1
G1 <- pv1[gender==1]  # male
G2 <- pv1[gender==2]  # female
mean(G1)
mean(G2)
sd(G1)
sd(G2)

1.1.1 Results for conditioning variables when specifying categoric group

The results for the model (PVs) that includes the specification of the conditioning variables is as follows:

————————————————————
EAP Reliability
[1] 0.845
————————————————————
Covariances and Variances
Group1 Group2
2.350 1.058
————————————————————
Correlations and Standard Deviations (in the diagonal)
Group1 Group2
1.533 1.029
————————————————————
Regression Coefficients
[,1] [1,] 0.00000
[2,] 0.52928
*————————————————————

The main results (summary) outputted above, pertaining to the population of interest, note the standard deviation for males (Group 1) as SD= 1.533, and the standard deviation for females (Group 2) as SD=1.029. These values being almost identical to the real mean and standard deviations. Also, the regression coefficient is 0.529 for the second group, illustrating that the correct difference in mean theta values between males and females has been correctly captured and specified in the model.

When we focus on the plausible values, we also get similar results with the male group mean and standard deviation as M = 0.01, and SD = 0.56, and the females as M = 0.53 and SD = 1.04.

1.1.2 Results for conditioning variables when NOT specifying categoric group

Using the same response data above [resp], we can run the model again, but this time we do not specify the two groups of interest.

mod1 <- tam.mml(resp, group=gender)

Results are as follows:

————————————————————
EAP Reliability
[1] 0.84
————————————————————
Covariances and Variances
[,1] [1,] 1.747
————————————————————
Correlations and Standard Deviations (in the diagonal)
[,1] [1,] 1.322
————————————————————
Regression Coefficients
[,1] [1,] 0
*————————————————————

Obviously, the summary produces no no group-related output as the two gender groups are not specified in the model. To note, though, the overall SD is 1.322 (between the male and female SD of 1.5 and 1.0, respectively).
In terms of results for the plausible values, we get the following. For males, the results are M = -0.22 and SD = 1.47. For females, M = 0.23 and SD = 1.15. These estimates do not correctly capture the prior known ability distributions of these two groups.

1.2 Continuous conditioning variables

When the conditioning variables are categoric (e.g., male & female), both the mean and standard deviations of the respective subgroups can be accounted for and modelled. However, it is possible to model continuous regressors like student age, for example.

We can also model binary variables (such as gender) as continuous regressors. We may do this as follows:

####################### Generate response data with male and female distributions #######################
rm(list = ls())                                               # clears global environment
library(TAM)                                                  # loads IRT package
set.seed(6778)                                                # sets seed so all this can be replicated
N <- 3000                                                     # sets number of students
theta <- c(rnorm(N/2,mean=0,sd=1.5),rnorm(N/2,mean=.5,sd=1))  # defines real male-female ability distributions
I <- 20                                                       # defines number of items
delta <- seq( -2 , 2 , len=I )                                # defines item difficulty vector
p1 <- plogis( outer( theta , delta , "-" ) )                  # creates probability (P) matrix
resp <- 1 * ( p1 > matrix( runif( N*I ) , nrow=N , ncol=I ) ) # generates actual item response data
colnames(resp) <- paste("I" , 1:I, sep="")                    # adds column names
gender <- rep(1:2,each=N/2)                                   # generates categorical variables

########### Account for the different male-female ability distributions (means  not variance) #############
mod1R <- tam.mml(resp, Y=gender)  # Y is the gender variable (could be a matrix of many variables)
summary(mod1R)
PV <- tam.pv(mod1R)
pv1 <- PV$pv$PV1.Dim1
G1R <- pv1[gender==1]
G2R <- pv1[gender==2]
mean(G1R) # male
mean(G2R) # female
sd(G1R)   # male
sd(G2R)   # female

The results are as follows:

————————————————————
EAP Reliability
[1] 0.842
————————————————————
Covariances and Variances
[,1] [1,] 1.67
————————————————————
Correlations and Standard Deviations (in the diagonal)
[,1] [1,] 1.292
————————————————————
Regression Coefficients
[,1] [1,] 0.0000
[2,] 0.5423
*————————————————————
Standardized Coefficients
parm dim est StdYX StdX StdY
1 Intercept 1 0.0000 NA NA NA
2 Y1 1 0.5423 0.2054 0.2712 0.4107

** Explained Variance R^2
[1] 0.0422
** SD Theta
[1] 1.3205
** SD Predictors
Intercept Y1
0.0000 0.5001

In this instance, only one variance coefficient is returned. In this case the standard deviation was 1.292, almost half way between the two genders’ true standard deviation. Although, the regression coefficient for females (group 2) was 0.5423, in accordance with the true difference. This means that one unit shift in gender (from 0 to 1) is associated with a 0.5423 unit shift in ability thetas.

In this instance, the plausible values for males were well aligned at M = 0.55 and SD = 1.08, and for the females at M = 1.46 and SD = 1.14.

2. Facets, variables related to items

Facets pertain to factors associated with the items. For example, it may be that a paper-and-pencil format may have been administered to half the students whilst an online format may have been administered to the other half of the students. Also, when unique marker identity is known and specified for each student, facets may also be specified. In this instance, a variable allocating a unique number to the each marker should be specified as a facet.

2.1 Facets analysis pertaining to items and delivery mode

In the following example, we group 1 receives the test online and doesn’t do so well (items more difficult, more positive), and group 2 receives the test offline and does better overall (items easier, more negative). This may be because the pen-and-paper format may have helped students through mathematical working.

################## Generate response data with off and online ability distributions #######################
rm(list = ls())                                               # clears global environment
library(TAM)                                                  # loads IRT package
set.seed(6778)                                                # sets seed so all this can be replicated
N <- 3000                                                     # sets number of students
theta <- c(rnorm(N/2,mean=0,sd=1.5),rnorm(N/2,mean=.5,sd=1.5))# defines on- and off-line ability distributions
I <- 20                                                       # defines number of items
delta <- seq( -2 , 2 , len=I )                                # defines item difficulty vector
p1 <- plogis( outer( theta , delta , "-" ) )                  # creates probability (P) matrix
resp <- 1 * ( p1 > matrix( runif( N*I ) , nrow=N , ncol=I ) ) # generates actual item response data
colnames(resp) <- paste("I" , 1:I, sep="")                    # adds column names
mode<- rep(1:2,each=N/2)                                      # generates categorical variable of mode 1 & 2

#################### Assess the effect of different test formats: on- and off-line  ######################
formulaA <- ~ item+mode                         # this is the formula for putting gender on the facet side 
mod2 <- TAM::tam.mml.mfr( resp= resp, facets= as.data.frame(mode), formulaA = formulaA )
summary(mod2)
PV <- tam.pv(mod2)
pv1 <- PV$pv$PV1.Dim1
F1 <- pv1[mode==1]
F2 <- pv1[mode==2]
mean(F1)  # format 1 online Mean
mean(F2)  # format 2 offline Mean
sd(F1)    # format 1 online SD
sd(F2)    # format 2 offline SD

The output in this instance provides multiple results. First the delivery is entered as an item and its relative difficulty (around zero) is estimated.

Item Facet Parameters Xsi
parameter facet xsi se.xsi
1 I1 item -2.216 0.056
2 I2 item -2.106 0.055
3 I3 item -1.822 0.052
4 I4 item -1.631 0.050
5 I5 item -1.414 0.048
6 I6 item -1.163 0.047
7 I7 item -0.966 0.046
8 I8 item -0.868 0.045
9 I9 item -0.588 0.044
10 I10 item -0.374 0.044
11 I11 item -0.182 0.044
12 I12 item 0.024 0.044
13 I13 item 0.219 0.044
14 I14 item 0.527 0.044
15 I15 item 0.739 0.045
16 I16 item 0.958 0.046
17 I17 item 1.013 0.046
18 I18 item 1.352 0.048
19 I19 item 1.545 0.049
20 I20 item 1.776 0.051
21 mode1 mode 0.269 0.011
22 mode2 mode -0.269 0.011

In this instance, delivery mode 1 (online) is associated with an increase in item difficulty of 0.269 (small standard error of 0.011 given high n). This makes sense given that the first group who experienced that delivery mode did not perform so well on that item.

Because “item+mode” was specified as the formula, the following additional output was also provided,

Item Parameters -A*Xsi
item N M xsi.item AXsi_.Cat1 B.Cat1.Dim1
I1-mode1 I1-mode1 1500 0.803 -1.947 -1.947 1
I1-mode2 I1-mode2 1500 0.858 -2.485 -2.485 1
I10-mode1 I10-mode1 1500 0.520 -0.105 -0.105 1
I10-mode2 I10-mode2 1500 0.609 -0.644 -0.644 1
I11-mode1 I11-mode1 1500 0.491 0.088 0.088 1
I11-mode2 I11-mode2 1500 0.571 -0.451 -0.451 1
I12-mode1 I12-mode1 1500 0.451 0.293 0.293 1
I12-mode2 I12-mode2 1500 0.539 -0.246 -0.246 1
I13-mode1 I13-mode1 1500 0.413 0.489 0.489 1
I13-mode2 I13-mode2 1500 0.510 -0.050 -0.050 1
I14-mode1 I14-mode1 1500 0.357 0.796 0.796 1
I14-mode2 I14-mode2 1500 0.461 0.258 0.258 1
I15-mode1 I15-mode1 1500 0.325 1.008 1.008 1
I15-mode2 I15-mode2 1500 0.422 0.470 0.470 1
I16-mode1 I16-mode1 1500 0.298 1.227 1.227 1
I16-mode2 I16-mode2 1500 0.378 0.689 0.689 1
I17-mode1 I17-mode1 1500 0.286 1.283 1.283 1
I17-mode2 I17-mode2 1500 0.373 0.744 0.744 1
I18-mode1 I18-mode1 1500 0.235 1.622 1.622 1
I18-mode2 I18-mode2 1500 0.321 1.083 1.083 1
I19-mode1 I19-mode1 1500 0.202 1.814 1.814 1
I19-mode2 I19-mode2 1500 0.301 1.276 1.276 1
I2-mode1 I2-mode1 1500 0.787 -1.837 -1.837 1
I2-mode2 I2-mode2 1500 0.850 -2.375 -2.375 1
I20-mode1 I20-mode1 1500 0.191 2.045 2.045 1
I20-mode2 I20-mode2 1500 0.251 1.507 1.507 1
I3-mode1 I3-mode1 1500 0.755 -1.553 -1.553 1
I3-mode2 I3-mode2 1500 0.815 -2.091 -2.091 1
I4-mode1 I4-mode1 1500 0.733 -1.362 -1.362 1
I4-mode2 I4-mode2 1500 0.787 -1.900 -1.900 1
I5-mode1 I5-mode1 1500 0.683 -1.145 -1.145 1
I5-mode2 I5-mode2 1500 0.777 -1.683 -1.683 1
I6-mode1 I6-mode1 1500 0.645 -0.894 -0.894 1
I6-mode2 I6-mode2 1500 0.742 -1.432 -1.432 1
I7-mode1 I7-mode1 1500 0.629 -0.697 -0.697 1
I7-mode2 I7-mode2 1500 0.696 -1.235 -1.235 1
I8-mode1 I8-mode1 1500 0.593 -0.599 -0.599 1
I8-mode2 I8-mode2 1500 0.701 -1.138 -1.138 1
I9-mode1 I9-mode1 1500 0.577 -0.319 -0.319 1
I9-mode2 I9-mode2 1500 0.624 -0.857 -0.857 1

Results above suggest that the easiest item was item 1 delivered by way of pen-and-paper format. This item had an item difficulty of -2.485. The most difficult item was Item 20 delivered online which had an item difficulty of 2.045.

Also, the following regular output was also generated:

————————————————————
EAP Reliability
[1] 0.866
————————————————————
Covariances and Variances
[,1] [1,] 2.334
————————————————————
Correlations and Standard Deviations (in the diagonal)
[,1] [1,] 1.528
————————————————————
Regression Coefficients
[,1] [1,] 0
*————————————————————

2.2 Facets analysis pertaining to items, delivery mode, and delivery mode x item interaction

In this analysis, we look a little further into the effects of delivery mode by estimating the effect of delivery mode and items. It may be that certain items delivered in a certain way may be operating particularly differently.

################## Generate response data with off and online ability distributions #######################
rm(list = ls())                                               # clears global environment
library(TAM)                                                  # loads IRT package
set.seed(6778)                                                # sets seed so all this can be replicated
N <- 3000                                                     # sets number of students
theta <- c(rnorm(N/2,mean=0,sd=1.5),rnorm(N/2,mean=.5,sd=1.5))# defines on- and off-line ability distributions
I <- 20                                                       # defines number of items
delta <- seq( -2 , 2 , len=I )                                # defines item difficulty vector
p1 <- plogis( outer( theta , delta , "-" ) )                  # creates probability (P) matrix
resp <- 1 * ( p1 > matrix( runif( N*I ) , nrow=N , ncol=I ) ) # generates actual item response data
colnames(resp) <- paste("I" , 1:I, sep="")                    # adds column names
mode<- rep(1:2,each=N/2)                                      # generates categorical variable of mode 1 & 2

############ Assess the effect of different test formats and interaction: on- and off-line  ###############
# Use facet, have interaction terms for DIF
formulaA <- ~ item+mode+mode*item
mod3 <- TAM::tam.mml.mfr( resp= resp , facets= as.data.frame(mode) , formulaA = formulaA )
summary(mod3)
PV <- tam.pv(mod3)
pv1 <- PV$pv$PV1.Dim1
H1 <- pv1[mode==1]
H2 <- pv1[mode==2]
mean(H1)
mean(H2)
sd(H1)
sd(H2)

For this output we note the interaction term between the item and the mode. If we divide the interaction term by the standard error, we can ascertain the level of statistical signiifcance of the interaction. The results for this run are presented below:

Item Facet Parameters Xsi
parameter facet xsi se.xsi
1 I1 item -2.288 0.058
2 I2 item -2.092 0.055
3 I3 item -1.793 0.051
4 I4 item -1.629 0.050
5 I5 item -1.432 0.048
6 I6 item -1.185 0.046
7 I7 item -0.979 0.045
8 I8 item -0.865 0.044
9 I9 item -0.592 0.043
10 I10 item -0.374 0.043
11 I11 item -0.188 0.042
12 I12 item 0.003 0.042
13 I13 item 0.205 0.042
14 I14 item 0.533 0.043
15 I15 item 0.722 0.043
16 I16 item 0.951 0.044
17 I17 item 1.053 0.045
18 I18 item 1.370 0.047
19 I19 item 1.553 0.048
20 I20 item 1.755 0.050
21 mode1 mode 0.274 0.010
22 mode2 mode -0.274 0.010
23 I1:mode1 item:mode 0.108 0.038, 2.84, p = 0.004482
24 I2:mode1 item:mode 0.040 0.037, 1.08, p = 0.280142
25 I3:mode1 item:mode -0.021 0.036
26 I4:mode1 item:mode -0.031 0.035
27 I5:mode1 item:mode 0.098 0.035
28 I6:mode1 item:mode 0.087 0.034
29 I7:mode1 item:mode -0.027 0.033
30 I8:mode1 item:mode 0.067 0.033
31 I9:mode1 item:mode -0.115 0.033, 3.48, p = 0.000493 32 I10:mode1 item:mode -0.011 0.033
33 I11:mode1 item:mode -0.034 0.032
34 I12:mode1 item:mode -0.006 0.032
35 I13:mode1 item:mode 0.011 0.032
36 I14:mode1 item:mode 0.009 0.033
37 I15:mode1 item:mode 0.014 0.033
38 I16:mode1 item:mode -0.046 0.033
39 I17:mode1 item:mode -0.069 0.034
40 I18:mode1 item:mode -0.035 0.034
41 I19:mode1 item:mode 0.038 0.035
42 I20:mode1 item:mode -0.076 0.148
43 I1:mode2 item:mode -0.108 0.038
44 I2:mode2 item:mode -0.040 0.037
45 I3:mode2 item:mode 0.021 0.036
46 I4:mode2 item:mode 0.031 0.035
47 I5:mode2 item:mode -0.098 0.035
48 I6:mode2 item:mode -0.087 0.034
49 I7:mode2 item:mode 0.027 0.033
50 I8:mode2 item:mode -0.067 0.033
51 I9:mode2 item:mode 0.115 0.033
52 I10:mode2 item:mode 0.011 0.033
53 I11:mode2 item:mode 0.034 0.032
54 I12:mode2 item:mode 0.006 0.032
55 I13:mode2 item:mode -0.011 0.032
56 I14:mode2 item:mode -0.009 0.033
57 I15:mode2 item:mode -0.014 0.033
58 I16:mode2 item:mode 0.046 0.033
59 I17:mode2 item:mode 0.069 0.034
60 I18:mode2 item:mode 0.035 0.034
61 I19:mode2 item:mode -0.038 0.035
62 I20:mode2 item:mode 0.076 0.148

For example, to determine the interaction effect between delivery and mode, a total 40 hypothesis tests are run from lines 23 to 62. This includes tests across the 2 delivery modes for each of the 20 items.

We can generate a z-score by dividing the estimate, Xsi, by its standard error, se.xsi. I have done this for item 1 mode 1 and item 2 mode 1, also item 9, mode 1.

There appears to be an interaction effect between item 1 and mode 1, however some Bonferroni adjustment is in order. The p value .05 divided by 40 is .00125, therefore only the interaction effect between item 9 and mode 1 (online) is significant. The combination makes the item easier than it should be.

If one had to speculate as to why (yes, I know that this is a simulation, but let’s have fun anyway), it may be because the computer interface provides some direction (or unintentional hint) toward the answer that guides students. More investigation is necessary (I know that this is a chance artefact anyway!).

Summary

To summarise, we use the following models for the following purposes:

TAM::tam.mml(resp, group=gender):
To account for different mean and SDs of subgroups when population parameters are of research interest (categoric regressors).
TAM::tam.mml(resp, Y=gender):
To account for the linear effect of covariates such as SES and age. Gender is modelled here as a binary covariate.
formulaA <- ~ item+mode
TAM::mod2 <- TAM::tam.mml.mfr( resp= resp, facets= as.data.frame(mode), formulaA = formulaA ):
To assess the influence of the effect of factors related to items such as test delivery mode and marker identity. In this instance the item and its specific item facet is modelled as an item and a difficulty estimate is provided for each respective item-facet.
formulaA <- ~ item+mode+mode*item
TAM::mod3 <- TAM::tam.mml.mfr( resp= resp , facets= as.data.frame(mode) , formulaA = formulaA ):
To assess the interaction effect between items and facets. It may be that a particular questino delivered in a particular way may be causing some bias. In this instance, z-scores are generated and associated p values a ascertained. Bonferroni correction probably should be applied.