Read Jaeger 2008, “Categorical Data Analysis” carefully, working through each step of the analysis. Then, respond to the following questions. You may also find it helpful to read Arnon 2010 for context.
Categorical outcomes are traditionally analyzed using ANOVA over proportions/percentages. Take binary outcomes (A vs B) for example, if the proportion of A is p, then the proportion of B is 1-p. The traditional approach uses ANOVA to infer/predict p, which essentially assumes that p is represented by a linear combination of predictors (plus some noise). Jaeger is against this view, because of two main reasons. First, p is by definition within between 0 and 1, but ANOVA, which is essentially a special case of linear regression, could assign non-zero probabilities beyond [0, 1], which makes no sense. Second, if we assume the outcome is binomially distributed, the variance of the proportion p depends on p (the closer p is to 0 or 1, the greater the changes of variance will be caused by changes of p), which violates ANOVA’s assumption of homogeneous variances.
Jaeger instead recommends using logistic regression, which transforms probabilities into logit space (in the form of log-odds), and does linear regression afterwards. This overcomes the previous two problems. First, [0, 1] corresponds the whole real numbers in the logit space, thus the result of the linear regression in the logit space is always meaningful. Second, logit transformatio captures the fact that differences in probabilities around 0.5 do not matter as much as the same changes close to 0 or 1.
Traditionally, people use ANOVA instead of ordinary logit models because the latter does not model random subject or item effects. Mixed logit models, which are a type of Generalized Linear Mixed Model, address this issue by representing the logit of the outcome probability as the linear combination of fixed effects and conditional random effects (subjects, items, etc.).
The standard errors in different conditions (with different probabilities) are actually different, which violates the basic assumption of ANOVA.
To load the data into R, use
d <- data.frame(read.delim("inbal.tab"))
You’ll want to restrict attention to the comprehension studies, and remove null data so that the ANOVAs will run. (As Jaeger mentions, the ability to handle missing and unbalanced data is another key advantage of mixed-effects models over ANOVA.)
d <- subset(d, modality == 1) # comprehension study only
d <- subset(d, Correct != "#NULL!" & !is.na(Extraction) & !is.na(NPType))
Finally, you’ll need to recode some numerically coded categorical variables as factors with informative names. Here’s a key:
NPType: 1 <=> lexical, 2 <=> pronoun.Extraction): 1 <=> subject, 2 <=> object.Correct: 1 <=> correct, 0 <=> incorrect.d$NPType <- ifelse(d$NPType==1, "lexical", "pronoun")
d$NPType <- as.factor(d$NPType)
d$Extraction <- ifelse(d$Extraction==1, "subject", "object")
d$Extraction <- as.factor(d$Extraction)
d$Correct <- ifelse(d$Correct==1, "correct", "incorrect")
d$Correct <- factor(d$Correct, levels=c("incorrect", "correct"))
d$child <- as.factor(d$child)
d$itemby4 <- as.factor(d$itemby4)
CorrectN is the numeric encoding of Correct: 1 <=> correct, 0 <=> incorrect.
d$CorrectN <- as.numeric(d$Correct) - 1 # factor representation starts from 1
tapply(d$CorrectN, list(factor(d$Extraction, levels=c("subject", "object")), d$NPType), mean)
## lexical pronoun
## subject 0.8956044 0.9565217
## object 0.6890244 0.8433735
StandardError <- function(v){
return(sd(v)/sqrt(length(v)))
}
tapply(d$CorrectN, list(factor(d$Extraction, levels=c("subject", "object")), d$NPType), StandardError)
## lexical pronoun
## subject 0.02272793 0.01507502
## object 0.03625657 0.02829441
The results are almost the same as Table 2 in the paper, except that the mean of the LexicalNP-SubjectRC condition is 0.896 here instead of 0.897 in the paper. Either it was a typo or the datasets are slightly different.
I will first replicate the results of logistic regressions and extensions. The results of ANOVA come afterwards.
The design package has been replaced by the rms package in CRAN, so I will use the latter instead.
library(rms)
lrm(Correct ~ Extraction * NPType, data = d)
##
## Logistic Regression Model
##
## lrm(formula = Correct ~ Extraction * NPType, data = d)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 696 LR chi2 51.98 R2 0.126 C 0.707
## incorrect 104 d.f. 3 g 0.920 Dxy 0.414
## correct 592 Pr(> chi2) <0.0001 gr 2.509 gamma 0.535
## max |deriv| 3e-10 gp 0.105 tau-a 0.105
## Brier 0.117
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept 0.7956 0.1687 4.72 <0.0001
## Extraction=subject 1.3537 0.2953 4.58 <0.0001
## NPType=pronoun 0.8880 0.2721 3.26 0.0011
## Extraction=subject * NPType=pronoun 0.0537 0.5133 0.10 0.9166
The result is almost identical to Jaeger’s, except that some SEs are slightly different, and the p-value for the NPType main effect is 0.0011, not <0.001 as reported by Jaeger.
library(lme4)
Below is the model specification from (15) in Jaeger’s paper.
glm.inter <- glmer(Correct ~ 1 + Extraction * NPType + (1 + Extraction * NPType | child),
data = d , family = "binomial")
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.258724
## (tol = 0.001, component 9)
summary(glm.inter)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: Correct ~ 1 + Extraction * NPType + (1 + Extraction * NPType |
## child)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 540.2 603.9 -256.1 512.2 682
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -6.7940 0.1060 0.2632 0.4735 1.0044
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## child (Intercept) 0.25295 0.5029
## Extractionsubject 0.67634 0.8224 0.68
## NPTypepronoun 0.12657 0.3558 0.95 0.60
## Extractionsubject:NPTypepronoun 0.09069 0.3011 0.61 0.43 0.82
## Number of obs: 696, groups: child, 24
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.8360 0.2053 4.073 4.65e-05 ***
## Extractionsubject 1.8271 0.5046 3.621 0.000294 ***
## NPTypepronoun 1.0804 0.3323 3.252 0.001148 **
## Extractionsubject:NPTypepronoun 0.4647 1.0983 0.423 0.672210
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Extrct NPTypp
## Extrctnsbjc -0.189
## NPTypepronn -0.346 0.230
## Extrctn:NPT 0.173 -0.394 -0.297
The results are similar to those reported in Table 5 and 6 in the paper, but not exactly the same. For example, while all other coefficients are close to those reported, the coefficient for the interaction term is 0.4647 here, as opposed to 0.59. Random subject effects and correlations are also not exactly the same as reported. This might be due to either differences in the datasets, or different versions of the package (since there is no analytic solution, only various approximation methods. Perhaps the approximation method changed in later versions of the lme4 package.).
glm.nointer <- glmer(Correct ~ 1 + Extraction * NPType + (1 + Extraction + NPType | child),
data = d , family = "binomial")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00768967
## (tol = 0.001, component 6)
Next, the model specification from (16) in Jaeger’s paper is as follows:
glm.e <- glmer(Correct ~ 1 + Extraction * NPType + (1 + Extraction | child),
data = d , family = "binomial")
summary(glm.e)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: Correct ~ 1 + Extraction * NPType + (1 + Extraction | child)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 527.6 559.4 -256.8 513.6 689
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -6.0604 0.1170 0.2792 0.4447 1.0804
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## child (Intercept) 0.3946 0.6282
## Extractionsubject 0.7617 0.8727 0.64
## Number of obs: 696, groups: child, 24
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.85958 0.22115 3.887 0.000102 ***
## Extractionsubject 1.91571 0.48400 3.958 7.55e-05 ***
## NPTypepronoun 0.95645 0.28288 3.381 0.000722 ***
## Extractionsubject:NPTypepronoun 0.09542 0.53940 0.177 0.859589
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Extrct NPTypp
## Extrctnsbjc -0.160
## NPTypepronn -0.468 0.212
## Extrctn:NPT 0.245 -0.314 -0.524
The results are close to those reported in Table 7 and 8 in the paper. Again, there are some differences, e.g. SE estimates, but in general they are comparable.
anova(glm.inter, glm.nointer, glm.e)
## Data: d
## Models:
## glm.e: Correct ~ 1 + Extraction * NPType + (1 + Extraction | child)
## glm.nointer: Correct ~ 1 + Extraction * NPType + (1 + Extraction + NPType |
## glm.nointer: child)
## glm.inter: Correct ~ 1 + Extraction * NPType + (1 + Extraction * NPType |
## glm.inter: child)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## glm.e 7 527.55 559.37 -256.78 513.55
## glm.nointer 10 532.32 577.77 -256.16 512.32 1.2348 3 0.7447
## glm.inter 14 540.24 603.87 -256.12 512.24 0.0804 4 0.9992
The model specification from (17) in Jaeger’s paper is as follows:
summary(glmer(Correct ~ 1 + Extraction * NPType + (1 + Extraction | child) + (1 | itemby4),
data = d , family = "binomial"))
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: Correct ~ 1 + Extraction * NPType + (1 + Extraction | child) +
## (1 | itemby4)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 527.9 564.3 -256.0 511.9 688
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -6.3939 0.1164 0.2602 0.4404 1.1441
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## child (Intercept) 0.42052 0.6485
## Extractionsubject 0.76766 0.8762 0.62
## itemby4 (Intercept) 0.08672 0.2945
## Number of obs: 696, groups: child, 24; itemby4, 8
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.85050 0.24997 3.402 0.000668 ***
## Extractionsubject 1.96117 0.48873 4.013 6e-05 ***
## NPTypepronoun 0.99162 0.29144 3.402 0.000668 ***
## Extractionsubject:NPTypepronoun 0.06943 0.54516 0.127 0.898661
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Extrct NPTypp
## Extrctnsbjc -0.155
## NPTypepronn -0.429 0.230
## Extrctn:NPT 0.230 -0.322 -0.533
The results are close to those reported in Table 9 and 10 in the paper. Again, not perfect replication, but comparable.
Further complications are not necessary because all further random effects are highly correlated with the random intercept (rs > 0.8) and hence unnecessary.
summary(glmer(Correct ~ 1 + Extraction * NPType + (1 + Extraction | child) + (1 + Extraction| itemby4),
data = d , family = "binomial"))
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: Correct ~ 1 + Extraction * NPType + (1 + Extraction | child) +
## (1 + Extraction | itemby4)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 530.1 575.5 -255.0 510.1 686
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.3149 0.1154 0.2648 0.4321 1.2667
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## child (Intercept) 0.4477 0.6691
## Extractionsubject 0.7890 0.8882 0.54
## itemby4 (Intercept) 0.2055 0.4533
## Extractionsubject 0.3437 0.5862 -0.90
## Number of obs: 696, groups: child, 24; itemby4, 8
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.834049 0.283775 2.939 0.003291 **
## Extractionsubject 1.968000 0.539628 3.647 0.000265 ***
## NPTypepronoun 1.057549 0.307077 3.444 0.000573 ***
## Extractionsubject:NPTypepronoun 0.003758 0.554393 0.007 0.994591
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Extrct NPTypp
## Extrctnsbjc -0.339
## NPTypepronn -0.394 0.206
## Extrctn:NPT 0.218 -0.284 -0.554
summary(glmer(Correct ~ 1 + Extraction * NPType + (1 + Extraction | child) + (1 + Extraction* NPType| itemby4),
data = d , family = "binomial"))
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: Correct ~ 1 + Extraction * NPType + (1 + Extraction | child) +
## (1 + Extraction * NPType | itemby4)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 539.3 616.6 -252.7 505.3 679
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.4403 0.0914 0.2330 0.4317 1.1896
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## child (Intercept) 0.4550 0.6746
## Extractionsubject 0.8468 0.9202 0.55
## itemby4 (Intercept) 0.1113 0.3336
## Extractionsubject 0.1404 0.3746 -0.50
## NPTypepronoun 0.1517 0.3895 0.95 -0.76
## Extractionsubject:NPTypepronoun 2.8540 1.6894 -0.72 -0.24
##
##
##
##
##
##
## -0.45
## Number of obs: 696, groups: child, 24; itemby4, 8
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.8433 0.2593 3.252 0.001145 **
## Extractionsubject 2.0157 0.5257 3.834 0.000126 ***
## NPTypepronoun 1.1208 0.3638 3.081 0.002065 **
## Extractionsubject:NPTypepronoun 0.4186 0.9802 0.427 0.669354
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Extrct NPTypp
## Extrctnsbjc -0.198
## NPTypepronn -0.176 0.098
## Extrctn:NPT -0.072 -0.196 -0.419
For ANOVAs, conduct the analyses using the aov() function;
bonus points if you can also figure out how to do the same thing using a regression with lmer() or glmer(). (Note: for this question you’ll have to work out how to include random effects structure in an ANOVA in R, using the aov() function. This has a different syntax from lmer(). This is something that we didn’t cover in class, but using your Googling skills to hack it out is good practice navigating the R jungle.)
First, Jaeger mentioned that he discarded 2 items in the ANOVA analyses. We can see from below that items 3 and 5 are systematically missing.
d.item <- aggregate(CorrectN ~ itemby4 + Extraction + NPType, data=d, mean)
Hence we discard items 3 and 5.
df <- d[d$itemby4!=3 & d$itemby4!=5,]
df.subj <- aggregate(CorrectN ~ child + Extraction + NPType, data=df, mean)
df.item <- aggregate(CorrectN ~ itemby4 + Extraction + NPType, data=df, mean)
Using within-subject and within-items ANOVA, respectively
summary(aov(CorrectN ~ 1 + Extraction * NPType + Error(child / (Extraction * NPType)), data=df.subj))
##
## Error: child
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 23 1.169 0.05084
##
## Error: child:Extraction
## Df Sum Sq Mean Sq F value Pr(>F)
## Extraction 1 0.5056 0.5056 24.24 5.64e-05 ***
## Residuals 23 0.4797 0.0209
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: child:NPType
## Df Sum Sq Mean Sq F value Pr(>F)
## NPType 1 0.3000 0.30001 16.14 0.000538 ***
## Residuals 23 0.4275 0.01859
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: child:Extraction:NPType
## Df Sum Sq Mean Sq F value Pr(>F)
## Extraction:NPType 1 0.09584 0.09584 9.75 0.00479 **
## Residuals 23 0.22610 0.00983
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(CorrectN ~ 1 + Extraction * NPType + Error(itemby4 / (Extraction * NPType)), data=df.item))
##
## Error: itemby4
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 5 0.05633 0.01127
##
## Error: itemby4:Extraction
## Df Sum Sq Mean Sq F value Pr(>F)
## Extraction 1 0.12832 0.12832 10.21 0.0241 *
## Residuals 5 0.06285 0.01257
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: itemby4:NPType
## Df Sum Sq Mean Sq F value Pr(>F)
## NPType 1 0.07176 0.07176 19.67 0.00679 **
## Residuals 5 0.01824 0.00365
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: itemby4:Extraction:NPType
## Df Sum Sq Mean Sq F value Pr(>F)
## Extraction:NPType 1 0.02569 0.025689 12.59 0.0164 *
## Residuals 5 0.01020 0.002041
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
And for the combined analysis, the F-score \(F'=F_1 \times F_2 / (F_1 + F_2)\). For example, for the interaction term in Table 3, its F-value is
9.7 * 12.6 / (9.7 + 12.6)
## [1] 5.480717
Using within-subject ANOVA after arcsine-square-root transformations
trans.subj <- df.subj
trans.subj$CorrectN <- asin(sqrt(df.subj$CorrectN))
summary(aov(CorrectN ~ 1 + Extraction * NPType + Error(child / (Extraction * NPType)), data=trans.subj))
##
## Error: child
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 23 3.151 0.137
##
## Error: child:Extraction
## Df Sum Sq Mean Sq F value Pr(>F)
## Extraction 1 1.682 1.682 28.48 2.03e-05 ***
## Residuals 23 1.358 0.059
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: child:NPType
## Df Sum Sq Mean Sq F value Pr(>F)
## NPType 1 0.8416 0.8416 17.28 0.000381 ***
## Residuals 23 1.1202 0.0487
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: child:Extraction:NPType
## Df Sum Sq Mean Sq F value Pr(>F)
## Extraction:NPType 1 0.2575 0.25752 8.513 0.00775 **
## Residuals 23 0.6958 0.03025
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The spurious interaction persists.
We can do this by applying anova to the corresponding linear mixed-effects models.
anova(lmer(CorrectN ~ Extraction*NPType + (1|child) + (1|Extraction:child) + (1|NPType:child), data=df.subj) )
## Analysis of Variance Table
## Df Sum Sq Mean Sq F value
## Extraction 1 0.238286 0.238286 24.2397
## NPType 1 0.158677 0.158677 16.1414
## Extraction:NPType 1 0.095845 0.095845 9.7498
anova(lmer(CorrectN ~ Extraction*NPType + (1|itemby4) + (1|Extraction:itemby4) + (1|NPType:itemby4), data=df.item) )
## Analysis of Variance Table
## Df Sum Sq Mean Sq F value
## Extraction 1 0.023515 0.023515 11.298
## NPType 1 0.042304 0.042304 20.325
## Extraction:NPType 1 0.025689 0.025689 12.342
anova(lmer(CorrectN ~ Extraction*NPType + (1|child) + (1|Extraction:child) + (1|NPType:child), data=trans.subj) )
## Analysis of Variance Table
## Df Sum Sq Mean Sq F value
## Extraction 1 0.86174 0.86174 28.4851
## NPType 1 0.52280 0.52280 17.2814
## Extraction:NPType 1 0.25752 0.25752 8.5125