1 Background

Hebrew-speaking children with autism were asked to read aloud sentences containing heterophonic homographs - ambiguous words whose meaning and pronunciation depends on the context in which they are read. We used linear mixed random effects analyses to model reading accuracy as a function of children’s age, autism severity, and language and reading abilities.

Brock, J., Sukenik, N., & Friedmann, N. (in press). Individual differences in autistic children’s homograph reading: Evidence from Hebrew. Autism and Developmental Language Impairments.

2 Session set-up

Load libraries

library(lme4)
library(influence.ME)
library(plyr)
library(ggplot2)
library(knitr)
library(grid)
library(ReporteRs)
library(dplyr)
library(corrplot) 

sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.2
## 
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] corrplot_0.77       dplyr_0.5.0         ReporteRs_0.8.8    
##  [4] ReporteRsjars_0.0.2 knitr_1.15.1        ggplot2_2.2.1      
##  [7] plyr_1.8.4          influence.ME_0.9-8  lme4_1.1-12        
## [10] Matrix_1.2-7.1     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.8       nloptr_1.0.4      R.methodsS3_1.7.1
##  [4] R.utils_2.5.0     tools_3.3.2       digest_0.6.11    
##  [7] evaluate_0.10     tibble_1.2        gtable_0.2.0     
## [10] nlme_3.1-128      lattice_0.20-34   png_0.1-7        
## [13] DBI_0.6-1         shiny_1.0.0       yaml_2.1.14      
## [16] rJava_0.9-8       stringr_1.1.0     xml2_1.1.1       
## [19] gdtools_0.1.3     rprojroot_1.1     R6_2.2.0         
## [22] rvg_0.1.2         rmarkdown_1.3     minqa_1.2.4      
## [25] magrittr_1.5      backports_1.0.4   scales_0.4.1     
## [28] htmltools_0.3.5   MASS_7.3-45       splines_3.3.2    
## [31] assertthat_0.1    xtable_1.8-2      mime_0.5         
## [34] colorspace_1.3-2  httpuv_1.3.3      stringi_1.1.2    
## [37] lazyeval_0.2.0    munsell_0.4.3     R.oo_1.21.0

Import data

HomographData <- read.csv("data/HebrewHomographsData.csv")
HomographData$SentenceID <- factor(paste0("Sentence", HomographData$SentenceID))
HomographData = HomographData %>% rename(WordReading = TILTANWords, WordAssoc = WordWordAssociation, PictureNaming = ShemeshPictureNaming)

str(HomographData)
## 'data.frame':    468 obs. of  19 variables:
##  $ ChildID             : Factor w/ 18 levels "Child1","Child10",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ SentenceID          : Factor w/ 26 levels "Sentence1","Sentence10",..: 1 12 20 21 22 23 24 25 26 2 ...
##  $ OrthographicForm    : Factor w/ 13 levels "?ONA","HBIRA",..: 7 10 3 6 4 2 8 5 1 12 ...
##  $ Meaning             : Factor w/ 26 levels "barber","beard",..: 18 5 1 11 22 3 15 14 20 2 ...
##  $ CorrectPronunciation: Factor w/ 26 levels "habiRA","haBIra",..: 8 20 14 6 5 2 12 21 11 23 ...
##  $ Bias                : num  -0.0152 0.3182 -0.2273 -0.4394 0.3182 ...
##  $ Dominance           : Factor w/ 2 levels "Dominant","Subordinate": 2 1 2 2 1 1 2 1 2 2 ...
##  $ InitialCorrect      : int  1 0 1 1 1 1 1 1 1 1 ...
##  $ FinalCorrect        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ ErrorTypeIni        : Factor w/ 4 levels "Correct","HomographError",..: 1 2 1 1 1 1 1 1 1 1 ...
##  $ ErrorTypeFinal      : Factor w/ 4 levels "Correct","HomographError",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ HesitationBefore    : int  0 1 0 1 0 1 0 0 1 0 ...
##  $ HesitationAfter     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Age                 : num  11.3 11.3 11.3 11.3 11.3 11.3 11.3 11.3 11.3 11.3 ...
##  $ Gender              : Factor w/ 2 levels "f","m": 2 2 2 2 2 2 2 2 2 2 ...
##  $ CARS                : num  39 39 39 39 39 39 39 39 39 39 ...
##  $ WordReading         : num  0.971 0.971 0.971 0.971 0.971 ...
##  $ PictureNaming       : num  0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 ...
##  $ WordAssoc           : num  1 1 1 1 1 1 1 1 1 1 ...

3 Error analysis

Our analyses focused on correct responses. Here we provide a breakdown of error types for each participant.

HomographData$HomographError <- ifelse(HomographData$ErrorTypeIni=="HomographError",1,0)
HomographData$SurfaceError <- ifelse(HomographData$ErrorTypeIni=="SurfaceError",1,0)
HomographData$OtherError <- ifelse(HomographData$ErrorTypeIni=="OtherError",1,0)
ErrorData <- ddply(HomographData, .(ChildID), summarise, CorrectResponse=sum(InitialCorrect), HomographError=sum(HomographError), SurfaceError=sum(SurfaceError), OtherError=sum(OtherError))
kable(ErrorData, title="Error analysis", digits=2)
ChildID CorrectResponse HomographError SurfaceError OtherError
Child1 25 1 0 0
Child10 16 4 5 1
Child11 22 2 1 1
Child12 25 1 0 0
Child13 25 0 0 1
Child14 24 1 0 1
Child15 24 2 0 0
Child16 19 5 0 2
Child17 18 5 1 2
Child18 24 2 0 0
Child2 25 1 0 0
Child3 15 8 1 2
Child4 25 0 1 0
Child5 22 2 1 1
Child6 20 5 0 1
Child7 23 2 1 0
Child8 22 4 0 0
Child9 25 1 0 0

4 Logistic regression analyses

We analysed homograph reading accuracy using logistic regression with mixed random effects. Rather than looking at the proportion of correct trials for each participant, this approach attempts to model the likelihood of a correct response on each trial by each participant.

We began with a series of models that each included the child’s age and one other predictor as fixed factors and the identity of the participant and the item as random effects.

Model1a <- glmer(InitialCorrect ~ CARS + Age + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")
summary(Model1a)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: InitialCorrect ~ CARS + Age + (1 | SentenceID) + (1 | ChildID)
##    Data: HomographData
## 
##      AIC      BIC   logLik deviance df.resid 
##    360.7    381.4   -175.3    350.7      463 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.0517  0.1847  0.2649  0.3843  1.4404 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  SentenceID (Intercept) 0.8403   0.9167  
##  ChildID    (Intercept) 0.3319   0.5761  
## Number of obs: 468, groups:  SentenceID, 26; ChildID, 18
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  2.68779    1.95361   1.376   0.1689  
## CARS        -0.07078    0.03913  -1.809   0.0705 .
## Age          0.14113    0.07585   1.861   0.0628 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr) CARS  
## CARS -0.880       
## Age  -0.767  0.403
Model1b <- glmer(InitialCorrect ~ WordReading + Age + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")
summary(Model1b)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: InitialCorrect ~ WordReading + Age + (1 | SentenceID) + (1 |  
##     ChildID)
##    Data: HomographData
## 
##      AIC      BIC   logLik deviance df.resid 
##    361.7    382.4   -175.8    351.7      463 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.2911  0.1852  0.2682  0.4045  1.3605 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  SentenceID (Intercept) 0.8451   0.9193  
##  ChildID    (Intercept) 0.4302   0.6559  
## Number of obs: 468, groups:  SentenceID, 26; ChildID, 18
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.36631    1.20270  -1.136    0.256
## WordReading  3.03424    2.18760   1.387    0.165
## Age          0.08749    0.10925   0.801    0.423
## 
## Correlation of Fixed Effects:
##             (Intr) WrdRdn
## WordReading -0.563       
## Age         -0.119 -0.735
Model1c <- glmer(InitialCorrect ~ WordAssoc + Age + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")
summary(Model1c)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## InitialCorrect ~ WordAssoc + Age + (1 | SentenceID) + (1 | ChildID)
##    Data: HomographData
## 
##      AIC      BIC   logLik deviance df.resid 
##    360.7    381.5   -175.4    350.7      463 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.5982  0.1843  0.2647  0.3936  1.3703 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  SentenceID (Intercept) 0.8482   0.9210  
##  ChildID    (Intercept) 0.3667   0.6056  
## Number of obs: 468, groups:  SentenceID, 26; ChildID, 18
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -3.01340    1.76045  -1.712   0.0869 .
## WordAssoc    3.82250    2.20565   1.733   0.0831 .
## Age          0.14169    0.07801   1.816   0.0693 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) WrdAss
## WordAssoc -0.842       
## Age       -0.110 -0.420
Model1d <- glmer(InitialCorrect ~ PictureNaming + Age + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")
summary(Model1d)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: InitialCorrect ~ PictureNaming + Age + (1 | SentenceID) + (1 |  
##     ChildID)
##    Data: HomographData
## 
##      AIC      BIC   logLik deviance df.resid 
##    356.8    377.5   -173.4    346.8      463 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.0060  0.1828  0.2578  0.3853  1.7775 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  SentenceID (Intercept) 0.8548   0.9246  
##  ChildID    (Intercept) 0.2215   0.4706  
## Number of obs: 468, groups:  SentenceID, 26; ChildID, 18
## 
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   -8.556363   3.053977  -2.802  0.00508 **
## PictureNaming 11.963905   4.372151   2.736  0.00621 **
## Age           -0.006888   0.097760  -0.070  0.94383   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) PctrNm
## PictureNmng -0.961       
## Age          0.566 -0.764

To evaluate the contribution of each predictor, X, we asked whether removing it significantly reduced model fit. That is, whether X accounted for significant variation beyond Age. This was only true of Picture Naming.

We then removed the random effect of participant and determined whether this significantly reduced model fit, indicating that there was reliable variation in performance that was unaccounted for by Age and X. All predictors except Picture Naming left reliable individual variation unaccounted for.

ANOVAs.DF <- data.frame(Predictor=c("CARS", "WordReading", "WordAssoc", "PictureNaming"),
                       Chisq_Fixed=numeric(4),
                       p_Fixed=numeric(4),
                       Chisq_Random=numeric(4),
                       p_Random=numeric(4),
                       stringsAsFactors=FALSE)

for (i in 1:4) {
  Predictor <- ANOVAs.DF$Predictor[i]
  HomographData$X <- HomographData[[Predictor]]
  
  Model1 <- glmer(InitialCorrect ~ X + Age + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")
  Model2 <- glmer(InitialCorrect ~ Age + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")
  Model3 <- glmer(InitialCorrect ~ X + Age + (1|SentenceID), HomographData, family="binomial")

  Fixed.ANOVA <- anova(Model1, Model2)
  ANOVAs.DF$Chisq_Fixed[i] <- Fixed.ANOVA$Chisq[2] # second row of ANOVA output
  ANOVAs.DF$p_Fixed[i] <- Fixed.ANOVA$'Pr(>Chisq)'[2]
  
  Random.ANOVA <- anova(Model1, Model3)
  ANOVAs.DF$Chisq_Random[i] <- Random.ANOVA$Chisq[2]
  ANOVAs.DF$p_Random[i] <- Random.ANOVA$'Pr(>Chisq)'[2]
}
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge with max|grad| = 0.0384645 (tol =
## 0.001, component 1)
kable(ANOVAs.DF, digits=3)  
Predictor Chisq_Fixed p_Fixed Chisq_Random p_Random
CARS 2.966 0.085 4.330 0.037
WordReading 1.953 0.162 7.471 0.006
WordAssoc 2.876 0.090 5.298 0.021
PictureNaming 6.860 0.009 2.170 0.141

We also found that Picture Naming accounted for variation beyond each of the other predictors. Conversely, none of the other predictors account for significant variation beyond Picture Naming.

ANOVAs2.DF <- data.frame(Predictor=c("Age", "CARS", "WordReading", "WordAssoc"),
                       Chisq_4v5=numeric(4),
                       p_4v5=numeric(4),
                       Chisq_4v6=numeric(4),
                       p_4v6=numeric(4),
                       stringsAsFactors=FALSE)

for (i in 1:4) {
  Predictor <- ANOVAs2.DF$Predictor[i]
  HomographData$X <- HomographData[[Predictor]]
  
  Model4 <- glmer(InitialCorrect ~ X + PictureNaming + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")
  Model5 <- glmer(InitialCorrect ~ X + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")
  Model6 <- glmer(InitialCorrect ~ PictureNaming + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")

  ANOVA_4v5  <- anova(Model4, Model5)
  ANOVAs2.DF$Chisq_4v5[i] <- ANOVA_4v5$Chisq[2] # second row of ANOVA output
  ANOVAs2.DF$p_4v5[i] <- ANOVA_4v5$'Pr(>Chisq)'[2]
  
  ANOVA_4v6  <- anova(Model4, Model6)
  ANOVAs2.DF$Chisq_4v6[i] <- ANOVA_4v6$Chisq[2] # second row of ANOVA output
  ANOVAs2.DF$p_4v6[i] <- ANOVA_4v6$'Pr(>Chisq)'[2]
}

kable(ANOVAs2.DF, digits=3)  
Predictor Chisq_4v5 p_4v5 Chisq_4v6 p_4v6
Age 6.860 0.009 0.005 0.944
CARS 9.120 0.003 2.119 0.145
WordReading 6.977 0.008 1.466 0.226
WordAssoc 10.105 0.001 3.155 0.076

Export ANOVA tables as Word document.

doc <- docx()

is.num <- sapply(ANOVAs.DF, is.numeric)
ANOVAs.DF[is.num] <- lapply(ANOVAs.DF[is.num], round, 3)
doc <- addTitle(doc, "Table 2", level=2)
doc <- addFlexTable( doc, vanilla.table(ANOVAs.DF))

is.num <- sapply(ANOVAs2.DF, is.numeric)
ANOVAs2.DF[is.num] <- lapply(ANOVAs2.DF[is.num], round, 3)
doc <- addTitle(doc, "Table 3", level=2)
doc <- addFlexTable( doc, vanilla.table(ANOVAs2.DF))

writeDoc(doc, file = "output/doc/ANOVA_Table.docx")

5 Plots

First, we define a function, drawFitPlot, which plots homograph reading accuracy against a single predictor. Each plot contains three elements: the proportion correct for each participant, the function derived from the logistic regression analysis, and the predicted values for each participant (with prediction intervals).

drawFitPlot <- function(Predictor, Data = HomographData, XLabel = "Default", XJitter = 0) {
  
# Add predictor to the dataframe as a new column, X (this will be on the X-axis in the figure)
  Data$X <- Data[,Predictor]
  
# Using ddply, create a new dataframe, PlotData, with the predictor, X, and the observed proportion correct for each participant
  PlotData <- ddply(Data, .(ChildID, X), summarize, Correct=mean(InitialCorrect))

# Fit the model using lme4 with X as the fixed factor. We centre Age so that it's effectively controlled for in the fitting of X.
  Data$AgeC <- Data$Age - mean(Data$Age)
  fit <- glmer(InitialCorrect ~ X + AgeC + (1|SentenceID) + (1|ChildID), Data, family="binomial")
  
# Create a second dataframe (df) with the intercept and prediction interval for the random effect of ChildID. This is based on the code for creating caterpillar plots in ggplot2:
# http://stackoverflow.com/questions/13847936/in-r-plotting-random-effects-from-lmer-lme4-package-using-qqmath-or-dotplot
  randoms <- ranef(fit, condVar = TRUE)
  qq <- attr(ranef(fit, condVar = TRUE)[[2]], "postVar") # ChildID is the second random effect in the model so use "2"
  rand.interc<-randoms$ChildID
  df<-data.frame(RandIntercepts=randoms$ChildID[,1],
                 sd.interc=2*sqrt(qq[,,1:length(qq)]),
                 lev.names=rownames(rand.interc))
  
# Add this to the PlotData dataframe.
  PlotData <- cbind(PlotData, df)
  
# Extract the intercept (beta1) and fixed effect of X (beta2) from the model
  beta1 <- summary(fit)$coefficients[[1]]
  beta2 <- summary(fit)$coefficients[[2]]
  
# Next, calculate the predicted value for each child based on the model. This involves the usual process of multiplying the predictor (X) by the slope (beta2) and adding the intercept (beta1). However, we also add the random intercepts for each participant. Then, because this is logistic data, we transform to make it a probability.
  PlotData$Pred <- 1 / (1+exp(-1*(beta1 + beta2*PlotData$X + PlotData$RandIntercepts)))
  
# We can do the same thing to create prediction intervals for each participant
  PlotData$YMin <- 1 / (1+exp(-1*(beta1 + beta2*PlotData$X + PlotData$RandIntercepts - PlotData$sd.interc)))
  PlotData$YMax <- 1 / (1+exp(-1*(beta1 + beta2*PlotData$X + PlotData$RandIntercepts + PlotData$sd.interc)))
  
# Add jitter to x-vale for plotting (default is 0 jitter). Note that this is done after calculating the predicted values, which are based on the true X-values.
  PlotData$Jitter <- runif(nrow(PlotData), min=-1*XJitter, max=XJitter)
  PlotData$X <- PlotData$X + PlotData$Jitter
  
# Before plotting the figure, determine the label for the x-axis. XLabel is optional when calling this function. If it's missing, use Predictor1.
  if (XLabel == "Default") {XLabel <- Predictor1}  
  
  ggplot() +
    # Plot model fit
    stat_function(data=PlotData, fun=function(x) 1/(1+exp(-1*(beta1+x*beta2))), geom="line", color="orange") +
    # Plot predicted values based on the model
    geom_point(data=PlotData, aes(x=X, y=Pred), colour="orange", size=3) +
    # Plot error bars
    geom_errorbar(data=PlotData, aes(x=X, ymin=YMin, ymax=YMax), width=0,color="orange") +
    # Overlay the observed data
    geom_point(data=PlotData, aes(x=X, y=Correct), colour="black", size=3) + 
    # Add theme, axis scales, labels, remove legend etc
    xlab(XLabel) + ylab("Proportion Correct") +
    scale_y_continuous(breaks = seq(0, 1, 0.1)) +
    coord_cartesian(ylim = c(0.4, 1)) +
    theme_bw() +
    theme(axis.title=element_text(size=rel(1.3)),
          panel.grid.minor=element_blank(),
          panel.grid.major=element_blank(),
          legend.position = "none")
  }

We apply the drawFitPlot function to each predictor, create a multi-panel figure, and export this as a PNG file.

CARS_Plot <- drawFitPlot ("CARS", XLabel = "CARS Total Score", XJitter=1)
WordReading_Plot <- drawFitPlot ("WordReading", XLabel = "Word Reading", XJitter=0.01) +
  theme(axis.title.y=element_blank(), axis.text.y=element_blank())
WordAssoc_Plot <- drawFitPlot ("WordAssoc", XLabel = "Word Association", XJitter=0.01) +
  theme(axis.title.y=element_blank(), axis.text.y=element_blank())
PictureNaming_Plot <- drawFitPlot ("PictureNaming", XLabel = "Picture Naming", XJitter=0.01) +
  theme(axis.title.y=element_blank(), axis.text.y=element_blank())

plots <- list(CARS_Plot, WordReading_Plot, WordAssoc_Plot, PictureNaming_Plot)
grobs <- lapply(plots, ggplotGrob)
g <- do.call(gridExtra::cbind.gtable, grobs)

png(filename="output/figures/Figure1.png", width=1000, height=400)
grid.draw(g)
dev.off() 
## quartz_off_screen 
##                 2
knitr::include_graphics("output/figures/Figure1.png")

6 Supplementary analyses

6.1 Cook’s distance

Given the small sample size, it is important to ensure that the strong association between Homograph Reading and Picture Naming is not driven by a small number of data points with excessive leverage. We identified and excluded participants with a Cook’s distance 3 times greater than the mean for the sample.

PictureNaming.lmer <- glmer(InitialCorrect ~ PictureNaming + Age + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")
CooksDistance <- as.data.frame(cooks.distance(influence(PictureNaming.lmer, group="ChildID")))
CooksDistance <- cbind(ChildID = rownames(CooksDistance), CooksDistance)
CooksDistance$Outlier <- ifelse(CooksDistance$V1 > 3 * mean(CooksDistance$V1),1,0)

HomographData$Outlier <- CooksDistance$Outlier[match(HomographData$ChildID, CooksDistance$ChildID)]
Outliers <- HomographData[(HomographData$Outlier==1),]
NotOutliers <- HomographData[(HomographData$Outlier==0),]

We then re-ran the analyses, excluding the “outliers”. Picture naming remained the strongest predictor of homograph reading.

for (i in 1:4) {
  Predictor <- ANOVAs.DF$Predictor[i]
  NotOutliers$X <- NotOutliers[[Predictor]]
  
  Model1 <- glmer(InitialCorrect ~ X + Age + (1|SentenceID) + (1|ChildID), NotOutliers, family="binomial")
  Model2 <- glmer(InitialCorrect ~ Age + (1|SentenceID) + (1|ChildID), NotOutliers, family="binomial")
  Model3 <- glmer(InitialCorrect ~ X + Age + (1|SentenceID), NotOutliers, family="binomial")

  Fixed.ANOVA <- anova(Model1, Model2)
  ANOVAs.DF$Chisq_Fixed[i] <- Fixed.ANOVA$Chisq[2] # second row of ANOVA output
  ANOVAs.DF$p_Fixed[i] <- Fixed.ANOVA$'Pr(>Chisq)'[2]
  
  Random.ANOVA <- anova(Model1, Model3)
  ANOVAs.DF$Chisq_Random[i] <- Random.ANOVA$Chisq[2]
  ANOVAs.DF$p_Random[i] <- Random.ANOVA$'Pr(>Chisq)'[2]
}

kable(ANOVAs.DF, digits=3)  
Predictor Chisq_Fixed p_Fixed Chisq_Random p_Random
CARS 0.448 0.503 6.120 0.013
WordReading 1.509 0.219 6.436 0.011
WordAssoc 3.674 0.055 3.449 0.063
PictureNaming 11.539 0.001 0.032 0.857

We also replotted the model fit for Picture naming. Excluded participants (excess leverage) are shown as unfilled black circles. Comparison with the original plot (first panel) shows that these “outliers” had, if anything, been reducing the effect of picture naming rather than driving it.

PictureNaming_Plot <- drawFitPlot ("PictureNaming", XLabel = "Picture Naming", XJitter=0.01) +
  scale_x_continuous(breaks = seq(0.7, 1, 0.1), limits = c(0.7, 1.02))

OutlierPlot <- drawFitPlot ("PictureNaming", Data = NotOutliers, XLabel = "Picture Naming", XJitter=0.01) +
  geom_point(data=ddply(Outliers, .(ChildID, PictureNaming), summarize, Correct=mean(InitialCorrect)),
             aes(x=PictureNaming, y=Correct), colour="black", shape=1, size=3) +
  scale_x_continuous(breaks = seq(0.7, 1, 0.1), limits = c(0.7, 1.02)) +
  theme(axis.title.y=element_blank(), axis.text.y=element_blank())

plots <- list(PictureNaming_Plot, OutlierPlot)
grobs <- lapply(plots, ggplotGrob)
g <- do.call(gridExtra::cbind.gtable, grobs)

png(filename="output/figures/OutlierPlot.png", width=450, height=400)
grid.draw(g)
dev.off() 
## quartz_off_screen 
##                 2
knitr::include_graphics("output/figures/OutlierPlot.png")

6.2 Age as a binary variable

Participants were recruited from two school grades, resulting in a bimodal distribution of age. We therefore repeated the analyses, treating age as a binary (older vs younger) variable. Results were comparable to the analyses reported in the paper. Picture Naming was again the strongest predictor of performance and left no variation unaccounted for. The other three predictors were either significant or close to significant, but all left considerable variation unaccounted for.

HomographData$AgeGroup <- ifelse(HomographData$Age > 13, 0.5, -0.5)

for (i in 1:4) {
  Predictor <- ANOVAs.DF$Predictor[i]
  HomographData$X <- HomographData[[Predictor]]
  
  Model1 <- glmer(InitialCorrect ~ X + AgeGroup + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")
  Model2 <- glmer(InitialCorrect ~ AgeGroup + (1|SentenceID) + (1|ChildID), HomographData, family="binomial")
  Model3 <- glmer(InitialCorrect ~ X + AgeGroup + (1|SentenceID), HomographData, family="binomial")

  Fixed.ANOVA <- anova(Model1, Model2)
  ANOVAs.DF$Chisq_Fixed[i] <- Fixed.ANOVA$Chisq[2] # second row of ANOVA output
  ANOVAs.DF$p_Fixed[i] <- Fixed.ANOVA$'Pr(>Chisq)'[2]
  
  Random.ANOVA <- anova(Model1, Model3)
  ANOVAs.DF$Chisq_Random[i] <- Random.ANOVA$Chisq[2]
  ANOVAs.DF$p_Random[i] <- Random.ANOVA$'Pr(>Chisq)'[2]
}
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge with max|grad| = 0.0384647 (tol =
## 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge with max|grad| = 0.0386013 (tol =
## 0.001, component 1)
kable(ANOVAs.DF, digits=3)  
Predictor Chisq_Fixed p_Fixed Chisq_Random p_Random
CARS 3.789 0.052 6.840 0.009
WordReading 4.271 0.039 9.761 0.002
WordAssoc 3.942 0.047 7.796 0.005
PictureNaming 10.180 0.001 2.269 0.132

6.3 Pairwise correlations

Pairwise correlations show again that homograph reading was most strongly associated with Picture Naming.

SubjectData <- ddply(HomographData, .(ChildID, Age, CARS, WordReading, WordAssoc, PictureNaming), summarise, CorrectResponse=sum(InitialCorrect))

CorrDF <- select(SubjectData, -ChildID)
CorrDF = CorrDF %>% rename(Hom = CorrectResponse, WR = WordReading, WA = WordAssoc, PN = PictureNaming)
corrplot.mixed(cor(CorrDF))

6.4 Linear regression

We also performed a conventional linear regression on the logit-transformed proportion of correct responses on the homograph reading task. We considered whether the predictor, X, accounted for significant variation beyond that accounted for by age. Consistent with the logistic regression analyses, this was only true of Picture Naming.

SubjectData$ProportionCorrect <- SubjectData$CorrectResponse / 26
SubjectData$LogitCorrect <- log(SubjectData$ProportionCorrect / (1-SubjectData$ProportionCorrect))

Regression.DF <- data.frame(Predictor=c("CARS", "WordReading", "WordAssoc", "PictureNaming"),
                       F=numeric(4),
                       p=numeric(4),
                       stringsAsFactors=FALSE)

for (i in 1:4) {
  Predictor <- Regression.DF$Predictor[i]
  SubjectData$X <- SubjectData[[Predictor]]
  
  ModelA.lm <- lm(LogitCorrect ~ X + Age, data=SubjectData)
  ModelB.lm <- lm(LogitCorrect ~ Age, data=SubjectData)

  Fixed.ANOVA <- anova(ModelA.lm, ModelB.lm)
  Regression.DF$F[i] <- Fixed.ANOVA$F[2] # second row of ANOVA output
  Regression.DF$p[i] <- Fixed.ANOVA$'Pr(>F)'[2]
}

kable(Regression.DF, digits=3) 
Predictor F p
CARS 2.160 0.162
WordReading 2.752 0.118
WordAssoc 3.156 0.096
PictureNaming 8.581 0.010