I’m trying out this newly acquired technique–Linear Mixed Effects Modeling on my Master’s Thesis data. Previously I used an Anlysis of Variance (ANOVA), ANOVA compares aggregated condition means to determine significance, we may look information during this aggregation. LME on the other hand takes into consideration all entries of data, and fixed effects are determined from entire dataset, where random effects are determined from groupings. Regular multiple regression model cannot be applied to linguistic data due to non-independence occurring at the lowest level, but LME takes care of these non-independence through creating grouping variables at a higher level, and looks at how much variance in outcome variable can be explained by variation between groups, this is called random effects.

This experiement intends to examine Reaction Time difference between in lexical decision task conditions where the word to respond to is followed by a verbal prompt that is a common idiom, but with the last word removed. The experiemental condition word is the idiomatic ending, the control condition word is a matched control word to the idiomatic ending word. We had English idioms and Mandarin Chinese idioms which are translated word-for-word to English, the entire experiement was in English. Both Mandarin-English bilinguals and Native English speakers participated in the study. We hypothesized that Native English speakers would respond faster to English idiomatic endings than control endings, but show no difference to Mandarin Chinese idiom conditions. And it’s exploratory when it comes to Bilingual’s performance on translated Chinese idioms, showing RT difference or not would both be interesting for the current investigation.

Data Preprocessing

library(dplyr)
library(ggplot2)
library(Hmisc)
library(lme4)
library(afex)
library(knitr)
library(data.table)
#install.packages('zoo')
library(zoo) #replace missing with col mean

Convert categorical variables numeric coding to factors

df<-read.csv('revised_alldata.csv')

cols<-c('participant','prime','end','gender','idiom_lang','nat_LAN','phrase_condition','list')
df[cols]<-lapply(df[cols],factor)

#replacing missing numerical values with col mean
cols1<-c('EN_rating','CN_rating','freq_HAL','age')
df[cols1] <- sapply(na.aggregate(df[cols1]),as.numeric)

head(df)

Visualize mean and distribution of RT using Boxplot

1 is English Monolingual, 2 is Mandarin-English Bilingual

ggplot(df, aes(idiom_lang, exp_resp.rt, color = phrase_condition))+
  geom_boxplot()+
  ylab('Reaction time')+
  ylim(0,2000)+
  facet_wrap(~nat_LAN)

Both Bilingual and Monolinguals RT are heavily right skewed

Density distribution of RT after Standardization

#be EXTRA cautious when using 'levels' without specifying name
levels(df$nat_LAN)[levels(df$nat_LAN)=="2"] <- 'Bilinguals'
levels(df$nat_LAN)[levels(df$nat_LAN)=="1"] <- 'English Monolinguals'

df2 <- df %>% 
            group_by(nat_LAN) %>% 
            mutate(exp_resp.rt = scale(exp_resp.rt))
  
p <- ggplot(df2, aes(x = exp_resp.rt, colour = phrase_condition)) +
            xlab("Reaction Time") + 
            geom_density() + 
            facet_wrap(~nat_LAN)

p

Another representation of reaction time distribution in standard unit

Outlier Removal

Trim RT that are 2 sd away from the Native Language Group Mean

df3 <- df %>% 
            group_by(nat_LAN) %>% 
            filter(!(exp_resp.rt - median(exp_resp.rt)) > abs(2*sd(exp_resp.rt)))

#Get count of n's in the original compared to trimmed dataframe
1-nrow(df3)/nrow(df)
## [1] 0.0691818

about 7% of data was removed as outliers

Overview of grouped mean using Trimmed data

grouped_mean<-df3 %>% 
  group_by(nat_LAN, phrase_condition,idiom_lang) %>% 
  summarise(mean(exp_resp.rt),n=n())

kable(grouped_mean)
nat_LAN phrase_condition idiom_lang mean(exp_resp.rt) n
English Monolinguals Control CN 638.0277 1289
English Monolinguals Control EN 647.0982 1204
English Monolinguals Experimental CN 610.0051 1349
English Monolinguals Experimental EN 580.8312 1363
Bilinguals Control CN 727.0775 1149
Bilinguals Control EN 731.8735 1153
Bilinguals Experimental CN 697.6380 1218
Bilinguals Experimental EN 701.7999 1218

Visualization of Trimmed data

ggplot(df3, aes(idiom_lang, exp_resp.rt, color = phrase_condition))+
  geom_boxplot()+
  ylab('Reaction time')+
  ylim(0,2000)+
  facet_wrap(~nat_LAN)

Dummy code Binary categorical variables

#preserve df3 as the trimmed dataset with original variable levels
df4<-df3

levels(df4$nat_LAN)<-c(0,1)
colnames(df4)[colnames(df4)=="nat_LAN"] <- "IsBilingual"

levels(df4$idiom_lang)<-c(1,0)
colnames(df4)[colnames(df4)=="idiom_lang"] <- "IsChineseIdiom"

levels(df4$phrase_condition)<-c(0,1)
colnames(df4)[colnames(df4)=="phrase_condition"] <- "IsExperimental"

#can only dummy code factor variables
levels(df4$list)<-c(1,0)
colnames(df4)[colnames(df4)=="list"] <- "IsList1"

First using traditional ANOVA approach

Subject Analysis

m_aov1<-aov_ez('participant','exp_resp.rt',df4 ,between=('IsBilingual'),
            within=c('IsChineseIdiom','IsExperimental'))
## Warning: More than one observation per cell, aggregating the data using
## mean (i.e, fun_aggregate = mean)!
## Contrasts set to contr.sum for the following variables: IsBilingual
summary(m_aov1)
## 
## Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
## 
##                                              Sum Sq num Df Error SS den Df
## (Intercept)                               221144173      1  2724937    120
## IsBilingual                                 1122171      1  2724937    120
## IsChineseIdiom                                  913      1   117900    120
## IsBilingual:IsChineseIdiom                     9145      1   117900    120
## IsExperimental                               182414      1   147671    120
## IsBilingual:IsExperimental                     8995      1   147671    120
## IsChineseIdiom:IsExperimental                  9870      1   209531    120
## IsBilingual:IsChineseIdiom:IsExperimental     10585      1   209531    120
##                                             F value    Pr(>F)    
## (Intercept)                               9738.6841 < 2.2e-16 ***
## IsBilingual                                 49.4178 1.362e-10 ***
## IsChineseIdiom                               0.9295  0.336944    
## IsBilingual:IsChineseIdiom                   9.3083  0.002809 ** 
## IsExperimental                             148.2327 < 2.2e-16 ***
## IsBilingual:IsExperimental                   7.3092  0.007857 ** 
## IsChineseIdiom:IsExperimental                5.6527  0.019008 *  
## IsBilingual:IsChineseIdiom:IsExperimental    6.0619  0.015234 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Three-way interaction is significant at .05 level when aggregared across items.

Item Analysis

df6<-df4[!df4$end=="gold",]
df6<-df6[!df6$end=="water",]

m_aov2<-aov_ez('end','exp_resp.rt',df6,
               between=c('IsChineseIdiom','IsExperimental'),
               within=('IsBilingual'))
## Warning: More than one observation per cell, aggregating the data using
## mean (i.e, fun_aggregate = mean)!
## Contrasts set to contr.sum for the following variables: IsChineseIdiom, IsExperimental
summary(m_aov2)
## 
## Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
## 
##                                              Sum Sq num Df Error SS den Df
## (Intercept)                               167835774      1   951559    181
## IsChineseIdiom                                  317      1   951559    181
## IsExperimental                               128142      1   951559    181
## IsChineseIdiom:IsExperimental                  9218      1   951559    181
## IsBilingual                                  967676      1   278861    181
## IsChineseIdiom:IsBilingual                     6878      1   278861    181
## IsExperimental:IsBilingual                    11889      1   278861    181
## IsChineseIdiom:IsExperimental:IsBilingual      9866      1   278861    181
##                                              F value    Pr(>F)    
## (Intercept)                               31924.7448 < 2.2e-16 ***
## IsChineseIdiom                                0.0603  0.806262    
## IsExperimental                               24.3744 1.793e-06 ***
## IsChineseIdiom:IsExperimental                 1.7533  0.187130    
## IsBilingual                                 628.0879 < 2.2e-16 ***
## IsChineseIdiom:IsBilingual                    4.4644  0.035978 *  
## IsExperimental:IsBilingual                    7.7170  0.006047 ** 
## IsChineseIdiom:IsExperimental:IsBilingual     6.4039  0.012239 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Significant 3-way interaction when aggregated across subjects.

LME analysis

Now let’s go ahead and confirm ANOVA results using LME analysis, and combine subject analysis and item analysis into one model. LME also gives us regression coefficients on how each predictor affects the dependent variable, and by doing likelihood ratio tests, we will know whether specific parameter in the model is significant or not. LME is also more flexible in terms of being able to incorporate predictors at various group level, all defined in the same model.

Grand mean centering

Unconditional model-aggregate by Subject

lmer(exp_resp.rt~1+(1|participant), df4, REML=FALSE) 
## Linear mixed model fit by maximum likelihood  ['lmerModLmerTest']
## Formula: exp_resp.rt ~ 1 + (1 | participant)
##    Data: df4
##       AIC       BIC    logLik  deviance  df.resid 
## 126032.97 126054.58 -63013.48 126026.97      9940 
## Random effects:
##  Groups      Name        Std.Dev.
##  participant (Intercept)  86.7   
##  Residual                133.9   
## Number of obs: 9943, groups:  participant, 122
## Fixed Effects:
## (Intercept)  
##       670.7

Intra-Class Correlation (ICC) for between subjects

86.7/(86.7+133.9)
## [1] 0.393019

About 39% of the overall variances in RT can be attributed to the variances between subjects. Higher ICC indicate more similarity within cluster.

Unconditional model-aggregate by Prime

lmer(exp_resp.rt~1+(1|end), df4, REML=FALSE)
## Linear mixed model fit by maximum likelihood  ['lmerModLmerTest']
## Formula: exp_resp.rt ~ 1 + (1 | end)
##    Data: df4
##       AIC       BIC    logLik  deviance  df.resid 
## 128320.12 128341.74 -64157.06 128314.12      9940 
## Random effects:
##  Groups   Name        Std.Dev.
##  end      (Intercept)  44.53  
##  Residual             151.03  
## Number of obs: 9943, groups:  end, 187
## Fixed Effects:
## (Intercept)  
##       668.8

Intra-Class Correlation (ICC) for between-idioms

44.53/(44.53+151.03)
## [1] 0.2277051

About 22% of the overall variances in RT can be attributed to the variances between idioms.

All Subjects

#Experimental condition in my design is analogous to 'list' in split-plot design
m_full <-lmer(exp_resp.rt~IsExperimental*IsChineseIdiom*IsBilingual+
          (1|participant) + 
          (1|end), df4, REML=FALSE)

m_reduced<-lmer(exp_resp.rt~IsBilingual+IsExperimental+IsChineseIdiom+
          (1|participant) + 
          (1|end), df4, REML=FALSE)

#Likelihood ratio test
anova(m_full,m_reduced)

Full model is significantly better than the reduced model, therefore the three-way interaction between Native language, Idiom language, Experimental condition explains a significant proportion of variances in RT.

To get regression coefficients

summary(m_full)
## Linear mixed model fit by maximum likelihood . t-tests use
##   Satterthwaite's method [lmerModLmerTest]
## Formula: exp_resp.rt ~ IsExperimental * IsChineseIdiom * IsBilingual +  
##     (1 | participant) + (1 | end)
##    Data: df4
## 
##      AIC      BIC   logLik deviance df.resid 
## 124943.3 125022.6 -62460.7 124921.3     9932 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6327 -0.6799 -0.1472  0.5324  4.4867 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  end         (Intercept)  2177     46.66  
##  participant (Intercept)  5588     74.76  
##  Residual                15438    124.25  
## Number of obs: 9943, groups:  end, 187; participant, 122
## 
## Fixed effects:
##                                              Estimate Std. Error       df
## (Intercept)                                   648.266     11.959  245.966
## IsExperimental1                               -26.842      9.900  345.096
## IsChineseIdiom0                                12.366     10.621  241.725
## IsBilingual1                                   91.556     14.468  144.040
## IsExperimental1:IsChineseIdiom0               -45.178     14.257  309.955
## IsExperimental1:IsBilingual1                   -1.956      7.067 9646.993
## IsChineseIdiom0:IsBilingual1                   -6.100      7.219 9652.117
## IsExperimental1:IsChineseIdiom0:IsBilingual1   42.910     10.035 9651.268
##                                              t value Pr(>|t|)    
## (Intercept)                                   54.207  < 2e-16 ***
## IsExperimental1                               -2.711  0.00704 ** 
## IsChineseIdiom0                                1.164  0.24547    
## IsBilingual1                                   6.328 2.95e-09 ***
## IsExperimental1:IsChineseIdiom0               -3.169  0.00168 ** 
## IsExperimental1:IsBilingual1                  -0.277  0.78198    
## IsChineseIdiom0:IsBilingual1                  -0.845  0.39816    
## IsExperimental1:IsChineseIdiom0:IsBilingual1   4.276 1.92e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) IsExp1 IsChI0 IsBln1 IsE1:ICI0 IE1:IB ICI0:I
## IsExprmntl1 -0.424                                             
## IsChinsIdm0 -0.426  0.477                                      
## IsBilingul1 -0.583  0.084  0.079                               
## IsExp1:ICI0  0.294 -0.667 -0.719 -0.058                        
## IsExpr1:IB1  0.143 -0.336 -0.160 -0.251  0.234                 
## IsChnI0:IB1  0.140 -0.168 -0.326 -0.246  0.243     0.503       
## IE1:ICI0:IB -0.100  0.237  0.235  0.177 -0.335    -0.704 -0.720

Random effects section, SD tells us how much variability in RT is due to subjects and item (or ending in our case). You can see that subject has much more variability than item. Residual is the remaining unexplained error variance, there are the variances outside the control of my experiment.

When it comes to interpreting fixed effects, the intercept of fixed effects represents the mean RT of Chinese idiom with control ending and being responded by an English speaker, we can see through regression coefficients how each categorical parameter affects RT if it was the other value.

Regression coefficient tells us the slope for the categorical effects of our predictors. For example, isexperimental1 with estimate of –26.842 tells us, the RT would reduce by –26.842 if the word participants respond to is idiomatic rather than control.

LME analysis of Monolingual English speakers

#full model 
m_en_f <-lmer(exp_resp.rt~IsExperimental*IsChineseIdiom+
          (1|participant) + 
          (1|end), df4[df4$IsBilingual=='0',], REML=FALSE)

#reduced model
m_en_r <-lmer(exp_resp.rt~IsExperimental+IsChineseIdiom+
          (1|participant) + 
          (1|end), df4[df4$IsBilingual=='0',], REML=FALSE)

anova(m_en_f,m_en_r)

Interaction between experimental condition and idiom language was significant.

#add level-2 word frequency
m_en_f1 <-lmer(exp_resp.rt~IsExperimental*IsChineseIdiom+scale(freq_HAL)+
          (1|participant) + 
          (1|end), df4[df4$IsBilingual=='0',], REML=FALSE)

anova(m_en_f, m_en_f1)

m_en_f1 is a better model.

summary(m_en_f1)
## Linear mixed model fit by maximum likelihood . t-tests use
##   Satterthwaite's method [lmerModLmerTest]
## Formula: 
## exp_resp.rt ~ IsExperimental * IsChineseIdiom + scale(freq_HAL) +  
##     (1 | participant) + (1 | end)
##    Data: df4[df4$IsBilingual == "0", ]
## 
##      AIC      BIC   logLik deviance df.resid 
##  64062.6  64115.0 -32023.3  64046.6     5197 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9718 -0.6659 -0.1221  0.5416  3.9226 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  end         (Intercept)  1537     39.21  
##  participant (Intercept)  4765     69.03  
##  Residual                11733    108.32  
## Number of obs: 5205, groups:  end, 187; participant, 63
## 
## Fixed effects:
##                                 Estimate Std. Error      df t value
## (Intercept)                      648.482     10.757 121.538  60.287
## IsExperimental1                  -27.878      8.705 223.052  -3.202
## IsChineseIdiom0                    9.971      9.078 187.488   1.098
## scale(freq_HAL)                   -7.234      3.177 177.071  -2.277
## IsExperimental1:IsChineseIdiom0  -43.266     12.444 212.557  -3.477
##                                 Pr(>|t|)    
## (Intercept)                      < 2e-16 ***
## IsExperimental1                 0.001562 ** 
## IsChineseIdiom0                 0.273463    
## scale(freq_HAL)                 0.024002 *  
## IsExperimental1:IsChineseIdiom0 0.000615 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) IsExp1 IsChI0 s(_HAL
## IsExprmntl1 -0.407                     
## IsChinsIdm0 -0.409  0.482              
## scl(fr_HAL) -0.001 -0.010  0.014       
## IsExp1:ICI0  0.285 -0.683 -0.713  0.003

For Monolingual English speakers:
Full model is significantly better than reduced model. Interaction between Experimental condition and Idiom language explains a significantly proportion of variances in RT.

LME analysis of Bilingual Mandarin-English speakers

#full model
m_ch_f <-lmer(exp_resp.rt~IsExperimental*IsChineseIdiom+
          (1|participant) + 
          (1|end), df4[df4$IsBilingual=='1',], REML=FALSE)

#reduced model
m_ch_r <-lmer(exp_resp.rt~IsExperimental+IsChineseIdiom+ 
          (1|participant) + 
          (1|end), df4[df4$IsBilingual=='1',], REML=FALSE)

#likelihood ratio test
anova(m_ch_f, m_ch_r) 

Interaction between experimental condition and idiom language was not significant through Likelihood ratio test of full vs. reduced model.

#Does word-level frequency explains a significant proportion of RT variances?
m_ch_r1 <-lmer(exp_resp.rt~IsExperimental+IsChineseIdiom+scale(freq_HAL)+
          (1|participant) + 
          (1|end), df4[df4$IsBilingual=='1',], REML=FALSE)

anova(m_ch_r, m_ch_r1)

Word frequency was a significant predictor of RT.

summary(m_ch_r1)
## Linear mixed model fit by maximum likelihood . t-tests use
##   Satterthwaite's method [lmerModLmerTest]
## Formula: 
## exp_resp.rt ~ IsExperimental + IsChineseIdiom + scale(freq_HAL) +  
##     (1 | participant) + (1 | end)
##    Data: df4[df4$IsBilingual == "1", ]
## 
##      AIC      BIC   logLik deviance df.resid 
##  60593.2  60638.5 -30289.6  60579.2     4731 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9332 -0.6810 -0.1744  0.5286  4.1245 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  end         (Intercept)  3297     57.42  
##  participant (Intercept)  6456     80.35  
##  Residual                18791    137.08  
## Number of obs: 4738, groups:  end, 187; participant, 59
## 
## Fixed effects:
##                 Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)      740.322     13.136 122.238  56.360  < 2e-16 ***
## IsExperimental1  -29.350      9.108 191.393  -3.222  0.00149 ** 
## IsChineseIdiom0    5.546      9.101 191.888   0.609  0.54300    
## scale(freq_HAL)  -17.998      4.763 164.425  -3.778  0.00022 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) IsExp1 IsChI0
## IsExprmntl1 -0.341              
## IsChinsIdm0 -0.349 -0.003       
## scl(fr_HAL)  0.008 -0.016  0.026

For Bilingual speakers:
Interaction between experimental condition and idiom language did not significantly explain the variances in RT.

References:
http://www.sfs.uni-tuebingen.de/~hbaayen/publications/baayenDavidsonBates.pdf