DEM 7473 Homework 2

We are making a random-intercept model with the same data set from the previous homework.

To help improve the quality of data, we are first going to test our parameters for colinearity. It’s possible having several parameters that were correlated with eachother (e.g. multiple measure of math fluency) split up the explained variance too much, resulting in false negatives in the t-tests last homework.

library(car)

## Loading required package: carData

library(knitr)
load("D:/Behavioral_Data/N400.Rdata")
N400$RT<-N400$RT/1000 #convert RTs from ms to seconds
linfit<-lm(RT~Syllables+Age+LQ+PicVoc_SS+OralComp_SS+NumRev_SS+IncWord_SS+Add_SS+Sub_SS+Mult_SS+female+digit_first+rightfinger+N400ListA+correct_trial+accurate_resp,data=N400)
inflate1<-vif(linfit)
kable(inflate1)

	x
Syllables	1.000158
Age	4.147259
LQ	11.388999
PicVoc_SS	19.196909
OralComp_SS	4.308954
NumRev_SS	10.644917
IncWord_SS	2.440158
Add_SS	17.159958
Sub_SS	10.077422
Mult_SS	44.937911
female	17.741999
digit_first	4.879995
rightfinger	6.999612
N400ListA	5.117086
correct_trial	1.004277
accurate_resp	1.008896

I’m looking for any parameter with an inflation factor greater than 10. According to the rule of thumb, a inflation factor greater than 10 would indicate the parameter is too colinear with other parameters.

The following factors are to remain the the second linear model to test inflation. Sub_SS remains even though it exceeded the inflation factor threshold of 10 because the other two math-proficiency parameters greatly exceeded Sub_SS, and leaving Sub_SS might independetly account for varaince better.

Syllables
Age
OralComp_SS
IncWord_SS
digit_first
rightfinger
N400ListA
correct_trial
accurate_response

linfit2<-lm(RT~Syllables+Age+OralComp_SS+IncWord_SS+Sub_SS+digit_first+rightfinger+N400ListA+correct_trial+accurate_resp,data=N400)
inflate2<-vif(linfit2)
kable(inflate2)

	x
Syllables	1.000158
Age	1.332450
OralComp_SS	1.299128
IncWord_SS	1.789697
Sub_SS	1.858291
digit_first	1.331525
rightfinger	1.650004
N400ListA	1.806680
correct_trial	1.004273
accurate_resp	1.007859

Indeed, when running the model again, the inflation factors drop dramatically (including Sub_SS). The random-intercepts and -slope models will use only these parameters.

Below are the steps to estimate the random-intercept model. With this model, we can ask the question:

“Do people answer more slowly (higher RT) when the trial condition is incorrect, regardless of other trial or demographic factors?”

library(lme4)

## Loading required package: Matrix

fit1<-lmer(RT~Syllables+Age+OralComp_SS+IncWord_SS+Sub_SS+digit_first+rightfinger+N400ListA+correct_trial+accurate_resp+(1|Subject),data=N400, subset=complete.cases(N400))
summary(fit1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## RT ~ Syllables + Age + OralComp_SS + IncWord_SS + Sub_SS + digit_first +  
##     rightfinger + N400ListA + correct_trial + accurate_resp +  
##     (1 | Subject)
##    Data: N400
##  Subset: complete.cases(N400)
## 
## REML criterion at convergence: -81
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2301 -0.4437 -0.1233  0.2569 11.0519 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 0.008345 0.09135 
##  Residual             0.050609 0.22496 
## Number of obs: 1280, groups:  Subject, 16
## 
## Fixed effects:
##                 Estimate Std. Error t value
## (Intercept)    8.708e-01  4.593e-01   1.896
## Syllables     -1.237e-02  9.691e-03  -1.276
## Age            4.958e-03  1.523e-02   0.325
## OralComp_SS   -2.420e-03  3.037e-03  -0.797
## IncWord_SS     6.249e-05  2.769e-03   0.023
## Sub_SS         1.491e-03  2.568e-03   0.581
## digit_first   -2.344e-01  5.897e-02  -3.975
## rightfinger   -1.010e-01  6.281e-02  -1.609
## N400ListA      3.725e-02  6.413e-02   0.581
## correct_trial -1.719e-02  1.260e-02  -1.364
## accurate_resp -4.207e-02  3.507e-02  -1.199
## 
## Correlation of Fixed Effects:
##             (Intr) Syllbl Age    OrC_SS InW_SS Sub_SS dgt_fr rghtfn N400LA
## Syllables   -0.029                                                        
## Age         -0.603  0.000                                                 
## OralComp_SS -0.557  0.000  0.224                                          
## IncWord_SS  -0.216  0.000 -0.068 -0.386                                   
## Sub_SS      -0.063  0.000 -0.397 -0.060 -0.222                            
## digit_first  0.027  0.000  0.163 -0.137  0.114 -0.378                     
## rightfinger -0.234  0.000  0.257 -0.182  0.473 -0.384  0.333              
## N400ListA   -0.076  0.000 -0.137  0.191 -0.439  0.541 -0.362 -0.269       
## correct_trl -0.018 -0.003  0.000  0.000 -0.001  0.000  0.000 -0.001  0.001
## accurat_rsp -0.072 -0.012 -0.001  0.006 -0.013  0.005 -0.001 -0.010  0.011
##             crrct_
## Syllables         
## Age               
## OralComp_SS       
## IncWord_SS        
## Sub_SS            
## digit_first       
## rightfinger       
## N400ListA         
## correct_trl       
## accurat_rsp  0.065

In the model to describe particpipants’ reaction times to judge a multiplication problem’s correctness, the Subject variance was 0.008 seconds, which is incredibly small. The intracorrelation coefficient (ICC) for Subject variance is 0141, which means a large portion of the variance is explained by factors beyond the subject (85.9%).

The fixed effects have the following estimates and t-values:

Parameter	Estimate	t-value
Intercept	8.708e-01	1.896
Syllables	-1.237e-02	-1.276
Age	4.958e-03	0.325
OralComp_SS	-2.420e-03	-0.797
Incword_SS	6.249e-05	0.023
Sub_SS	1.491e-03	0.581
digit_first	-2.344e-01	-3.975
rightfinger	-1.010e-01	-1.609
N400ListA	3.725e-02	0.581
correct_trial	-1.719e-02	-1.364
accurate_resp	-4.207e-02	-1.199

For a two-tailed Student T-test with 15 degrees of freedom, the t-value threshold is +/- 2.13 for an alpha of 0.05. The only parameter that exceeded this threshold was the digit_first parameter with a t-value of -3.975, suggesting that if participants do a math verification task, they are significantly faster with this picture verification task, presumably because of practice.

Next, I will model the data using a mixed model with random intercepts and a random slope for the factor of digit_first, addressing the question:

Does the order of experiments affect reaction times equally between participants?

fit2<-lmer(RT~Syllables+Age+OralComp_SS+IncWord_SS+Sub_SS+digit_first+rightfinger+N400ListA+correct_trial+accurate_resp+(1+digit_first|Subject), data = N400)
summary(fit2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## RT ~ Syllables + Age + OralComp_SS + IncWord_SS + Sub_SS + digit_first +  
##     rightfinger + N400ListA + correct_trial + accurate_resp +  
##     (1 + digit_first | Subject)
##    Data: N400
## 
## REML criterion at convergence: -84.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2400 -0.4314 -0.1296  0.2525 11.1051 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  Subject  (Intercept) 0.02063  0.1436        
##           digit_first 0.01898  0.1378   -0.95
##  Residual             0.05061  0.2250        
## Number of obs: 1280, groups:  Subject, 16
## 
## Fixed effects:
##                 Estimate Std. Error t value
## (Intercept)    1.0087971  0.3829967   2.634
## Syllables     -0.0123659  0.0096907  -1.276
## Age            0.0098558  0.0112179   0.879
## OralComp_SS   -0.0026388  0.0019437  -1.358
## IncWord_SS     0.0008689  0.0015478   0.561
## Sub_SS        -0.0018436  0.0016303  -1.131
## digit_first   -0.2010228  0.0693259  -2.900
## rightfinger   -0.0603858  0.0391323  -1.543
## N400ListA     -0.0114651  0.0411727  -0.278
## correct_trial -0.0171973  0.0126028  -1.365
## accurate_resp -0.0422475  0.0350737  -1.205
## 
## Correlation of Fixed Effects:
##             (Intr) Syllbl Age    OrC_SS InW_SS Sub_SS dgt_fr rghtfn N400LA
## Syllables   -0.035                                                        
## Age         -0.770  0.000                                                 
## OralComp_SS -0.689  0.000  0.503                                          
## IncWord_SS  -0.054  0.000 -0.111 -0.396                                   
## Sub_SS      -0.204  0.000 -0.187  0.057 -0.270                            
## digit_first -0.079  0.000  0.024 -0.085  0.066 -0.204                     
## rightfinger -0.283  0.000  0.293 -0.080  0.471 -0.312  0.137              
## N400ListA   -0.309  0.000  0.075  0.251 -0.330  0.571 -0.188 -0.034       
## correct_trl -0.022 -0.003  0.000  0.001 -0.001  0.001  0.000 -0.001  0.002
## accurat_rsp -0.089 -0.012 -0.002  0.009 -0.021  0.012 -0.002 -0.011  0.027
##             crrct_
## Syllables         
## Age               
## OralComp_SS       
## IncWord_SS        
## Sub_SS            
## digit_first       
## rightfinger       
## N400ListA         
## correct_trl       
## accurat_rsp  0.065

With this model, the first measurement that stood out is the very high -0.95 correlation coefficient.

Below, we can compare the Subject level variance between the model with and without the random intercept.

Subject	RI Only	RI + RS
Variance	0.008345	0.02063
ICC	0.141	0.289

digit_first
Variance	-	0.01898
ICC	-	-

The Subjects accounted for more variance in the random intercept (RI) + slope (RS) model, as well as a higher ICC in the RI+RS model.

The following are the states from fixed effects from both model types:

Parameter	Estimate (RI Only)	t-value (RI Only)	Estimate (RI + RS)	t-value (RI + RS)
(Intercept)	8.708e-01	1.896	1.008	2.634
Syllables	-1.237e-02	1.896	-0.012	-1.276
Age	4.958e-03	-1.276	0.009	0.879
OralComp_SS	-2.420e-03	-0.797	-0.002	-1.358
Incword_SS	6.249e-05	0.023	0.0008	0.561
Sub_SS	1.491e-03	0.581	-0.001	-1.131
digit_first	-2.344e-01	-3.975	-0.201	-2.900
rightfinger	-1.010e-01	-1.609	-0.060	-1.543
N400ListA	3.725e-02	0.581	-0.011	-0.278
correct_trial	-1.719e-02	-0.017	-0.017	-1.365
accurate_resp	-4.207e-02	-1.199	-0.042	-1.205

In this new model, the fixed effects are less likely to explain variance in participants reaction times, even if each subject’s experiment order was modelled with random slope. The factor of digit_first did lose some explanatory power, as seen by the reduced t-value in the RI + RS model. There was also an increase in the ICC in the RI + RS model (0.141 -> 0.289). I would conclude that the first RI model was a better fit to the data than the RI + RS model.

I leave you with a plot of my random-intercept and random-slope model for adult response times to a picture-verification task.

rancoefs<-ranef(fit2)
plot(NULL, ylim=c(0,2),xlim=c(0,1),ylab="Reaction Time (ms)", xlab="Task Order")
title(main="Regression Lines for each participant from Random Slope and Intercept Model")
cols=sample(rainbow(n=16),size=dim(rancoefs$Subject)[1],replace = T)
for (i in 1:dim(rancoefs$Subject)[1]){
  abline(a=fixef(fit2)[1]+rancoefs$Subject[[1]][i],b=fixef(fit2)[7]+rancoefs$Subject[[2]][i],col=cols[i],lwd=.5)
}
abline(a=fixef(fit2)[1],b=fixef(fit2)[8],col=1,lwd=3)
legend("topright",col=1,lwd5,legend="Averaege Effect of Correctness")

DEM 7473 Homework 2

Matthew Wood

September 23, 2018