A group of researchers at the University of Sheffield, self-named “Practitioners”, investigate how people develop skills and knowledge throughout their lifetime. They are interested in understanding skill acquisition and focus on collecting and analysing data that can reveal underlying factors that influence this process. In a set of studies, Practitioners focused on the acquisition of language skills in toddlers. In particular, they investigated how language exposure impacts later linguistic skills, cognitive abilities, and academic achievement. In this knowledge assessment, you are part of the Practitioners team. The goal is to make several models, which quantify and test postulated theoretical assumptions. Previous studies in language acquisition showed that language skills depend on the richness of the environment as well as number of other factors. In the case of this exam, we will focus on the question of how people learn language and whether this influences other outcomes, such as university enrollment.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(ggplot2)
library(tidyr)
require('tidySEM')
require(lavaan)
library(statmod)
library(semPlot)
# Checking summary of dataframes
summary(LexicalDec)
ID item_id WordType MotherVocab
1 : 50 1 : 100 cognates :2500 Min. : 63.49
2 : 50 2 : 100 non-cognates:2500 1st Qu.: 88.28
3 : 50 3 : 100 Median :100.55
4 : 50 4 : 100 Mean :101.02
5 : 50 5 : 100 3rd Qu.:111.80
6 : 50 6 : 100 Max. :149.71
(Other):4700 (Other):4400
FatherVocab ReadingTime ToddlerVocab RT
Min. : 55.43 Min. : 48.24 Min. : 26.14 Min. : 324.7
1st Qu.: 88.65 1st Qu.: 87.44 1st Qu.: 53.11 1st Qu.:2396.7
Median : 99.82 Median :100.96 Median : 65.96 Median :2917.5
Mean :101.67 Mean :100.04 Mean : 65.60 Mean :2881.2
3rd Qu.:114.23 3rd Qu.:113.19 3rd Qu.: 76.42 3rd Qu.:3349.0
Max. :151.56 Max. :142.65 Max. :109.20 Max. :4587.6
summary(Lexicon)
ID SES MotherVocab FatherVocab
1 : 1 Low :163 Min. : 39.49 Min. : 36.71
2 : 1 High:187 1st Qu.: 86.56 1st Qu.: 86.40
3 : 1 Median : 99.92 Median : 97.79
4 : 1 Mean : 99.56 Mean : 98.35
5 : 1 3rd Qu.:113.23 3rd Qu.:111.59
6 : 1 Max. :149.71 Max. :190.50
(Other):344
ReadingTime ToddlerVocab Enrollment VerbalAbility
Min. : 40.06 Min. : 12.24 Min. :0.00 Min. : 2.80
1st Qu.: 83.25 1st Qu.: 52.63 1st Qu.:0.00 1st Qu.: 80.06
Median : 99.60 Median : 64.94 Median :0.00 Median :100.19
Mean : 97.73 Mean : 64.38 Mean :0.44 Mean :100.37
3rd Qu.:113.05 3rd Qu.: 75.90 3rd Qu.:1.00 3rd Qu.:119.54
Max. :148.35 Max. :112.19 Max. :1.00 Max. :197.73
ReadingComprehension Orthography Morphology
Min. : 48.73 Min. : 42.81 Min. : 46.33
1st Qu.: 77.04 1st Qu.: 84.58 1st Qu.: 85.27
Median : 87.05 Median : 99.92 Median : 99.97
Mean : 86.56 Mean : 99.33 Mean :100.03
3rd Qu.: 96.12 3rd Qu.:114.06 3rd Qu.:114.80
Max. :126.68 Max. :179.51 Max. :170.14
Phonetics Syntax Semantics Pragmatics
Min. : 42.03 Min. : 38.06 Min. : 50.87 Min. : 40.47
1st Qu.: 87.09 1st Qu.: 84.12 1st Qu.: 86.59 1st Qu.: 86.91
Median :100.06 Median : 99.60 Median :100.64 Median :100.80
Mean : 99.84 Mean : 98.91 Mean : 99.91 Mean :100.15
3rd Qu.:111.78 3rd Qu.:112.86 3rd Qu.:112.08 3rd Qu.:113.22
Max. :157.73 Max. :156.90 Max. :167.64 Max. :153.01
In the case of the first study, Practitioners would like to re-analyse previously collected data. This data set collects measures of vocabulary size - approximation of how many English words each toddler knows, socioeconomic status of the family in which the toddlers are growing up (high and low), father and mother’s vocabulary size, and time that parents spend reading to toddlers. They formed several hypothetical assumptions:
Specify and estimate a linear model that tests for the research questions postulated by the “Practitioners” team.
model1 <- lm(ToddlerVocab ~ MotherVocab, data = Lexicon)
summary(model1)
Call:
lm(formula = ToddlerVocab ~ MotherVocab, data = Lexicon)
Residuals:
Min 1Q Median 3Q Max
-46.181 -9.943 -0.007 10.730 39.840
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.76300 4.05976 7.085 7.77e-12 ***
MotherVocab 0.35776 0.03996 8.952 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.1 on 348 degrees of freedom
Multiple R-squared: 0.1872, Adjusted R-squared: 0.1848
F-statistic: 80.14 on 1 and 348 DF, p-value: < 2.2e-16
model2 <- lm(ToddlerVocab ~ FatherVocab, data = Lexicon)
summary(model2)
Call:
lm(formula = ToddlerVocab ~ FatherVocab, data = Lexicon)
Residuals:
Min 1Q Median 3Q Max
-46.784 -12.026 0.079 11.532 49.977
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.88488 4.40094 12.017 < 2e-16 ***
FatherVocab 0.11688 0.04383 2.667 0.00802 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.58 on 348 degrees of freedom
Multiple R-squared: 0.02002, Adjusted R-squared: 0.01721
F-statistic: 7.111 on 1 and 348 DF, p-value: 0.008019
#First model
model3 <- lm(ToddlerVocab ~ MotherVocab*FatherVocab, data = Lexicon)
summary(model3)
Call:
lm(formula = ToddlerVocab ~ MotherVocab * FatherVocab, data = Lexicon)
Residuals:
Min 1Q Median 3Q Max
-43.881 -9.992 0.194 10.493 39.954
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.017364 18.723989 0.695 0.48738
MotherVocab 0.484281 0.185924 2.605 0.00959 **
FatherVocab 0.170537 0.191124 0.892 0.37286
MotherVocab:FatherVocab -0.001377 0.001859 -0.741 0.45921
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.12 on 346 degrees of freedom
Multiple R-squared: 0.1899, Adjusted R-squared: 0.1829
F-statistic: 27.04 on 3 and 346 DF, p-value: 9.871e-16
#Second model
modelvoc <- lm(ToddlerVocab ~ ReadingTime, data = Lexicon)
summary(modelvoc)
Call:
lm(formula = ToddlerVocab ~ ReadingTime, data = Lexicon)
Residuals:
Min 1Q Median 3Q Max
-54.76 -10.87 -0.28 10.28 53.79
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.23382 4.11002 10.762 < 2e-16 ***
ReadingTime 0.20615 0.04112 5.014 8.51e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.17 on 348 degrees of freedom
Multiple R-squared: 0.06737, Adjusted R-squared: 0.06469
F-statistic: 25.14 on 1 and 348 DF, p-value: 8.51e-07
#Third model
model_ses <- lm(ToddlerVocab ~ ReadingTime:SES, data= Lexicon)
summary(model_ses)
Call:
lm(formula = ToddlerVocab ~ ReadingTime:SES, data = Lexicon)
Residuals:
Min 1Q Median 3Q Max
-59.444 -10.152 0.214 10.867 57.182
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.89958 3.95364 11.104 < 2e-16 ***
ReadingTime:SESLow 0.16163 0.04040 4.001 7.71e-05 ***
ReadingTime:SESHigh 0.25166 0.04043 6.224 1.40e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.56 on 347 degrees of freedom
Multiple R-squared: 0.1397, Adjusted R-squared: 0.1347
F-statistic: 28.17 on 2 and 347 DF, p-value: 4.606e-12
# Checking model differences
anova(modelvoc,model_ses)
Analysis of Variance Table
Model 1: ToddlerVocab ~ ReadingTime
Model 2: ToddlerVocab ~ ReadingTime:SES
Res.Df RSS Df Sum of Sq F Pr(>F)
1 348 91042
2 347 83983 1 7058.7 29.165 1.236e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Checking model differences
anova(model3,modelvoc,model_ses)
Analysis of Variance Table
Model 1: ToddlerVocab ~ MotherVocab * FatherVocab
Model 2: ToddlerVocab ~ ReadingTime
Model 3: ToddlerVocab ~ ReadingTime:SES
Res.Df RSS Df Sum of Sq F Pr(>F)
1 346 79081
2 348 91042 -2 -11960.6 26.165 2.619e-11 ***
3 347 83983 1 7058.7 30.883 5.479e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Final model
model_ses_final <- lm(ToddlerVocab ~ MotherVocab + FatherVocab + ReadingTime*SES, data=Lexicon)
summary(model_ses_final)
Call:
lm(formula = ToddlerVocab ~ MotherVocab + FatherVocab + ReadingTime *
SES, data = Lexicon)
Residuals:
Min 1Q Median 3Q Max
-53.065 -9.140 -0.265 10.086 46.396
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.938659 6.429529 3.568 0.000411 ***
MotherVocab 0.319085 0.039544 8.069 1.19e-14 ***
FatherVocab 0.005765 0.039737 0.145 0.884726
ReadingTime 0.051234 0.054690 0.937 0.349511
SESHigh -9.428745 7.291130 -1.293 0.196816
ReadingTime:SESHigh 0.175596 0.072844 2.411 0.016452 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 14.26 on 344 degrees of freedom
Multiple R-squared: 0.283, Adjusted R-squared: 0.2726
F-statistic: 27.16 on 5 and 344 DF, p-value: < 2.2e-16
plot(model_ses_final)
A linear model was suitable here because the hypothesized relationship suggested linear relationship between predictors and the response. I took an iterative approach to check models by adding variables in every model and compare the significance to keep the ones that explains variance significantly.
Model1
model3 <- lm(ToddlerVocab ~ MotherVocabFatherVocab, data = Lexicon)
I decided to include MotherVocab and FatherVocab as the hypothesis stated that toddlers with higher vocabulary sizes will also have parents with stronger linguistic abilities. The main effect plus interaction does explain a significant of variance.
Model2
modelvoc <- lm(ToddlerVocab ~ ReadingTime, data = Lexicon)
As toddlers that might spend more time reading will have higher vocabulary size, I checked how the main effect of ReadingTime.
Model 3
model_ses <- lm(ToddlerVocab ~ ReadingTime:SES, data= Lexicon)
As the hypothesis stated effect of Reading time will depend on SES levels of the toddlers, I checked out the main effect and interaction terms of terms. Here, effect of reading time (ReadingTime) will be moderated by socioeconomic status (SES), where toddlers in more affluent families benefit more from reading time.
Final Model
lm(ToddlerVocab ~ MotherVocab + FatherVocab + ReadingTimeSES, data=Lexicon)
Hence, I decided to include main effect of MotherVocab and FatherVocab as well as, main and iterative effect of ReadingTime and SES in tandem with the hypothesized relationships along with the theoretical support of above model evaluations. The interaction effect will let me check if the relationship changes depending on the level of SES, that will in turn test the hypothesis if relationship between ReadingTime, ToddlerVocab will be higher for toddlers with high SES.
The Practitioners team collected additional information for the next three research questions. The toddlers from the original study are now young adults, and Practitioners contacted them to collect additional information, such as measures of verbal abilities (verbal intelligence), reading comprehension performance, and finally, whether they enrolled at the University. For the second research project, Practitioners would like to focus on testing predictions from two theoretical frameworks that argue for nature or nurture mechanistic pathways in learning verbal behaviour. The Verbal nurturing theory states that practice is the main important factor, where reading time influences vocabulary size, which results in better reading comprehension outcomes (see Figure 1). The claims from the verbal nurturing theory are illustrated using a direct pathway c, from Reading time to Reading comprehension and indirect pathways a (from Reading time to Vocabulary size) and b (from Vocabulary size to Reading Comprehension).
The Verbal nature theory assumes individual differences in the verbal capabilities of people. It argues that verbal talent (Verbal ability) is the main factor that influences Vocabulary size and consequently results in better reading comprehension outcomes (see Figure 1). The claims from Verbal nature theory are illustrated using direct pathway e (from Verbal ability to Reading comprehension), and indirect pathways d (from Verbal ability to Vocabulary size) and b (from Vocabulary size to Reading comprehension). The Verbal nature theory also assumes that Verbal ability directly influences Reading time, but they do not expect the existence of any indirect effects over Reading time. This is illustrated as a pathway k.
Practitioners would like to specify a path model that tests for the proposed pathways by the two theories. To be able to do so, you need to specify two indirect effects: indirect effect 1 for Verbal nurturing theory and indirect effect 2 for Verbal nature theory, as well as two total effects, that is combinations of relevant direct and indirect effects.
Specify and estimate a path model that tests the proposed pathways postulated by the “Practitioners” team.
library(tidyverse)
library(ggplot2)
library(tidyr)
require('tidySEM')
require(lavaan)
library(statmod)
library(semPlot)
require('tidySEM')
path_model1 ="
ToddlerVocab ~ a*ReadingTime
ReadingComprehension ~ b*ToddlerVocab
ReadingComprehension ~ c*ReadingTime
ToddlerVocab ~ d*VerbalAbility
ReadingComprehension ~ e*VerbalAbility
ReadingTime ~ k*VerbalAbility
ind1 := a*b
tot1 := c+(a*b)
ind2 := d*b
tot2 := e+(d*b)
"
fit_path_model= sem(path_model1, data = Lexicon)
summary(fit_path_model, fit.measures=T, standardized=T)
lavaan 0.6-8 ended normally after 54 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 9
Number of observations 350
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Model Test Baseline Model:
Test statistic 1888.649
Degrees of freedom 6
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3514.250
Loglikelihood unrestricted model (H1) -3514.250
Akaike (AIC) 7046.500
Bayesian (BIC) 7081.221
Sample-size adjusted Bayesian (BIC) 7052.670
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value RMSEA <= 0.05 NA
Standardized Root Mean Square Residual:
SRMR 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv
ToddlerVocab ~
ReadingTim (a) 0.209 0.041 5.092 0.000 0.209
ReadingComprehension ~
ToddlerVcb (b) 0.502 0.003 157.332 0.000 0.502
ReadingTim (c) 0.297 0.003 116.957 0.000 0.297
ToddlerVocab ~
VerblAblty (d) 0.028 0.029 0.948 0.343 0.028
ReadingComprehension ~
VerblAblty (e) 0.249 0.002 143.143 0.000 0.249
ReadingTime ~
VerblAblty (k) -0.054 0.038 -1.428 0.153 -0.054
Std.all
0.263
0.611
0.455
0.049
0.538
-0.076
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv
.ToddlerVocab 259.454 19.613 13.229 0.000 259.454
.ReadngCmprhnsn 0.926 0.070 13.229 0.000 0.926
.ReadingTime 439.607 33.231 13.229 0.000 439.607
Std.all
0.930
0.005
0.994
Defined Parameters:
Estimate Std.Err z-value P(>|z|) Std.lv
ind1 0.105 0.021 5.090 0.000 0.105
tot1 0.402 0.021 19.369 0.000 0.402
ind2 0.014 0.015 0.948 0.343 0.014
tot2 0.263 0.015 17.865 0.000 0.263
Std.all
0.161
0.616
0.030
0.568
#plotting path model
graph_sem(fit_path_model, variance_diameter=.2)
# Checking summary
summary(fit_path_model, fit.measures=T, standardized=T)
lavaan 0.6-8 ended normally after 54 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 9
Number of observations 350
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Model Test Baseline Model:
Test statistic 1888.649
Degrees of freedom 6
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3514.250
Loglikelihood unrestricted model (H1) -3514.250
Akaike (AIC) 7046.500
Bayesian (BIC) 7081.221
Sample-size adjusted Bayesian (BIC) 7052.670
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value RMSEA <= 0.05 NA
Standardized Root Mean Square Residual:
SRMR 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv
ToddlerVocab ~
ReadingTim (a) 0.209 0.041 5.092 0.000 0.209
ReadingComprehension ~
ToddlerVcb (b) 0.502 0.003 157.332 0.000 0.502
ReadingTim (c) 0.297 0.003 116.957 0.000 0.297
ToddlerVocab ~
VerblAblty (d) 0.028 0.029 0.948 0.343 0.028
ReadingComprehension ~
VerblAblty (e) 0.249 0.002 143.143 0.000 0.249
ReadingTime ~
VerblAblty (k) -0.054 0.038 -1.428 0.153 -0.054
Std.all
0.263
0.611
0.455
0.049
0.538
-0.076
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv
.ToddlerVocab 259.454 19.613 13.229 0.000 259.454
.ReadngCmprhnsn 0.926 0.070 13.229 0.000 0.926
.ReadingTime 439.607 33.231 13.229 0.000 439.607
Std.all
0.930
0.005
0.994
Defined Parameters:
Estimate Std.Err z-value P(>|z|) Std.lv
ind1 0.105 0.021 5.090 0.000 0.105
tot1 0.402 0.021 19.369 0.000 0.402
ind2 0.014 0.015 0.948 0.343 0.014
tot2 0.263 0.015 17.865 0.000 0.263
Std.all
0.161
0.616
0.030
0.568
# Checking if models are identical
identical(summary(fit_path_model),summary(fit_path_model, fit.measures=T, standardized=T))
lavaan 0.6-8 ended normally after 54 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 9
Number of observations 350
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
ToddlerVocab ~
ReadingTim (a) 0.209 0.041 5.092 0.000
ReadingComprehension ~
ToddlerVcb (b) 0.502 0.003 157.332 0.000
ReadingTim (c) 0.297 0.003 116.957 0.000
ToddlerVocab ~
VerblAblty (d) 0.028 0.029 0.948 0.343
ReadingComprehension ~
VerblAblty (e) 0.249 0.002 143.143 0.000
ReadingTime ~
VerblAblty (k) -0.054 0.038 -1.428 0.153
Variances:
Estimate Std.Err z-value P(>|z|)
.ToddlerVocab 259.454 19.613 13.229 0.000
.ReadngCmprhnsn 0.926 0.070 13.229 0.000
.ReadingTime 439.607 33.231 13.229 0.000
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
ind1 0.105 0.021 5.090 0.000
tot1 0.402 0.021 19.369 0.000
ind2 0.014 0.015 0.948 0.343
tot2 0.263 0.015 17.865 0.000
lavaan 0.6-8 ended normally after 54 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 9
Number of observations 350
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Model Test Baseline Model:
Test statistic 1888.649
Degrees of freedom 6
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3514.250
Loglikelihood unrestricted model (H1) -3514.250
Akaike (AIC) 7046.500
Bayesian (BIC) 7081.221
Sample-size adjusted Bayesian (BIC) 7052.670
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value RMSEA <= 0.05 NA
Standardized Root Mean Square Residual:
SRMR 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv
ToddlerVocab ~
ReadingTim (a) 0.209 0.041 5.092 0.000 0.209
ReadingComprehension ~
ToddlerVcb (b) 0.502 0.003 157.332 0.000 0.502
ReadingTim (c) 0.297 0.003 116.957 0.000 0.297
ToddlerVocab ~
VerblAblty (d) 0.028 0.029 0.948 0.343 0.028
ReadingComprehension ~
VerblAblty (e) 0.249 0.002 143.143 0.000 0.249
ReadingTime ~
VerblAblty (k) -0.054 0.038 -1.428 0.153 -0.054
Std.all
0.263
0.611
0.455
0.049
0.538
-0.076
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv
.ToddlerVocab 259.454 19.613 13.229 0.000 259.454
.ReadngCmprhnsn 0.926 0.070 13.229 0.000 0.926
.ReadingTime 439.607 33.231 13.229 0.000 439.607
Std.all
0.930
0.005
0.994
Defined Parameters:
Estimate Std.Err z-value P(>|z|) Std.lv
ind1 0.105 0.021 5.090 0.000 0.105
tot1 0.402 0.021 19.369 0.000 0.402
ind2 0.014 0.015 0.948 0.343 0.014
tot2 0.263 0.015 17.865 0.000 0.263
Std.all
0.161
0.616
0.030
0.568
[1] FALSE
# Making path model 2
path_model2 ="
ToddlerVocab ~ a*ReadingTime
ReadingComprehension ~ b*ToddlerVocab
ReadingComprehension ~ c*ReadingTime
ind1 := a*b
tot1 := c+(a*b)
"
fit_path_model2= sem(path_model2, data = Lexicon)
graph_sem(fit_path_model2, variance_diameter=.2)
# Checking summary of model 2
summary(fit_path_model2, fit.measures=T, standardized=T)
lavaan 0.6-8 ended normally after 16 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 5
Number of observations 350
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Model Test Baseline Model:
Test statistic 455.377
Degrees of freedom 3
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -2668.212
Loglikelihood unrestricted model (H1) -2668.212
Akaike (AIC) 5346.425
Bayesian (BIC) 5365.714
Sample-size adjusted Bayesian (BIC) 5349.853
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value RMSEA <= 0.05 NA
Standardized Root Mean Square Residual:
SRMR 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv
ToddlerVocab ~
ReadingTim (a) 0.206 0.041 5.028 0.000 0.206
ReadingComprehension ~
ToddlerVcb (b) 0.526 0.025 21.355 0.000 0.526
ReadingTim (c) 0.266 0.020 13.601 0.000 0.266
Std.all
0.260
0.639
0.407
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv
.ToddlerVocab 260.120 19.663 13.229 0.000 260.120
.ReadngCmprhnsn 55.149 4.169 13.229 0.000 55.149
Std.all
0.933
0.292
Defined Parameters:
Estimate Std.Err z-value P(>|z|) Std.lv
ind1 0.108 0.022 4.895 0.000 0.108
tot1 0.374 0.029 13.063 0.000 0.374
Std.all
0.166
0.572
#install.packages("statmod")
library(statmod)
#install.packages("semPlot")
library(semPlot)
semPaths(fit_path_model2, 'model','est', edge.label.cex = 1.1)
# Creating path model 3
path_model3 ="
ToddlerVocab ~ a*ReadingTime
ReadingComprehension ~ b*ToddlerVocab
ReadingComprehension ~ c*ReadingTime
ToddlerVocab ~ 0*VerbalAbility
ReadingComprehension ~ e*VerbalAbility
ReadingTime ~ k*VerbalAbility
ind1 := a*b
tot1 := c+(a*b)
ind2 := 0*b
tot2 := e+(0*b)
"
fit_path_model3= sem(path_model3, data = Lexicon)
summary(fit_path_model3, fit.measures=T, standardized=T)
lavaan 0.6-8 ended normally after 48 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 8
Number of observations 350
Model Test User Model:
Test statistic 0.897
Degrees of freedom 1
P-value (Chi-square) 0.344
Model Test Baseline Model:
Test statistic 1888.649
Degrees of freedom 6
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3514.698
Loglikelihood unrestricted model (H1) -3514.250
Akaike (AIC) 7045.397
Bayesian (BIC) 7076.260
Sample-size adjusted Bayesian (BIC) 7050.881
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.138
P-value RMSEA <= 0.05 0.525
Standardized Root Mean Square Residual:
SRMR 0.022
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv
ToddlerVocab ~
ReadingTim (a) 0.206 0.041 5.028 0.000 0.206
ReadingComprehension ~
ToddlerVcb (b) 0.502 0.003 157.533 0.000 0.502
ReadingTim (c) 0.297 0.003 117.080 0.000 0.297
ToddlerVocab ~
VerblAblty 0.000 0.000
ReadingComprehension ~
VerblAblty (e) 0.249 0.002 143.327 0.000 0.249
ReadingTime ~
VerblAblty (k) -0.054 0.038 -1.428 0.153 -0.054
Std.all
0.260
0.621
0.462
0.000
0.547
-0.076
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv
.ToddlerVocab 260.120 19.663 13.229 0.000 260.120
.ReadngCmprhnsn 0.926 0.070 13.229 0.000 0.926
.ReadingTime 439.607 33.231 13.229 0.000 439.607
Std.all
0.933
0.005
0.994
Defined Parameters:
Estimate Std.Err z-value P(>|z|) Std.lv
ind1 0.104 0.021 5.026 0.000 0.104
tot1 0.401 0.021 19.328 0.000 0.401
ind2 0.000 0.000
tot2 0.249 0.002 143.327 0.000 0.249
Std.all
0.161
0.623
0.000
0.547
semPaths(fit_path_model3, 'model','est', edge.label.cex = 1.1)
# Comparing models
require(semTools)
diff<-compareFit(fit_path_model,fit_path_model3)
summary(diff)
################### Nested Model Comparison #########################
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit_path_model 0 7046.5 7081.2 0.0000
fit_path_model3 1 7045.4 7076.3 0.8967 0.89668 1 0.3437
####################### Model Fit Indices ###########################
################## Differences in Fit Indices #######################
NA
I used structural equation model (SEM) as it evaluates complex connected relationships very well.
For the third research project, Practitioners would like to focus on the information behind university enrollment. For each participant, they recorded whether they enrolled at University (1 – yes, 0 – no). Practitioners would like to build a model that tests which variables are predictive of enrollment status. They aim to use the initially collected variables from when the participants were toddlers, such as mothers’, fathers’ and toddlers’ vocabulary size, as well as reading time and socioeconomic status. If this type of model works well to predict distal outcomes, such as university enrollment, we can use this information and improve behaviour that leads to more positive outcomes (enrolling at the university).
Specify the statistical model that tests which factors are predictive of University enrollment.
# Fixing the non-normal datapoints
Q <- quantile(Lexicon$FatherVocab, probs = c(.25,.75), na.rm = FALSE)
iqr<- IQR(Lexicon$FatherVocab)
up<- Q[2]+1.5*iqr
down<- Q[1]-1.5*iqr
Lexi_normal<- subset(Lexicon, Lexicon$FatherVocab > (Q[1]-1.5*iqr) & Lexicon$FatherVocab < (Q[2]+1.5*iqr))
# Define full and null models and do step procedure
model.null = glm(Enrollment ~ 1,
data=Lexi_normal,
family = binomial(link="logit")
)
model.full = glm(Enrollment~SES + FatherVocab + MotherVocab + ToddlerVocab + ReadingTime + VerbalAbility + ReadingComprehension + Orthography + Morphology + Phonetics + Syntax + Pragmatics,
data=Lexi_normal,
family = binomial(link="logit")
)
step(model.null,
scope = list(upper=model.full),
direction="both",
test="Chisq",
data=Lexi_normal)
Start: AIC=474.9
Enrollment ~ 1
Df Deviance AIC LRT Pr(>Chi)
+ SES 1 395.51 399.51 77.391 < 2.2e-16 ***
+ ToddlerVocab 1 436.68 440.68 36.215 1.767e-09 ***
+ ReadingComprehension 1 442.03 446.03 30.873 2.755e-08 ***
+ ReadingTime 1 453.12 457.12 19.779 8.691e-06 ***
+ MotherVocab 1 467.48 471.48 5.414 0.01998 *
+ Pragmatics 1 469.90 473.90 2.996 0.08345 .
<none> 472.90 474.90
+ Syntax 1 471.10 475.10 1.796 0.18023
+ FatherVocab 1 471.38 475.38 1.517 0.21811
+ Morphology 1 471.46 475.46 1.437 0.23055
+ Orthography 1 472.25 476.25 0.643 0.42255
+ Phonetics 1 472.77 476.77 0.124 0.72424
+ VerbalAbility 1 472.82 476.82 0.077 0.78205
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Step: AIC=399.51
Enrollment ~ SES
Df Deviance AIC LRT Pr(>Chi)
+ ReadingTime 1 368.04 374.04 27.468 1.597e-07 ***
+ ReadingComprehension 1 376.07 382.07 19.434 1.042e-05 ***
+ ToddlerVocab 1 376.88 382.88 18.625 1.591e-05 ***
+ MotherVocab 1 391.56 397.56 3.948 0.04694 *
+ Pragmatics 1 393.50 399.50 2.010 0.15629
<none> 395.51 399.51
+ FatherVocab 1 393.56 399.56 1.945 0.16315
+ VerbalAbility 1 394.40 400.40 1.111 0.29193
+ Syntax 1 394.62 400.62 0.884 0.34697
+ Morphology 1 394.87 400.87 0.633 0.42610
+ Orthography 1 395.16 401.16 0.350 0.55407
+ Phonetics 1 395.51 401.51 0.000 0.98489
- SES 1 472.90 474.90 77.391 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Step: AIC=374.04
Enrollment ~ SES + ReadingTime
Df Deviance AIC LRT Pr(>Chi)
+ ToddlerVocab 1 358.89 366.89 9.152 0.002484 **
+ ReadingComprehension 1 365.35 373.35 2.684 0.101338
<none> 368.04 374.04
+ Pragmatics 1 366.79 374.79 1.250 0.263617
+ MotherVocab 1 366.82 374.82 1.222 0.269032
+ Syntax 1 367.11 375.11 0.925 0.336157
+ VerbalAbility 1 367.36 375.36 0.675 0.411443
+ Morphology 1 367.60 375.60 0.442 0.506382
+ Orthography 1 367.62 375.62 0.420 0.517016
+ FatherVocab 1 367.92 375.92 0.119 0.730168
+ Phonetics 1 368.03 376.03 0.009 0.925965
- ReadingTime 1 395.51 399.51 27.468 1.597e-07 ***
- SES 1 453.12 457.12 85.080 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Step: AIC=366.89
Enrollment ~ SES + ReadingTime + ToddlerVocab
Df Deviance AIC LRT Pr(>Chi)
<none> 358.89 366.89
+ VerbalAbility 1 358.09 368.09 0.794 0.373008
+ Syntax 1 358.15 368.15 0.734 0.391505
+ ReadingComprehension 1 358.17 368.17 0.715 0.397836
+ Pragmatics 1 358.27 368.27 0.618 0.431884
+ Morphology 1 358.41 368.41 0.480 0.488382
+ Orthography 1 358.64 368.64 0.250 0.616983
+ Phonetics 1 358.87 368.87 0.020 0.888581
+ MotherVocab 1 358.87 368.87 0.013 0.908738
+ FatherVocab 1 358.89 368.89 0.001 0.972086
- ToddlerVocab 1 368.04 374.04 9.152 0.002484 **
- ReadingTime 1 376.88 382.88 17.996 2.214e-05 ***
- SES 1 427.16 433.16 68.277 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Call: glm(formula = Enrollment ~ SES + ReadingTime + ToddlerVocab,
family = binomial(link = "logit"), data = Lexi_normal)
Coefficients:
(Intercept) SESHigh ReadingTime ToddlerVocab
-5.83233 2.12945 0.02784 0.02520
Degrees of Freedom: 344 Total (i.e. Null); 341 Residual
Null Deviance: 472.9
Residual Deviance: 358.9 AIC: 366.9
#Final model
model.final =glm(formula = Enrollment ~ SES + ReadingTime + ToddlerVocab,
family = binomial(link = "logit"), data = Lexi_normal)
summary(model.final)
Call:
glm(formula = Enrollment ~ SES + ReadingTime + ToddlerVocab,
family = binomial(link = "logit"), data = Lexi_normal)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0935 -0.7605 -0.3821 0.8825 2.3950
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.832333 0.840261 -6.941 3.89e-12 ***
SESHigh 2.129449 0.279534 7.618 2.58e-14 ***
ReadingTime 0.027840 0.006848 4.065 4.80e-05 ***
ToddlerVocab 0.025196 0.008460 2.978 0.0029 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 472.90 on 344 degrees of freedom
Residual deviance: 358.89 on 341 degrees of freedom
AIC: 366.89
Number of Fisher Scoring iterations: 4
#Checking for correlation
df_glm <- Lexi_normal[,c("ToddlerVocab", "ReadingTime" )]
library("PerformanceAnalytics")
chart.Correlation(df_glm, histogram=TRUE, pch=19)
#Analysis of variance for individual terms
#install.packages("car")
library(car)
Anova(model.final, type="II", test="Wald")
Analysis of Deviance Table (Type II tests)
Response: Enrollment
Df Chisq Pr(>Chisq)
SES 1 58.0316 2.579e-14 ***
ReadingTime 1 16.5272 4.796e-05 ***
ToddlerVocab 1 8.8697 0.002899 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Pseudo-R-squared
#install.packages("rcompanion")
library(rcompanion)
nagelkerke(model.final)
$Models
Model: "glm, Enrollment ~ SES + ReadingTime + ToddlerVocab, binomial(link = \"logit\"), Lexi_normal"
Null: "glm, Enrollment ~ 1, binomial(link = \"logit\"), Lexi_normal"
$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden 0.241091
Cox and Snell (ML) 0.281413
Nagelkerke (Cragg and Uhler) 0.377191
$Likelihood.ratio.test
Df.diff LogLik.diff Chisq p.value
-3 -57.006 114.01 1.5026e-24
$Number.of.observations
Model: 345
Null: 345
$Messages
[1] "Note: For models fit with REML, these statistics are based on refitting with ML"
$Warnings
[1] "None"
#odds ratio
#install.packages("broom")
library(broom)
tidy(model.final, exponentiate = T, conf.int = T)
NA
The odds-ratio was >1, so having more SES, ReadingTime and ToddlerVocab increased the chance of enrollment. The confidence interval for SES, ReadingTime and ToddlerVocab does not includes 1.0 so the results are statistically significant.
Lexi_normal$probability = predict(model.final,newdata = NULL, type = 'response')
pairs.panels(Lexi_normal)
Lexi_normal$probability <- ifelse(Lexi_normal$probability>0.5,1,0)
# making a confusion matrix
cm<-table(col=Lexi_normal$Enrollment,Lexi_normal$probability)
print(cm)
col 0 1
0 150 44
1 45 106
sum(diag(cm))/sum(colSums(cm))
[1] 0.742029
#install.packages("caret")
library(caret)
# checking precision n f1-score
confusionMatrix(cm, mode = "prec_recall")
Confusion Matrix and Statistics
col 0 1
0 150 44
1 45 106
Accuracy : 0.742
95% CI : (0.6925, 0.7874)
No Information Rate : 0.5652
P-Value [Acc > NIR] : 7.278e-12
Kappa : 0.4755
Mcnemar's Test P-Value : 1
Precision : 0.7732
Recall : 0.7692
F1 : 0.7712
Prevalence : 0.5652
Detection Rate : 0.4348
Detection Prevalence : 0.5623
Balanced Accuracy : 0.7379
'Positive' Class : 0
#confidence intervals
exp(confint(model.final))
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 0.0005252926 0.01427845
SESHigh 4.9306587127 14.79015580
ReadingTime 1.0148298671 1.04251953
ToddlerVocab 1.0088537972 1.04300408
I used generalized linear model with binomial family as the dependent variable (Enrollment) is binary and predictors (SES, ReadingTime and ToddlerVocab) confirm normality distribution. I’ve used a stepwise procedure using the step function that selects models to minimize AIC to arrive at this specification. As the predictors have low correlation (0.26), any order gives the same result.
In their previous work, Practitioners developed a questionnaire to investigate levels of language expertise. The questionnaire collects six measures, each focusing on a specific level of language processing: orthography, phonetics, morphology, syntax, semantics, pragmatics. When investigating the latent structure of this questionnaire, Practitioners proposed a two-factor solution. The first factor was described as Rule automatization, with orthography, phonetics and morphology highly loaded on it. The second factor was described as Fine expertise with knowledge of syntax, semantics, and pragmatics highly loading on it.
In the case of this research project, Practitioners used our participants to resample indices of language expertise. In other words, they collected additional measures of language performance using their questionnaire. In the case of the fourth study, they aimed to validate their proposed two-factor solution for this questionnaire and planned to run a confirmatory factor analysis on this data. Can you help them with this goal?
Specify and estimate a confirmatory factor model that tests the proposed latent structure. NOTE: use scaling of the variance to 1 for the latent factor structure.
library(lavaan)
library(psych)
scatter.hist(x=Lexicon$Orthography, y=Lexicon$Semantics, density=T, ellipse=T)
fa1 ='
RuleAutomization=~ NA*Orthography + Phonetics + Morphology
FineExpertise=~ NA*Syntax + Semantics + Pragmatics
RuleAutomization ~~ 1*RuleAutomization
FineExpertise ~~ 1*FineExpertise
'
cfa1 = cfa(fa1, data=Lexicon)
summary(cfa1, standardize=TRUE, fit.measures=TRUE)
lavaan 0.6-8 ended normally after 23 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 13
Number of observations 350
Model Test User Model:
Test statistic 9.113
Degrees of freedom 8
P-value (Chi-square) 0.333
Model Test Baseline Model:
Test statistic 734.539
Degrees of freedom 15
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.998
Tucker-Lewis Index (TLI) 0.997
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -8975.372
Loglikelihood unrestricted model (H1) -8970.815
Akaike (AIC) 17976.743
Bayesian (BIC) 18026.896
Sample-size adjusted Bayesian (BIC) 17985.656
Root Mean Square Error of Approximation:
RMSEA 0.020
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.068
P-value RMSEA <= 0.05 0.810
Standardized Root Mean Square Residual:
SRMR 0.034
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv
RuleAutomization =~
Orthography 17.720 1.081 16.387 0.000 17.720
Phonetics 11.616 1.041 11.163 0.000 11.616
Morphology 16.438 1.120 14.680 0.000 16.438
FineExpertise =~
Syntax 16.953 1.099 15.429 0.000 16.953
Semantics 17.099 0.992 17.238 0.000 17.099
Pragmatics 15.039 1.084 13.867 0.000 15.039
Std.all
0.870
0.595
0.781
0.777
0.856
0.706
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv
RuleAutomization ~~
FineExpertise 0.080 0.063 1.267 0.205 0.080
Std.all
0.080
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv
RuleAutomizatn 1.000 1.000
FineExpertise 1.000 1.000
.Orthography 100.384 24.546 4.090 0.000 100.384
.Phonetics 245.826 21.133 11.632 0.000 245.826
.Morphology 172.963 23.975 7.214 0.000 172.963
.Syntax 188.499 22.357 8.431 0.000 188.499
.Semantics 106.331 19.273 5.517 0.000 106.331
.Pragmatics 227.307 21.895 10.382 0.000 227.307
Std.all
1.000
1.000
0.242
0.646
0.390
0.396
0.267
0.501
lavPredict(cfa1)
RlAtmz FnExpr
[1,] 0.179 0.438
[2,] 0.177 -0.160
[3,] -0.450 0.897
[4,] -0.372 -1.784
[5,] 0.292 0.270
[6,] -0.812 -0.567
[7,] -0.008 -2.203
[8,] -2.216 -0.080
[9,] -1.113 0.842
[10,] -0.273 -1.086
[11,] -0.003 -1.314
[12,] 0.482 -0.929
[13,] 0.330 -0.973
[14,] 0.006 0.653
[15,] -0.220 -1.184
[16,] 2.136 0.801
[17,] 1.248 -0.691
[18,] -0.585 0.129
[19,] -0.920 1.601
[20,] -0.255 -0.739
[21,] 0.721 -0.399
[22,] -0.310 -0.478
[23,] 0.398 -0.743
[24,] -0.543 -1.116
[25,] 0.617 0.268
[26,] 0.167 -0.139
[27,] -0.406 0.320
[28,] -0.076 0.619
[29,] -0.804 -0.055
[30,] 0.624 -0.125
[31,] -0.122 -0.286
[32,] 0.686 0.014
[33,] -0.635 1.134
[34,] -0.760 -0.782
[35,] 0.321 1.075
[36,] 1.447 0.229
[37,] -0.388 -1.004
[38,] -1.758 0.735
[39,] 1.017 0.357
[40,] 1.836 0.459
[41,] -1.992 -0.367
[42,] -0.344 -0.512
[43,] -0.306 1.230
[44,] -0.796 -0.562
[45,] -0.099 -1.449
[46,] 0.703 0.864
[47,] 0.524 1.342
[48,] 0.813 -0.020
[49,] -0.111 -1.452
[50,] -1.066 -0.268
[51,] 0.377 -0.415
[52,] 0.176 -0.705
[53,] -1.692 -0.882
[54,] -0.705 0.513
[55,] 0.561 1.210
[56,] 0.814 1.049
[57,] 0.431 -0.211
[58,] 0.224 -0.521
[59,] 1.177 0.004
[60,] 0.236 -1.278
[61,] -0.120 2.295
[62,] 0.724 -0.023
[63,] 0.689 -1.383
[64,] 0.673 0.078
[65,] -0.359 -0.045
[66,] 0.330 -0.837
[67,] -0.135 -0.845
[68,] 1.378 1.186
[69,] -0.211 -1.264
[70,] -0.001 0.097
[71,] -0.900 -1.023
[72,] -0.652 0.511
[73,] 1.119 1.446
[74,] -0.417 1.235
[75,] -0.911 0.356
[76,] 1.713 0.456
[77,] -0.226 -0.589
[78,] 1.346 0.822
[79,] 0.286 1.239
[80,] 1.303 -1.453
[81,] 0.179 -0.791
[82,] 0.162 0.386
[83,] 0.167 -1.572
[84,] -0.644 0.161
[85,] -0.365 -1.481
[86,] -0.087 -0.305
[87,] -0.945 -0.499
[88,] -1.874 0.388
[89,] 0.218 -1.281
[90,] 0.153 0.374
[91,] -0.179 -0.218
[92,] -0.284 -0.812
[93,] 1.714 0.385
[94,] -1.020 1.930
[95,] -0.820 -0.619
[96,] 0.419 1.311
[97,] -0.365 0.809
[98,] 0.541 0.540
[99,] -0.077 -0.356
[100,] -0.210 0.076
[101,] 1.593 1.192
[102,] -0.995 -0.849
[103,] -0.157 0.750
[104,] -0.483 0.200
[105,] -0.878 -0.467
[106,] 0.144 -0.336
[107,] -0.025 -0.141
[108,] -0.776 1.080
[109,] 0.437 0.188
[110,] 0.183 0.350
[111,] -0.015 -0.579
[112,] -0.336 0.215
[113,] -0.353 -1.382
[114,] 0.982 -0.985
[115,] -0.487 -0.861
[116,] 0.503 0.510
[117,] -0.143 -0.031
[118,] 0.863 -1.288
[119,] -1.383 1.346
[120,] -0.615 0.016
[121,] -1.391 -0.308
[122,] 1.351 0.147
[123,] -0.908 -0.531
[124,] -0.179 1.764
[125,] 0.552 0.092
[126,] -2.006 -0.981
[127,] 0.779 -0.826
[128,] -1.526 1.819
[129,] 0.713 2.853
[130,] 0.413 0.646
[131,] -0.254 0.160
[132,] 0.908 0.300
[133,] 0.449 -1.442
[134,] -1.123 -0.190
[135,] -0.642 -1.373
[136,] -0.246 0.233
[137,] 0.628 -0.241
[138,] -0.124 0.244
[139,] -1.614 0.888
[140,] -0.162 1.068
[141,] 1.119 0.402
[142,] -1.901 -0.150
[143,] -0.131 0.709
[144,] 1.077 1.322
[145,] -1.400 0.476
[146,] 0.060 0.519
[147,] -0.693 -2.104
[148,] 0.729 2.150
[149,] -0.390 0.630
[150,] -1.464 1.256
[151,] -1.273 -0.399
[152,] 1.126 1.625
[153,] 0.993 2.114
[154,] 0.232 -0.737
[155,] -0.951 0.323
[156,] -1.041 -0.019
[157,] 1.024 -0.843
[158,] -0.386 -1.405
[159,] 2.027 2.189
[160,] -0.076 0.754
[161,] 2.169 0.331
[162,] 0.480 0.765
[163,] -0.116 0.338
[164,] 0.813 -1.133
[165,] 1.672 -0.205
[166,] -0.732 -0.633
[167,] -0.366 0.755
[168,] -0.424 0.749
[169,] 0.616 -0.396
[170,] 0.669 -0.208
[171,] -1.666 -1.471
[172,] 0.665 0.755
[173,] -1.337 1.293
[174,] -0.587 -1.934
[175,] 1.638 0.087
[176,] -0.696 0.405
[177,] -1.584 0.355
[178,] -0.920 0.063
[179,] -0.498 -1.001
[180,] 0.534 0.065
[181,] -1.142 -2.090
[182,] 0.689 -1.917
[183,] 1.087 0.375
[184,] 0.385 -0.830
[185,] -0.663 0.760
[186,] 0.509 -0.384
[187,] -0.871 -0.236
[188,] -0.168 0.125
[189,] 0.647 0.173
[190,] -0.230 0.011
[191,] 0.256 -0.697
[192,] 1.091 -0.449
[193,] -1.403 -0.326
[194,] 0.117 -0.062
[195,] -0.227 1.078
[196,] 0.572 -0.158
[197,] 0.534 -1.059
[198,] -1.030 -0.573
[199,] -0.551 1.218
[200,] -0.511 0.603
[201,] 0.128 0.313
[202,] 0.146 0.596
[203,] -0.058 -1.148
[204,] -1.652 1.264
[205,] 1.686 -1.320
[206,] 0.244 0.608
[207,] 1.646 0.240
[208,] -0.073 -1.996
[209,] 0.303 1.217
[210,] 1.240 0.443
[211,] 0.375 0.178
[212,] 2.177 -0.762
[213,] 0.962 0.629
[214,] 0.287 1.631
[215,] -1.668 0.195
[216,] 1.327 0.721
[217,] 0.944 0.083
[218,] 0.729 -0.552
[219,] 1.066 -0.597
[220,] 3.574 0.604
[221,] -1.612 -1.354
[222,] -0.338 -0.569
[223,] 1.123 -0.185
[224,] -0.176 0.300
[225,] 0.002 2.434
[226,] 0.599 1.190
[227,] 0.354 -0.350
[228,] -1.315 0.197
[229,] 0.867 -0.287
[230,] -1.006 -0.169
[231,] -0.293 0.635
[232,] -0.411 0.559
[233,] 0.065 -0.013
[234,] -0.240 -0.007
[235,] -0.137 -1.572
[236,] 1.288 -0.163
[237,] 0.178 1.532
[238,] -0.469 0.300
[239,] 0.246 -0.806
[240,] 0.618 1.156
[241,] 1.214 -0.115
[242,] -1.106 0.370
[243,] -0.137 0.732
[244,] 0.093 -0.288
[245,] -0.474 0.590
[246,] 1.050 0.734
[247,] 0.931 -1.181
[248,] -0.169 -0.953
[249,] 0.334 0.931
[250,] 1.033 -0.143
[251,] 0.420 -0.099
[252,] 1.449 0.293
[253,] 0.841 1.021
[254,] -1.174 -0.727
[255,] 0.699 0.830
[256,] -0.663 -0.818
[257,] -1.186 0.263
[258,] 1.857 -0.060
[259,] -0.218 -1.640
[260,] 0.633 0.566
[261,] -0.309 -0.085
[262,] 0.309 0.326
[263,] -1.033 -1.715
[264,] 0.349 0.756
[265,] 0.446 -0.907
[266,] 0.904 0.256
[267,] 1.198 -0.694
[268,] -0.895 -0.808
[269,] -0.809 -0.042
[270,] 0.137 1.153
[271,] -1.543 1.075
[272,] 1.757 0.105
[273,] 0.408 1.709
[274,] 0.087 -0.753
[275,] -0.578 -0.181
[276,] 0.447 0.536
[277,] 0.710 0.575
[278,] -0.723 -1.828
[279,] -1.088 1.608
[280,] 1.333 0.813
[281,] -0.012 -0.691
[282,] 1.854 0.082
[283,] -0.346 1.284
[284,] 0.724 -0.586
[285,] 0.731 0.730
[286,] -2.244 0.177
[287,] 0.526 0.334
[288,] -1.543 -1.212
[289,] 0.445 0.600
[290,] -0.628 -0.872
[291,] -0.297 -0.474
[292,] 0.960 -0.433
[293,] -1.499 -2.468
[294,] -1.375 -0.215
[295,] -0.488 -0.177
[296,] 0.961 -0.194
[297,] 0.305 -1.016
[298,] -0.512 -1.428
[299,] -1.586 0.981
[300,] -0.560 1.275
[301,] -0.076 -0.185
[302,] 0.751 -0.119
[303,] 0.648 -0.063
[304,] 0.705 0.418
[305,] 0.155 -2.106
[306,] 0.831 0.767
[307,] -1.703 -0.189
[308,] 1.282 0.112
[309,] -0.005 0.704
[310,] -0.276 0.438
[311,] -0.906 0.607
[312,] -1.343 -0.966
[313,] -0.839 -1.059
[314,] -1.920 -0.611
[315,] 0.990 0.272
[316,] 0.862 -1.315
[317,] -0.380 0.321
[318,] -0.332 -0.548
[319,] 0.031 1.364
[320,] 0.170 0.678
[321,] 0.020 0.403
[322,] -1.484 0.086
[323,] -0.282 0.843
[324,] 0.325 -0.465
[325,] -0.724 -0.348
[326,] 0.151 0.284
[327,] 0.526 0.500
[328,] -1.308 -0.407
[329,] 1.082 -1.396
[330,] 0.635 -0.638
[331,] 0.335 0.389
[332,] -0.102 0.465
[333,] 0.460 0.476
[334,] -1.145 2.069
[335,] -0.823 -0.294
[336,] -0.746 1.954
[337,] -0.206 0.294
[338,] -0.655 -0.734
[339,] -1.283 -0.043
[340,] -0.889 -0.246
[341,] 0.888 -1.425
[342,] -0.032 -0.872
[343,] -2.723 0.064
[344,] -0.400 -1.608
[345,] 1.779 -1.218
[346,] -0.539 -1.172
[347,] 0.154 0.853
[348,] 0.268 0.420
[349,] -0.480 0.027
[350,] 1.206 1.282
#Lexicon$probability <- predict(cfa1 , newdata = NULL, type = "response")
fitMIM = cfa(fa1, data=Lexicon, group = 'SES', group.equal='loadings')
summary(fitMIM)
lavaan 0.6-8 ended normally after 32 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 38
Number of equality constraints 6
Number of observations per group:
Low 163
High 187
Model Test User Model:
Test statistic 21.783
Degrees of freedom 22
P-value (Chi-square) 0.473
Test statistic for each group:
Low 9.891
High 11.892
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Group 1 [Low]:
Latent Variables:
Estimate Std.Err z-value P(>|z|)
RuleAutomization =~
Orthgrp (.p1.) 17.759 1.081 16.433 0.000
Phontcs (.p2.) 11.595 1.038 11.169 0.000
Mrphlgy (.p3.) 16.412 1.117 14.698 0.000
FineExpertise =~
Syntax (.p4.) 16.845 1.096 15.373 0.000
Semntcs (.p5.) 17.095 0.992 17.235 0.000
Prgmtcs (.p6.) 14.951 1.084 13.798 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
RuleAutomization ~~
FineExpertise 0.004 0.093 0.048 0.962
Intercepts:
Estimate Std.Err z-value P(>|z|)
.Orthography 100.030 1.579 63.348 0.000
.Phonetics 100.922 1.540 65.526 0.000
.Morphology 101.343 1.659 61.084 0.000
.Syntax 100.414 1.676 59.930 0.000
.Semantics 100.367 1.579 63.576 0.000
.Pragmatics 101.420 1.650 61.475 0.000
RuleAutomizatn 0.000
FineExpertise 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
RuleAutomizatn 1.000
FineExpertise 1.000
.Orthography 91.050 29.789 3.056 0.002
.Phonetics 252.226 31.066 8.119 0.000
.Morphology 179.310 31.505 5.691 0.000
.Syntax 173.836 29.129 5.968 0.000
.Semantics 113.997 25.388 4.490 0.000
.Pragmatics 220.105 30.278 7.270 0.000
Group 2 [High]:
Latent Variables:
Estimate Std.Err z-value P(>|z|)
RuleAutomization =~
Orthgrp (.p1.) 17.759 1.081 16.433 0.000
Phontcs (.p2.) 11.595 1.038 11.169 0.000
Mrphlgy (.p3.) 16.412 1.117 14.698 0.000
FineExpertise =~
Syntax (.p4.) 16.845 1.096 15.373 0.000
Semntcs (.p5.) 17.095 0.992 17.235 0.000
Prgmtcs (.p6.) 14.951 1.084 13.798 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
RuleAutomization ~~
FineExpertise 0.143 0.086 1.670 0.095
Intercepts:
Estimate Std.Err z-value P(>|z|)
.Orthography 98.718 1.504 65.637 0.000
.Phonetics 98.901 1.413 70.004 0.000
.Morphology 98.885 1.528 64.731 0.000
.Syntax 97.602 1.611 60.574 0.000
.Semantics 99.508 1.444 68.897 0.000
.Pragmatics 99.042 1.566 63.234 0.000
RuleAutomizatn 0.000
FineExpertise 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
RuleAutomizatn 1.000
FineExpertise 1.000
.Orthography 107.629 29.019 3.709 0.000
.Phonetics 238.810 27.792 8.593 0.000
.Morphology 167.044 28.954 5.769 0.000
.Syntax 201.741 29.807 6.768 0.000
.Semantics 97.838 23.595 4.147 0.000
.Pragmatics 235.204 29.901 7.866 0.000
fitMIC <- cfa(fa1, data = Lexicon, group = 'SES')
summary(fitMIC)
lavaan 0.6-8 ended normally after 36 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 38
Number of observations per group:
Low 163
High 187
Model Test User Model:
Test statistic 13.914
Degrees of freedom 16
P-value (Chi-square) 0.605
Test statistic for each group:
Low 5.743
High 8.171
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Group 1 [Low]:
Latent Variables:
Estimate Std.Err z-value P(>|z|)
RuleAutomization =~
Orthography 19.872 1.616 12.298 0.000
Phonetics 12.066 1.561 7.728 0.000
Morphology 15.645 1.643 9.525 0.000
FineExpertise =~
Syntax 15.403 1.563 9.856 0.000
Semantics 18.165 1.511 12.023 0.000
Pragmatics 13.920 1.562 8.914 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
RuleAutomization ~~
FineExpertise 0.027 0.089 0.307 0.758
Intercepts:
Estimate Std.Err z-value P(>|z|)
.Orthography 100.030 1.650 60.626 0.000
.Phonetics 100.922 1.574 64.109 0.000
.Morphology 101.343 1.663 60.926 0.000
.Syntax 100.414 1.623 61.869 0.000
.Semantics 100.367 1.597 62.860 0.000
.Pragmatics 101.420 1.611 62.959 0.000
RuleAutomizatn 0.000
FineExpertise 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
RuleAutomizatn 1.000
FineExpertise 1.000
.Orthography 48.847 42.036 1.162 0.245
.Phonetics 258.364 32.486 7.953 0.000
.Morphology 206.223 34.489 5.979 0.000
.Syntax 192.101 31.012 6.194 0.000
.Semantics 85.586 32.773 2.611 0.009
.Pragmatics 229.227 31.372 7.307 0.000
Group 2 [High]:
Latent Variables:
Estimate Std.Err z-value P(>|z|)
RuleAutomization =~
Orthography 15.840 1.457 10.874 0.000
Phonetics 11.065 1.396 7.928 0.000
Morphology 17.226 1.541 11.180 0.000
FineExpertise =~
Syntax 18.148 1.534 11.833 0.000
Semantics 16.385 1.317 12.446 0.000
Pragmatics 15.852 1.502 10.551 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
RuleAutomization ~~
FineExpertise 0.132 0.087 1.521 0.128
Intercepts:
Estimate Std.Err z-value P(>|z|)
.Orthography 98.718 1.440 68.536 0.000
.Phonetics 98.901 1.385 71.416 0.000
.Morphology 98.885 1.523 64.941 0.000
.Syntax 97.602 1.656 58.941 0.000
.Semantics 99.508 1.432 69.472 0.000
.Pragmatics 99.042 1.598 61.986 0.000
RuleAutomizatn 0.000
FineExpertise 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
RuleAutomizatn 1.000
FineExpertise 1.000
.Orthography 137.060 30.360 4.514 0.000
.Phonetics 236.201 27.782 8.502 0.000
.Morphology 136.837 34.755 3.937 0.000
.Syntax 183.432 31.722 5.782 0.000
.Semantics 115.175 23.894 4.820 0.000
.Pragmatics 226.120 30.414 7.435 0.000
library(semTools)
summary(compareFit(fitMIM,fitMIC))
################### Nested Model Comparison #########################
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fitMIC 16 18013 18159 13.914
fitMIM 22 18009 18132 21.783 7.8691 6 0.2478
####################### Model Fit Indices ###########################
################## Differences in Fit Indices #######################
NA
In the last step of data collection, Practitioners created an experimental study to investigate the cognitive processing of different words. In particular, they focused on the processing of cognates versus non-cognates. Cognates are words that have a common etymological origin as words in other languages (e.g., English: starve and German: sterben). Practitioners randomly choose 100 participants from their original sample and presented them with a lexical decision task. In this task, participants are presented with a string of letters, and they are asked to indicate whether this string of letters is a real word (e.g., starve) versus a non-word (e.g., snepd) as quickly and as accurately as possible. Each participant was presented with 100 strings, half of which were words. More importantly, practitioners manipulated cognate status within those 50 words, thus 25 words were cognates and 25 words were non-cognates. In the case of this analysis, Practitioners would like you to focus only on the speed of processing of these real words, measured as a reaction time in milliseconds. Smaller reaction times indicate faster processing, which is potentially important cognitive status of cognates. Can you analyse this experimental data and inform Practitioners whether cognates elicit faster processing times in comparison to the non-cognate words?
In the second dataset (LexicalDec), you can find all the collected information. Column ID indicates repeated measurements for each subject, item_id indicates repeated measurements for each item, WordType shows whether words were cognates or non-cognates, and RT (reaction time) shows how fast participants answered the question about whether the presented string of letters was a word or a non-word. Practitioners also added other information to this dataset, such as mother and father’s vocabulary size, time spent reading and estimated vocabulary size while our participants were toddlers. Using this data, build a model that accounts for the repeated measurements (clustered observations) and tests the effects of type of words. You can also use additional measures to control for the effects of vocabulary size and time spent reading.
Specify the statistical model that tests the effect of type of words on reaction time.
library(lme4)
#install.packages("lmerTest")
library(lmerTest)
#install.packages("sjPlot")
require(sjPlot)
#install.packages("glmmTMB")
library(glmmTMB)
#install.packages("merTools")
library('merTools')
#install.packages("jtools")
library(jtools)
ggplot(data = LexicalDec, aes(item_id, RT, group=ID))+geom_line()+geom_point()
# Null model
m_0 = lmer(RT~1 + (1|ID) + (1|item_id), REML= FALSE, data=LexicalDec)
summary(m_0) #checking summary
Linear mixed model fit by maximum likelihood . t-tests use
Satterthwaite's method [lmerModLmerTest]
Formula: RT ~ 1 + (1 | ID) + (1 | item_id)
Data: LexicalDec
AIC BIC logLik deviance df.resid
67966.1 67992.2 -33979.1 67958.1 4996
Scaled residuals:
Min 1Q Median 3Q Max
-3.7334 -0.6734 -0.0033 0.6576 3.4361
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 403733 635.40
item_id (Intercept) 5531 74.37
Residual 40255 200.64
Number of obs: 5000, groups: ID, 100; item_id, 50
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2881.20 64.47 105.36 44.69 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
confint(m_0) #checking confidence intervals
Computing profile confidence intervals ...
2.5 % 97.5 %
.sig01 556.3987 734.9580
.sig02 60.7480 93.0611
.sigma 196.7086 204.6952
(Intercept) 2753.6867 3008.7185
summ(m_0) #checking lmer summary
MODEL INFO:
Observations: 5000
Dependent Variable: RT
Type: Mixed effects linear regression
MODEL FIT:
AIC = 67966.11, BIC = 67992.18
Pseudo-R² (fixed effects) = 0.00
Pseudo-R² (total) = 0.91
FIXED EFFECTS:
------------------------------------------------------------
Est. S.E. t val. d.f. p
----------------- --------- ------- -------- -------- ------
(Intercept) 2881.20 64.47 44.69 105.36 0.00
------------------------------------------------------------
p values calculated using Satterthwaite d.f.
RANDOM EFFECTS:
------------------------------------
Group Parameter Std. Dev.
---------- ------------- -----------
ID (Intercept) 635.40
item_id (Intercept) 74.37
Residual 200.64
------------------------------------
Grouping variables:
---------------------------
Group # groups ICC
--------- ---------- ------
ID 100 0.90
item_id 50 0.01
---------------------------
ranova(m_0) #implementing ranova
ANOVA-like table for random-effects: Single term deletions
Model:
RT ~ (1 | ID) + (1 | item_id)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 4 -33979 67966
(1 | ID) 3 -39632 79271 11306.8 1 < 2.2e-16 ***
(1 | item_id) 3 -34229 68463 499.3 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Model 1
m_1 = lmer(RT~1 + WordType + (1|ID) + (1|item_id), REML= FALSE, data=LexicalDec)
summary(m_1)
Linear mixed model fit by maximum likelihood . t-tests use
Satterthwaite's method [lmerModLmerTest]
Formula: RT ~ 1 + WordType + (1 | ID) + (1 | item_id)
Data: LexicalDec
AIC BIC logLik deviance df.resid
67960.6 67993.1 -33975.3 67950.6 4995
Scaled residuals:
Min 1Q Median 3Q Max
-3.7225 -0.6691 -0.0032 0.6608 3.4429
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 403722 635.39
item_id (Intercept) 4684 68.44
Residual 40255 200.64
Number of obs: 5000, groups: ID, 100; item_id, 50
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2852.38 65.12 109.33 43.802 < 2e-16 ***
WordTypenon-cognates 57.64 20.17 49.05 2.857 0.00626 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
WrdTypnn-cg -0.155
confint(m_1)
Computing profile confidence intervals ...
2.5 % 97.5 %
.sig01 556.38816 734.94310
.sig02 55.73392 85.83507
.sigma 196.70861 204.69519
(Intercept) 2723.61925 2981.15034
WordTypenon-cognates 17.31108 97.96025
summ(m_1)
MODEL INFO:
Observations: 5000
Dependent Variable: RT
Type: Mixed effects linear regression
MODEL FIT:
AIC = 67960.56, BIC = 67993.15
Pseudo-R² (fixed effects) = 0.00
Pseudo-R² (total) = 0.91
FIXED EFFECTS:
---------------------------------------------------------------------
Est. S.E. t val. d.f. p
-------------------------- --------- ------- -------- -------- ------
(Intercept) 2852.38 65.12 43.80 109.33 0.00
WordTypenon-cognates 57.64 20.17 2.86 49.05 0.01
---------------------------------------------------------------------
p values calculated using Satterthwaite d.f.
RANDOM EFFECTS:
------------------------------------
Group Parameter Std. Dev.
---------- ------------- -----------
ID (Intercept) 635.39
item_id (Intercept) 68.44
Residual 200.64
------------------------------------
Grouping variables:
---------------------------
Group # groups ICC
--------- ---------- ------
ID 100 0.90
item_id 50 0.01
---------------------------
ranova(m_1)
ANOVA-like table for random-effects: Single term deletions
Model:
RT ~ WordType + (1 | ID) + (1 | item_id)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 5 -33975 67961
(1 | ID) 4 -39629 79265 11306.6 1 < 2.2e-16 ***
(1 | item_id) 4 -34183 68374 415.3 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#ICC
ICC(outcome = "RT", group = c("ID", "item_id"), data = LexicalDec)
Using formula(x) is deprecated when x is a character vector of length > 1.
Consider formula(paste(x, collapse = " ")) instead.
[1] 0.8990089
# Model 2
m_2 = lmer(RT~1 + WordType + ReadingTime + (1|ID) + (1|item_id),REML= FALSE, data=LexicalDec)
summary(m_2)
Linear mixed model fit by maximum likelihood . t-tests use
Satterthwaite's method [lmerModLmerTest]
Formula: RT ~ 1 + WordType + ReadingTime + (1 | ID) + (1 | item_id)
Data: LexicalDec
AIC BIC logLik deviance df.resid
67958.3 67997.4 -33973.2 67946.3 4994
Scaled residuals:
Min 1Q Median 3Q Max
-3.7231 -0.6698 -0.0031 0.6610 3.4434
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 386888 622.00
item_id (Intercept) 4684 68.44
Residual 40255 200.64
Number of obs: 5000, groups: ID, 100; item_id, 50
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2175.653 330.970 100.356 6.574 2.23e-09
WordTypenon-cognates 57.636 20.172 49.050 2.857 0.00625
ReadingTime 6.765 3.246 99.999 2.084 0.03973
(Intercept) ***
WordTypenon-cognates **
ReadingTime *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) WrdTy-
WrdTypnn-cg -0.030
ReadingTime -0.981 0.000
confint(m_2)
Computing profile confidence intervals ...
2.5 % 97.5 %
.sig01 544.6573875 719.46486
.sig02 55.7334700 85.83360
.sigma 196.7086156 204.69519
(Intercept) 1520.7055704 2830.59964
WordTypenon-cognates 17.3115821 97.95975
ReadingTime 0.3404082 13.18931
summ(m_2)
MODEL INFO:
Observations: 5000
Dependent Variable: RT
Type: Mixed effects linear regression
MODEL FIT:
AIC = 67958.31, BIC = 67997.41
Pseudo-R² (fixed effects) = 0.04
Pseudo-R² (total) = 0.91
FIXED EFFECTS:
----------------------------------------------------------------------
Est. S.E. t val. d.f. p
-------------------------- --------- -------- -------- -------- ------
(Intercept) 2175.65 330.97 6.57 100.36 0.00
WordTypenon-cognates 57.64 20.17 2.86 49.05 0.01
ReadingTime 6.76 3.25 2.08 100.00 0.04
----------------------------------------------------------------------
p values calculated using Satterthwaite d.f.
RANDOM EFFECTS:
------------------------------------
Group Parameter Std. Dev.
---------- ------------- -----------
ID (Intercept) 622.00
item_id (Intercept) 68.44
Residual 200.64
------------------------------------
Grouping variables:
---------------------------
Group # groups ICC
--------- ---------- ------
ID 100 0.90
item_id 50 0.01
---------------------------
ranova(m_2)
ANOVA-like table for random-effects: Single term deletions
Model:
RT ~ WordType + ReadingTime + (1 | ID) + (1 | item_id)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 6 -33973 67958
(1 | ID) 5 -39533 79076 11119.3 1 < 2.2e-16 ***
(1 | item_id) 5 -34181 68372 415.3 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Model 3
m_3 = lmer(RT~1 + WordType + ReadingTime + ToddlerVocab + (1|ID) + (1|item_id),REML= FALSE, data=LexicalDec)
summary(m_3)
Linear mixed model fit by maximum likelihood . t-tests use
Satterthwaite's method [lmerModLmerTest]
Formula:
RT ~ 1 + WordType + ReadingTime + ToddlerVocab + (1 | ID) + (1 |
item_id)
Data: LexicalDec
AIC BIC logLik deviance df.resid
67606.5 67652.1 -33796.2 67592.5 4993
Scaled residuals:
Min 1Q Median 3Q Max
-3.6367 -0.6657 0.0035 0.6595 3.5166
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 10447 102.21
item_id (Intercept) 4636 68.09
Residual 40255 200.64
Number of obs: 5000, groups: ID, 100; item_id, 50
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3953.7210 65.6668 108.8113 60.209 <2e-16
WordTypenon-cognates 57.6357 20.0774 49.7399 2.871 0.006
ReadingTime 14.3486 0.5684 99.7945 25.244 <2e-16
ToddlerVocab -38.6674 0.6686 99.7945 -57.838 <2e-16
(Intercept) ***
WordTypenon-cognates **
ReadingTime ***
ToddlerVocab ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) WrdTy- RdngTm
WrdTypnn-cg -0.153
ReadingTime -0.712 0.000
ToddlerVocb -0.468 0.000 -0.231
confint(m_3)
Computing profile confidence intervals ...
2.5 % 97.5 %
.sig01 88.46066 119.35554
.sig02 55.50986 85.25948
.sigma 196.70934 204.69601
(Intercept) 3823.87012 4083.57184
WordTypenon-cognates 17.51276 97.75857
ReadingTime 13.22373 15.47341
ToddlerVocab -39.99049 -37.34439
# Model 4
m_4 = lmer(RT~1 + WordType + ReadingTime + ToddlerVocab + (1|ID) + (1|item_id) + MotherVocab,REML= FALSE, data=LexicalDec)
summary(m_4)
Linear mixed model fit by maximum likelihood . t-tests use
Satterthwaite's method [lmerModLmerTest]
Formula:
RT ~ 1 + WordType + ReadingTime + ToddlerVocab + (1 | ID) + (1 |
item_id) + MotherVocab
Data: LexicalDec
AIC BIC logLik deviance df.resid
67607.0 67659.1 -33795.5 67591.0 4992
Scaled residuals:
Min 1Q Median 3Q Max
-3.6327 -0.6654 0.0029 0.6605 3.5196
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 10279 101.38
item_id (Intercept) 4636 68.09
Residual 40255 200.64
Number of obs: 5000, groups: ID, 100; item_id, 50
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3910.1041 74.2082 106.8387 52.691 <2e-16
WordTypenon-cognates 57.6357 20.0766 49.7442 2.871 0.006
ReadingTime 14.1666 0.5832 99.7910 24.292 <2e-16
ToddlerVocab -38.9630 0.7057 99.7910 -55.213 <2e-16
MotherVocab 0.8039 0.6533 99.7910 1.231 0.221
(Intercept) ***
WordTypenon-cognates **
ReadingTime ***
ToddlerVocab ***
MotherVocab
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) WrdTy- RdngTm TddlrV
WrdTypnn-cg -0.135
ReadingTime -0.484 0.000
ToddlerVocb -0.224 0.000 -0.124
MotherVocab -0.478 0.000 -0.254 -0.340
anova(m_3, m_4)
Data: LexicalDec
Models:
m_3: RT ~ 1 + WordType + ReadingTime + ToddlerVocab + (1 | ID) + (1 |
m_3: item_id)
m_4: RT ~ 1 + WordType + ReadingTime + ToddlerVocab + (1 | ID) + (1 |
m_4: item_id) + MotherVocab
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
m_3 7 67606 67652 -33796 67592
m_4 8 67607 67659 -33795 67591 1.5029 1 0.2202
summ(m_4)
MODEL INFO:
Observations: 5000
Dependent Variable: RT
Type: Mixed effects linear regression
MODEL FIT:
AIC = 67606.99, BIC = 67659.13
Pseudo-R² (fixed effects) = 0.88
Pseudo-R² (total) = 0.91
FIXED EFFECTS:
---------------------------------------------------------------------
Est. S.E. t val. d.f. p
-------------------------- --------- ------- -------- -------- ------
(Intercept) 3910.10 74.21 52.69 106.84 0.00
WordTypenon-cognates 57.64 20.08 2.87 49.74 0.01
ReadingTime 14.17 0.58 24.29 99.79 0.00
ToddlerVocab -38.96 0.71 -55.21 99.79 0.00
MotherVocab 0.80 0.65 1.23 99.79 0.22
---------------------------------------------------------------------
p values calculated using Satterthwaite d.f.
RANDOM EFFECTS:
------------------------------------
Group Parameter Std. Dev.
---------- ------------- -----------
ID (Intercept) 101.38
item_id (Intercept) 68.09
Residual 200.64
------------------------------------
Grouping variables:
---------------------------
Group # groups ICC
--------- ---------- ------
ID 100 0.19
item_id 50 0.08
---------------------------
ranova(m_4)
ANOVA-like table for random-effects: Single term deletions
Model:
RT ~ WordType + ReadingTime + ToddlerVocab + MotherVocab + (1 | ID) + (1 | item_id)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 8 -33795 67607
(1 | ID) 7 -34231 68476 870.78 1 < 2.2e-16 ***
(1 | item_id) 7 -34003 68020 414.72 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Model 5
m_5 = lmer(RT~1 + WordType + ReadingTime + ToddlerVocab + (1|ID) + (1|item_id) + MotherVocab + FatherVocab ,REML= FALSE, data=LexicalDec)
summary(m_5)
Linear mixed model fit by maximum likelihood . t-tests use
Satterthwaite's method [lmerModLmerTest]
Formula:
RT ~ 1 + WordType + ReadingTime + ToddlerVocab + (1 | ID) + (1 |
item_id) + MotherVocab + FatherVocab
Data: LexicalDec
AIC BIC logLik deviance df.resid
67607.4 67666.1 -33794.7 67589.4 4991
Scaled residuals:
Min 1Q Median 3Q Max
-3.6320 -0.6665 0.0038 0.6583 3.5294
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 10104 100.52
item_id (Intercept) 4634 68.07
Residual 40255 200.64
Number of obs: 5000, groups: ID, 100; item_id, 50
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3956.4526 82.2228 105.5126 48.119 < 2e-16
WordTypenon-cognates 57.6357 20.0733 49.7659 2.871 0.00599
ReadingTime 14.2745 0.5848 99.7827 24.410 < 2e-16
ToddlerVocab -38.9139 0.7012 99.7827 -55.499 < 2e-16
MotherVocab 0.9275 0.6554 99.7827 1.415 0.16012
FatherVocab -0.7166 0.5655 99.7827 -1.267 0.20803
(Intercept) ***
WordTypenon-cognates **
ReadingTime ***
ToddlerVocab ***
MotherVocab
FatherVocab
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) WrdTy- RdngTm TddlrV MthrVc
WrdTypnn-cg -0.122
ReadingTime -0.364 0.000
ToddlerVocb -0.176 0.000 -0.114
MotherVocab -0.357 0.000 -0.226 -0.328
FatherVocab -0.445 0.000 -0.146 -0.055 -0.149
summ(m_5)
MODEL INFO:
Observations: 5000
Dependent Variable: RT
Type: Mixed effects linear regression
MODEL FIT:
AIC = 67607.40, BIC = 67666.05
Pseudo-R² (fixed effects) = 0.88
Pseudo-R² (total) = 0.91
FIXED EFFECTS:
---------------------------------------------------------------------
Est. S.E. t val. d.f. p
-------------------------- --------- ------- -------- -------- ------
(Intercept) 3956.45 82.22 48.12 105.51 0.00
WordTypenon-cognates 57.64 20.07 2.87 49.77 0.01
ReadingTime 14.27 0.58 24.41 99.78 0.00
ToddlerVocab -38.91 0.70 -55.50 99.78 0.00
MotherVocab 0.93 0.66 1.42 99.78 0.16
FatherVocab -0.72 0.57 -1.27 99.78 0.21
---------------------------------------------------------------------
p values calculated using Satterthwaite d.f.
RANDOM EFFECTS:
------------------------------------
Group Parameter Std. Dev.
---------- ------------- -----------
ID (Intercept) 100.52
item_id (Intercept) 68.07
Residual 200.64
------------------------------------
Grouping variables:
---------------------------
Group # groups ICC
--------- ---------- ------
ID 100 0.18
item_id 50 0.08
---------------------------
ranova(m_5)
ANOVA-like table for random-effects: Single term deletions
Model:
RT ~ WordType + ReadingTime + ToddlerVocab + MotherVocab + FatherVocab + (1 | ID) + (1 | item_id)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 9 -33795 67607
(1 | ID) 8 -34222 68460 855.03 1 < 2.2e-16 ***
(1 | item_id) 8 -34002 68020 414.72 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Selecting m_3 as final model
library("ggplot2")
plot_model(m_3, type='re', sort.est = 'sort.all', grid = FALSE)
[[1]]
[[2]]
# checking summary for model 3
summ(m_3)
MODEL INFO:
Observations: 5000
Dependent Variable: RT
Type: Mixed effects linear regression
MODEL FIT:
AIC = 67606.49, BIC = 67652.11
Pseudo-R² (fixed effects) = 0.88
Pseudo-R² (total) = 0.91
FIXED EFFECTS:
---------------------------------------------------------------------
Est. S.E. t val. d.f. p
-------------------------- --------- ------- -------- -------- ------
(Intercept) 3953.72 65.67 60.21 108.81 0.00
WordTypenon-cognates 57.64 20.08 2.87 49.74 0.01
ReadingTime 14.35 0.57 25.24 99.79 0.00
ToddlerVocab -38.67 0.67 -57.84 99.79 0.00
---------------------------------------------------------------------
p values calculated using Satterthwaite d.f.
RANDOM EFFECTS:
------------------------------------
Group Parameter Std. Dev.
---------- ------------- -----------
ID (Intercept) 102.21
item_id (Intercept) 68.09
Residual 200.64
------------------------------------
Grouping variables:
---------------------------
Group # groups ICC
--------- ---------- ------
ID 100 0.19
item_id 50 0.08
---------------------------
ranova(m_3)
ANOVA-like table for random-effects: Single term deletions
Model:
RT ~ WordType + ReadingTime + ToddlerVocab + (1 | ID) + (1 | item_id)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 7 -33796 67606
(1 | ID) 6 -34239 68490 885.84 1 < 2.2e-16 ***
(1 | item_id) 6 -34004 68019 414.73 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#checking coeff
coef(summary(m_3))
Estimate Std. Error df t value
(Intercept) 3953.72098 65.6668271 108.81135 60.208802
WordTypenon-cognates 57.63567 20.0774070 49.73991 2.870673
ReadingTime 14.34857 0.5683937 99.79450 25.244069
ToddlerVocab -38.66744 0.6685519 99.79450 -57.837603
Pr(>|t|)
(Intercept) 2.236411e-85
WordTypenon-cognates 6.001178e-03
ReadingTime 3.996799e-45
ToddlerVocab 1.450679e-78
VarCorr(m_3)
Groups Name Std.Dev.
ID (Intercept) 102.21
item_id (Intercept) 68.09
Residual 200.64
# converting to str to plot
str(resid(m_3))
Named num [1:5000] 372.1 -28.3 -131.8 -118.2 383.6 ...
- attr(*, "names")= chr [1:5000] "1" "2" "3" "4" ...
plot(fitted(m_3), resid(m_3, type = "pearson"))# this will create the plot
abline(0,0, col="red")
qqnorm(resid(m_3))
qqline(resid(m_3), col = "red") # add a perfect fit line
# lattice plot
require(lattice)
dotplot(ranef(m_3, condVar=TRUE))
$ID
$item_id
print(dotplot(ranef(m_3,condVar=TRUE))$ID)
# Checking random effect
ranef(m_3)
$ID
(Intercept)
1 87.328278
2 -208.904903
3 -161.237560
4 -46.901925
5 -111.934462
6 7.856438
7 -138.784779
8 38.104224
9 -2.368296
10 89.499938
11 -13.754364
12 20.388692
13 -115.725859
14 90.686380
15 -51.269624
16 -96.077004
17 68.752790
18 -80.415214
19 -47.463353
20 -101.902420
21 150.960177
22 -117.644438
23 -62.626163
24 -41.394581
25 -70.216147
26 -88.693603
27 -34.023706
28 148.729911
29 51.649228
30 -31.129323
31 -5.383212
32 -76.169461
33 54.497236
34 -140.451563
35 -90.479793
36 -44.071895
37 75.675836
38 -35.497309
39 202.528232
40 60.824253
41 -75.609437
42 -167.259251
43 -99.027144
44 -87.502683
45 -107.197561
46 135.413083
47 -28.446581
48 -20.503406
49 147.998023
50 -5.103535
51 95.952306
52 31.591750
53 -125.699439
54 13.029189
55 47.698899
56 34.670904
57 112.540171
58 178.926736
59 -43.193228
60 -115.798347
61 33.716077
62 63.917410
63 27.264064
64 102.563368
65 31.070773
66 126.417584
67 -73.400204
68 -92.183050
69 -99.868778
70 -220.165740
71 34.680973
72 125.130561
73 54.924391
74 64.899875
75 39.784875
76 96.318040
77 -23.716069
78 -20.452357
79 159.209795
80 20.454574
81 106.802145
82 -83.055083
83 -107.702704
84 17.598829
85 23.859126
86 191.101601
87 -105.389097
88 -121.795204
89 -58.518641
90 -92.531620
91 56.290353
92 53.002060
93 18.288573
94 216.366569
95 123.309459
96 -96.321065
97 92.384472
98 188.794983
99 -73.197391
100 144.705369
$item_id
(Intercept)
1 3.610568
2 123.707088
3 16.738904
4 -74.435123
5 169.304426
6 -46.181027
7 36.842658
8 -149.985396
9 -118.650710
10 -72.443667
11 -6.916457
12 41.732867
13 -47.557196
14 -35.173920
15 69.156742
16 37.897748
17 21.562942
18 -23.225126
19 -35.934260
20 -98.094144
21 80.107167
22 -26.220287
23 18.994541
24 16.783913
25 98.377748
26 -113.228475
27 9.297202
28 6.785464
29 -52.200052
30 60.659136
31 67.480371
32 78.703512
33 40.913624
34 -38.158938
35 -2.661941
36 -7.538214
37 -71.870956
38 -22.230771
39 -74.280879
40 -34.590424
41 -22.405845
42 7.115940
43 80.500682
44 7.836681
45 104.969458
46 -57.977259
47 67.626071
48 27.020291
49 -10.910254
50 -50.854423
with conditional variances for “ID” “item_id”
#install.packages("MuMIn")
require(MuMIn)
r.squaredGLMM(m_3)
R2m R2c
[1,] 0.8768848 0.9104415
I’ve used mixed effect model with partial pooling, as the data was multilevel and the ICC score gave substantial evidence of clustering. I used 3 predictors (WordType, ReadingTime and ToddlerVocab) and added 2 random effects (participant ID and itemID). As the data points varied a lot along this axis, I assumed that the main source of a random variation is Participant ID. I added the random effect of item_id too. Additionally, with iterative process I chose a model that controlled for the effect of ReadingTime and ToddlerVocab to measure the effect of WordType.
I considered the listed hypothesis to evaluate the models-
H0: No difference exists
Ha: A difference exists
Formula:
mult1 = lmer(RT~WordType + ReadingTime + ToddlerVocab + (1|ID) + (1|item_id), data=LexicalDec)