Correlation matrix of child-level variables (overall performance)

propIncorrect_choseAssoc is the proportion of incorrect trials where there were 1-2 semantically-related words and the child chose one of them.

child_level_vars <- select(child_item_vars, subjID, Age_years, totalAttempted, totalCorrect, totalIncorrect,
                           num_incorrect_choseAssociate, propPosAssn_correct, propNoAssn_correct, propCorrect,
                           propIncorrect, propIncorrect_choseAssoc) %>% 
  distinct()

child_level_corrs <- select(child_level_vars, propCorrect, propPosAssn_correct, propNoAssn_correct,
                            propIncorrect_choseAssoc, Age_years) %>% 
  cor(use="pairwise.complete.obs", method="pearson")
p.mat_child <- cor.mtest(child_level_corrs)
pMatrix_child <- p.mat_child$p

corrplot(child_level_corrs, method = 'color', type='lower', diag = TRUE, addCoef.col = "grey",
         tl.col = "black", number.font=2, number.cex=.8, p.mat=pMatrix_child, sig.level = 0.05, insig = "blank")

How does associative strength affect the likelihood of an trial being correct, controlling for age & word difficulty (using AOA as proxy)?
summary(glmer(is.Correct ~ Age_years + KupermanAOA + sumStrength + (1|subjID), child_item_vars, family="binomial"))
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: is.Correct ~ Age_years + KupermanAOA + sumStrength + (1 | subjID)
##    Data: child_item_vars
## 
##      AIC      BIC   logLik deviance df.resid 
##   8087.8   8123.1  -4038.9   8077.8     8591 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -8.3874  0.1957  0.3646  0.5299  1.8111 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  subjID (Intercept) 0.3768   0.6139  
## Number of obs: 8596, groups:  subjID, 193
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.92002    0.15625  12.288  < 2e-16 ***
## Age_years    0.55305    0.03441  16.073  < 2e-16 ***
## KupermanAOA -0.52089    0.02094 -24.871  < 2e-16 ***
## sumStrength -3.05610    0.58874  -5.191 2.09e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Ag_yrs KprAOA
## Age_years   -0.610              
## KupermanAOA -0.377 -0.432       
## sumStrength  0.043 -0.060 -0.087

Correlation matrix of item-level variables

Formula used to get t-value for each item: totalCorrect ~ is.Correct + Age_years + totalAttempted

item_level_vars <- select(child_item_vars, item, BlockNumber, WordNumber, sumStrength, numNonZero,
                          hyper_z, KupermanAOA, itemMeanCorrect, n_responded, tval_item) %>% 
  distinct()

item_level_corrs <- select(item_level_vars, sumStrength, numNonZero, hyper_z,
                           KupermanAOA, itemMeanCorrect, tval_item) %>% 
  cor(use="pairwise.complete.obs", method="pearson")
p.mat_item <- cor.mtest(item_level_corrs)
pMatrix_item <- p.mat_item$p

corrplot(item_level_corrs, method = 'color', type='lower', diag = TRUE, addCoef.col = "grey",
         tl.col = "black", number.font=2, number.cex=.8, p.mat=pMatrix_item, sig.level = 0.05, insig = "blank")

No correlation between t-value and sumStrength, but are there individual words that are high on both?

ggplot(filter(item_level_vars, sumStrength > 0), aes(sumStrength, tval_item))+
  geom_point()+
  geom_label_repel(aes(label=ifelse((tval_item>2)|(sumStrength>.3), as.character(item),'')),
                   box.padding=.35,
                   point.padding=.2)+
  theme_classic()
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_label_repel).

posAssn_highT_words <- c("penguin","clarinet","canoe","clamp","fly","castle","boulder",
                         "group","aquarium","vine","flamingo","carpenter","pastry",
                         "wrench","cactus","empty","vest","tusk","arrow","interior")

associations <- read.csv("associativeStrength_byTrial.csv") %>% 
  select(-X) %>% 
  rename(item=TargetWord) %>% 
  filter(item %in% posAssn_highT_words) %>% 
  arrange(desc(numNonZero))
DT::datatable(associations)

These words are from a range of blocks (2-13). ‘Interior’ probably shouldn’t count here.

How does PPVT performance predict W-J Science?

child_vars_DAS <- select(child_item_vars_DAS, subjID, totalAttempted, totalCorrect, propCorrect, Age_years,
                         propPosAssn_correct, propNoAssn_correct, num_incorrect_choseAssociate,
                         propIncorrect_choseAssoc, WJNumCorrect, PPVTRawScore, PPVTStandardScore, PPVTPercentile) %>% 
  distinct()

summary(lm(WJNumCorrect ~ Age_years + PPVTPercentile + propPosAssn_correct, child_vars_DAS))
## 
## Call:
## lm(formula = WJNumCorrect ~ Age_years + PPVTPercentile + propPosAssn_correct, 
##     data = child_vars_DAS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2905 -1.8229  0.0659  1.4067  7.6815 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -1.85078    2.94400  -0.629  0.53200    
## Age_years            1.71710    0.19392   8.855    2e-12 ***
## PPVTPercentile       0.05563    0.01731   3.214  0.00212 ** 
## propPosAssn_correct  2.49732    3.47633   0.718  0.47536    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.655 on 59 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.6243, Adjusted R-squared:  0.6052 
## F-statistic: 32.69 on 3 and 59 DF,  p-value: 1.416e-12
summary(lm(WJNumCorrect ~ Age_years + PPVTPercentile + propIncorrect_choseAssoc, child_vars_DAS))
## 
## Call:
## lm(formula = WJNumCorrect ~ Age_years + PPVTPercentile + propIncorrect_choseAssoc, 
##     data = child_vars_DAS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.0766 -1.9960  0.0346  1.4919  6.7947 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               0.86849    2.48092   0.350   0.7277    
## Age_years                 1.63242    0.21273   7.674 4.19e-10 ***
## PPVTPercentile            0.05088    0.02040   2.494   0.0159 *  
## propIncorrect_choseAssoc  3.27901    6.88471   0.476   0.6359    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.683 on 52 degrees of freedom
##   (11 observations deleted due to missingness)
## Multiple R-squared:  0.5494, Adjusted R-squared:  0.5234 
## F-statistic: 21.14 on 3 and 52 DF,  p-value: 4.367e-09