Children’s vocabulary as a function of Caregiver statement type

Mother-child interactuions were studied and the mother’s comments were classified as either statements or as questions. The child’s vocubulary was evaluated and scored. A camparison between the mother’s statement type and the child’s vocab score undertaken

foi <- "/Users/mqbxgjk2/Desktop/Lily_Project/data.txt"
dat <- read.table(foi, sep = "\t", header = TRUE)
head(dat)
##   ID MOT_questions MOT_statements CHI_VoCD
## 1 45            53             31    11.95
## 2 47            38             31    39.37
## 3 49            81             44    55.53
## 4 55            39             57    29.64
## 5 25            36             47    58.90
## 6 57            43             90    44.24

The columns in the data are pair identification number (‘ID’), statements by the mother posed as questions (‘MOT_questions’) and those posed as statements (‘MOT_statements’), the childs vocab score is contained in the column ‘CHI_VoCD’.

Next we examine the distribution of the data. Below are some box-and-whisker plots or Boxplots. These are interpreted as follows: the box represents the interquartile range (IQR), the median is indicated by a heavy solid line within the box, and the whiskers extend to the furthest data point within 1.5 times the IQR from the quartiles, but no further. Points outside the whiskers are identified as outliers.

titles <- NULL


par(las =1, bty ='n', mfrow=c(3,1))
for(ii in 2:4){
    tmp <- gsub("_", " ", colnames(dat)[ii])
    titles <- c(titles, tmp)
    boxplot(dat[,ii], horizontal=TRUE, xlab= tmp, width = 1.5)
    print(summary(dat[,ii]))
    }
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    18.0    31.0    43.0    51.7    70.0    99.0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   25.75   44.50   44.70   52.50  125.00

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.95   33.90   44.63   44.90   57.05   78.19

First we compare how questions affect vocab, and then statements. We see that

par(las =1, bty ='n', mfrow=c(1,2))

plot(dat$MOT_questions, dat$CHI_VoCD, xlab = titles[1], ylab = titles[3], xlim=c(0, 130))
mod1 <- lm(dat$CHI_VoCD~dat$MOT_questions)
abline(mod1)
print(titles[1])
## [1] "MOT questions"
summary(mod1)
## 
## Call:
## lm(formula = dat$CHI_VoCD ~ dat$MOT_questions)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -33.051 -11.369  -1.565   9.762  33.962 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       40.90201    9.21555   4.438 0.000317 ***
## dat$MOT_questions  0.07734    0.16066   0.481 0.636036    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.85 on 18 degrees of freedom
## Multiple R-squared:  0.01271,    Adjusted R-squared:  -0.04214 
## F-statistic: 0.2317 on 1 and 18 DF,  p-value: 0.636
plot(dat$MOT_statements, dat$CHI_VoCD,xlab = titles[2], ylab = titles[3], xlim=c(0, 130))
mod2 <- lm(dat$CHI_VoCD~dat$MOT_statements)
abline(mod2)

print(titles[2])
## [1] "MOT statements"
summary(mod2)
## 
## Call:
## lm(formula = dat$CHI_VoCD ~ dat$MOT_statements)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.195  -9.930  -2.560   6.466  38.257 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         35.9114     7.2463   4.956 0.000102 ***
## dat$MOT_statements   0.2011     0.1380   1.457 0.162373    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.99 on 18 degrees of freedom
## Multiple R-squared:  0.1055, Adjusted R-squared:  0.05578 
## F-statistic: 2.123 on 1 and 18 DF,  p-value: 0.1624

Then we compare

par(las =1, bty ='n', mfrow=c(1,1))
plot(dat$MOT_statements, dat$MOT_questions,xlab = titles[2], ylab = titles[1],xlim=c(0, 130),ylim=c(0, 130))
mod3 <- lm(dat$MOT_questions~dat$MOT_statements)
summary(mod3)
## 
## Call:
## lm(formula = dat$MOT_questions ~ dat$MOT_statements)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.164 -16.664  -8.089  17.175  47.193 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)   
## (Intercept)         35.7408    10.2578   3.484  0.00265 **
## dat$MOT_statements   0.3570     0.1954   1.827  0.08430 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24.06 on 18 degrees of freedom
## Multiple R-squared:  0.1565, Adjusted R-squared:  0.1096 
## F-statistic: 3.339 on 1 and 18 DF,  p-value: 0.0843
abline(mod3)

Then we carry out a linear regression of both predictors

mod <- lm(dat[,4] ~dat$MOT_statements+dat$MOT_questions)
summary(mod)
## 
## Call:
## lm(formula = dat[, 4] ~ dat$MOT_statements + dat$MOT_questions)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.116 -10.216  -2.103   6.830  38.258 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)   
## (Intercept)        36.36835    9.64697   3.770  0.00153 **
## dat$MOT_statements  0.20566    0.15462   1.330  0.20105   
## dat$MOT_questions  -0.01278    0.17130  -0.075  0.94138   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.48 on 17 degrees of freedom
## Multiple R-squared:  0.1058, Adjusted R-squared:  0.000569 
## F-statistic: 1.005 on 2 and 17 DF,  p-value: 0.3866

Correlations

Interpretation of Correlation Coefficients from cor.test in R

  • Strong correlation: \(|r| \geq 0.7\)
    • Indicates a strong association where the variables move together either in the same direction (positive correlation) or in opposite directions (negative correlation) consistently.
  • Moderate correlation: \(0.3 \leq |r| < 0.7\)
    • Suggests a moderate level of association, where the variables have a noticeable relationship, but there are other factors also influencing the outcomes.
  • Weak correlation: \(|r| < 0.3\)
    • Indicates a weak association, suggesting that there is little linear relationship between the variables.

Additional Considerations

  • These thresholds are not rigid and can vary slightly depending on the field of study or specific context.
  • It’s important to consider the significance of the correlation coefficient, usually indicated by a p-value in the output of cor.test, which tells you whether the observed correlation is statistically significant.
cor.test(dat$MOT_questions, dat$CHI_VoCD)
## 
##  Pearson's product-moment correlation
## 
## data:  dat$MOT_questions and dat$CHI_VoCD
## t = 0.4814, df = 18, p-value = 0.636
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3470951  0.5288772
## sample estimates:
##       cor 
## 0.1127425
print("WEAK")
## [1] "WEAK"
cor.test(dat$MOT_statements, dat$CHI_VoCD)
## 
##  Pearson's product-moment correlation
## 
## data:  dat$MOT_statements and dat$CHI_VoCD
## t = 1.4569, df = 18, p-value = 0.1624
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1375077  0.6708779
## sample estimates:
##       cor 
## 0.3247757
print("MODERATE to WEAK")
## [1] "MODERATE to WEAK"
cor.test(dat$MOT_questions, dat$MOT_statements)
## 
##  Pearson's product-moment correlation
## 
## data:  dat$MOT_questions and dat$MOT_statements
## t = 1.8272, df = 18, p-value = 0.0843
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.05694214  0.71322546
## sample estimates:
##       cor 
## 0.3955456
print("MODERATE")
## [1] "MODERATE"