As usual, we’d like you to use R (studio, markdown, etc.) to complete this homework. Please submit all code and your relevant output as you have been doing. For those who enjoy working on RMarkdown (or dislike it but want to learn), this assignment has some bare-bones code for you to work with. Feel free to write an entirely novel code file if you’d prefer.
This week, you’ll once again be working with the nursery dataset.
knitr::opts_knit$set(root.dir = "C:/Users/Spencer/Desktop/R Directory")
source('http://psych.colorado.edu/~jclab/R/mcSummaryLm.R')
library(lmSupport)
## Warning: package 'lmSupport' was built under R version 3.6.3
## Registered S3 methods overwritten by 'lme4':
## method from
## cooks.distance.influence.merMod car
## influence.merMod car
## dfbeta.influence.merMod car
## dfbetas.influence.merMod car
library(tinytex)
library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=70),tidy=TRUE)
You may have noticed another variable in the nursery.csv dataset. This new variable, kinder, is each child’s kindergarten score on the same Peabody test. Using these data, redo an ANOVA similar to the one you did in homework 13, examining the effects of nursery school versus no nursery school and of parents’ college education on first grade Peabody scores. (As before, you will need to create contrast codes that code the nursery school variable, parents’ education, and their interaction–this time, plan your contrasts so you first compare children of parents with no college education to those with one or two college educated parents, and then compare whether having one or two college educated parents is better. Provide a clear set of codes either as R code, in a contrast code table, or both.)
myNursery = read.csv("nursery.csv")
source("http://psych.colorado.edu/~jclab/R/mcSummaryLm.R")
# Define contrast codes: nursCode: -1/2 if 'no', +1/2 if 'yes' - tests
# for main effect of nursery on PEABODY scores collegeNone: 2/3 if
# none, -1/3 if one or both - tests for significant mean difference
# between PEABODY scores of no-college kids vs. one/both-college kids
# collegeOneorTwo: -1/2 if one, +1/2 if both - tests for significant
# mean difference between PEABODY scores of one-college vs. two-college
# kids
myNursery$nursCode = -1/2 * (myNursery$NURSERY == "no") + 1/2 * (myNursery$NURSERY ==
"yes")
myNursery$collegeNone = 2/3 * (myNursery$COLLEGE == "none") - 1/3 * (myNursery$COLLEGE ==
"one") - 1/3 * (myNursery$COLLEGE == "both")
myNursery$collegeOneorTwo = 0 * (myNursery$COLLEGE == "none") + 1/2 * (myNursery$COLLEGE ==
"one") - 1/2 * (myNursery$COLLEGE == "both")
# nursANOVA = aov(myNursery$PEABODY ~
# myNursery$nursCode*(myNursery$collegeNone +
# myNursery$collegeOneorTwo))
# print(model.tables(nursANOVA,'means'),digits=4) #recover means of
# each group and subgroup
nursModel = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone +
myNursery$collegeOneorTwo))
mcSummary(nursModel) #gives results for 1-df tests
## Loading required package: car
## Warning: package 'car' was built under R version 3.6.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.6.3
## lm(formula = myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone +
## myNursery$collegeOneorTwo))
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 998.205 5 199.641 0.224 2.535 0.042
## Error 3464.775 44 78.745
## Corr Total 4462.980 49 91.081
##
## RMSE AdjEtaSq
## 8.874 0.135
##
## Coefficients
## Est StErr t
## (Intercept) 75.932 1.626 46.706
## myNursery$nursCode 8.114 3.251 2.495
## myNursery$collegeNone -5.929 3.004 -1.974
## myNursery$collegeOneorTwo -5.683 4.437 -1.281
## myNursery$nursCode:myNursery$collegeNone 2.392 6.008 0.398
## myNursery$nursCode:myNursery$collegeOneorTwo -10.367 8.874 -1.168
## SSR(3) EtaSq tol
## (Intercept) 171776.910 0.980 NA
## myNursery$nursCode 490.359 0.124 0.621
## myNursery$collegeNone 306.807 0.081 0.727
## myNursery$collegeOneorTwo 129.201 0.036 0.727
## myNursery$nursCode:myNursery$collegeNone 12.480 0.004 0.715
## myNursery$nursCode:myNursery$collegeOneorTwo 107.468 0.030 0.539
## CI_2.5 CI_97.5 p
## (Intercept) 72.655 79.208 0.000
## myNursery$nursCode 1.561 14.667 0.016
## myNursery$collegeNone -11.983 0.125 0.055
## myNursery$collegeOneorTwo -14.625 3.259 0.207
## myNursery$nursCode:myNursery$collegeNone -9.716 14.499 0.692
## myNursery$nursCode:myNursery$collegeOneorTwo -28.251 7.517 0.249
fullModelA = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone +
myNursery$collegeOneorTwo))
college_2DF_c = lm(myNursery$PEABODY ~ myNursery$nursCode + myNursery$nursCode:myNursery$collegeNone +
myNursery$nursCode:myNursery$collegeOneorTwo)
interaction_2DF_c = lm(myNursery$PEABODY ~ myNursery$nursCode + myNursery$collegeNone +
myNursery$collegeOneorTwo)
modelCompare(ModelA = fullModelA, ModelC = college_2DF_c) #conduct college 2-df test
## SSE (Compact) = 3781.51
## SSE (Augmented) = 3464.775
## Delta R-Squared = 0.0709694
## Partial Eta-Squared (PRE) = 0.08375887
## F(2,44) = 2.011146, p = 0.1459537
modelCompare(ModelA = fullModelA, ModelC = interaction_2DF_c)
## SSE (Compact) = 3670.71
## SSE (Augmented) = 3464.775
## Delta R-Squared = 0.04614294
## Partial Eta-Squared (PRE) = 0.05610223
## F(2,44) = 1.307609, p = 0.280769
Now, perform an analysis of covariance, controlling for kindergarten Peabody scores. In other words, we now want to know whether nursery school and parents’ education are useful in predicting first grade scores when we control for kids’ kindergarten scores. If they are, then that would mean that those effects somehow get activated during the first grade. If they are not, then that would suggest that the effects of nursery school and parents’ education have already been captured by kindergarten performance.
myNursery = read.csv("nursery.csv")
source("http://psych.colorado.edu/~jclab/R/mcSummaryLm.R")
# Define contrast codes: nursCode: -1/2 if 'no', +1/2 if 'yes' - tests
# for main effect of nursery on PEABODY scores collegeNone: 2/3 if
# none, -1/3 if one or both - tests for significant mean difference
# between PEABODY scores of no-college kids vs. one/both-college kids
# collegeOneorTwo: -1/2 if one, +1/2 if both - tests for significant
# mean difference between PEABODY scores of one-college vs. two-college
# kids
myNursery$nursCode = -1/2 * (myNursery$NURSERY == "no") + 1/2 * (myNursery$NURSERY ==
"yes")
myNursery$collegeNone = 2/3 * (myNursery$COLLEGE == "none") - 1/3 * (myNursery$COLLEGE ==
"one" | myNursery$COLLEGE == "both")
myNursery$collegeOneorTwo = 0 * (myNursery$COLLEGE == "none") + 1/2 * (myNursery$COLLEGE ==
"one") - 1/2 * (myNursery$COLLEGE == "both")
# nursANOVA = aov(myNursery$PEABODY ~
# myNursery$nursCode*(myNursery$collegeNone +
# myNursery$collegeOneorTwo) + myNursery$KINDER)
# print(model.tables(nursANOVA,'means'),digits=4) #recover means of
# each group and subgroup
nursModel = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone +
myNursery$collegeOneorTwo) + myNursery$KINDER)
mcSummary(nursModel) #gives results for 1-df tests
## lm(formula = myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone +
## myNursery$collegeOneorTwo) + myNursery$KINDER)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 3578.325 6 596.388 0.802 28.988 0
## Error 884.655 43 20.573
## Corr Total 4462.980 49 91.081
##
## RMSE AdjEtaSq
## 4.536 0.774
##
## Coefficients
## Est StErr t SSR(3)
## (Intercept) 14.158 5.578 2.538 132.517
## myNursery$nursCode 3.316 1.716 1.932 76.792
## myNursery$collegeNone -2.331 1.569 -1.486 45.418
## myNursery$collegeOneorTwo -0.286 2.319 -0.123 0.312
## myNursery$KINDER 0.900 0.080 11.199 2580.120
## myNursery$nursCode:myNursery$collegeNone -1.207 3.088 -0.391 3.143
## myNursery$nursCode:myNursery$collegeOneorTwo -3.170 4.581 -0.692 9.849
## EtaSq tol CI_2.5 CI_97.5
## (Intercept) 0.130 NA 2.908 25.408
## myNursery$nursCode 0.080 0.582 -0.145 6.777
## myNursery$collegeNone 0.049 0.697 -5.494 0.833
## myNursery$collegeOneorTwo 0.000 0.696 -4.961 4.390
## myNursery$KINDER 0.745 0.842 0.738 1.062
## myNursery$nursCode:myNursery$collegeNone 0.004 0.707 -7.433 5.020
## myNursery$nursCode:myNursery$collegeOneorTwo 0.011 0.528 -12.408 6.069
## p
## (Intercept) 0.015
## myNursery$nursCode 0.060
## myNursery$collegeNone 0.145
## myNursery$collegeOneorTwo 0.903
## myNursery$KINDER 0.000
## myNursery$nursCode:myNursery$collegeNone 0.698
## myNursery$nursCode:myNursery$collegeOneorTwo 0.493
fullModelA = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone +
myNursery$collegeOneorTwo) + myNursery$KINDER)
college_2DF_cCov = lm(myNursery$PEABODY ~ myNursery$nursCode + myNursery$nursCode:myNursery$collegeNone +
myNursery$nursCode:myNursery$collegeOneorTwo + myNursery$KINDER)
interaction_2DF_cCov = lm(myNursery$PEABODY ~ myNursery$nursCode + myNursery$collegeNone +
myNursery$collegeOneorTwo + myNursery$KINDER)
modelCompare(ModelA = fullModelA, ModelC = college_2DF_cCov) #conduct college 2-df test
## SSE (Compact) = 941.542
## SSE (Augmented) = 884.6545
## Delta R-Squared = 0.01274653
## Partial Eta-Squared (PRE) = 0.06041949
## F(2,43) = 1.382552, p = 0.2618671
modelCompare(ModelA = fullModelA, ModelC = interaction_2DF_cCov) #conduct interaction 2-df test
## SSE (Compact) = 894.617
## SSE (Augmented) = 884.6545
## Delta R-Squared = 0.002232249
## Partial Eta-Squared (PRE) = 0.01113603
## F(2,43) = 0.2421209, p = 0.7860242
Present source tables for the two analyses (including the 2-df tests of education and the interaction) and discuss the differences between the ANCOVA results and those you reached in the ANOVA analysis when kindergarten scores were not controlled.
Without KINDER covariate:
| b | df | SS | MS | F* | p | PRE | |
|---|---|---|---|---|---|---|---|
| Model | NA | 5 | 998.2 | 199.6 | 2.535 | p<.05 | 0.223 |
| Nursery | |||||||
| nursCode | 8.114 | 1 | 490.33 | 490.3 | 6.22 | p<.05 | 0.124 |
| College | NA | 2 | 317.35 | 158.36 | 2.01 | p>.05 | 0.083 |
| collegeNone | -5.92 | 1 | 306.8 | 306.8 | 3.80 | p>.055 | 0.081 |
| collegeOneorTwo | -5.683 | 1 | 129.2 | 129.2 | 1.64 | p>.05 | 0.036 |
| Interaction | NA | 2 | 205.93 | 102.9 | 1.30 | p>.05 | 0.056 |
| nurs*None | 2.39 | 1 | 12.48 | 12.48 | .158 | p>.05 | 0.004 |
| nurs*OneorTwo | -10.3 | 1 | 107.4 | 107.4 | 1.36 | p>.05 | 0.030 |
| Error | NA | 44 | 3464.775 | 78.74 | NA | NA | NA |
| Total | NA | 49 | 4462.98 | NA | NA | NA | NA |
With KINDER covariate:
| b | df | SS | MS | F* | p | PRE | |
|---|---|---|---|---|---|---|---|
| Model | NA | 6 | 3578.3 | 596.3 | 28.9 | p<<.05 | 0.802 |
| Nursery | |||||||
| nursCode | 3.316 | 1 | 76.79 | 76.79 | 3.73 | p>.05 | 0.08 |
| College | NA | 2 | 56.88 | 28.44 | 1.38 | p>.05 | 0.060 |
| collegeNone | -2.33 | 1 | 45.41 | 45.41 | 2.20 | p>.05 | 0.049 |
| collegeOneorTwo | -.286 | 1 | 0.312 | 0.312 | .0151 | p>.05 | 0.00 |
| Interaction | NA | 2 | 9.95 | 4.97 | .242 | p>.05 | .011 |
| nurs*None | -1.207 | 1 | 3.143 | 3.143 | .152 | p>.05 | 0.004 |
| nurs*OneorTwo | -3.17 | 1 | 9.849 | 9.849 | .478 | p>.05 | 0.011 |
| KINDER | .900 | 1 | 2580.1 | 2580.1 | 125.4 | p<<.05 | 0.745 |
| Error | NA | 43 | 884.6 | 20.57 | NA | NA | NA |
| Total | NA | 49 | 4462.98 | NA | NA | NA | NA |
As can be seen from the tables above, inclusion of the KINDER covariate caused several important changes to the model results. For one, the effects of nursery were rendered non-significant, suggesting that the reason this variable significantly predicted PEABODY scores in the ANOVA model was due to its covariance with KINDER scores. Moreover, although non-significant, the beta estimates of the other predictors all decreased in the ANCOVA model, implying their their effects are inflated for similar reasons.
Derive the adjusted means and discuss the differences between these and the raw means. Interpret the slope of the nursery school contrast in the context of the original ANOVA model and now in the context of the ANCOVA model.
# compute subgroup KINDER means
k = 1
thisMean_KINDER = vector(mode = "numeric", length = 6)
for (thisNursery in c("no", "yes")) {
for (thisCollege in c("both", "none", "one")) {
thisMean_KINDER[k] = mean(myNursery$KINDER[myNursery$COLLEGE ==
thisCollege & myNursery$NURSERY == thisNursery])
k = k + 1
}
}
grandMean_KINDER = mean(thisMean_KINDER) #compute KINDER grand mean of means
# compute subgroup PEABODY means
k = 1
thisMean_PEA = vector(mode = "numeric", length = 6)
for (thisNursery in c("no", "yes")) {
for (thisCollege in c("both", "none", "one")) {
thisMean_PEA[k] = mean(myNursery$PEABODY[myNursery$COLLEGE == thisCollege &
myNursery$NURSERY == thisNursery])
k = k + 1
}
}
grandMean_PEA = mean(thisMean_PEA) #compute PEABODY grand mean of means
adjustedMeans = thisMean_PEA - 0.9 * (thisMean_KINDER - grandMean_KINDER)
# compute adjusted PEABODY means
print("adjusted means")
## [1] "adjusted means"
adjustedMeans
## [1] 74.20000 73.12500 75.50000 79.50000 75.63333 77.63333
print("unadjusted means")
## [1] "unadjusted means"
thisMean_PEA
## [1] 74.50000 67.12500 74.00000 87.00000 76.83333 76.13333
# as a check that the means have been adjusted correctly...
mean(adjustedMeans[4:6]) - mean(adjustedMeans[1:3]) #recovers the nursCode ANCOVA beta
## [1] 3.313889
mean(adjustedMeans[c(3, 6)]) - mean(adjustedMeans[c(1, 4)]) #recovers the collegeOneorTwo ANCOVA beta
## [1] -0.2833333
The unadjusted and adjusted means for the six PEABODY subgroups are different in that the latter control for the confounding effects of a child’s KINDER score. That is, KINDER scores, which explain much of the variation in later PEABODY results, may not be uniformly distributed across the six groups. If a particular subgroup attains a higher score, then, the effects of that group’s defining characteristics (here parents’ degree status and child’s nursery school attendance) will be inflated in an ANOVA design. The adjusted means control for this effect by subtracting the term \(b_{KINDER}(\overline{W}_k-\overline{W})\) from the unadjusted mean. Here the unadjusted means are corrected to the extent that that group’s mean attains higher KINDER scores and to the extent that such scores matter in predicting PEABODY results.
As a result, the interpretation of the nursCode variable changes slightly. In the case of unadjusted means, this code registers the uncontrolled main effect of a child’s nursery school attendance on PEABODY scores; mathematically, the beta estimate for the nursCode variable is given by the difference in the marginal means of the “yes” and “no” nursery subgroups. That is, the nursCode beta is the average difference between the two nursery subgroups that may be due to the effects of nursery attendance or the correlated effects of KINDER scores. For the adjusted means of the ANCOVA design, the nursCode variable records the controlled main effect of nursery school attendance that cannot be explained by the correlated effects of KINDER scores.
Test the homogeneity of regression assumption: do we need to worry that the effects of KINDER on PEABODY vary across the cells of our 3x2 design?
testHR_modelA = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone +
myNursery$collegeOneorTwo) + myNursery$KINDER + myNursery$KINDER *
(myNursery$collegeNone + myNursery$collegeOneorTwo + myNursery$nursCode) +
myNursery$KINDER * myNursery$collegeNone * myNursery$nursCode + myNursery$KINDER *
myNursery$collegeOneorTwo * myNursery$nursCode)
testHR_modelC = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone +
myNursery$collegeOneorTwo) + myNursery$KINDER)
modelCompare(ModelA = testHR_modelA, ModelC = testHR_modelC)
## SSE (Compact) = 884.6545
## SSE (Augmented) = 813.3027
## Delta R-Squared = 0.0159875
## Partial Eta-Squared (PRE) = 0.08065508
## F(5,38) = 0.6667558, p = 0.6509399
No, we do not need to worry about the homogeneity of regression assumption because a model comparison involving the various KINDER-by-predictor two- and three-way interaction terms fails to yields a significant result.