Introductory Instructions

As usual, we’d like you to use R (studio, markdown, etc.) to complete this homework. Please submit all code and your relevant output as you have been doing. For those who enjoy working on RMarkdown (or dislike it but want to learn), this assignment has some bare-bones code for you to work with. Feel free to write an entirely novel code file if you’d prefer.

This week, you’ll once again be working with the nursery dataset.

knitr::opts_knit$set(root.dir = "C:/Users/Spencer/Desktop/R Directory")
source('http://psych.colorado.edu/~jclab/R/mcSummaryLm.R')
library(lmSupport)

## Warning: package 'lmSupport' was built under R version 3.6.3

## Registered S3 methods overwritten by 'lme4':
##   method                          from
##   cooks.distance.influence.merMod car 
##   influence.merMod                car 
##   dfbeta.influence.merMod         car 
##   dfbetas.influence.merMod        car

library(tinytex)
library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=70),tidy=TRUE)

Question 1

You may have noticed another variable in the nursery.csv dataset. This new variable, kinder, is each child’s kindergarten score on the same Peabody test. Using these data, redo an ANOVA similar to the one you did in homework 13, examining the effects of nursery school versus no nursery school and of parents’ college education on first grade Peabody scores. (As before, you will need to create contrast codes that code the nursery school variable, parents’ education, and their interaction–this time, plan your contrasts so you first compare children of parents with no college education to those with one or two college educated parents, and then compare whether having one or two college educated parents is better. Provide a clear set of codes either as R code, in a contrast code table, or both.)

myNursery = read.csv("nursery.csv")
source("http://psych.colorado.edu/~jclab/R/mcSummaryLm.R")

# Define contrast codes: nursCode: -1/2 if 'no', +1/2 if 'yes' - tests
# for main effect of nursery on PEABODY scores collegeNone: 2/3 if
# none, -1/3 if one or both - tests for significant mean difference
# between PEABODY scores of no-college kids vs. one/both-college kids
# collegeOneorTwo: -1/2 if one, +1/2 if both - tests for significant
# mean difference between PEABODY scores of one-college vs. two-college
# kids

myNursery$nursCode = -1/2 * (myNursery$NURSERY == "no") + 1/2 * (myNursery$NURSERY == 
    "yes")
myNursery$collegeNone = 2/3 * (myNursery$COLLEGE == "none") - 1/3 * (myNursery$COLLEGE == 
    "one") - 1/3 * (myNursery$COLLEGE == "both")
myNursery$collegeOneorTwo = 0 * (myNursery$COLLEGE == "none") + 1/2 * (myNursery$COLLEGE == 
    "one") - 1/2 * (myNursery$COLLEGE == "both")

# nursANOVA = aov(myNursery$PEABODY ~
# myNursery$nursCode*(myNursery$collegeNone +
# myNursery$collegeOneorTwo))
# print(model.tables(nursANOVA,'means'),digits=4) #recover means of
# each group and subgroup

nursModel = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone + 
    myNursery$collegeOneorTwo))
mcSummary(nursModel)  #gives results for 1-df tests

## Loading required package: car

## Warning: package 'car' was built under R version 3.6.3

## Loading required package: carData

## Warning: package 'carData' was built under R version 3.6.3

## lm(formula = myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone + 
##     myNursery$collegeOneorTwo))
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq     F     p
## Model       998.205  5 199.641 0.224 2.535 0.042
## Error      3464.775 44  78.745                  
## Corr Total 4462.980 49  91.081                  
## 
##   RMSE AdjEtaSq
##  8.874    0.135
## 
## Coefficients
##                                                  Est StErr      t
## (Intercept)                                   75.932 1.626 46.706
## myNursery$nursCode                             8.114 3.251  2.495
## myNursery$collegeNone                         -5.929 3.004 -1.974
## myNursery$collegeOneorTwo                     -5.683 4.437 -1.281
## myNursery$nursCode:myNursery$collegeNone       2.392 6.008  0.398
## myNursery$nursCode:myNursery$collegeOneorTwo -10.367 8.874 -1.168
##                                                  SSR(3) EtaSq   tol
## (Intercept)                                  171776.910 0.980    NA
## myNursery$nursCode                              490.359 0.124 0.621
## myNursery$collegeNone                           306.807 0.081 0.727
## myNursery$collegeOneorTwo                       129.201 0.036 0.727
## myNursery$nursCode:myNursery$collegeNone         12.480 0.004 0.715
## myNursery$nursCode:myNursery$collegeOneorTwo    107.468 0.030 0.539
##                                               CI_2.5 CI_97.5     p
## (Intercept)                                   72.655  79.208 0.000
## myNursery$nursCode                             1.561  14.667 0.016
## myNursery$collegeNone                        -11.983   0.125 0.055
## myNursery$collegeOneorTwo                    -14.625   3.259 0.207
## myNursery$nursCode:myNursery$collegeNone      -9.716  14.499 0.692
## myNursery$nursCode:myNursery$collegeOneorTwo -28.251   7.517 0.249

fullModelA = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone + 
    myNursery$collegeOneorTwo))
college_2DF_c = lm(myNursery$PEABODY ~ myNursery$nursCode + myNursery$nursCode:myNursery$collegeNone + 
    myNursery$nursCode:myNursery$collegeOneorTwo)
interaction_2DF_c = lm(myNursery$PEABODY ~ myNursery$nursCode + myNursery$collegeNone + 
    myNursery$collegeOneorTwo)


modelCompare(ModelA = fullModelA, ModelC = college_2DF_c)  #conduct college 2-df test

## SSE (Compact) =  3781.51 
## SSE (Augmented) =  3464.775 
## Delta R-Squared =  0.0709694 
## Partial Eta-Squared (PRE) =  0.08375887 
## F(2,44) = 2.011146, p = 0.1459537

modelCompare(ModelA = fullModelA, ModelC = interaction_2DF_c)

## SSE (Compact) =  3670.71 
## SSE (Augmented) =  3464.775 
## Delta R-Squared =  0.04614294 
## Partial Eta-Squared (PRE) =  0.05610223 
## F(2,44) = 1.307609, p = 0.280769

Question 2

Now, perform an analysis of covariance, controlling for kindergarten Peabody scores. In other words, we now want to know whether nursery school and parents’ education are useful in predicting first grade scores when we control for kids’ kindergarten scores. If they are, then that would mean that those effects somehow get activated during the first grade. If they are not, then that would suggest that the effects of nursery school and parents’ education have already been captured by kindergarten performance.

myNursery = read.csv("nursery.csv")
source("http://psych.colorado.edu/~jclab/R/mcSummaryLm.R")

# Define contrast codes: nursCode: -1/2 if 'no', +1/2 if 'yes' - tests
# for main effect of nursery on PEABODY scores collegeNone: 2/3 if
# none, -1/3 if one or both - tests for significant mean difference
# between PEABODY scores of no-college kids vs. one/both-college kids
# collegeOneorTwo: -1/2 if one, +1/2 if both - tests for significant
# mean difference between PEABODY scores of one-college vs. two-college
# kids

myNursery$nursCode = -1/2 * (myNursery$NURSERY == "no") + 1/2 * (myNursery$NURSERY == 
    "yes")
myNursery$collegeNone = 2/3 * (myNursery$COLLEGE == "none") - 1/3 * (myNursery$COLLEGE == 
    "one" | myNursery$COLLEGE == "both")
myNursery$collegeOneorTwo = 0 * (myNursery$COLLEGE == "none") + 1/2 * (myNursery$COLLEGE == 
    "one") - 1/2 * (myNursery$COLLEGE == "both")

# nursANOVA = aov(myNursery$PEABODY ~
# myNursery$nursCode*(myNursery$collegeNone +
# myNursery$collegeOneorTwo) + myNursery$KINDER)
# print(model.tables(nursANOVA,'means'),digits=4) #recover means of
# each group and subgroup

nursModel = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone + 
    myNursery$collegeOneorTwo) + myNursery$KINDER)
mcSummary(nursModel)  #gives results for 1-df tests

## lm(formula = myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone + 
##     myNursery$collegeOneorTwo) + myNursery$KINDER)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model      3578.325  6 596.388 0.802 28.988 0
## Error       884.655 43  20.573               
## Corr Total 4462.980 49  91.081               
## 
##   RMSE AdjEtaSq
##  4.536    0.774
## 
## Coefficients
##                                                 Est StErr      t   SSR(3)
## (Intercept)                                  14.158 5.578  2.538  132.517
## myNursery$nursCode                            3.316 1.716  1.932   76.792
## myNursery$collegeNone                        -2.331 1.569 -1.486   45.418
## myNursery$collegeOneorTwo                    -0.286 2.319 -0.123    0.312
## myNursery$KINDER                              0.900 0.080 11.199 2580.120
## myNursery$nursCode:myNursery$collegeNone     -1.207 3.088 -0.391    3.143
## myNursery$nursCode:myNursery$collegeOneorTwo -3.170 4.581 -0.692    9.849
##                                              EtaSq   tol  CI_2.5 CI_97.5
## (Intercept)                                  0.130    NA   2.908  25.408
## myNursery$nursCode                           0.080 0.582  -0.145   6.777
## myNursery$collegeNone                        0.049 0.697  -5.494   0.833
## myNursery$collegeOneorTwo                    0.000 0.696  -4.961   4.390
## myNursery$KINDER                             0.745 0.842   0.738   1.062
## myNursery$nursCode:myNursery$collegeNone     0.004 0.707  -7.433   5.020
## myNursery$nursCode:myNursery$collegeOneorTwo 0.011 0.528 -12.408   6.069
##                                                  p
## (Intercept)                                  0.015
## myNursery$nursCode                           0.060
## myNursery$collegeNone                        0.145
## myNursery$collegeOneorTwo                    0.903
## myNursery$KINDER                             0.000
## myNursery$nursCode:myNursery$collegeNone     0.698
## myNursery$nursCode:myNursery$collegeOneorTwo 0.493

fullModelA = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone + 
    myNursery$collegeOneorTwo) + myNursery$KINDER)
college_2DF_cCov = lm(myNursery$PEABODY ~ myNursery$nursCode + myNursery$nursCode:myNursery$collegeNone + 
    myNursery$nursCode:myNursery$collegeOneorTwo + myNursery$KINDER)

interaction_2DF_cCov = lm(myNursery$PEABODY ~ myNursery$nursCode + myNursery$collegeNone + 
    myNursery$collegeOneorTwo + myNursery$KINDER)

modelCompare(ModelA = fullModelA, ModelC = college_2DF_cCov)  #conduct college 2-df test

## SSE (Compact) =  941.542 
## SSE (Augmented) =  884.6545 
## Delta R-Squared =  0.01274653 
## Partial Eta-Squared (PRE) =  0.06041949 
## F(2,43) = 1.382552, p = 0.2618671

modelCompare(ModelA = fullModelA, ModelC = interaction_2DF_cCov)  #conduct interaction 2-df test

## SSE (Compact) =  894.617 
## SSE (Augmented) =  884.6545 
## Delta R-Squared =  0.002232249 
## Partial Eta-Squared (PRE) =  0.01113603 
## F(2,43) = 0.2421209, p = 0.7860242

Question 3

Present source tables for the two analyses (including the 2-df tests of education and the interaction) and discuss the differences between the ANCOVA results and those you reached in the ANOVA analysis when kindergarten scores were not controlled.

Without KINDER covariate:

	b	df	SS	MS	F*	p	PRE
Model	NA	5	998.2	199.6	2.535	p<.05	0.223
Nursery
nursCode	8.114	1	490.33	490.3	6.22	p<.05	0.124
College	NA	2	317.35	158.36	2.01	p>.05	0.083
collegeNone	-5.92	1	306.8	306.8	3.80	p>.055	0.081
collegeOneorTwo	-5.683	1	129.2	129.2	1.64	p>.05	0.036
Interaction	NA	2	205.93	102.9	1.30	p>.05	0.056
nurs*None	2.39	1	12.48	12.48	.158	p>.05	0.004
nurs*OneorTwo	-10.3	1	107.4	107.4	1.36	p>.05	0.030
Error	NA	44	3464.775	78.74	NA	NA	NA
Total	NA	49	4462.98	NA	NA	NA	NA

With KINDER covariate:

	b	df	SS	MS	F*	p	PRE
Model	NA	6	3578.3	596.3	28.9	p<<.05	0.802
Nursery
nursCode	3.316	1	76.79	76.79	3.73	p>.05	0.08
College	NA	2	56.88	28.44	1.38	p>.05	0.060
collegeNone	-2.33	1	45.41	45.41	2.20	p>.05	0.049
collegeOneorTwo	-.286	1	0.312	0.312	.0151	p>.05	0.00
Interaction	NA	2	9.95	4.97	.242	p>.05	.011
nurs*None	-1.207	1	3.143	3.143	.152	p>.05	0.004
nurs*OneorTwo	-3.17	1	9.849	9.849	.478	p>.05	0.011
KINDER	.900	1	2580.1	2580.1	125.4	p<<.05	0.745
Error	NA	43	884.6	20.57	NA	NA	NA
Total	NA	49	4462.98	NA	NA	NA	NA

As can be seen from the tables above, inclusion of the KINDER covariate caused several important changes to the model results. For one, the effects of nursery were rendered non-significant, suggesting that the reason this variable significantly predicted PEABODY scores in the ANOVA model was due to its covariance with KINDER scores. Moreover, although non-significant, the beta estimates of the other predictors all decreased in the ANCOVA model, implying their their effects are inflated for similar reasons.

Question 4

Derive the adjusted means and discuss the differences between these and the raw means. Interpret the slope of the nursery school contrast in the context of the original ANOVA model and now in the context of the ANCOVA model.

# compute subgroup KINDER means
k = 1
thisMean_KINDER = vector(mode = "numeric", length = 6)
for (thisNursery in c("no", "yes")) {
    for (thisCollege in c("both", "none", "one")) {
        thisMean_KINDER[k] = mean(myNursery$KINDER[myNursery$COLLEGE == 
            thisCollege & myNursery$NURSERY == thisNursery])
        k = k + 1
    }
}
grandMean_KINDER = mean(thisMean_KINDER)  #compute KINDER grand mean of means

# compute subgroup PEABODY means
k = 1
thisMean_PEA = vector(mode = "numeric", length = 6)
for (thisNursery in c("no", "yes")) {
    for (thisCollege in c("both", "none", "one")) {
        thisMean_PEA[k] = mean(myNursery$PEABODY[myNursery$COLLEGE == thisCollege & 
            myNursery$NURSERY == thisNursery])
        k = k + 1
    }
}
grandMean_PEA = mean(thisMean_PEA)  #compute PEABODY grand mean of means

adjustedMeans = thisMean_PEA - 0.9 * (thisMean_KINDER - grandMean_KINDER)
# compute adjusted PEABODY means

print("adjusted means")

## [1] "adjusted means"

adjustedMeans

## [1] 74.20000 73.12500 75.50000 79.50000 75.63333 77.63333

print("unadjusted means")

## [1] "unadjusted means"

thisMean_PEA

## [1] 74.50000 67.12500 74.00000 87.00000 76.83333 76.13333

# as a check that the means have been adjusted correctly...
mean(adjustedMeans[4:6]) - mean(adjustedMeans[1:3])  #recovers the nursCode ANCOVA beta

## [1] 3.313889

mean(adjustedMeans[c(3, 6)]) - mean(adjustedMeans[c(1, 4)])  #recovers the collegeOneorTwo ANCOVA beta

## [1] -0.2833333

The unadjusted and adjusted means for the six PEABODY subgroups are different in that the latter control for the confounding effects of a child’s KINDER score. That is, KINDER scores, which explain much of the variation in later PEABODY results, may not be uniformly distributed across the six groups. If a particular subgroup attains a higher score, then, the effects of that group’s defining characteristics (here parents’ degree status and child’s nursery school attendance) will be inflated in an ANOVA design. The adjusted means control for this effect by subtracting the term \(b_{KINDER}(\overline{W}_k-\overline{W})\) from the unadjusted mean. Here the unadjusted means are corrected to the extent that that group’s mean attains higher KINDER scores and to the extent that such scores matter in predicting PEABODY results.

As a result, the interpretation of the nursCode variable changes slightly. In the case of unadjusted means, this code registers the uncontrolled main effect of a child’s nursery school attendance on PEABODY scores; mathematically, the beta estimate for the nursCode variable is given by the difference in the marginal means of the “yes” and “no” nursery subgroups. That is, the nursCode beta is the average difference between the two nursery subgroups that may be due to the effects of nursery attendance or the correlated effects of KINDER scores. For the adjusted means of the ANCOVA design, the nursCode variable records the controlled main effect of nursery school attendance that cannot be explained by the correlated effects of KINDER scores.

Question 5

Test the homogeneity of regression assumption: do we need to worry that the effects of KINDER on PEABODY vary across the cells of our 3x2 design?

testHR_modelA = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone + 
    myNursery$collegeOneorTwo) + myNursery$KINDER + myNursery$KINDER * 
    (myNursery$collegeNone + myNursery$collegeOneorTwo + myNursery$nursCode) + 
    myNursery$KINDER * myNursery$collegeNone * myNursery$nursCode + myNursery$KINDER * 
    myNursery$collegeOneorTwo * myNursery$nursCode)

testHR_modelC = lm(myNursery$PEABODY ~ myNursery$nursCode * (myNursery$collegeNone + 
    myNursery$collegeOneorTwo) + myNursery$KINDER)

modelCompare(ModelA = testHR_modelA, ModelC = testHR_modelC)

## SSE (Compact) =  884.6545 
## SSE (Augmented) =  813.3027 
## Delta R-Squared =  0.0159875 
## Partial Eta-Squared (PRE) =  0.08065508 
## F(5,38) = 0.6667558, p = 0.6509399

No, we do not need to worry about the homogeneity of regression assumption because a model comparison involving the various KINDER-by-predictor two- and three-way interaction terms fails to yields a significant result.

HW 14