Question 1
The dataset NURSERY contains scores for first-grade children on the Peabody Picture Vocabulary Test, which is designed primarily to measure a child’s receptive (hearing) vocabulary for Standard American English. We want to know whether children of college-educated parents begin school with larger vocabularies than their peers. The variable COLLEGE indicates whether “none”, “one”, or “both” of the child’s parents are college-educated. The variable PEABODY contains the child’s score. Make up contrast codes that make sense to you and conduct a one-way analysis of variance, asking whether the different COLLEGE groups have different mean PEABODY scores.
nursery <- read.csv("nursery.csv", header = T)
View(nursery)
# complete set of codes
nursery$x1 <- 0*(nursery$COLLEGE == 'none') -1/2 *(nursery$COLLEGE == 'one') + 1/2*(nursery$COLLEGE == 'both')
nursery$x2 <- .67*(nursery$COLLEGE == 'none') -1/3*(nursery$COLLEGE == 'one') -1/3*(nursery$COLLEGE == 'both')
m1 <- lm(PEABODY ~ nursery$x1 + nursery$x2, data=nursery)
mcSummary(m1)## lm(formula = PEABODY ~ nursery$x1 + nursery$x2, data = nursery)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 330.99 2 165.495 0.074 1.882 0.164
## Error 4131.99 47 87.915
## Corr Total 4462.98 49 91.081
##
## RMSE AdjEtaSq
## 9.376 0.035
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 76.750 1.684 45.564 182515.328 0.978 NA 73.361 80.138 0.000
## nursery$x1 6.720 4.593 1.463 188.160 0.044 0.758 -2.521 15.961 0.150
## nursery$x2 -5.671 3.099 -1.830 294.328 0.066 0.758 -11.906 0.564 0.074
In the context of this overall one-way analysis of variance, test the specific contrast that PEABODY scores are significantly different in the families where one or both parents went to college compared to families where neither did.
#See code chunk above
Finally, test the specific contrast that PEABODY scores are significantly different in the “none” families than the “both” families. (Be careful here - is this contrast orthogonal to the one you just tested in the above paragraph? Can they both be tested as simultaneous predictors in the same model A?)
nursery$x3 <- -1/2*(nursery$COLLEGE == 'none') + 0 *(nursery$COLLEGE == 'one') + 1/2*(nursery$COLLEGE == 'both')
nursery$x4 <- -1/3*(nursery$COLLEGE == 'none') + 2/3*(nursery$COLLEGE == 'one') - 1/3*(nursery$COLLEGE == 'both')
m2 <- lm(PEABODY ~ nursery$x3 + nursery$x4, data=nursery)
mcSummary(m2)## lm(formula = PEABODY ~ nursery$x3 + nursery$x4, data = nursery)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 330.99 2 165.495 0.074 1.882 0.164
## Error 4131.99 47 87.915
## Corr Total 4462.98 49 91.081
##
## RMSE AdjEtaSq
## 9.376 0.035
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 76.743 1.683 45.597 182778.803 0.978 NA 73.357 80.129 0.000
## nursery$x3 9.050 4.688 1.930 327.610 0.073 0.78 -0.381 18.481 0.060
## nursery$x4 -2.195 3.002 -0.731 47.005 0.011 0.78 -8.234 3.844 0.468
Present a write-up of your results, including the two-degree of freedom test of whether there are any mean differences and the results of the two specific contrasts discussed in the previous paragraphs.
Write Up: A one-way analysis of variance of child’s PEABODY vocabulary score from each of the three conditions reveals that their means are not significantly different, (F(2,47) = 1.88, Ƞ2_p = 0.074, p = 0.164). Tests of individual single-degree-of-freedom contrasts indicate that the mean in the one parent going to college condition (one) was not significantly different than the mean of the other condition of both parents going to college (both),(b = -2.20, F(2,47) = -0.534, Ƞ2_p = 0.011, p = 0.468, 95% CI: [-8.23, 3.84]). However, the means of neither parent going to college (none) and both parents going to college (both) did marginally differ, but not significantly, (b = 9.05, F(2,47) = -3.72, Ƞ2_p = 0.073, p = 0.060, 95% CI: [-0.381, 18.481]). Hence, parental college attendance status does nothing to reveal child’s PEABODY vocabulary score.
Question 2
Now, create a dummy coding scheme for COLLEGE such that you can run a single model that simultaneously tests whether there is a difference in PEABODY scores between “none” and “one” families, and whether there is a difference between “none” and “both” families. Run the model and briefly summarize the results.
nursery$one <- ifelse(nursery$COLLEGE == "one", 1, 0)
nursery$both <- ifelse(nursery$COLLEGE == "both", 1, 0)
m3 <- lm(PEABODY ~ one + both, data = nursery)
mcSummary(m3)## lm(formula = PEABODY ~ one + both, data = nursery)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 330.99 2 165.495 0.074 1.882 0.164
## Error 4131.99 47 87.915
## Corr Total 4462.98 49 91.081
##
## RMSE AdjEtaSq
## 9.376 0.035
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 72.95 2.097 34.794 106434.050 0.963 NA 68.732 77.168 0.000
## one 2.33 2.813 0.828 60.321 0.014 0.889 -3.329 7.989 0.412
## both 9.05 4.688 1.930 327.610 0.073 0.889 -0.381 18.481 0.060
Summary: A one-way analysis of variance of child’s PEABODY vocabulary score from each of the three conditions reveals that their means are not significantly different, (F(2,47) = 1.88, Ƞ2_p = 0.074, p = 0.164). Tests of individual single-degree-of-freedom contrasts indicate that the mean in the one parent going to college condition (one) was not significantly different than the mean of the other condition of no parents going to college (none),(b = 2.33, F(2,47) = 0.686, Ƞ2_p = 0.014, p = 0.412, 95% CI: [-3.33, 7.99]). However, the means of neither parent going to college (none) and both parents going to college (both) did marginally differ, but not significantly, (b = 9.05, F(2,47) = -3.72, Ƞ2_p = 0.073, p = 0.060, 95% CI: [-0.38, 18.48]). In regards to partial slopes, this analysis reveals that the mean PEABODY vocabulary score for children who had no parents go to college (none) was 72.95, while the mean scores for children with one parent going to college was 75.28 and mean scores for children with both parents who had attended college was 82. Overall, parental college attendance status does nothing to reveal child’s PEABODY vocabulary score.
Question 3
Each of the following asks you to generate sets of orthogonal contrast codes in particular contexts.
A. An advertising director tests the effectiveness of three types of ads: those with color pictures, those with black and white photos, and those with no pictures. Subjects rate each type of ad. Specify a contrast code to test picture vs. no picture and specify the other orthogonal contrast code. What does the other contrast code test?
Model 1: Rating of ad = 0 * (NoPics) -1/2 * (B&W) + 1/2* (Color) Model 2: Rating of ad = 2/3 * (NoPics) -1/3 * (B&W) - 1/3 * (Color)
In examining ad ratings across color, black and white, and no pictures groups, I set contrast codes as listed above. Since we wanted to test ad ratings across picture versus no picture groups specifically, I assigned the code of -1/3 to both B&W and color photo groups, while assigning the no pics group a code of 2/3. By doing this, we can test the mean of the two photo groups (B&W, Color) against the no photo group (NoPics) to determine any potential mean differences. The code 2/3 assigned to the NoPics group tests the mean of this group against the group mean of B&W and Color photo add scores to determine the presence of any differences. When interpreting the output, we will add the estimate values of the B&W and Color groups to the estimate value listed in the NoPics group to get those estimates in context, as this is what we are testing against.
B. Four different groups of subjects are asked to study a set of materials describing automobiles. One group (MEMORY) is told to prepare for a memory test after the study period, another group (CHOOSE BEST) is told to prepare for choosing the best alternative after the study period, yet another group (CHOOSE WORST) is told to prepare for choosing the worst alternative, and a final group (RATE) is told to prepare for rating the desirability of each alternative. In fact, all groups are given a memory test after the study period. Generate an appropriate set of contrast codes for analyzing these data. Indicate the question asked by each code you generate.
\(\lambda_1\): Memory Test Score = 3/4 * (CHOOSEBEST) - 1/4 * (CHOOSEWORST) - 1/4 * (MEMORY) - 1/4 * (RATE) \(\lambda_2\): Memory Test Score = 0 * (CHOOSE BEST) + 2/3 (CHOOSE WORST) - 1/3 (MEMORY) - 1/3 (RATE) \(\lambda_3\): Memory Test Score = 0 * (CHOOSEBEST) + 0 * (CHOOSEWORST) + 1/2 * (MEMORY) - 1/2 * (RATE)
\(\lambda_1\) Question: Are there differences in memory test score means between CHOOSE BEST and the group mean of CHOOSE WORST, MEMORY, and RATE? \(\lambda_2\) Question: Are there differences in memory test score means between CHOOSE WORST and the group mean of MEMORY, and RATE? \(\lambda_3\) Question: Are there differences in memory test score means between MEMORY and RATE?
C. No statistics course would be complete without an example from the field (pardon the pun) which generated so much of the early work on statistical methods. This example is from the classic textbook by Snedecor and Cochran. The field is agriculture…. An experiment on sugar beets compared times and methods of applying mixed artificial fertilizers. Yields were measured for the following conditions: no artificials; artificials applied in January by plowing; artificials applied in January with broadcast spreaders; and artificials applied in April with broadcast spreaders. Generate contrast codes to test the following questions.
- Do the artificials have an effect?
\(\lambda_1\): Sugar_Beet_Yields = 3/4 * (No_Artificials) - 1/4 * (Artificials_J_P) - 1/4 * (Artificals_J_BS) - 1/4 * (Artificials_A_BS)
- Are January applications better than April?
\(\lambda_1\): Sugar_Beet_Yields = 0 * (No_Artificials) - 1/3 * (Artificials_J_P) - 1/3 * (Artificals_J_BS) - 2/3 * (Artificials_A_BS)
- Given that fertilizer is applied in January, does method of application make a difference? Show that the codes for these questions generate a complete set of orthogonal contrast codes.
\(\lambda_1\): Sugar_Beet_Yields = 0 * (No_Artificials) - 1/2 * (Artificials_J_P) + 1/2 * (Artificals_J_BS) - 0 * (Artificials_A_BS)
Question 4 - Data application
Using your own dataset, generate a set of orthogonal contrast codes to test a hypothesis you have about your data. State your hypothesis in one sentence, specify your coding scheme, run the model, and briefly state your results.
co2emissions <- read.csv("co2emissions.csv", header = T)
co2emissions$x1 <- 1/2 *(co2emissions$HOHGender == '1') - 1/2 *(co2emissions$HOHGender == '2') + 0 *(co2emissions$HOHGender == '3')
co2emissions$x2 <- -1/3 *(co2emissions$HOHGender == '1') - 1/3 *(co2emissions$HOHGender == '2') + 2/3 *(co2emissions$HOHGender == '3')
m4 <- lm(EmissionsTons ~ x1 + x2, data= co2emissions)
mcSummary(m4)## lm(formula = EmissionsTons ~ x1 + x2, data = co2emissions)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 6.265 2 3.132 0.059 7.694 0.001
## Error 100.555 247 0.407
## Corr Total 106.820 249 0.429
##
## RMSE AdjEtaSq
## 0.638 0.051
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 7.594 0.045 167.093 11366.351 0.991 NA 7.504 7.683 0.000
## x1 -0.144 0.094 -1.528 0.950 0.009 0.978 -0.330 0.042 0.128
## x2 0.365 0.109 3.344 4.553 0.043 0.978 0.150 0.580 0.001
Hypothesis: There will be no difference in average CO2 emissions for male and female head of households compared to nonbinary heads of households.
Results: Results indicate that there is a significant difference in the group mean of annual CO2 emissions for male and females heads of households versus nonbinary heads of households, A one-way analysis of variance of household annual CO2 emissions from each of the three conditions reveals that their means are significantly different, F(2,47) = 7.69, Ƞ2_p = 0.059, p = 0.001). Tests of individual single-degree-of-freedom contrasts indicate that the group mean of male and female head of households was significantly different than the mean of nonbinary head of households in annual CO2 emissions,(b = 0.37, F(2,47) = 11.18, Ƞ2_p = 0.043, p = 0.001, 95% CI: [0.15, 0.58]). However, the means of male head of households and female head of households were not significantly different in annual CO2 emissions, (b = -0.14, F(2,47) = -2.33, Ƞ2_p = 0.01, p = 0.128, 95% CI: [-0.33, 0.04]). Hence, head of household gender identity did explain some annual household CO2 emissions.