Preliminaries ————————————————————-

1) Use the “install.packages” function to install the “wec” and “psych”packages.

2) Use the “library” function to load the “psych” and “wec” packages.

3) Use the “data” function to load the “bfi” and “BMI” datasets into your workspace.

4) Source the “studentFunctions.R” file to initialize the summary.cellMeans() function.

Factors ——————————————————————-

Use the “bfi” data to complete the following:

You may ignore any missing data, for the purposes of these exercises (although you should never do so in a real data analysis).

1) Refer to the help file of the “bfi” dataset to find the correct levels for the “gender” and “education” variables.

2) Create factors for the “gender” and “education” variables with sensible sets of labels for the levels.

3) How many women in this sample have graduate degrees?

266 women in this sample have graduate degrees.

4) What is the most frequently reported level of educational attainment among men in this sample?

"some college" is the most frequently reported level of educational attainment among men in this sample.

Dummy Codes —————————————————————

Use the “BMI” data to complete the following:

1) How many levels does the “education” factor have?

"education" factor has 3 levels.

2a) What is the reference level of the “sex” factor?

"male" is the reference level of the "sex" factor

2b) What is the reference level of the “education” factor?

"lowest" is the reference level of the "education" factor

3a) Run a linear regression model wherein “BMI” is predicted by dummy-coded “sex” and “education”.

  • Set the reference group to “male” for the “sex” factor
  • Set the reference group to “highest” for the “education” factor

Call:
lm(formula = BMI ~ sex + education, data = BMI)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.2780 -2.6173 -0.4563  1.9877 15.5884 

Coefficients:
                Estimate Std. Error t value             Pr(>|t|)    
(Intercept)      24.5456     0.1263 194.379 < 0.0000000000000002 ***
sexfemale        -0.4986     0.1305  -3.820             0.000136 ***
educationlowest   1.8564     0.1783  10.412 < 0.0000000000000002 ***
educationmiddle   0.7139     0.1472   4.851           0.00000129 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.744 on 3310 degrees of freedom
Multiple R-squared:  0.03565,   Adjusted R-squared:  0.03477 
F-statistic: 40.78 on 3 and 3310 DF,  p-value: < 0.00000000000000022

3b) Is there a significant effect (at alpha = 0.05) of “sex” on “BMI” after controlling for “education”?

There is a significant effect [t-statistic = -0.4986, p-value = 0.0001] (at alpha = 0.05) of "sex" on "BMI" after controlling for "education".

3c) What is the expected BMI for males in the highest education group?

24.5456 is the expected BMI for males in the highest education group.

Cell-Means Codes ———————————————————-

Use the “BMI” data to complete the following:

1) Create a new variable by centering “BMI” on 25.

2a) Regress the centered BMI from (1) onto the set of cell-means codes for “education”.


Call:
lm(formula = centered_BMI ~ education - 1, data = BMI)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2366 -0.6861 -0.1107  0.5337  4.0253 

Coefficients:
                    Estimate  Std. Error t value             Pr(>|t|)    
educationhighest -0.17887404  0.02845492  -6.286 0.000000000367796303 ***
educationlowest   0.30682881  0.03726291   8.234 0.000000000000000257 ***
educationmiddle  -0.00003825  0.02613445  -0.001                0.999    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9845 on 3311 degrees of freedom
Multiple R-squared:  0.03139,   Adjusted R-squared:  0.03081 
F-statistic: 53.66 on 2 and 3311 DF,  p-value: < 0.00000000000000022

2b) Is there a significant effect of education on BMI, at the alpha = 0.05 level?

There is a significant effect [F-statistic = 53.6590, p-value = 0.0000] (at alpha = 0.05) of "education" on "BMI".

2c) What is the value of the test statistic that you used to answer (2b)?

F-statistic = 53.6590

2d) Is the mean BMI level in the “lowest” education group significantly different from 25, at an alpha = 0.05 level?

The mean BMI level in the educationlowest group is significantly different from 25 [estimated slope = 0.3068, t-statistic = 8.2342, p-value = 0.0000] (at alpha = 0.05) of "education" on "BMI".

2e) Is the mean BMI level in the “middle” education group significantly different from 25, at an alpha = 0.05 level?

The mean BMI level in the educationmiddle group is NOT significantly different from 25 [estimated slope = 0.0000, t-statistic = -0.0015, p-value = 0.9988] (at alpha = 0.05) of "education" on "BMI".

2f) Is the mean BMI level in the “highest” education group significantly different from 25, at an alpha = 0.05 level?

The mean BMI level in the educationhighest group is significantly different from 25 [estimated slope = -0.1789, t-statistic = -6.2862, p-value = 0.0000] (at alpha = 0.05) of "education" on "BMI".

Unweighted Effects Codes ————————————————–

Use the “BMI” data to complete the following:

1) Regress “BMI” onto an unweighted effects-coded representation of “education” and a dummy-coded representation of “childless”. Adjust the contrasts attribute of the “education” factor to implement the unweighted effects coding.

    yes
no    0
yes   1
        highest lowest
highest       1      0
lowest        0      1
middle       -1     -1

Call:
lm(formula = BMI ~ education_uwc + childless, data = BMI)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.7947 -2.5500 -0.4678  1.9613 16.2403 

Coefficients:
                     Estimate Std. Error t value             Pr(>|t|)    
(Intercept)          25.57683    0.07909 323.395 < 0.0000000000000002 ***
education_uwchighest -0.73465    0.09170  -8.012  0.00000000000000155 ***
education_uwclowest   0.84868    0.10603   8.004  0.00000000000000165 ***
childlessyes         -1.44709    0.13901 -10.410 < 0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.692 on 3310 degrees of freedom
Multiple R-squared:  0.0621,    Adjusted R-squared:  0.06125 
F-statistic: 73.05 on 3 and 3310 DF,  p-value: < 0.00000000000000022

2) Change the reference group (i.e., the omitted group) for the unweighted effects codes that you implemented in (1) and rerun the model regressing “BMI” onto “education” and “childless”.

        middle highest
middle       1       0
highest      0       1
lowest      -1      -1

Call:
lm(formula = BMI ~ education_uwc2 + childless, data = BMI)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.7947 -2.5500 -0.4678  1.9613 16.2403 

Coefficients:
                      Estimate Std. Error t value             Pr(>|t|)    
(Intercept)           25.57683    0.07909 323.395 < 0.0000000000000002 ***
education_uwc2middle  -0.11403    0.08790  -1.297                0.195    
education_uwc2highest -0.73465    0.09170  -8.012  0.00000000000000155 ***
childlessyes          -1.44709    0.13901 -10.410 < 0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.692 on 3310 degrees of freedom
Multiple R-squared:  0.0621,    Adjusted R-squared:  0.06125 
F-statistic: 73.05 on 3 and 3310 DF,  p-value: < 0.00000000000000022

3a) What is the expected BMI (averaged across education groups) for people with children?

25.5768 is the expected BMI (averaged across education groups) for people with children.

3b) What is the expected difference in BMI between the most highly educated group and the average BMI across education groups, after controlling for childlessness?

-0.7347 is the expected difference in BMI between the most highly educated group and the average BMI across education groups, after controlling for childlessness.

3c) Is the difference you reported in (3b) significantly different from zero, at the alpha = 0.05 level?

The difference reported in (3b) is different from zero [estimated slope = -0.7347, t-statistic = -8.0117, p-value = 0.0000] (at alpha = 0.05).

3d) What is the expected difference in BMI between the middle education group and the average BMI across education groups, after controlling for childlessness?

-0.1140 is the expected difference in BMI between the middle education group and the average BMI across education groups, after controlling for childlessness.

3e) Is the difference you reported in (3d) significantly different from zero, at the alpha = 0.05 level?

The difference reported in (3d) is NOT significantly different from zero [estimated slope = -0.1140, t-statistic = -1.2973, p-value = 0.1946] (at alpha = 0.05).

Weighted Effects Codes —————————————————-

Use the “BMI” data to complete the following:

1) Regress “BMI” onto a weighted effects-coded representation of “education” and a dummy-coded representation of “sex”. Adjust the contrasts attribute of the “education” factor to implement the weighted effects coding.

       female
male        0
female      1
            lowest    middle
highest -0.5831245 -1.185464
lowest   1.0000000  0.000000
middle   0.0000000  1.000000

Call:
lm(formula = BMI ~ sex + education_wc, data = BMI)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.2780 -2.6173 -0.4563  1.9877 15.5884 

Coefficients:
                   Estimate Std. Error t value             Pr(>|t|)    
(Intercept)        25.24231    0.09485 266.140 < 0.0000000000000002 ***
sexfemale          -0.49864    0.13052  -3.820             0.000136 ***
education_wclowest  1.15972    0.12592   9.210 < 0.0000000000000002 ***
education_wcmiddle  0.01721    0.07529   0.229             0.819185    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.744 on 3310 degrees of freedom
Multiple R-squared:  0.03565,   Adjusted R-squared:  0.03477 
F-statistic: 40.78 on 3 and 3310 DF,  p-value: < 0.00000000000000022

2) Change the reference group (i.e., the omitted group) for the weighted effects codes that you implemented in (1) and rerun the model regressing “BMI” onto “education” and “sex”.

           highest     lowest
highest  1.0000000  0.0000000
lowest   0.0000000  1.0000000
middle  -0.8435518 -0.4918957

Call:
lm(formula = BMI ~ sex + education_wc2, data = BMI)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.2780 -2.6173 -0.4563  1.9877 15.5884 

Coefficients:
                     Estimate Std. Error t value             Pr(>|t|)    
(Intercept)          25.24231    0.09485 266.140 < 0.0000000000000002 ***
sexfemale            -0.49864    0.13052  -3.820             0.000136 ***
education_wc2highest -0.69666    0.08657  -8.047  0.00000000000000117 ***
education_wc2lowest   1.15972    0.12592   9.210 < 0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.744 on 3310 degrees of freedom
Multiple R-squared:  0.03565,   Adjusted R-squared:  0.03477 
F-statistic: 40.78 on 3 and 3310 DF,  p-value: < 0.00000000000000022

3a) What is the average BMI for females?

24.7437 is the average BMI for females.

3b) What is the expected difference in BMI between the least educated group and the average BMI, after controlling for sex?

1.1597 is the expected difference in BMI between the least educated group and the average BMI, after controlling for sex.

3c) Is the difference you reported in (3b) significantly different from zero, at the alpha = 0.01 level?

The difference reported in (3b) is significantly different from zero [estimated slope = 1.1597, t-statistic = 9.2100, p-value = 0.0000] (at alpha = 0.01).

3d) What is the expected difference in BMI between the most highly educated group and the average BMI, after controlling for sex?

-0.6967 is the expected difference in BMI between the most highly educated group and the average BMI, after controlling for sex.

3e) Is the difference you reported in (3d) significantly different from zero, at the alpha = 0.01 level?

The difference reported in (3d) is significantly different from zero [estimated slope = -0.6967, t-statistic = -8.0471, p-value = 0.0000] (at alpha = 0.01).

4a) Does education level explain a significant proportion of variance in BMI, above and beyond sex?


Call:
lm(formula = BMI ~ sex, data = BMI)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.9293 -2.7112 -0.4974  1.9697 15.0702 

Coefficients:
            Estimate Std. Error t value             Pr(>|t|)    
(Intercept) 25.23544    0.09626 262.149 < 0.0000000000000002 ***
sexfemale   -0.48566    0.13236  -3.669             0.000247 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.803 on 3312 degrees of freedom
Multiple R-squared:  0.004049,  Adjusted R-squared:  0.003748 
F-statistic: 13.46 on 1 and 3312 DF,  p-value: 0.000247

------------------------------------------------------------------------
Analysis of Variance Table

Model 1: BMI ~ sex
Model 2: BMI ~ sex + education_wc2
  Res.Df   RSS Df Sum of Sq      F                Pr(>F)    
1   3312 47909                                              
2   3310 46389  2      1520 54.229 < 0.00000000000000022 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] 54.22865

------------------------------------------------------------------------
Education level DOES explain a significant proportion of variance in BMI, above and beyond sex [F-statistic = 54.2286, p-value = 0.0000] (at alpha = 0.05).

4b) What is the value of the test statistic that you used to answer (4a)?

F-statistic = 54.2286