Exercise 1

Part A

Boxplot for Infant Birth Weight (Weight)

boxplot(bweight$Weight,
        main = "Infant Birth Weight",
        ylab = "Weight (grams)")

Based on the boxplot, the data is normally distributed with three outliers below the minimum and one above the maximum.

QQ-Plot of Weight

qqnorm(bweight$Weight)
qqline(bweight$Weight)

The QQ-Plot also shows a normal distribution.

Shapiro-Wilk Test for Weight

shapiro.test(bweight$Weight)
## 
##  Shapiro-Wilk normality test
## 
## data:  bweight$Weight
## W = 0.99206, p-value = 0.1153

The P-Value is above the 0.05 significant value and therefore, we cannot reject the null hypothesis.

Part B

Boxplot of Weight by MomSmoke

boxplot(Weight ~ MomSmoke, bweight,
        main = "Infant Birth Weight and Smoking",
        ylab = "Weight (grams)",
        xlab = "Smoking")

The Boxplot shows a difference in infant weights with mothers who do smoke (1) and those who don’t (0). Those who smoke, have a smaller range of infant birth weights. They do follow a normal distribution.

Part C

Shapiro-Wilk Test for Each MomSmoke

shapiro.test(bweight[bweight$MomSmoke == "0", "Weight"])
## 
##  Shapiro-Wilk normality test
## 
## data:  bweight[bweight$MomSmoke == "0", "Weight"]
## W = 0.99362, p-value = 0.3549

The p-value for mother’s who don’t smoke (0) is higher than 0.05 meaning this variable follows normal distribution.

shapiro.test(bweight[bweight$MomSmoke == "1", "Weight"])
## 
##  Shapiro-Wilk normality test
## 
## data:  bweight[bweight$MomSmoke == "1", "Weight"]
## W = 0.96299, p-value = 0.2

The p-value for mother’s who do smoke (1) is also higher than 0.05 meaning this variable also follows normal distribution.

Exercise 2

Test Choice

Based on the results from Exercise 1, there are two samples, both following a normal distribution. This means a two-sample t-test would be the best choice. Null Hypothesis (Ho): Mean weights from mothers who do smoke = mean weights from mothers who do not smoke Alternative Hypothesis (Ha): Mean weights from mothers who do smoke NOT= mean weights from mothers who do not smoke A variance equality check will show which specific test to run.

Variance Equality Test

var.test(Weight ~ MomSmoke, bweight, alternative = "two.sided")
## 
##  F test to compare two variances
## 
## data:  Weight by MomSmoke
## F = 1.0786, num df = 253, denom df = 40, p-value = 0.8009
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.6421109 1.6671729
## sample estimates:
## ratio of variances 
##           1.078555

Because the p-value is above 0.05, a Pooled T-Test is the best option.

Pooled T-Test

t.test(Weight ~ MomSmoke, bweight, alternative = "two.sided", var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  Weight by MomSmoke
## t = 3.071, df = 293, p-value = 0.002334
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##   93.37931 426.65488
## sample estimates:
## mean in group 0 mean in group 1 
##        3422.724        3162.707

The Pooled T-Test shows the p-value is below 0.05 - this means we can reject the Null (Ho). The mean weights from mothers who do smoke IS NOT EQUAL TO the mean weights from mothers who do not smoke.

Exercise 3

Part A

One-Way ANOVA

aov.weight = aov(Weight ~ MomSmoke, bweight)
summary(aov.weight)
##              Df   Sum Sq Mean Sq F value  Pr(>F)   
## MomSmoke      1  2386708 2386708   9.431 0.00233 **
## Residuals   293 74151291  253076                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Homogeneity of Variance

LeveneTest(aov.weight)

Based on the Levene Test, the p-value is greater than 0.05 meaning we cannot reject the Null and we can assume equal variance. From here, we can move on to perform ANOVA.

Part B

Scheffe Test for Conclusion

ScheffeTest(aov.weight)
## 
##   Posthoc multiple comparisons of means: Scheffe Test 
##     95% family-wise confidence level
## 
## $MomSmoke
##          diff    lwr.ci    upr.ci   pval    
## 1-0 -260.0171 -426.6549 -93.37931 0.0023 ** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the Scheffe Test, the p-value is lower than 0.05. This means at least one group in MomSmoke has a significant effect on Weight. Mothers who do not smoke have a higher mean value than those who do smoke.

Exercise 4

Part A

Steps in Backwards Selection

  1. Use all independent variables to run one-way ANOVA (Black, Married, Boy, MomSmoke, and Ed)

  2. Remove the variable with the highest p-value - remove Ed (0.8626)

  3. Run one-way ANOVA with the remaining variables (Black, Married, Boy, MomSmoke)

  4. Remove the variable with the highest p-value - remove Married (0.6157)

  5. Run one-way ANOVA with the remaining variables (Black, Boy, MomSmoke)

  6. Remove the variable with the highest p-value - remove Boy (0.3888)

  7. The remaining two variables (Black and MomSmoke) both have p-values below 0.05 meaning they are the only significant variables. We now need to check their interaction to get the final model. We Will run another one-way ANOVA (with only Black and Married) for the interaction effect.

  8. We can conclude that the interaction will not be included because of it’s p-value.

Part B

Final Model

aov.weight2 = aov(Weight ~ Black + MomSmoke, bweight)
summary(aov.weight2, type = 3)
##              Df   Sum Sq Mean Sq F value  Pr(>F)    
## Black         1  3530450 3530450   14.62 0.00016 ***
## MomSmoke      1  2513301 2513301   10.41 0.00140 ** 
## Residuals   292 70494249  241419                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
table(bweight$Black)
## 
##   0   1 
## 246  49
table(bweight$MomSmoke)
## 
##   0   1 
## 254  41

Based on the ANOVA model and final tables, both variables contain unbalanced data.

Variation

lm.res = lm(Weight ~ Black + MomSmoke, bweight)
summary(lm.res)$r.squared
## [1] 0.07896405

The linear model shows that 7.896% of Weight Variation can be explained by the final model.

Diagnostic Plots for Normality Assumptions

plot(aov.weight2, 2)

The QQPlot shows the data follows a normal distribution.

Part C

Scheffe Test for Mean Values

ScheffeTest(aov.weight2)
## 
##   Posthoc multiple comparisons of means: Scheffe Test 
##     95% family-wise confidence level
## 
## $Black
##          diff    lwr.ci    upr.ci  pval    
## 1-0 -293.9412 -483.0575 -104.8249 8e-04 ***
## 
## $MomSmoke
##         diff    lwr.ci    upr.ci   pval    
## 1-0 -266.763 -470.2261 -63.29987 0.0060 ** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Concludions

Black: - African American variable has a lower mean than Caucasian variables. Both have significant effect on weight. MomSmoke: - Mothers who smoke have a lower mean than mothers who do not smoke. Both have siginificant effect on weight.