Solutions to Day 28 Homework

1. MODIFICATION TO ORIGINAL QUESTION: Do tests at alpha = 0.02 level

Get data and transform:

setwd("/Users/traves/Dropbox/SM339/Day28")
Blood1 = read.csv("Blood1.csv")
head(Blood1)
##   SystolicBP Smoke Overwt
## 1        133     0      2
## 2        115     1      0
## 3        140     1      1
## 4        132     0      2
## 5        133     0      1
## 6        138     0      1
attach(Blood1)
logSys = log(SystolicBP)
mod.1 = lm(logSys ~ 1 + as.factor(Overwt))
summary(mod.1)
## 
## Call:
## lm(formula = logSys ~ 1 + as.factor(Overwt))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6889 -0.1073 -0.0185  0.1306  0.4391 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          4.8936     0.0138  353.72  < 2e-16 ***
## as.factor(Overwt)1   0.0646     0.0228    2.83   0.0048 ** 
## as.factor(Overwt)2   0.1221     0.0192    6.38  4.1e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.189 on 497 degrees of freedom
## Multiple R-squared:  0.0756, Adjusted R-squared:  0.0719 
## F-statistic: 20.3 on 2 and 497 DF,  p-value: 3.24e-09
anova(mod.1)
## Analysis of Variance Table
## 
## Response: logSys
##                    Df Sum Sq Mean Sq F value  Pr(>F)    
## as.factor(Overwt)   2   1.46   0.728    20.3 3.2e-09 ***
## Residuals         497  17.79   0.036                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
MSE = 0.03579

FISHER'S LSD TEST

# Overwt = 0 vs Overwt = 1:
coeff = mod.1$coeff[2]
counts = tapply(Overwt, Overwt, length)
test.stat = coeff/sqrt(MSE * ((1/counts[1]) + (1/counts[2])))
test.stat
## as.factor(Overwt)1 
##              2.835
totalcount = sum(counts)
p = 2 * (1 - pt(test.stat, totalcount - 3))
p  #  0.004771152
## as.factor(Overwt)1 
##           0.004771
# reject null: group means are different


# Overwt = 0 vs Overwt = 2:
coeff = mod.1$coeff[3]
test.stat = coeff/sqrt(MSE * ((1/counts[1]) + (1/counts[3])))
test.stat
## as.factor(Overwt)2 
##              6.377
p = 2 * (1 - pt(test.stat, totalcount - 3))
p  # 4.12586e-10 
## as.factor(Overwt)2 
##          4.126e-10
# reject null: group means are different

# Overwt = 1 vs Overwt = 2:
coeff = mod.1$coeff[3] - mod.1$coeff[2]
test.stat = coeff/sqrt(MSE * ((1/counts[2]) + (1/counts[3])))
test.stat
## as.factor(Overwt)2 
##              2.562
p = 2 * (1 - pt(test.stat, totalcount - 3))
p  # 0.01068754
## as.factor(Overwt)2 
##            0.01069
# reject null: group means are different at alpha = 0.02

BONFERRONI TEST: Do same tests but now alpha is 0.02/3

# Overwt = 0 vs Overwt = 1:
coeff = mod.1$coeff[2]
counts = tapply(Overwt, Overwt, length)
test.stat = coeff/sqrt(MSE * ((1/counts[1]) + (1/counts[2])))
test.stat
## as.factor(Overwt)1 
##              2.835
totalcount = sum(counts)
p = 2 * (1 - pt(test.stat, totalcount - 3))
p  #  0.004771152
## as.factor(Overwt)1 
##           0.004771
# reject null: group means are different


# Overwt = 0 vs Overwt = 2:
coeff = mod.1$coeff[3]
test.stat = coeff/sqrt(MSE * ((1/counts[1]) + (1/counts[3])))
test.stat
## as.factor(Overwt)2 
##              6.377
p = 2 * (1 - pt(test.stat, totalcount - 3))
p  # 4.12586e-10 
## as.factor(Overwt)2 
##          4.126e-10
# reject null: group means are different

# Overwt = 1 vs Overwt = 2:
coeff = mod.1$coeff[3] - mod.1$coeff[2]
test.stat = coeff/sqrt(MSE * ((1/counts[2]) + (1/counts[3])))
test.stat
## as.factor(Overwt)2 
##              2.562
p = 2 * (1 - pt(test.stat, totalcount - 3))
p  # 0.01068754
## as.factor(Overwt)2 
##            0.01069
# reject null: group means are not significantly different (p > 0.02/3)

The difference is that Fisher's LSD test finds a significant difference between groups 1 and 2 and the Bonferroni method does not.

2. STAT2 book, Exercise 5.22:

The ANOVA table seems to be for a regression of child poverty rate against the type of county (small, medium or large). The p-value is high, suggesting that the type of county is not a good predictor of the child poverty rate (that is, the coefficients of the treatment types (type of county) can't be reliably distinguished from each other).

The dotplot also raises concerns since the spread of the child poverty rates among small counties is very small but the spread in the other two categories is quite large. This seems to violate the requirement that the standard deviations of the errors terms in our model are independent of the county type.