STAT 700: Homework 5 Problems

Francis Alvarez & Gonzalo Urrutia

Sept. 30, 2014.

1. We will use the R dataset chickwts. We are interested in the effects of different types of feed on the growth rate of chicken. The help file for the data set gives more details of the experiment.

a) Make boxplots of the data. What do the boxplots suggest about the different types of feed?

boxplot(weight~feed,data=chickwts, main="Chicken Weights",
        ylab="Weight (grams)", xlab="Feed type")

plot of chunk unnamed-chunk-1

The boxlots suggest that the feeds do vary in their effectiveness on the weight. Some feeds led to a higher weight when just looking at the data between the first and third quartile (Horsebean vs. Casein). The feeds also have a large range (take horsebean vs. meatmeal) which may be an indication that they differ in variances as well.

Some of the boxplots appear to be of different sizes, so it is possible that the variances are not the same from group to group, suggesting that the feeds are associated with weight.

To check normality, we can use a histogram of the residual.

residuals <- resid(lm(weight ~ feed, chickwts))
hist(residuals)

plot of chunk unnamed-chunk-2

The normal distribution assumption looks very reasonable.

b) Determine if there are differences in the weights of chicken according to their feed. Write down the model that you are using and state the null and alternative hypotheses. State your conclusion. Give diagnostic plots of the residuals.

lmfit <- lm(weight~feed, chickwts)
summary(lmfit)

## 
## Call:
## lm(formula = weight ~ feed, data = chickwts)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -123.91  -34.41    1.57   38.17  103.09 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     323.58      15.83   20.44  < 2e-16 ***
## feedhorsebean  -163.38      23.49   -6.96  2.1e-09 ***
## feedlinseed    -104.83      22.39   -4.68  1.5e-05 ***
## feedmeatmeal    -46.67      22.90   -2.04  0.04557 *  
## feedsoybean     -77.15      21.58   -3.58  0.00067 ***
## feedsunflower     5.33      22.39    0.24  0.81249    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 54.9 on 65 degrees of freedom
## Multiple R-squared:  0.542,  Adjusted R-squared:  0.506 
## F-statistic: 15.4 on 5 and 65 DF,  p-value: 5.94e-10

H0: All means are equal to each other. vs. H1 = At least one mean differs from the others.

Model

weight^ = 323.58 - 163.38*feedhorsebean - 104.83*feedlinseed - 46.67*feedmeatmeal - 77.15*feedsoybean + 5.33*feedsunflower

Where the intercept is equal to the mean effect of casein feed.

The F-test yields F= 15.36 with 5 and 65 degrees of freedom, so H0 (all means are equal) is strongly rejected.

There is a significant difference in the chickens’ weight according to diet (p=5.936x10^-10).

anova(lmfit)

## Analysis of Variance Table
## 
## Response: weight
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## feed       5 231129   46226    15.4 5.9e-10 ***
## Residuals 65 195556    3009                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According to the above analysis, the p-value for testing whether there are any differences among the treatment means is extremely small. Therefore, we have strong evidence that at least one mean differs from the others.

Diagnostic Plots of Residuals

par(mfrow=c(2,2))
plot(lmfit)

plot of chunk unnamed-chunk-5

*c) To test all possible two-group comparisons, use the R function pairwise.t.test with the Bonferroni adjustment. *

pairwise.t.test(chickwts$weight, chickwts$feed, p.adj="bonferroni")

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  chickwts$weight and chickwts$feed 
## 
##           casein  horsebean linseed meatmeal soybean
## horsebean 3.1e-08 -         -       -        -      
## linseed   0.00022 0.22833   -       -        -      
## meatmeal  0.68350 0.00011   0.20218 -        -      
## soybean   0.00998 0.00487   1.00000 1.00000  -      
## sunflower 1.00000 1.2e-08   9.3e-05 0.39653  0.00447
## 
## P value adjustment method: bonferroni

What do you conclude?

We conclude that the comparisons between these following pairs below have significant evidence against their means being equal:

linseed vs. casein
meatmeal vs. horsebean
soybean vs. casein
soybean vs. horsebean
sunflower vs. soybean

The following below have strong significant evidence:

casein vs. horsebean
horsebean vs. sunflower
sunflower vs. linseed

There were 15 pairwise comparisons done in this problem.

d) In part c) we used the Bonferroni adjustment. What is the Bonferroni adjustment and what does it do? (see the R function p.adjust).

The bonferroni adjustment divides the p-values by the number of comparisons in order to control the familywise error rate (probablity of making type I errors).

The Bonferroni adjustment simply divides the Type I error rate (alpha = .05) by the number of tests (in this case, 6 typed of feed). Hence, this method is often considered overly conservative.

These results are more conservative than with no adjustment.

e) What are the other methods available to adjust the p-value?

Less conservative corrections include Hochberg and Hommel. Hocheberg is similar to Bonferroni’s except it adjusts each p-value with a different number. The lower p-values get compared to lower numbers while larger p-values get compared with higher numbers.

The Tukey Honest Significant Difference (HSD) method controls for the Type I error rate across multiple comparisons and is generally considered an acceptable technique.

The Holm adjustment sequentially compares the lowest p-value with a Type I error rate that is reduced for each consecutive test.

2. Please read Lab2 Two-Way ANOVA Section. The survival times (in hours) for animals in an experiment whose design consisted of three poisons, four treatments, and four observations per cell (Ref: Rice, 1995)

We will use data available off the class web page:

http://www.rohan.sdsu.edu/~babailey/stat700/poison.dat

You can use the header information already in the file.

(a) Plot the data using strip charts and interaction plots. Describe any differences or interactions that you see.

poison = read.table("http://www.rohan.sdsu.edu/~babailey/stat700/poison.dat", header = T,)
attach(poison)
par(mfrow=c(2,2))
stripchart(Survival~Treatment, data=poison, xlab ="Survival Time (hrs)", ylab = "Treatment Options")
stripchart(Survival~Poison, data=poison, xlab ="Survival Time (hrs)", ylab = "Poison Type")
interaction.plot(Treatment ,Poison, Survival)
interaction.plot(Poison, Treatment, Survival)

plot of chunk unnamed-chunk-7

Looking at the strip charts you notice that in treatment options you see significant difference between option 1 and options 2,3. Option 1 is more pretty consistent in the sense that it is clustered together below 5 hours. The other two vary greatly between around 3 to 9 hours. This may be possible to each treatment reacting specifically to each poison. Now looking at the second chart you can see that Poison Type 3 has the least survival time. Poison Type 2 has some high variability as well as type 3. Looking at the interaction plot there were a lot of similarites as far as overall shape. Each treatmeant for the most part was less effective as the poison type increased in number. Treatment 2 was also the most effective in terms of survival time.

b) Conduct a two-way ANOVA to test the effects of the two main factors and their interactions. Give the R ANOVA table and state your conclusion. Include diagnostic plots of the residuals. Use the Bonferroni method for multiple comparisons to determine if there are significant pairwise difference among poisons and among treatments. (Make sure that a factor is a factor!)

fit = lm(Survival~Poison+Treatment+Poison*Treatment)
anova(fit)

## Analysis of Variance Table
## 
## Response: Survival
##                  Df Sum Sq Mean Sq F value Pr(>F)    
## Poison            1   93.2    93.2   20.81  4e-05 ***
## Treatment         1    8.3     8.3    1.84   0.18    
## Poison:Treatment  1    0.0     0.0    0.00   0.96    
## Residuals        44  197.0     4.5                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

When looking at the ANOVA table, Poison has an F value of 20.8118 and P-value of 4.035e-05. This leads us to argue that the poison is statistically significant and is an important factor in terms of Survival Time. As for Treatment Options, it doesn’t seem siginificant enough to lead us to believe it has an affect on survival time (since its P-value=0.1815 > 0.05). The interaction between Poison and Treatment yields a P-value of 0.9585 which enforces that there are not significant interactions between these two factors.

Diagnostic plot of the residuals

par(mfrow=c(2,2))
plot(fit)

plot of chunk unnamed-chunk-9

Bonferroni for Treatments

pairwise.t.test(Survival, Treatment, p.adjust = "bonferroni")

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  Survival and Treatment 
## 
##   1      2      3     
## 2 0.0011 -      -     
## 3 1.0000 0.0147 -     
## 4 0.1051 0.6613 0.7235
## 
## P value adjustment method: bonferroni

After Bonferroni comparisons there is evidence supporting differences in the treatment pairs below:

Treatment 1 vs. Treatment 2
Treatment 3 vs. Treatment 2

Bonferroni for Poison

pairwise.t.test(Survival, Poison, p.adjust = "bonferroni")

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  Survival and Poison 
## 
##   1       2     
## 2 0.9542  -     
## 3 9.3e-05 0.0022
## 
## P value adjustment method: bonferroni

After Bonferroni comparisons there is strong evidence supporting differences in the poison pairs below:

    Poison 3 vs. Poison 1
    Poison 3 vs. Poison 2

c) Box and Cox (1964) analyzed the reciprocals of the survival data, pointing out that the reciprocal of survival time can also be interpreted as rate of death. Conduct a two-way ANOVA, and compare your results to part (b)

death = poison
death$Survival=1/(death$Survival)
colnames(death)[3]= "Death.Rate"
fit2 = lm(Death.Rate~Poison+Treatment+Poison*Treatment , data=death)
anova(fit2)

## Analysis of Variance Table
## 
## Response: Death.Rate
##                  Df Sum Sq Mean Sq F value  Pr(>F)    
## Poison            1  0.319   0.319   50.60 7.8e-09 ***
## Treatment         1  0.053   0.053    8.48  0.0056 ** 
## Poison:Treatment  1  0.005   0.005    0.77  0.3846    
## Residuals        44  0.277   0.006                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

When looking at the P-value of both Treatment and Poison we have evidence to say that they both are significant (since their P-values are less than 0.05), therefore both factors have an impact on the death rate. As for the interaction between the two, the p-value of 0.38464 supports the two factors having no influence among each other in terms of affecting the rate of death. The results are similar to that of the survival time with the exception of Treatment being signific nt as well.