PROBLEM 1

library(nlme)
dye <- read.table("http://www.rohan.sdsu.edu/~babailey/stat700/dye.dat", header=T)
dye$Batch <- as.factor(dye$Batch)
#dye

(a) Make strip chart of Strength by Batch. What do you notice?

stripchart(Strength ~ Batch, vertical=T,method="stack",xlab="Batch",ylab="Strength",data=dye,main="Dyestuff Data Stripchart")

plot of chunk unnamed-chunk-2

One notices that batches 2,3 and 5 are on the lower quartiles of strength with roughly similar standard deviations. While batches 4 and 6 are in the higher quartiles of strength. Again the standard deviations of those five batches seem plausible to all be the same although the means may be different. As for batch one there appears to be a higher standard devation.

  1. To test the hypothesis
fit = lm(Strength ~ Batch , data=dye)
summary(fit)
## 
## Call:
## lm(formula = Strength ~ Batch, data = dye)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.000 -1.250  0.125  1.500  4.500 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   92.000      0.863  106.59  < 2e-16 ***
## Batch2        -1.500      1.221   -1.23     0.23    
## Batch3        -0.750      1.221   -0.61     0.54    
## Batch4         6.250      1.221    5.12  1.7e-05 ***
## Batch5        -0.833      1.221   -0.68     0.50    
## Batch6         6.500      1.221    5.33  9.3e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.11 on 30 degrees of freedom
## Multiple R-squared:  0.756,  Adjusted R-squared:  0.715 
## F-statistic: 18.6 on 5 and 30 DF,  p-value: 2.16e-08
anova(fit)
## Analysis of Variance Table
## 
## Response: Strength
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## Batch      5    415    83.1    18.6 2.2e-08 ***
## Residuals 30    134     4.5                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

After looking at the Anova table the p-value of the batch is 2.157e-08 which is signicantly small, therefore we reject the null hypothesis H0: = 0

Problem 2 (a) We can use the R function lme to fit

fit2 = lme(Strength~1, data=dye, random=~1 | Batch)
summary(fit2)
## Linear mixed-effects model fit by REML
##  Data: dye 
##     AIC   BIC logLik
##   175.9 180.6 -84.96
## 
## Random effects:
##  Formula: ~1 | Batch
##         (Intercept) Residual
## StdDev:        3.62    2.114
## 
## Fixed effects: Strength ~ 1 
##             Value Std.Error DF t-value p-value
## (Intercept) 93.61     1.519 30   61.62       0
## 
## Standardized Within-Group Residuals:
##      Min       Q1      Med       Q3      Max 
## -1.93304 -0.65770  0.07683  0.81223  2.08757 
## 
## Number of Observations: 36
## Number of Groups: 6

For the intercept /simga^2 we get a StdDev = 3.619968 and a Residual Std Deviation simga^2 a = 2.114106.

  1. The plot function will give you a diagnostic plot for your fit. From this plot, how well does the model fit the data?
plot(fit2)

plot of chunk unnamed-chunk-5

fit
## 
## Call:
## lm(formula = Strength ~ Batch, data = dye)
## 
## Coefficients:
## (Intercept)       Batch2       Batch3       Batch4       Batch5  
##      92.000       -1.500       -0.750        6.250       -0.833  
##      Batch6  
##       6.500

The plot now has a gap in values from 93 to 98 but overall there is no specific pattern in the “Standardized residuals vs Fitted values” plot.