One Way Anova procedure

Data Set: Plant Growth

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.2.5

plant.df=PlantGrowth
plant.df$group=factor(plant.df$group,labels = c("Control","Treatment1","Treatment2"))

Visulazing Factors(with reordering):

attach(plant.df)
ggplot(plant.df,aes(group,weight,fill=group))+geom_boxplot(aes(reorder(group,weight,median)),color="blue",notch = F)

fitting ANOVA Model:

anovafit=aov(weight~ group,data=plant.df)
summary(anovafit)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## group        2  3.766  1.8832   4.846 0.0159 *
## Residuals   27 10.492  0.3886                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA table shows P value of 0.0159 which though rejects H{0}, NULL Hypothesis that group means are same, it does not tell how much different they are. To do pairwise comparison, we need to do pairwise comparison. Pairwise t stat comparison:

pairwise.t.test(weight,group,p.adjust.method = "bonferroni")

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  weight and group 
## 
##            Control Treatment1
## Treatment1 0.583   -         
## Treatment2 0.263   0.013     
## 
## P value adjustment method: bonferroni

Tuckey Honest Significance test:

TukeyHSD(anovafit,conf.level=0.95)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = weight ~ group, data = plant.df)
## 
## $group
##                         diff        lwr       upr     p adj
## Treatment1-Control    -0.371 -1.0622161 0.3202161 0.3908711
## Treatment2-Control     0.494 -0.1972161 1.1852161 0.1979960
## Treatment2-Treatment1  0.865  0.1737839 1.5562161 0.0120064

Tuckey Test shows that Treatment1 and Treatment 2 means are different, but there is no conclusive evidence for control group.

Anova Test using lm function:

lmfit=lm(weight~group,data=plant.df)
summary(lmfit)

## 
## Call:
## lm(formula = weight ~ group, data = plant.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0710 -0.4180 -0.0060  0.2627  1.3690 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       5.0320     0.1971  25.527   <2e-16 ***
## groupTreatment1  -0.3710     0.2788  -1.331   0.1944    
## groupTreatment2   0.4940     0.2788   1.772   0.0877 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6234 on 27 degrees of freedom
## Multiple R-squared:  0.2641, Adjusted R-squared:  0.2096 
## F-statistic: 4.846 on 2 and 27 DF,  p-value: 0.01591

anova(lmfit)

## Analysis of Variance Table
## 
## Response: weight
##           Df  Sum Sq Mean Sq F value  Pr(>F)  
## group      2  3.7663  1.8832  4.8461 0.01591 *
## Residuals 27 10.4921  0.3886                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

here it can be seen easily that both method show same F stat. Calculation of confident Interval:

confint(lmfit,level=.95)

##                       2.5 %    97.5 %
## (Intercept)      4.62752600 5.4364740
## groupTreatment1 -0.94301261 0.2010126
## groupTreatment2 -0.07801261 1.0660126

Fitted Vs residual Values Plot:

plant.mod=data.frame(fitted=fitted(lmfit),residuals=resid(lmfit),treatment=plant.df$group)
ggplot(plant.mod,aes(fitted,residuals,color=treatment))+geom_point()