Bio Stat Example with ANOVA

First upload the data and call it “dataoneway”

dataoneway <- read.table("onewayanova.txt", h=T)

What are the names of the arrays? How many types of groups are available?

names(dataoneway)

## [1] "Group"  "Length"

There are two types of groups available.

Categorize/Factor “Group”

dataoneway$Group <- as.factor(dataoneway$Group) 
dataoneway$Group = factor(dataoneway$Group, labels = c("Wall lizard", "Viviparous lizard", "Snake-eyed lizard"))

First create Group1, Group 2, and Group3 as 3 subsets of “Group”. For example:

Group1 <- subset(data name, Group == “Category”)

Group1 <- subset(dataoneway, Group == "Wall lizard")
Group2 <- subset(dataoneway, Group == "Viviparous lizard")
Group3 <- subset(dataoneway, Group == "Snake-eyed lizard")

Draw the normal quantile plot for each group and see if there is any major outliers in every single group.

qqnorm(Group1$Length)
qqline(Group1$Length)

qqnorm(Group2$Length)
qqline(Group2$Length)

qqnorm(Group3$Length)
qqline(Group3$Length)

Before doing ANOVA, check the homogeneity of variance.

barlett.test(Length ~ Group, data = name of the data)

bartlett.test(Length ~ Group, data = dataoneway)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Length by Group
## Bartlett's K-squared = 0.43292, df = 2, p-value = 0.8054

What is the p-value from the barlett.test? Is it > 0.05? What does it mean?

p = 0.8054, p > 0.05, which means the variance of all three groups are more or less the same.

For ANOVA test, create the linear model with Length versus Group and call it model1. Then do ANOVA:

model1 = lm(Length ~ Group, data = dataoneway)
model1

## 
## Call:
## lm(formula = Length ~ Group, data = dataoneway)
## 
## Coefficients:
##            (Intercept)  GroupViviparous lizard  GroupSnake-eyed lizard  
##                18.4657                 -0.7200                 -0.1029

lm(formula = Length ~ Group, data = dataoneway)

## 
## Call:
## lm(formula = Length ~ Group, data = dataoneway)
## 
## Coefficients:
##            (Intercept)  GroupViviparous lizard  GroupSnake-eyed lizard  
##                18.4657                 -0.7200                 -0.1029

anova(model1)

## Analysis of Variance Table
## 
## Response: Length
##            Df Sum Sq Mean Sq F value Pr(>F)   
## Group       2 10.615  5.3074  7.0982 0.0013 **
## Residuals 102 76.267  0.7477                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Report the p-value. What can you conclude about the null hypothesis?

p-value: 0.0013, 0.0013 < 0.05, reject null hypothesis

Verify the Post-hoc test TukeyHSD

TukeyHSD(aov(model1))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = model1)
## 
## $Group
##                                           diff        lwr        upr
## Viviparous lizard-Wall lizard       -0.7200000 -1.2116284 -0.2283716
## Snake-eyed lizard-Wall lizard       -0.1028571 -0.5944855  0.3887713
## Snake-eyed lizard-Viviparous lizard  0.6171429  0.1255145  1.1087713
##                                         p adj
## Viviparous lizard-Wall lizard       0.0020955
## Snake-eyed lizard-Wall lizard       0.8726158
## Snake-eyed lizard-Viviparous lizard 0.0098353

What can you say from the p-values?

For the p-value 0.0020955 for the Viviparous and Wall lizards and 0.0098353 for the Snake-eyed and Viviparous lizards, since 0.0020955 < 0.05 and 0.0098353 < 0.05, you can reject the null hypothesis for both. For the Snake-eyed and Wall lizards, the p-value is 0.8726158, and since 0.8726158 > 0.05, you fail to reject the null hypothesis.

Visualize the data with ggplot2.

ggplot(name of the data, aes(x = Group, y = Length)) + geom_boxplot(fill = “grey80”, col = “black”) + scale_x_discrete() + xlab(“Treatment Group”) + ylab(“Length (cm)”)

library("ggplot2")

ggplot(dataoneway, aes( x = Group, y = Length)) +
geom_boxplot(fill = "grey80", col = "black") +
scale_x_discrete() + xlab("Treatment Group") +
ylab("Length (cm")

Bio Stat Example with ANOVA

Olivia Visaggio

2/25/2019