Bio Stat Example with ANOVA

R Markdown

First upload the data and call it “dataoneway”

dataoneway <- read.table("onewayanova.txt", h=T) 
attach(dataoneway)

What are the names of the arrays?

names(dataoneway)

## [1] "Group"  "Length"

Categorize/ Factor “Group”

dataoneway$Group <- as.factor(dataoneway$Group)
dataoneway$Group = factor(dataoneway$Group, labels= 
                        c("Wall Lizard", "Viviparus lizard", "Snake eyed lizard"))

Check if you have done the classification correctly.

class(dataoneway$Group)

## [1] "factor"

A. First create Group1, Group2, and Group3 as 3 subjects of “Group”

Group1 <- subset(dataoneway, Group == "Wall lizard")
Group2 <- subset(dataoneway, Group == "Viviparous lizard")
Group3 <- subset(dataoneway, Group == "Snake-eyed lizard")

B. Draw the normal quantile plot for each group and see if there is any major outliers in avery single group.

qqnorm(Group1\(Length) qqline(Group1\)Length)

qqnorm(Group2\(Length) qqline(Group2\)Length)

qqnorm(Group3\(Length) qqline(Group3\)Length)

Before doing ANOVA, check the homogenity of varaince. That is actually assumption 4: homogenity of variance.

bartlett.test(Length~Group, data=dataoneway)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Length by Group
## Bartlett's K-squared = 0.43292, df = 2, p-value = 0.8054

What is the pvalue from the barlett.test? What does it mean? The p value is .8054 The p-value is greater than .05 which means the varainces are the same
For ANOVA test, create the linear model with Length vs Group and call it model1. Then do ANOVA

model1 = with(dataoneway, lm(Length ~ Group)) 
anova(model1)

## Analysis of Variance Table
## 
## Response: Length
##            Df Sum Sq Mean Sq F value Pr(>F)   
## Group       2 10.615  5.3074  7.0982 0.0013 **
## Residuals 102 76.267  0.7477                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Report the p value. What can you conclude about the null hypothesis? We accept the null hypothesis.
We do not know yet which species is longer than others. We will verify with post-hoc test.

TukeyHSD(aov(model1))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = model1)
## 
## $Group
##                                          diff        lwr        upr
## Viviparus lizard-Wall Lizard       -0.7200000 -1.2116284 -0.2283716
## Snake eyed lizard-Wall Lizard      -0.1028571 -0.5944855  0.3887713
## Snake eyed lizard-Viviparus lizard  0.6171429  0.1255145  1.1087713
##                                        p adj
## Viviparus lizard-Wall Lizard       0.0020955
## Snake eyed lizard-Wall Lizard      0.8726158
## Snake eyed lizard-Viviparus lizard 0.0098353

The 1st and 3rd p values are very low and the second p value is very high.

Visualize the data with ggplot2

library(ggplot2)

ggplot(dataoneway, aes(x = Group, y=Length)) +
geom_boxplot(fill= "grey80", colour= "black") + 
scale_x_discrete() + xlab("Treatment Group") +
ylab("Length (cm)")

Bio Stat Example with ANOVA

Cade Corcoran

February 25, 2019

R Markdown