One way ANOVA with different species of lizard from Europe. A data is collected with three types of lizards and their lengths are recorded. With the ANOVA test, we want to see whether one kind of lizard is longer than the other one.
Assumption 1: All the samples are independent, and collected in more than two independent categorical groups.
\(H_o\): There is no difference between lengths of three species or mean lengths are equal.
We will conduct the experiment following the steps below:
1. Upload the data and call it “dataoneway”
dataoneway <- read.delim("C:\\Users\\torresk\\Desktop\\onewayanova.txt",h=T)
#dataoneway <- read.delim("C:\\Users\\Public\\Documents\\series.txt",h=T)
attach(dataoneway)
2. What are the names of the arrays?
names(dataoneway)
## [1] "Group" "Length"
How many types of groups are available?
head(dataoneway)
## Group Length
## 1 1 19.0
## 2 1 18.6
## 3 1 18.3
## 4 1 18.0
## 5 1 18.2
## 6 1 18.6
3. Categorize/Factor “Group”
dataoneway$Group <- as.factor(dataoneway$Group)
dataoneway$Group = factor(dataoneway$Group,labels = c("Wall Lizard","Viviparous Lizard","Snake-eyed Lizard"))
4. Check if classification was done correctly.
class(dataoneway$Group)
## [1] "factor"
5. In this experiment, the dependent variable is length and it is continous. So for ANOVA test, assumption 2: dependent variable is continous is filled.
Next we have to check assumtion 3: no major outliers. You can check this in R.
(a)First create Group1, Group2, and Group3, as 3 subjects of “Group”.
group1 <- subset(dataoneway, Group=="Wall Lizard")
group2 <- subset(dataoneway, Group=="Viviparous Lizard")
group3 <- subset(dataoneway, Group=="Snake-eyed Lizard")
(b)Draw the normal quantile plot for each group and see if there is any major outliers in homogeneity of variance.
qqnorm(group1$Length)
qqline(group1$Length)
6. Before doing the ANOVA check the homogeneity of variance. That is actually assumption 4: homogeneity of variance.
bartlett.test(Length~Group, data=dataoneway)
##
## Bartlett test of homogeneity of variances
##
## data: Length by Group
## Bartlett's K-squared = 0.43292, df = 2, p-value = 0.8054
7. What is the p-value from the barlett.test? Is it >.05? What does it mean?
P-value is 0.8054. It’s greater than .05 so we cannot reject the null hypothesis that the variance is the same.
8. For ANOVA test, create the linear model with Lenth versus Group and call it model1.
model1 <- lm(Length~Group, data=dataoneway)
ANOVA:
anova(model1)
## Analysis of Variance Table
##
## Response: Length
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 2 10.615 5.3074 7.0982 0.0013 **
## Residuals 102 76.267 0.7477
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
9. Report the p-value. What can you conclude about the null hypothesis?
The P-value is .0013, which means we can’t reject the null hypothesis.
10. We don’t know yet which species is longer than the others. We will verify with Post-hoc test TukeyHSD.
TukeyHSD(aov(model1))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = model1)
##
## $Group
## diff lwr upr
## Viviparous Lizard-Wall Lizard -0.7200000 -1.2116284 -0.2283716
## Snake-eyed Lizard-Wall Lizard -0.1028571 -0.5944855 0.3887713
## Snake-eyed Lizard-Viviparous Lizard 0.6171429 0.1255145 1.1087713
## p adj
## Viviparous Lizard-Wall Lizard 0.0020955
## Snake-eyed Lizard-Wall Lizard 0.8726158
## Snake-eyed Lizard-Viviparous Lizard 0.0098353
What can you say from the p-values? We can say that the comparison between Snake-eyed Lized and Wall Lizard is not statistically significant because the p-value is greater than .05. While the other two comparisons are.
11. Visualize the data with ggplo2:
#install.packages("ggplot2")
library(ggplot2)
ggplot(dataoneway, aes(x = Group, y = Length)) +
geom_boxplot(fill = "grey80", colour = "black") +
scale_x_discrete() + xlab("Treatment Group") +
ylab("Length (cm)")