# Loads the Iris Dataset! Answer the questions below.
data("iris") # Loads "iris" data
head(iris) # Views iris data
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
# To run a box plot of categorical data, you can use this code
# boxplot(dependant variable ~ independent variable)
boxplot(iris$Sepal.Length ~ iris$Species)
Figure 1: Boxplot of Sepal Length (cm) vs iris species in the iris dataset.
Raw data is untransformed, transformed data is normal distribution.
Yes these data points are independent. For ANOVA this means the assumptions are indepedence, normality, and homogeneity of variances.
The data that falls along the 1:1 line in a Q-Q plot are normally distributed where as anything else is not. Sepal length and Sepal width are most close along the 1:1 line.
qqnorm(iris$Petal.Length) #q-q plot for petal length
qqline(iris$Petal.Length)
qqnorm(iris$Sepal.Length) #q-q plot for sepal length
qqline(iris$Sepal.Length)
qqnorm(iris$Petal.Width) #q-q plot for petal width
qqline(iris$Sepal.Width)
qqnorm(iris$Sepal.Width) #q-q plot for sepal width
qqline(iris$Sepal.Width)
The assumptions of ANOVA are first Data Independence, which means the data is independent and that theres no correlations between the data. Normal distribution is assuming the data is normal and to know if our data is normal we can generally use a histogram to get a sense of the data. Third assumption is homoscedasticity/equal variance. We expect that each catergory should have equal variablility for the continous variable. This data qualifies for ANOVA because the data is independent, the data is normal and there’s equal variance.
boxplot(iris$Sepal.Width ~ iris$Species)
hist(iris$Sepal.Width)
qqnorm(iris$Sepal.Width)
qqline(iris$Sepal.Width)
Please turn–in your homework via Sakai by saving and submitting an R Markdown PDF or HTML file from R Pubs!