# Loads the Iris Dataset! Answer the questions below.
data("iris") # Loads "iris" data
head(iris) # Views iris data## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
# To run a box plot of categorical data, you can use this code
# boxplot(dependant variable ~ independent variable)
boxplot(iris$Sepal.Length ~ iris$Species)Figure 1: Boxplot of Sepal Length (cm) vs iris species in the iris dataset.
Raw data is non-normal data that affects the validity of the statistical test. Transformed data is raw data transformed into something that’s closer to normal.
it is not independent, if you run an ANOVa with data that are not normally distributed, the results may be misleading or completely invalid.
#the variable most closely to normal is Sepal width because it falls into the 1:1 line. The least normal is Petal Length becasue data falls out of the 1:1 line in both ends.
hist(iris$Sepal.Length)qqnorm (iris$Sepal.Length)
qqline (iris$Sepal.Length)hist(iris$Sepal.Width)qqnorm (iris$Sepal.Width)
qqline (iris$Sepal.Width)hist(iris$Petal.Length)qqnorm (iris$Petal.Length)
qqline (iris$Petal.Length)hist(iris$Petal.Width)qqnorm (iris$Petal.Width)
qqline (iris$Petal.Width)#The three assumptions of the ANOVA are Independence:Data are independent, Normality residuals have a normal distribution or at list is symmetric. And Homogeneity of vriances: Data from multiple groups have the same variance.
boxplot(iris$Sepal.Width~iris$Species)hist(iris$Sepal.Width)qqnorm (iris$Sepal.Width)
qqline (iris$Sepal.Width)Please turn–in your homework via Sakai by saving and submitting an R Markdown PDF or HTML file from R Pubs!