# Loads the Iris Dataset! Answer the questions below.

data("iris") # Loads "iris" data
head(iris) # Views iris data
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
# To run a box plot of categorical data, you can use this code
# boxplot(dependant variable ~ independent variable)

boxplot(iris$Sepal.Length ~ iris$Species)
Figure 1: Boxplot of Sepal Length (cm) vs iris species in the iris dataset.

Figure 1: Boxplot of Sepal Length (cm) vs iris species in the iris dataset.

Q1: What is the difference between raw data and transformed data?

Raw data is non-normal data that affects the validity of the statistical test. Transformed data is raw data transformed into something that’s closer to normal.

Q2: A scientist is setting up an experiment in a temperate marsh to measure the growth of an invasive species. In the design, there are 10 control plots with normal water level and 10 experiment plots drained of water. Plant growth measurements are taken monthly for one year. Are these data points independent? What does this mean for running an ANOVA? Explain your answer.

it is not independent, if you run an ANOVa with data that are not normally distributed, the results may be misleading or completely invalid.

Q3: Run Q–Q plots for the four measurement variables in the iris dataset. Which iris variable most closely resembles normality? Which iris variable least resembles a normal distribution? How can you tell? Explain your answer.Present your commented code and graphs with descriptive figure captions below.

#the variable most closely to normal is Sepal width because it falls into the 1:1 line. The least normal is Petal Length becasue data falls out of the 1:1 line in both ends.
hist(iris$Sepal.Length)

qqnorm (iris$Sepal.Length)
qqline (iris$Sepal.Length)

hist(iris$Sepal.Width)

qqnorm (iris$Sepal.Width)
qqline (iris$Sepal.Width)

hist(iris$Petal.Length)

qqnorm (iris$Petal.Length)
qqline (iris$Petal.Length)

hist(iris$Petal.Width)

qqnorm (iris$Petal.Width)
qqline (iris$Petal.Width)

Q4: What are the three assumptions of the ANOVA? Explain each assumption in your own words. Say we want to run an ANOVA to detect differences in Sepal.Width for the three iris Species in the iris dataset. Does this relationship qualify for an ANOVA? Explain your answer. Present your commented code testing these assumptions.

#The three assumptions of the ANOVA are Independence:Data are independent, Normality residuals have a normal distribution or at list is symmetric. And Homogeneity of vriances: Data from multiple groups have the same variance.
boxplot(iris$Sepal.Width~iris$Species)

hist(iris$Sepal.Width)

qqnorm (iris$Sepal.Width)
qqline (iris$Sepal.Width)

Please turn–in your homework via Sakai by saving and submitting an R Markdown PDF or HTML file from R Pubs!