# Loads the Iris Dataset! Answer the questions below.

data("iris") # Loads "iris" data
head(iris) # Views iris data
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
# To run a box plot of categorical data, you can use this code
# boxplot(dependant variable ~ independent variable)

boxplot(iris$Sepal.Length ~ iris$Species)
Figure 1: Boxplot of Sepal Length (cm) vs iris species in the iris dataset.

Figure 1: Boxplot of Sepal Length (cm) vs iris species in the iris dataset.

Q1: What is the difference between raw data and transformed data?

Raw data is just the data we collect without doing anything to the data, and transformed data is when we take the raw data and apply a function such as log or square root functions to transform the data to fit the Q-Q line and make it normal.

Q2: A scientist is setting up an experiment in a temperate marsh to measure the growth of an invasive species. In the design, there are 10 control plots with normal water level and 10 experiment plots drained of water. Plant growth measurements are taken monthly for one year. Are these data points independent? What does this mean for running an ANOVA? Explain your answer.

The data points are dependent because they are replicated on the same plants each time. Also the species is the same so that’s a correlation. This means that you cannot run an ANOVA test, but there are more complex functions that can help us get the answers.

Q3: Run Q–Q plots for the four measurement variables in the iris dataset. Which iris variable most closely resembles normality? Which iris variable least resembles a normal distribution? How can you tell? Explain your answer.Present your commented code and graphs with descriptive figure captions below.

I can tell because if we look at the QQ line in the middle we look for which one has the most points closer to the line. As we see in the first graph of Sepal Length it’s pretty close to having all the points on the line. The second graph about Sepal Width is pretty close, but it’s not exactly on the line mostly. The third one is way off because its in a sort of “s” shape. And number 4 is similar, but the points are closer to the line than graph three.

qqnorm(iris$Sepal.Length) #This one is the closest to normal distribution.
qqline(iris$Sepal.Length)
Figure 1: Iris Sepal Length.

Figure 1: Iris Sepal Length.

qqnorm(iris$Sepal.Width)
qqline(iris$Sepal.Width)
Figure 2: Iris Sepal Width.

Figure 2: Iris Sepal Width.

qqnorm(iris$Petal.Length) # This one is the farthest from a normal distribution.
qqline(iris$Petal.Length)
Figure 3: Iris Petal Length.

Figure 3: Iris Petal Length.

qqnorm(iris$Petal.Width)
qqline(iris$Petal.Width)
Figure 4: Iris Petal Width.

Figure 4: Iris Petal Width.

Q4: What are the three assumptions of the ANOVA? Explain each assumption in your own words. Say we want to run an ANOVA to detect differences in Sepal.Width for the three iris Species in the iris dataset. Does this relationship qualify for an ANOVA? Explain your answer. Present your commented code testing these assumptions.

The first assumption is data independence which means that the data we get and the way we set up the experiment should be independent from one another, and there should be no correlations. The second assumption is normal distribution which means that the data is spread normally around the mean. The way we find this out is to run a Q-Q plot and see how close the data is to the q-q line and then make adjustments from there. Lastly we have homoscedasticity which means that the variances are equal meaning that there is an equal spread of data. I would say yes because it passes all three assumptions.

hist(iris$Sepal.Width)
Figure 5: Iris sepal width vs other species.

Figure 5: Iris sepal width vs other species.

boxplot(iris$Sepal.Width~iris$Species)
Figure 5: Iris sepal width vs other species.

Figure 5: Iris sepal width vs other species.

qqnorm(iris$Sepal.Width) #test for normality
qqline(iris$Sepal.Width)
Figure 5: Iris sepal width vs other species.

Figure 5: Iris sepal width vs other species.

Please turn–in your homework via Sakai by saving and submitting an R Markdown PDF or HTML file from R Pubs!