R Homework 5

# Loads the Iris Dataset! Answer the questions below.

data("iris") # Loads "iris" data
head(iris) # Views iris data

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

# To run a box plot of categorical data, you can use this code
# boxplot(dependant variable ~ independent variable)

boxplot(iris$Sepal.Length ~ iris$Species)

Figure 1: Boxplot of Sepal Length (cm) vs iris species in the iris dataset.

Q1: What is the difference between raw data and transformed data?

Raw data is untransformed, transformed data is normal distribution.

Q2: A scientist is setting up an experiment in a temperate marsh to measure the growth of an invasive species. In the design, there are 10 control plots with normal water level and 10 experiment plots drained of water. Plant growth measurements are taken monthly for one year. Are these data points independent? What does this mean for running an ANOVA? Explain your answer

Yes these data points are independent. For ANOVA this means the assumptions are indepedence, normality, and homogeneity of variances.

Q3: Run Q–Q plots for the four measurement variables in the iris dataset. Which iris variable most closely resembles normality? Which iris variable least resembles a normal distribution? How can you tell? Explain your answer.Present your commented code and graphs with descriptive figure captions below.

The data that falls along the 1:1 line in a Q-Q plot are normally distributed where as anything else is not. Sepal length and Sepal width are most close along the 1:1 line.

qqnorm(iris$Petal.Length)  #q-q plot for petal length
qqline(iris$Petal.Length)

qqnorm(iris$Sepal.Length)  #q-q plot for sepal length
qqline(iris$Sepal.Length)

qqnorm(iris$Petal.Width)  #q-q plot for petal width
qqline(iris$Sepal.Width)

qqnorm(iris$Sepal.Width)   #q-q plot for sepal width
qqline(iris$Sepal.Width)

Q4: What are the three assumptions of the ANOVA? Explain each assumption in your own words. Say we want to run an ANOVA to detect differences in Sepal.Width for the three iris Species in the iris dataset. Does this relationship qualify for an ANOVA? Explain your answer. Present your commented code testing these assumptions.

The assumptions of ANOVA are first Data Independence, which means the data is independent and that theres no correlations between the data. Normal distribution is assuming the data is normal and to know if our data is normal we can generally use a histogram to get a sense of the data. Third assumption is homoscedasticity/equal variance. We expect that each catergory should have equal variablility for the continous variable. This data qualifies for ANOVA because the data is independent, the data is normal and there’s equal variance.

boxplot(iris$Sepal.Width ~ iris$Species)

hist(iris$Sepal.Width)

qqnorm(iris$Sepal.Width)
qqline(iris$Sepal.Width)

Please turn–in your homework via Sakai by saving and submitting an R Markdown PDF or HTML file from R Pubs!

R Homework 5

Gabby Krochmal

Ecology Lab - Summer 2017

Q1: What is the difference between raw data and transformed data?