The experiment and data comes from Example 6 on a test about fish and the fish living in different locations: the pool or the riffle.
library(ggplot2)
fish2 <- read.csv("fish2.csv")
fish2
## location pool riffle
## 1 1 6 3
## 2 2 6 3
## 3 3 3 3
## 4 4 8 4
## 5 5 5 2
## 6 6 2 2
## 7 7 6 2
## 8 8 7 2
## 9 9 1 2
## 10 10 3 2
## 11 11 4 3
## 12 12 5 1
## 13 13 4 3
## 14 14 6 2
## 15 15 4 3
There are fifteen rows and three columns. There are fifteen different locations and the researchers counted the number of fish in the riffles and the number of fish in the pool. The other two variables are the fish in the pool and the fish in the riffles.
The purpose of this study is to find which location better supports fish. The researchers want to figure out if pools or riffles are better environments for fish.
There is no reason to plot location—location is just a label, and there is no reason to think that these labels are arbitrary. Nothing about location answers our question, which is “in each location are there more fish in the pools or in the riffles”. We could plot histograms for each variable—pool and riffle, but the trouble is we are interested in how they relate to each other. What does it mean that there are 4 fish in the pool? It means one thing if there are 2 fish in the riffle in the same location and it means and entirely different thing that there are 8 fish in the riffle.
I suggest a scatterplot with pool and riffle. To interpret the plot, superimpose the line where pool=riffle or y=x (slope 1, intercept 0). On one side of the line there are more fish in the pool and on the other there are more fish in the riffle.
Here is how you do it:
library(ggplot2)
ggplot(data=fish2, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")
The plot shows that in many of the locations, there are a higher number of fish in the pools than the riffles.
The Null Hypothesis: The fish prefer to live in the fish and riffle equally. The populations in each sample are equal.
The Alternative Hypothesis: They are unequal. The means are different. There is a larger population of fish in one location than in the other location.
The T test is for testing hypotheses about population means of a quantitative variable. In this project, we want to use a paired T test.
We will use a two sample test. We need one sample for the pools and one sample for the riffles.
This graph shows that it was close enough to the line to be considered Normal.
gg <- ggplot(data=fish2)
ggplot(data=fish2) + geom_qq(mapping=aes(sample=fish2$pool))
gg + geom_qq(mapping=aes(sample=fish2$riffle))
The level of significance will be 0.05.
This is a paired problem. Each x is paired with a single y. This was not the case with the sleep data set where there was no pairing between the Unrest subjects and the Deprived subjects. Because of this we do the t.test a little differently. Specifically we do a paired t.test.
t.test(fish2$pool, fish2$riffle, paired=TRUE)
##
## Paired t-test
##
## data: fish2$pool and fish2$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.170332 3.229668
## sample estimates:
## mean of the differences
## 2.2
The p value of 0.0004264 is less than the level of significance. The level of significance was 0.05. We then will reject the null hypothesis.
The confidence interval is the different values for the difference in means. The confidence interval is not zero. This means that they are not different for the difference in means. The farther they are from zero the more different they are. The confidence intervals were 1.170332 and 3.229668. The mean estimate is 2.2.
The sample estimates guesses that there are 2.2 on average more fish in the pool than in the riffle.
In conclusion, fish prefer to live in pools than in riffles. We reject the null hypothesis and accept the alternative hypothesis that the populations are unequal in the locations. More fish will live in pools than in riffles.