fish <- read.csv("fish.csv")
fish
## location pool riffle
## 1 1 6 3
## 2 2 6 3
## 3 3 3 3
## 4 4 8 4
## 5 5 5 2
## 6 6 2 2
## 7 7 6 2
## 8 8 7 2
## 9 9 1 2
## 10 10 3 2
## 11 11 4 3
## 12 12 5 1
## 13 13 4 3
## 14 14 6 2
## 15 15 4 3
The data set consists of 15 rows with three columns. Each row represents the number of different species of fish captured along different locations at a river, and each column represents a variable. The first column is the location where the fish were captured, the second column represents the amount of fish collected in a section that can be described as a pool, and the third column represents the amount of fish collected in a section that can be described as a riffle. Pools are defined as deep, slow-moving parts of a stream, and riffles are defined as the shallow, fast-moving parts of a stream.
This study is looking at whether the two habitats (pools and riffles) support equal numbers of species and measure species diversity.
library(ggplot2)
ggplot(data=fish, mapping = aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")
The plot suggests that more fish can be found in pools than in riffles as there are more dots above the line than below or on the line. That said, there are samples that show more fish in riffles than pools and where the number of fish in pools equals the number of fish in riffles. We need a p-value and confidence interval to ensure this finding is not by chance, and a lower p-value will indicate that there is less of a chance that the population means of fish in each location are equal.
The population for this project is all fish in the river. The null hypothesis for this test would be that the population mean of fish found in pools would be equal to the population mean of fish found in riffles and that the difference between both means equals 0.
Despite the fact that the data plot above suggests that there are more fish in pools than in riffles, it is best to go with a more conservative choice for the alternative hypothesis. Therefore, the alternative hypothesis will be that the populations are unequal.
The correct choice for this case would be a t-test because we are exploring hypotheses in regard to population means. Since the variable–the population mean of fish found in pools or riffles–is quantitative, that would be the reasoning behind choosing a t-test as opposed to a proportions test.
A two-sample test will be needed because there are two samples–one of fish found in pools and one of fish found in riffles.
Check Normality
gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes(sample=fish$pool))
gg + geom_qq(mapping=aes(sample=fish$riffle))
Based on the qq-plots of samples of fish in the pools and riffles, the qq-plot of fish found in pools is fairly normal, and the qq-plot of fish found in riffles–though less normal–is also close enough to qualify as a normal distribution.
The level of significance for this case will be the fairly traditional value of 0.05.
This is a paired problem. Each x is paired with a single y. This was not the case with the sleep data set where there was no pairing between the Unrest subjects and the Deprived subjects. Because of this we do the t.test a little differently. Specifically we do a paired t.test.
t.test(fish$pool, fish$riffle, paired=TRUE)
##
## Paired t-test
##
## data: fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.170332 3.229668
## sample estimates:
## mean of the differences
## 2.2
The p-value is much less than the level of significance, so this will lead us to reject the null hypothesis.
Because 0 is not a value represented within the confidence interval, this result is consistent with the result of STEP 14. This also means that it is not plausible that the means of the two groups would be the same.
The sample estimates show that the mean of the differences is equal to 2.2. In other words, there is an average of 2.2 more fish that can be found in pools than in riffles.
STEPs 14 and 15 indicated that the means of fish found in pools and riffles are unequal, and STEP 16 can then lead us to reasonably conclude that fish prefer pools to riffles.