This step is already completed for me.
fish <- read.csv("fish.csv")
fish
## location pool riffle
## 1 1 6 3
## 2 2 6 3
## 3 3 3 3
## 4 4 8 4
## 5 5 5 2
## 6 6 2 2
## 7 7 6 2
## 8 8 7 2
## 9 9 1 2
## 10 10 3 2
## 11 11 4 3
## 12 12 5 1
## 13 13 4 3
## 14 14 6 2
## 15 15 4 3
I have a data set with 15 rows and 3 columns. The rows represent the 15 different locations along a river where researchers captured fish for the study. The columns represent the variables in the study, inlcuding location along the river, number of fish captured in a riffle, and the number of fish captured in an adjacent pool.
The purpose of the study is to measure species diversity to investigate whether or not two habitats, pools and riffles, support equal numbers of fish species. In streams, pools are the deep and slow-moving areas, while riffles represent the shallow and fast-moving areas. Are fish found in equal numbers in both areas? Is one region a better habitat for fish?
Here is a scatterplot with pools and riffles. There is a line where pool=riffle or y=x (slope 1, intercept 0). On one side of the line there are more fish in the pool and on the other there are more fish in the riffle.
library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")
The plot suggests that there is more species diversity in pools than riffles. Fish were found in a greater quantitiy in pools than in riffles. There were between 1 and 8 fish in every pool, while there were only between 1 and 4 fish in each riffle sampled. These data suggest that pools are better habitats for fish, but these are extrapolations that need to be substantiated with a p-value.
Null hypothesis: The two populations are fish found in pools and fish found in riffles. The samples are drawn from these populations. The parameter is mean. The null hypothesis is that the population mean for fish found in pools is equal to the population mean of fish found in riffles. The difference in means is zero.
Alternative hypothesis: There is a difference in the population means for fish populations in pools v. riffles.
Type of test: A t-test will be conducted, because we are testing two population means of a quantitative variable.
A two sample test will be conducted, since we have samples for fish population in pools and fish populations in riffles.
The major assumption for the t-test is that the data will represent a normal distribution, which has a bell shape. The following plot checks the normality and assumptions of the t test
gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes(sample=fish$pool))
gg + geom_qq(mapping=aes(sample=fish$riffle))
If the data are normal, they will lie on line. This graphs shows that the data more or less lie on the line, and thus the data are normal.
The level of significance of the test is 0.05.
I will perform a paired t test, where each x is paired with a single y.
t.test(fish$pool, fish$riffle, paired=TRUE)
##
## Paired t-test
##
## data: fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.170332 3.229668
## sample estimates:
## mean of the differences
## 2.2
Since the p-value is 0.0004264, making it less than the level of significance (0.05), I reject the null hypothesis that the means are equal. I accept the alternative hypothesis that the true difference in means is not equal to 0.
The 95 percent confidence interval is 1.170332 3.229668. Since zero is not in this interval, zero is not a plausible value for the difference in means, so it is not plausible that the means are the same. The result of step 15 is consistent with the result of step 14.
I have concluded that the means are not equal, but I really want to know: Are more fish found in pools or riffles? Knowing that the means are unequal I can answer this question by looking the sample estimates that state that on average, there are 2.2 more fish in pools than there are in riffles. These findings support the conclusions of both step 14 and step 15.
I have evidence that pools and riffles do not support equal numbers of fish species. More fish are found in pools than in riffles, and therefore, pools can be hypothesized to be better habitats for fish than riffles are.