Step 2: Load the data
fish <- read.csv("fish.csv")
fish
## location pool riffle
## 1 1 6 3
## 2 2 6 3
## 3 3 3 3
## 4 4 8 4
## 5 5 5 2
## 6 6 2 2
## 7 7 6 2
## 8 8 7 2
## 9 9 1 2
## 10 10 3 2
## 11 11 4 3
## 12 12 5 1
## 13 13 4 3
## 14 14 6 2
## 15 15 4 3
Step 3: Describe Data
Most fish are found in pools, the deep, slow-moving parts of streams. Other fish prefer riffles, which are shallow, fast moving regions. There are 15 rows which indicate the amount of fish in each column, pool or riffle.
Step 4: The Purpose of the Study
The purpose of the study is to assess whether these two habitats support equal numbers of species, which is a measure of species diversity.
Step 5: Visualize Data
library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")
Step 6: Interpret the Plot
On one side of the line, there are more fish in pool and on the other side of the line there are more fish in riffle.
Step 7: Formulate the null hypothesis
The number of fish in pool is equal to the number of fish in riffle.
Step 8: Identify the alternative hypothesis
The number of fish in pool will be greater than the number of fish in riffle.
Step 9: Decide on type of test
The type of test that will be used is a paired t. test because the hypothesis involves population means of a quantitative variable (amount of fish) and because each x is paired with a single y.
Step 10: Choose one sample or two sample
We need two samples.
Step 11: Check assumptions of the test
gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes(sample=fish$pool))
gg + geom_qq(mapping=aes(sample=fish$riffle))
The pool graph shows a normal distribution because it has a large sample size. In the riffle graph it doesn’t have a lot of variation but it still is considered a normal distribution because the points have a diagonal characteristic.
Step 12: Decide on level of significance of the test
The level of significance that will be used is 0.05
Step 13: Perform the test
A paired t.test is performed
t.test(fish$pool, fish$riffle, data=fish)
##
## Welch Two Sample t-test
##
## data: fish$pool and fish$riffle
## t = 4.1482, df = 18.125, p-value = 0.0005961
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.086327 3.313673
## sample estimates:
## mean of x mean of y
## 4.666667 2.466667
Step 14: Interpret the p-value
Since the p-value is less than the level of significance, the null hypothesis that the number of fish in pool is equal to the number of fish in riffle is rejected.
Step 15: Interpret the confidence interval
It is not plausible that the means are the same because 0 is not in this interval. This is also consistent with step 14 which rejects the null hypothesis that the number of fish in pool and riffle are equal.
Step 16: Interpret the sample estimates
Pool has 2.2 more fish on average than riffle based on the test
Step 17: Conclusion
We have evidence that pool is more diverse than riffle because it contains a greater amount of fish in each location than riffle. Fish prefere to live in pool.