fish <- read.csv("fish.cvs.csv")
fish
## location pool riffle
## 1 1 6 3
## 2 2 6 3
## 3 3 3 3
## 4 4 8 4
## 5 5 5 2
## 6 6 2 2
## 7 7 6 2
## 8 8 7 2
## 9 9 1 2
## 10 10 3 2
## 11 11 4 3
## 12 12 5 1
## 13 13 4 3
## 14 14 6 2
## 15 15 4 3
Description of the graph The graphs shows two locations and number of fish in each of the locations.
The null hypothesis The null hypothesis based on the two populations, would be that the population of fish in both locations is equal.
The alternative hypothesis The alternative hypothesis then, will be that the population of fish is largest in the pool than in riffle.
What is this project for? This project is going to perform the t-test, in order to determine which hypothesis is right.
Purpose of the project The purpose of this study is to determine where the population of fish is the largest, and to test the hypothesis that pool is the location where it the populations tend to be the largest.
data in graphs
library(ggplot2)
ggplot(data=fish, mapping=aes(x=location, y=pool)) + geom_point()
library(ggplot2)
ggplot(data=fish, mapping=aes(x=location, y=riffle)) + geom_point()
library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")
Understanding of the graphs By the graph above, it is now apparent that there are more fish in the pool than in riffle. The line clearly shows that the more points are in the pool, and much less than in riffles.
t.test(fish$pool, fish$riffle, paired=TRUE)
##
## Paired t-test
##
## data: fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.170332 3.229668
## sample estimates:
## mean of the differences
## 2.2
Check Assumptions
gg <- ggplot(data=fish)
gg+geom_qq(mapping=aes(sample=fish$pool))
gg+geom_qq(mapping=aes(sample=fish$riffle))
The graph is considered to be normal based on the graph shape, because it follows the same bell-curve shape.
Level of significance The level of significance will be equal to 0.05
Interpret P-Value The level of significance was set to be 0.05, however, the p-value is 0.0004264. Since the p-value is less than the level of significance, the null hypothesis is rejected.
Interpret Confidence level The confidence interval is (1.170332; 3.229668). The further away from 0 the endpoints, the more different the means are. Therefore, we can conclude that the means are different in this situation.
Interpret the Sample Estimates The sample estimates were the following: mean of the differences = 2.2. On average, there are 2.2 on average more fish in the pool than in riffle.
Conclusion The conclusion, therefore, is that the null hypothesis is rejected due to the p-value, and the mean of the difference is equal to 2.2, proving that there are more fish in the pool than in riffle.