fish <- read.csv("fish.csv")
fish
## location pool riffle
## 1 1 6 3
## 2 2 6 3
## 3 3 3 3
## 4 4 8 4
## 5 5 5 2
## 6 6 2 2
## 7 7 6 2
## 8 8 7 2
## 9 9 1 2
## 10 10 3 2
## 11 11 4 3
## 12 12 5 1
## 13 13 4 3
## 14 14 6 2
## 15 15 4 3
There are 15 studies. The columns represent the variables of locations where fish are found. The rows represent how many are in those locations.
The purpose of this data is to compare the locations where the most of the fishes are found. The data shows that for the 15 different samples taken, more fishes were found in the pool.
library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4,label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")
The graph actually shows us a linear line that runs through two points. If there was a mean difference of zero between the pool fish and riffle fish, they would be lined up on the line. But there is not a mean difference of zero, because it is clear that more fish prefer the pool rather than the ripple. There are more plotted points for the pool than riffle as well.
Null hypothesis: The population mean the same for the fish in the riffle as the same in the pool. There is no differene in the amount of species in the pool and in the riffel. Mean of the fish in the pool is equal to the mean of fish in the riffel. WE are trying to acess that our data can come about from random sampling.
Alternate hypothesis: The alternate hypothesis would be that there is a different amount of species in the pool and the riffel.
It is paired t-test. One sample and measuring two variable of that sample: riffle and pool.
Location is one sample just with different variables
library(ggplot2)
gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes(sample=fish$pool))
gg + geom_qq(mapping=aes(sample=fish$riffle))
The level of significance is 5%
library(ggplot2)
t.test(fish$pool, fish$riffle, paired=TRUE)
##
## Paired t-test
##
## data: fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.170332 3.229668
## sample estimates:
## mean of the differences
## 2.2
Because the p value is lower than the level of significance, this rejects the null hypothesis
The confidence interval is the range of plausible values for the difference in means. Zero is not in this interval. Therefore 0 is not a plausible value for the difference in means, so it is not plausible that the means are the same. The result of STEP 15 is consistent with the result of STEP 14.
The mean has a 2.2 difference stated from the test on pool vs. riffil
It can be concluded that more fish prefer the pool location rather than the riffel location.
The difference of mean is around 0.9. Non primed had a mean of 2 while primed has 2.97.
I concluded that this data set needs to take the alternative hypothesis, primed shoppers have a higher preference.