The data for this project was given to us. In the next steps, the data will be described and the hypotheses will be formed. Graphs will support evidence of p-values, while helping one to decide whether to reject or accpet the null hypothesis.
fish<-read.csv("fish.csv")
fish
## location pool riffle
## 1 1 6 3
## 2 2 6 3
## 3 3 3 3
## 4 4 8 4
## 5 5 5 2
## 6 6 2 2
## 7 7 6 2
## 8 8 7 2
## 9 9 1 2
## 10 10 3 2
## 11 11 4 3
## 12 12 5 1
## 13 13 4 3
## 14 14 6 2
## 15 15 4 3
There are 15 rows and 3 columns in this data set. The rows correspond to the variables and the columns correspond to subjects. The Location column describes the different locations in which the data was recorded. The Pool column describes the number of fish seen in pools, while the Riffle column is the number of fish that were seen in riffles.
The purpose of the study is to decide whether evidence suggests that there are more fish found in certain locations of pools or riffles. Specifically, the study focused on whether there were more fish in
It is always a good idea to visualize your data, as soon as you load it. Technically this should be done after you design the study, but as mentioned above, we are going to switch the order around.
Load graphics library, then plot:
library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point()
The plot suggests that the pool group may have. We must find a p-value: the probability that there is actualy a difference between the Pool and Riffle groups. If the p-value is low then one must reject the null hypothesis. Specifically, if evidence is found that there is a difference between the two means. The plot helpds describe the accuracy of the p-value.
Null Hypothesis: The means of the population of pool is the same as the mean in the population of Riffle.
Alternative: The mean of the population of pool is greater thean the mean in the population of Riffle.
A paired t-test because it is used to test the hypothesis abot population means of quantitative variables. Also, both Pool and Riffle are related, meaning that one has an effect on the other. Both are considered locations, even though they differ.
One sample because there is a paired t-test being used to test this sample. Also, there is a relationship between Pool and Riffle.
T-tests are meant to test a normal distribution of data. The greater the sample size, the more susceptible the t-test is to non-normality.
The best way of judging this is with a qq-plot.
ggplot(data=fish) + geom_qq(mapping=aes(sample=riffle, color=pool))
Here is a ggplot graph also.
gg<-ggplot(data=fish)
gg+geom_qq(mapping=aes(sample=fish$pool))
Here is a scatter plot comparing Riffle and Pool.
library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate ("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")
The data appears to be normal.
The level of significance for the test is 0.05.
t.test(fish$pool, fish$riffle, paired=TRUE)
##
## Paired t-test
##
## data: fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.170332 3.229668
## sample estimates:
## mean of the differences
## 2.2
The p-value is lower than the level of significane. The means are not equal, therefore; the null hypothesis is REJECTED. The means of Pool and Riffle are not equal. The alternative hypothesis should be accepted.
The confidence interval describes the range of values that include the difference in means. In this case, the difference of the means cannot be zero, because the confidence interval is between [1.170332, 3.229668]. This reaffirms that the null hypothesis should be rejected. This step relates to the prevous step.
There is evidence to prove that the means are not equal in the entire population.
summary(fish)
## location pool riffle
## Min. : 1.0 Min. :1.000 Min. :1.000
## 1st Qu.: 4.5 1st Qu.:3.500 1st Qu.:2.000
## Median : 8.0 Median :5.000 Median :2.000
## Mean : 8.0 Mean :4.667 Mean :2.467
## 3rd Qu.:11.5 3rd Qu.:6.000 3rd Qu.:3.000
## Max. :15.0 Max. :8.000 Max. :4.000
There is apparent evidence that fish in pools are more abundant than fish in riffles. When looking at the summary statistics, fish in most locations are found in pools, rather then Riffles. The mean of Riffle is notably lower than that of Pool.