These are the necessary steps to find and give evidence if more fish are found in riffles within average given locations in a body of water, or if they are found in pools.
Step 1 consists of thinking through the experiment and how to gather data and/or analyzing it. For this specifically, we were already given data on a river and the two locations within it that fish usually swim in. The riffle and the pool are the two locations. In the river, there are several locations with both rivers and pools, including how many fish were found in each part. The point of this study is to understand if fish prefer spending their time more in the pool or the riffle.
library(ggplot2)
fish <- read.csv("fish.csv")
## ï..location pool riffle
## 1 1 6 3
## 2 2 6 3
## 3 3 3 3
## 4 4 8 4
## 5 5 5 2
## 6 6 2 2
## 7 7 6 2
## 8 8 7 2
## 9 9 1 2
## 10 10 3 2
## 11 11 4 3
## 12 12 5 1
## 13 13 4 3
## 14 14 6 2
## 15 15 4 3
Our data set contains 15 rows and 3 columns. The rows represent the subjects, or locations, labeled 1-15. The columns represent the variables, or type of location- riffle or pool. For each pool and riffle, the data shows how many fish are in each.
The purpose of this study is to give evidence if more fish are found in riffles within average given locations in a body of water, or if they are found in pools.
Below is a visual representation of the location of fish, in riffles, and pools.
ggplot(data=fish.csv, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")
The plot above shows that more fish prefer to spend their time in the pool than the riffle. This is because on the scatterplot, the majority of the points are on the side of preference to the pool. The superimposed line is meant to be an indicator of what part of the river fish prefer to live on. Each point on the graph represents an actual part of the river and connects the y axis, representing the pool. The x axis represents the riffle. Therefore, it makes sense that the superimposed line differentiates the preferences of the fish.
First the parameters of the population must be defined. The population is the fish in the river. The two samples drawn from the population are of fishes in the pools, and fishes in the riffles. The sample is drawn from this population. Null Hypothesis: the mean of the population of pool is the same as the mean in the population of riffle.
The alternative hypothesis is not as definitive as the null hypothesis. There are several ways in which it can be correct, unlike the null hypothesis. Alternative Hypothesis: the mean of the population of pool is greater than the mean in the population of riffle.
This data is quantitiative, but it does not concern means of a sample. It involves the amount of fish in one part of the river compared to another. Therefore, a proportions test would be most appropriate here.
For the data, I will use a two samples because there are two samples from the river. The samples are from the pool and the riffle.
Check Normality:
gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes(sample=fish$pool))
gg + geom_qq(mapping=aes(sample=fish$riffle))
It is best to use the traditional level of significance, which is 0.05. This means that the likelihood of a type 1 error is very small at 5%.
Each x is paired with a single y (x being fish in that riffle, y being fish in that pool).
t.test(fish$pool, fish$riffle, paired=TRUE)
##
## Paired t-test
##
## data: fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.170332 3.229668
## sample estimates:
## mean of the differences
## 2.2
p-value = 0.0004264
The p-value is significantly small, which makes the chance of the error highly unlikely. Since the p-value is less than the level of significance, 0.05, the null hypothesis is rejected and the alternative hypothesis correct!
95 percent confidence interval: 1.170332 3.229668
When added, if these numbers do not equal 0, there is proof that the means are different. This further proves the alternative hypothesis.
Interpret the sample estimates by examining the mean (the average) of the differences. The mean of the differences is 2.2. This means that there is an average of 2.2 more fish in the pools of the river than there are in the riffles. This sample estimate is the center of the confidence interval.
There are more fish in the pool of the river than the river. This is because of the evidence shownn in the very small p-value, the 95% confidence interval, and the sample estimate.