These are the necessary steps to find and give evidence if more fish are found in riffles within average given locations in a body of water, or if they are found in pools.

STEP 1: Design the Experiment

Step 1 consists of thinking through the experiment and how to gather data and/or analyzing it. For this specifically, we were already given data on a river and the two locations within it that fish usually swim in. The riffle and the pool are the two locations. In the river, there are several locations with both rivers and pools, including how many fish were found in each part. The point of this study is to understand if fish prefer spending their time more in the pool or the riffle.

STEP 2: Load Data

library(ggplot2)
fish <- read.csv("fish.csv")
##    ï..location pool riffle
## 1            1    6      3
## 2            2    6      3
## 3            3    3      3
## 4            4    8      4
## 5            5    5      2
## 6            6    2      2
## 7            7    6      2
## 8            8    7      2
## 9            9    1      2
## 10          10    3      2
## 11          11    4      3
## 12          12    5      1
## 13          13    4      3
## 14          14    6      2
## 15          15    4      3

Step 3: Describe the data

Our data set contains 15 rows and 3 columns. The rows represent the subjects, or locations, labeled 1-15. The columns represent the variables, or type of location- riffle or pool. For each pool and riffle, the data shows how many fish are in each.

Step 4: Identify the Purpose of the Study

The purpose of this study is to give evidence if more fish are found in riffles within average given locations in a body of water, or if they are found in pools.

Step 5: Visualize data

Below is a visual representation of the location of fish, in riffles, and pools.

ggplot(data=fish.csv, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")

STEP 6: Interpret the Plot

The plot above shows that more fish prefer to spend their time in the pool than the riffle. This is because on the scatterplot, the majority of the points are on the side of preference to the pool. The superimposed line is meant to be an indicator of what part of the river fish prefer to live on. Each point on the graph represents an actual part of the river and connects the y axis, representing the pool. The x axis represents the riffle. Therefore, it makes sense that the superimposed line differentiates the preferences of the fish.

STEP 7: Formulate the Null Hypothesis

First the parameters of the population must be defined. The population is the fish in the river. The two samples drawn from the population are of fishes in the pools, and fishes in the riffles. The sample is drawn from this population. Null Hypothesis: the mean of the population of pool is the same as the mean in the population of riffle.

Step 8: Identify the Alternative Hypothesis

The alternative hypothesis is not as definitive as the null hypothesis. There are several ways in which it can be correct, unlike the null hypothesis. Alternative Hypothesis: the mean of the population of pool is greater than the mean in the population of riffle.

Step 9: Decide on Type of Test

This data is quantitiative, but it does not concern means of a sample. It involves the amount of fish in one part of the river compared to another. Therefore, a proportions test would be most appropriate here.

Step 10: Choose One Sample or Two

For the data, I will use a two samples because there are two samples from the river. The samples are from the pool and the riffle.

STEP 11: Check Assumptions

Check Normality:

gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes(sample=fish$pool))
gg + geom_qq(mapping=aes(sample=fish$riffle))

STEP 12: Decide on a Level of Significance for the Test

It is best to use the traditional level of significance, which is 0.05. This means that the likelihood of a type 1 error is very small at 5%.

STEP 13: Perform the Test

Each x is paired with a single y (x being fish in that riffle, y being fish in that pool).

t.test(fish$pool, fish$riffle, paired=TRUE)
## 
##  Paired t-test
## 
## data:  fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.170332 3.229668
## sample estimates:
## mean of the differences 
##                     2.2

STEP 14: Interpret the P-Value

p-value = 0.0004264

The p-value is significantly small, which makes the chance of the error highly unlikely. Since the p-value is less than the level of significance, 0.05, the null hypothesis is rejected and the alternative hypothesis correct!

STEP 15: Interpret the Confidence Interval

95 percent confidence interval: 1.170332 3.229668

When added, if these numbers do not equal 0, there is proof that the means are different. This further proves the alternative hypothesis.

STEP 16: Interpret the Sample Estimates

Interpret the sample estimates by examining the mean (the average) of the differences. The mean of the differences is 2.2. This means that there is an average of 2.2 more fish in the pools of the river than there are in the riffles. This sample estimate is the center of the confidence interval.

STEP 17: State your Conclusion

There are more fish in the pool of the river than the river. This is because of the evidence shownn in the very small p-value, the 95% confidence interval, and the sample estimate.