Step 1: Design the experiment

Some stream fishes are most often found in pools, the deep, slow-moving parts of a stream. Others prefer riffles, the shallow, fast-moving regions. To investigate whether these two habitats support equal numbers of species (a measure of speciesdiversity) researchers captured fishes at 15 locations along a river. At each location, they recorded the number of species captured in a riffle and the number captured in an adjacent pool.

Taken from Sample.Exam1.pdf written by Professor Carver

Step 2: Collect Data

fish <- read.csv("fish.csv")
fish
##    ï..location pool riffle
## 1            1    6      3
## 2            2    6      3
## 3            3    3      3
## 4            4    8      4
## 5            5    5      2
## 6            6    2      2
## 7            7    6      2
## 8            8    7      2
## 9            9    1      2
## 10          10    3      2
## 11          11    4      3
## 12          12    5      1
## 13          13    4      3
## 14          14    6      2
## 15          15    4      3

Step 3: Describe the Data

In the fish data set, we have 15 rows and 3 columns. The rows of the dataset represent the subject of the study, which are the fishes and their respective locations. The colums of the study represent different variables. The first variable is labeled “location”. Location represents where in the river the fish was taken. The second variable is labeled “pool”. Pool represents the number of fish that were captured at a location inside a pool (slow moving parts of the river). The third variable is labeled “riffle”. Riffle represents the number of fish that were captured at a location inside a riffle (fast moving parts of the river).There were 70 fishes caught in pools and 37 fishes caught in riffles.

Step 4: Purpose of the Study

The purpose of this study was examine whether riffles and pools have the same amount of fishes living inside of them.

Step 5: Visualize data

library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")

Step 6: Interpret the plot

The dots represent where the fishes are. With that in mind, where the dots are in relation to the line represents how many fishes reside in pools, and how many fishes reside in riffles. According to the graph there are more dots on the pool side of the graph rather than the riffle side. This means that more fish reside in pools rather than riffles.

*Step 7: Formulate the null hypothesis

The null hypothesis is that there is the number of fish living in riffles is the identical to the number of fish living in pools.

Step 8: Formulate the alternative hypothesis

With our null hypothesis in mind, we can formulate our alternative hypothesis. Our null hypothesis would be that the number of fish living in riffles is not identical to the number of fish living in pools.

Step 9: The type of test

We are allowed to choose from two types of tests, proportion tests and T-tests. Since proportion tests tests hypotheses about proportions of catergorical variables, the better option is to choose T-test. T-tests tests hypotheses about means of quantitative varaibles.

Step 10: Choose the number of samples

We should choose a two-sample T-test because we have two samples. We have a sample for the amount of fishes in pools, and we have a sample for the amount of fishes in riffles.

Step 11: Check assumptions of the test

Based off these graphs, there is a strong linear correlation

gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes(sample=fish$pool))

gg + geom_qq(mapping=aes(sample=fish$riffle))

Step 12: Decide on a level of significance of the test

For this test, we will use the traditional level of significance of 0.05

Step 13: Perform the test

t.test(fish$pool, fish$riffle, paired=TRUE)
## 
##  Paired t-test
## 
## data:  fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.170332 3.229668
## sample estimates:
## mean of the differences 
##                     2.2

Step 14: Interpret the P-Value

As we can see from the calculations above, the P-value is 0.0004264. Since the P-value is less than the level of significance of 0.05, we reject our null hypothesis where we state the amount of fishes living in pools is equal to the amount of fishes living in riffles.

Step 15: Interpret the confidence interval

According to thethe calculations, the 95% confidence interval is 1.170332 to 3.229668. Since the 95% confidence interval does not mention the number 0, 0 is not a possible difference in the means. This suggests that there IS a difference in the means, which in turn suggests that the null hypothesis is false.

Step 16: Interpret the sample estimates

According to the calculations, the mean of the differences is 2.2. This represents that the number of fishes living in pools versus the number of fishes living in riffles differs about 2.2 fish per sample.

Step 17: State your conclusion

In conclusion, the number of fishes living in riffles is not identical to the number of fishes living in pools.