Fish

STEP 1: Design the experiment

This step is already completed for me.

STEP 2: Collect (or load) data

fish <- read.csv("fish.csv")
fish

##    location pool riffle
## 1         1    6      3
## 2         2    6      3
## 3         3    3      3
## 4         4    8      4
## 5         5    5      2
## 6         6    2      2
## 7         7    6      2
## 8         8    7      2
## 9         9    1      2
## 10       10    3      2
## 11       11    4      3
## 12       12    5      1
## 13       13    4      3
## 14       14    6      2
## 15       15    4      3

STEP 3: Describe data

I have a data set with 15 rows and 3 columns. The rows represent the 15 different locations along a river where researchers captured fish for the study. The columns represent the variables in the study, inlcuding location along the river, number of fish captured in a riffle, and the number of fish captured in an adjacent pool.

STEP 4: Identify the purpose of the study

The purpose of the study is to measure species diversity to investigate whether or not two habitats, pools and riffles, support equal numbers of fish species. In streams, pools are the deep and slow-moving areas, while riffles represent the shallow and fast-moving areas. Are fish found in equal numbers in both areas? Is one region a better habitat for fish?

STEP 5: Visualize data

Here is a scatterplot with pools and riffles. There is a line where pool=riffle or y=x (slope 1, intercept 0). On one side of the line there are more fish in the pool and on the other there are more fish in the riffle.

library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")

STEP 6: Interpret the plot

The plot suggests that there is more species diversity in pools than riffles. Fish were found in a greater quantitiy in pools than in riffles. There were between 1 and 8 fish in every pool, while there were only between 1 and 4 fish in each riffle sampled. These data suggest that pools are better habitats for fish, but these are extrapolations that need to be substantiated with a p-value.

STEP 7: Formulate the null hypothesis

Null hypothesis: The two populations are fish found in pools and fish found in riffles. The samples are drawn from these populations. The parameter is mean. The null hypothesis is that the population mean for fish found in pools is equal to the population mean of fish found in riffles. The difference in means is zero.

STEP 8: Identify the alternative hypothesis

Alternative hypothesis: There is a difference in the population means for fish populations in pools v. riffles.

STEP 9: Decide on type of test

Type of test: A t-test will be conducted, because we are testing two population means of a quantitative variable.

STEP 10: Choose one sample or two

A two sample test will be conducted, since we have samples for fish population in pools and fish populations in riffles.

STEP 11: Check assumptions of the test

The major assumption for the t-test is that the data will represent a normal distribution, which has a bell shape. The following plot checks the normality and assumptions of the t test

gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes(sample=fish$pool))

gg + geom_qq(mapping=aes(sample=fish$riffle))

If the data are normal, they will lie on line. This graphs shows that the data more or less lie on the line, and thus the data are normal.

STEP 12: Decide on a level of significance of the test

The level of significance of the test is 0.05.

STEP 13: Perform the test

I will perform a paired t test, where each x is paired with a single y.

t.test(fish$pool, fish$riffle, paired=TRUE)

## 
##  Paired t-test
## 
## data:  fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.170332 3.229668
## sample estimates:
## mean of the differences 
##                     2.2

STEP 14: Interpret the p-value

Since the p-value is 0.0004264, making it less than the level of significance (0.05), I reject the null hypothesis that the means are equal. I accept the alternative hypothesis that the true difference in means is not equal to 0.

STEP 15: Interpret the confidence interval

The 95 percent confidence interval is 1.170332 3.229668. Since zero is not in this interval, zero is not a plausible value for the difference in means, so it is not plausible that the means are the same. The result of step 15 is consistent with the result of step 14.

STEP 16: Interpret the sample estimates

I have concluded that the means are not equal, but I really want to know: Are more fish found in pools or riffles? Knowing that the means are unequal I can answer this question by looking the sample estimates that state that on average, there are 2.2 more fish in pools than there are in riffles. These findings support the conclusions of both step 14 and step 15.

STEP 17: State your conclusion

I have evidence that pools and riffles do not support equal numbers of fish species. More fish are found in pools than in riffles, and therefore, pools can be hypothesized to be better habitats for fish than riffles are.