STEP 1: DESIGN THE EXPERIMENT

This experiment will utilize statistics to measure fish located at two different locations. Our goal is to determine, on average, if more fish are found in the deep pool or the shallow riffle over the 15 spots.

STEP 2: COLLECT (OR LOAD) DATA

fish<- read.csv("fish.csv")
fish
##    location pool riffle
## 1         1    6      3
## 2         2    6      3
## 3         3    3      3
## 4         4    8      4
## 5         5    5      2
## 6         6    2      2
## 7         7    6      2
## 8         8    7      2
## 9         9    1      2
## 10       10    3      2
## 11       11    4      3
## 12       12    5      1
## 13       13    4      3
## 14       14    6      2
## 15       15    4      3
library(ggplot2)

STEP 3: DESCRIBE DATA

We have a data set with 15 rows and 2 columns. The rows represent the subjects of the study ( types of fish) and the columns represent variables (location). The first variable is the pool location. The second variable is the riffle location.

STEP 4: IDENTIFY THE PUROSE OF THE STUDY

Some stream fishes are most often found in pools, the deep, slow-moving parts of a stream. Others prefer riffles, the shallow, fast-moving regions. To investigate whether these two habitats support equal numbers of species (a measure of species 5 diversity) researchers captured fish at 15 locations along a river. At each location, they recorded the number of species captured in a riffle and the number captured in an adjacent pool. We will determine, on average, if more fish can be found in the deep pool or the shallow ripples over the 15 spots.

STEP 5: VISUALIZE DATA

ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")

STEP 6: INTERPRET THE PLOT

In most locations, there are more fish found in the pool than in the riffle. This suggests that the fish perfer the pool more than the riffle. We will back this hypothesis up with a p value that will give us a quantified value to the statement.

STEP 7: FORMULATE THE NULL HYPOTHESIS

The null hypothesis states that on average, the number of fish found in the pool will be equal to the number of fish found in the riffle. In other words, the mean of the riffle will equal the mean to the pool.

STEP 8: IDENTIFY THE ALTERNATIVE HYPOTHESIS

The alternative hypothesis states that on average, the number of fish found in the pool will be greater than the average number of fish found in the riffle. In other words, therew ill be a difference in means between the fish found in pools and riffles.

STEP 9: DECIDE ON TYPE OF TEST

I am going to run a t-test on this data, which means I will be testing my hypotheses (null and alternative) about population means of a quantitative variable.

STEP 10: CHOOSE ONE SAMPLE OF TWO

This is a two sample test because we have a sample for the pool fish and the riffle fish. In other words, the data is being drawn from two different samples rather than one.

STEP 11: CHECK ASSUMPTIONS OF THE TEST

According to the qq plot below, the data is Normal because the data lies close to the line.

ggplot(data=df)+ geom_qq(mapping=aes(sample=y))

gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes(sample=fish$pool))

gg + geom_qq(mapping=aes(sample=fish$riffle))

STEP 12: DECIDE ON A LEVEL OF SIGNIFICANCE OF THE TEST

I will use the traditional level of significance, 0.05.

STEP 13: PERFORM THE TEST

This is a paired problem. Each x is paired with a single y.

t.test(fish$pool, fish$riffle, paired=TRUE)
## 
##  Paired t-test
## 
## data:  fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.170332 3.229668
## sample estimates:
## mean of the differences 
##                     2.2

STEP 14: INTERPRET THE P-VALUE

The p value is 0.0004264 which means we reject the null hypothesis.

STEP 15: INTERPRET THE CONFIDENCE INTERVAL

The confidence interval does not contain 0 so we are confident that there are more fish in the pool and we can reject the null hypothesis.

STEP 16: INTERPRET THE SAMPLE ESTIMATES

On average, there are 2.2 percent more fish in the pool than the riffle (this is based on a sample, so it is an estimate).

STEP 17: STATE YOUR CONCLUSION

I conclude that fish tend to prefer pools rather than riffles. Furthermore, more fish can be found in the pools than riffles by 2.2%.