fish <- read.csv("fish.csv")
fish
##    location pool riffle
## 1         1    6      3
## 2         2    6      3
## 3         3    3      3
## 4         4    8      4
## 5         5    5      2
## 6         6    2      2
## 7         7    6      2
## 8         8    7      2
## 9         9    1      2
## 10       10    3      2
## 11       11    4      3
## 12       12    5      1
## 13       13    4      3
## 14       14    6      2
## 15       15    4      3

Describe Data:

In this chart there are 15 rows and three columns. The rows represent the number of species captured at each location. The columns represent the variables. The first column is the location number and the second and third column represent the fish captured in the riffle or adjacent pool.

Identify the purpose of the study

The purpose of the study is to determine whether riffles and pools support equal numbers of species.

Visualize Data

library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")

Interpret the plot

There are more fish in the pool area than there are in the riffle area and in two different locations, there is the same amount of fish in the pool area as there is in the riffle area (represented by the line).

Formulate the null hypothesis

Mean of pool is the same as mean of riffle.

Identify the alternative hypothesis

The means are unequal. The mean for the pool is larger than the mean for the riffle.

Decide on type of test

The type of test we would use would be T-test

Choose one sample or two

Two sample because we are testing two types of locations- the pool and riffle locations.

Check assumptions of the test

gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes(sample=fish$pool))

gg + geom_qq(mapping=aes(sample=fish$riffle))

Decide on a level of significance of the test

0.05

Perform the test

t.test(fish$pool, fish$riffle, paired=TRUE)
## 
##  Paired t-test
## 
## data:  fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.170332 3.229668
## sample estimates:
## mean of the differences 
##                     2.2

Interpret the p-value

The p-value is less than the level of significance, so we will reject the null-hypothesis.

Interpret the confidence interval

Confidence interval is a range of values that are plausable for the difference of means. However, zero is not plausable, so the means will not be the same. Therefore, the null-hypothesis is rejected. There is a 95 percent confidence interval between 1.170332 and 3.229668.

Interpret the sample estimates

There are 2.2 more fish in the pool, on average, than the riffle

State your conclusion

In conclusion, more fish prefer to be in locations that consist of pools rather than locations that are riffles.