knitr::opts_chunk$set(echo = TRUE)

First we load the data

Usually, we would have to design our experiment and plan how we would collect the data. In this case, our data has already been collected.

fish <- read.csv("fish.csv")
fish
##    location pool riffle
## 1         1    6      3
## 2         2    6      3
## 3         3    3      3
## 4         4    8      4
## 5         5    5      2
## 6         6    2      2
## 7         7    6      2
## 8         8    7      2
## 9         9    1      2
## 10       10    3      2
## 11       11    4      3
## 12       12    5      1
## 13       13    4      3
## 14       14    6      2
## 15       15    4      3

Step 2: Describe the data

We have a data set with three columns and 16 rows, meaning that there are 15 fish. The first row simply labels the columns. The colums represent the location, pool, and riffle of the fish. They are categorical. The rows identify which quantitative sector a fish is in, corresponding to the column category. The rows’ numbers correspond to specifically marked areas of the pools and riffles. At each location, scientists recorded the number of fish in a riffle/pool, hence the “location” number.

Step 3: Identify the purpose of the study

The purpose of this study is to investigate whether deep, slow moving parts of a pool or shallow riffles are better in supporting fish populations.

Step 4: Visualize Data

library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) +geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in pool") + annotate("text", x=3.5, y=2, label="More in Riffle")

library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle)) +geom_histogram(fill="darkgreen")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

library(ggplot2)
ggplot(data=fish, mapping=aes(x=pool)) +geom_histogram(fill="navy")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Step 5: Interpret the plot

The plots suggest that fish are more plentiful in the deeper, slow moving pools. We see this as more fish were counted in the pool than the riffle. The riffle only had a maximum of four fish at a time while the pool had a maximum of eight observed fish at one time. The histograms most clearly presented this information.

Step 6: Formulate the null hypothesis

The null hypothesis for this data is: there is no difference in fish presence between pools and riffles.

Step 7: Identify the alternative hypothesis

The alternative hypothesis for this data is: There are more fish present in the pools.

Step 8: Decide on Type of Test

We will be conducting a t-test.

Step 9: Choose one sample or two

The correct choice is two sample: we have a sample for riffle and a sample for pool.

Step 10: Check assumption of the test

We are assuming that the data lie close to a normal distribution. To determine this, we will be making a qq-plot.

ggplot(data=fish) + geom_qq(mapping=aes(sample=pool, color=riffle))

The data are quite linear, indicating that the data are normal.

Step 11: Decide on a level of significance of the test

Our significance level witll be 0.05

Step 12: Perform the test

t.test(fish$pool, fish$riffle, paired=TRUE)
## 
##  Paired t-test
## 
## data:  fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.170332 3.229668
## sample estimates:
## mean of the differences 
##                     2.2

Step 13: Interpret the p-value Since the p value is less than the level of significane, we reject the null hypothesis that stated there is no difference in fish presence between pools and riffles.

Step 14: Interpret the confidence interval Zero is not in the interval according to our t-test. Therefore, zero is not a possible value for the difference in means, meaning the means cannot be the same.

Step 15: Interpret the sample estimates Since we know the means are unequal, we can answer the question: are there more fish in ponds or riffles? The results of our t-test show us the answer.

Step 16: State your conclusion We have evidence that fish congregate more in pools. This conclusion was drawn from the three plots as well as an analysis of the t-test.