Step 1: Design the Experiment

Normaly this would be the first step in a project, however, this experiment has already been designed for us. We will decide What tests to use and What hypotheses to test test. Often you make these decisions before you collect data. However, because we are handed a data set, we should put these decisions off until after we have seen the data.

Step 2: Load Data

fish <- read.csv("fish.csv")
fish
##    ï..location pool riffle
## 1            1    6      3
## 2            2    6      3
## 3            3    3      3
## 4            4    8      4
## 5            5    5      2
## 6            6    2      2
## 7            7    6      2
## 8            8    7      2
## 9            9    1      2
## 10          10    3      2
## 11          11    4      3
## 12          12    5      1
## 13          13    4      3
## 14          14    6      2
## 15          15    4      3

Step3 3: Description

Some stream fishes are most often found in pools, the deep, slow-moving parts of a stream. Others prefer riffles, the shallow, fast-moving regions. To investigate whether these two habitats support equal numbers of species (a measure of species diversity) researchers captured fish at 15 locations along a river. At each location, they recorded the number of species captured in a riffe and the number captured in an adjacent pool.

Step 4: Purpose

The purpose of this experiement is to decide whether or not there are an equal number of fish that live in the pool as live in the riffles.

Step 5: Data Visualized

library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")

For this we are looking for a way to visualize our data in a way that will best show us whether or not there are more fish in the pools or in the riffles. We could create histograms for each set “pool” and “riffle,” however this would not allow us to compare each of the data sets. The graph shown above allows us to fully visualize both the amount of fish in pools, and riffles, on the same graph.

Step 6: Interpret the Plot

When looking at this plot it looks like there are more fish living in pools than in riffles. However, due to our small sample size and variablility of the data it is important that we do a test of significance, to test whether or not the pattern is due to chance. We need to back this plot up with a p-value which is the probability. If the probability (p-value) is high, we would interpret that result as evidence against the statement that the pools are not equal. The plot suggests an answer to our question. The p-value backs that suggestion up with solid evidence.

Step 7: Null Hypothesis

First we identify the samples from the population of all the fish living in the stream:

  1. The amount of fish living in the pools
  2. The amound of fish living in the riffles.

The null hypothesis for this experiment would be:

“We say the null hypothesis is that the population mean for the pool group is equal to the population mean of the riffle group. Typically this is expressed as the difference in means is zero.”

Step 8: Alternative Hypothesis

This is the statement that the population means are different:

“The population mean for the pool group is not equal to the population mean of the riggle group. THis is typically expressed as a number, the difference in means is not zero.”

Step 9: Choose a Test

We can choose between two different tests, a T test and a proportions test. T test is for testing hypotheses about population means of a quantitative variable, and a proportions test is when the variables are categorial: “yes” or “no”. The correst choice for this experiment is a T-test, we will see why below:

Step 10: Choose between one sample and two sample test

For this experiment we will use a two-sample test (paired) because we have two distinct samples:

  1. Fish living is pools
  2. Fish living in riffles

Step 11: Check assumptions of the test

gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes (sample=fish$pool))

gg + geom_qq(mapping=aes (sample=fish$riffle)) 

Above we decided a T test would be the best idea for this example. For the t-test, the main assumption is that the data lie close enough to a Normal (bell shaped) distribution. If the data are Normal, they will lie on a line. This graphs shows that it is probably close enough to deam the data linear and therefore state it is a normal distribution.

Step 12: Decide on a level of significance

It is commonly accepted among statisticians that a level of 0.05 is an appropriate level of significance. This is the level we will use for this experiment.

Step 13: Perform the test

t.test(fish$pool, fish$riffle, paired=TRUE)
## 
##  Paired t-test
## 
## data:  fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.170332 3.229668
## sample estimates:
## mean of the differences 
##                     2.2

Step 14: Interpret the P-value

Since the p-value is less than the level of significance (0.0004264 < 0.05), we REJECT the null hypothesis that the means are equal.

Step 15: Interpret the Confidence Interval

The confidence interval is the range of plausible values for the difference in means. In this experiment the confidence interval is between 1.170332 and 3.229668 which does not include Zero. Zero is not a plausible value for the difference in means, so it is not plausible that the means are the same. The result of STEP 15 is consistent with the result of STEP 14, we ACCEPT the alternative hypothesis.

Step 16: Interpret the Sample Estimates

Now that we have determined that the population means of the pools and the riffles is not the same we can look at the sample estimates and determine which is more prefferable to fish. According to the data we have collected the mean of the differences in 2.2. This shows that there are 2.2 more fish on average in the pools than in the riffles.

Step 17: Conclusion

Using all the data we have collected we can determine that there is enough evidence to show that there are more fish living in the pools than in the riffles. Biologists and environmentalsits can now use this conclusion to create new hypothesis surrounding the actions of fish in certain environments.