Inference Project

Step 2: Load the data

fish <- read.csv("fish.csv")
fish

##    location pool riffle
## 1         1    6      3
## 2         2    6      3
## 3         3    3      3
## 4         4    8      4
## 5         5    5      2
## 6         6    2      2
## 7         7    6      2
## 8         8    7      2
## 9         9    1      2
## 10       10    3      2
## 11       11    4      3
## 12       12    5      1
## 13       13    4      3
## 14       14    6      2
## 15       15    4      3

Step 3: Describe Data

Most fish are found in pools, the deep, slow-moving parts of streams. Other fish prefer riffles, which are shallow, fast moving regions. There are 15 rows which indicate the amount of fish in each column, pool or riffle.

Step 4: The Purpose of the Study

The purpose of the study is to assess whether these two habitats support equal numbers of species, which is a measure of species diversity.

Step 5: Visualize Data

library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")

Step 6: Interpret the Plot

On one side of the line, there are more fish in pool and on the other side of the line there are more fish in riffle.

Step 7: Formulate the null hypothesis

The number of fish in pool is equal to the number of fish in riffle.

Step 8: Identify the alternative hypothesis

The number of fish in pool will be greater than the number of fish in riffle.

Step 9: Decide on type of test

The type of test that will be used is a paired t. test because the hypothesis involves population means of a quantitative variable (amount of fish) and because each x is paired with a single y.

Step 10: Choose one sample or two sample

We need two samples.

Step 11: Check assumptions of the test

gg <- ggplot(data=fish)
gg + geom_qq(mapping=aes(sample=fish$pool))

gg + geom_qq(mapping=aes(sample=fish$riffle))

The pool graph shows a normal distribution because it has a large sample size. In the riffle graph it doesn’t have a lot of variation but it still is considered a normal distribution because the points have a diagonal characteristic.

Step 12: Decide on level of significance of the test

The level of significance that will be used is 0.05

Step 13: Perform the test

A paired t.test is performed

t.test(fish$pool, fish$riffle, data=fish)

## 
##  Welch Two Sample t-test
## 
## data:  fish$pool and fish$riffle
## t = 4.1482, df = 18.125, p-value = 0.0005961
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.086327 3.313673
## sample estimates:
## mean of x mean of y 
##  4.666667  2.466667

Step 14: Interpret the p-value

Since the p-value is less than the level of significance, the null hypothesis that the number of fish in pool is equal to the number of fish in riffle is rejected.

Step 15: Interpret the confidence interval

It is not plausible that the means are the same because 0 is not in this interval. This is also consistent with step 14 which rejects the null hypothesis that the number of fish in pool and riffle are equal.

Step 16: Interpret the sample estimates

Pool has 2.2 more fish on average than riffle based on the test

Step 17: Conclusion

We have evidence that pool is more diverse than riffle because it contains a greater amount of fish in each location than riffle. Fish prefere to live in pool.

Inference Project

Cheldina Jean

12/1/2017