fish <- read.csv ("fish file.csv")
fish
##    location pool riffle
## 1         1    6      3
## 2         2    6      3
## 3         3    3      3
## 4         4    8      4
## 5         5    5      2
## 6         6    2      2
## 7         7    6      2
## 8         8    7      2
## 9         9    1      2
## 10       10    3      2
## 11       11    4      3
## 12       12    5      1
## 13       13    4      3
## 14       14    6      2
## 15       15    4      3

Describe data

There are three columns: the coloumn “pool” signifies the fish that are present in slow moving and deep areas of water. The “riffle” column shows the fish present in the fast, shallow regions of water. The third column is location, there were 15 locations. At each location the amount of fish present in the pool or riffles were noted.

Identify the purpose of the study

The purpose of this study was to measure whether the two habitats (pools and riffles) support equal number of species - which measures species diversity.

Visualize the data

library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) + annotate("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle")

interpret the plot

The plot suggests the higher presence of fish in the pool habitats then the riffle habitats. To elaborate, fish are more likely to be found in areas with slowing moving, deep waters.

Formulate the null hypothesis

The depth and speed of the waters do not matter when fish are searching for habitat.

Alternative Hypothesis

Population means are unequal - the mean is higher in the population of pool habitats.

Decide on a type of test

T test

Choose one sample or two

Two sample - pool and riffle

Check assumptions of the test

ggplot(data=fish) + geom_qq(mapping=aes(sample=riffle, group=pool))

level of significance of the test

.05 is the level of significance

Perform the Test

t.test(fish$riffle,fish$pool, data=fish)
## 
##  Welch Two Sample t-test
## 
## data:  fish$riffle and fish$pool
## t = -4.1482, df = 18.125, p-value = 0.0005961
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.313673 -1.086327
## sample estimates:
## mean of x mean of y 
##  2.466667  4.666667

Interpret the p value

The p value is less than the level of significance, therefore, we reject the null hypothesis that says the means are equal.

Interpret the confidence interval

the means are not equal because the interval does not contain 0. Zero is not a plausible value for the parameter – the difference in means.Therefore, its not plausible that the means are equal.

Interpret the sample estimates

the mean of y improved more than the mean of x

State your conclusion

I conclude, after studying the data above, that because more fish were found in pool environments fish enjoy slow moving, deep waters moreso than shallow, fast moving waters.