STEP 1: Design the experiment

The data for this project was given to us. In the next steps, the data will be described and the hypotheses will be formed. Graphs will support evidence of p-values, while helping one to decide whether to reject or accpet the null hypothesis.

STEP 2: Collect (or load) data

fish<-read.csv("fish.csv")
fish
##    location pool riffle
## 1         1    6      3
## 2         2    6      3
## 3         3    3      3
## 4         4    8      4
## 5         5    5      2
## 6         6    2      2
## 7         7    6      2
## 8         8    7      2
## 9         9    1      2
## 10       10    3      2
## 11       11    4      3
## 12       12    5      1
## 13       13    4      3
## 14       14    6      2
## 15       15    4      3

STEP 3: Describe data

There are 15 rows and 3 columns in this data set. The rows correspond to the variables and the columns correspond to subjects. The Location column describes the different locations in which the data was recorded. The Pool column describes the number of fish seen in pools, while the Riffle column is the number of fish that were seen in riffles.

STEP 4: Identify the purpose of the study

The purpose of the study is to decide whether evidence suggests that there are more fish found in certain locations of pools or riffles. Specifically, the study focused on whether there were more fish in

STEP 5: Visualize data

It is always a good idea to visualize your data, as soon as you load it. Technically this should be done after you design the study, but as mentioned above, we are going to switch the order around.

Load graphics library, then plot:

library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() 

STEP 6: Interpret the plot

The plot suggests that the pool group may have. We must find a p-value: the probability that there is actualy a difference between the Pool and Riffle groups. If the p-value is low then one must reject the null hypothesis. Specifically, if evidence is found that there is a difference between the two means. The plot helpds describe the accuracy of the p-value.

STEP 7: Formulate the null hypothesis.

Null Hypothesis: The means of the population of pool is the same as the mean in the population of Riffle.

STEP 8: Identify the alternative hypothesis.

Alternative: The mean of the population of pool is greater thean the mean in the population of Riffle.

STEP 9: Decide on type of test.

A paired t-test because it is used to test the hypothesis abot population means of quantitative variables. Also, both Pool and Riffle are related, meaning that one has an effect on the other. Both are considered locations, even though they differ.

STEP 10: Choose one sample or two.

One sample because there is a paired t-test being used to test this sample. Also, there is a relationship between Pool and Riffle.

STEP 11: Check assumptions of the test

T-tests are meant to test a normal distribution of data. The greater the sample size, the more susceptible the t-test is to non-normality.

The best way of judging this is with a qq-plot.

ggplot(data=fish) + geom_qq(mapping=aes(sample=riffle, color=pool))

Here is a ggplot graph also.

gg<-ggplot(data=fish)
gg+geom_qq(mapping=aes(sample=fish$pool))

Here is a scatter plot comparing Riffle and Pool.

library(ggplot2)
ggplot(data=fish, mapping=aes(x=riffle, y=pool)) + geom_point() + geom_abline(slope=1, intercept=0) +  annotate ("text", x=1.25, y=4, label="More in Pool") + annotate("text", x=3.5, y=2, label="More in Riffle") 

The data appears to be normal.

STEP 12: Decide on a level of significance of the test

The level of significance for the test is 0.05.

STEP 13: Perform the test

t.test(fish$pool, fish$riffle, paired=TRUE)
## 
##  Paired t-test
## 
## data:  fish$pool and fish$riffle
## t = 4.5826, df = 14, p-value = 0.0004264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.170332 3.229668
## sample estimates:
## mean of the differences 
##                     2.2

STEP 14: Interpret the p-value

The p-value is lower than the level of significane. The means are not equal, therefore; the null hypothesis is REJECTED. The means of Pool and Riffle are not equal. The alternative hypothesis should be accepted.

STEP 15: Interpret the confidence interval

The confidence interval describes the range of values that include the difference in means. In this case, the difference of the means cannot be zero, because the confidence interval is between [1.170332, 3.229668]. This reaffirms that the null hypothesis should be rejected. This step relates to the prevous step.

STEP 16: Interpret the sample estimates

There is evidence to prove that the means are not equal in the entire population.

STEP 17: State your conclusion

summary(fish)
##     location         pool           riffle     
##  Min.   : 1.0   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 4.5   1st Qu.:3.500   1st Qu.:2.000  
##  Median : 8.0   Median :5.000   Median :2.000  
##  Mean   : 8.0   Mean   :4.667   Mean   :2.467  
##  3rd Qu.:11.5   3rd Qu.:6.000   3rd Qu.:3.000  
##  Max.   :15.0   Max.   :8.000   Max.   :4.000

There is apparent evidence that fish in pools are more abundant than fish in riffles. When looking at the summary statistics, fish in most locations are found in pools, rather then Riffles. The mean of Riffle is notably lower than that of Pool.