Markdown Author: Jessie Bell, 2023
Libraries Used: ggplot, car
Answers: orange
A group of scientists from the University of Plymouth in the UK conducted a field study to -investigate the impact of a implementing a “whole-site” Marine Protected Area (MPA) on the abundance of a fish in Lyme Bay, a temperate. They collected fish abundance by species before the protections went into effect and then again after 11 years of protection. The Lyme Bay MPA was the UK’s first and largest example of an ambitious, whole-site approach to marine protection, which was designed to manage, recover, and protect reef biodiversity by considering the whole ecosystem.
data <- read.csv("midterm2.csv")
Test: Paired Samples t-test
The assumptions are:
Normality of differences (you can check this using a normality test or visual inspection of a histogram or Q-Q plot).
Homogeneity of variances (you can check this using a test like Levene’s test).
# Boxplot to visualize the data
boxplot(data$Pre.MPA, data$Post.MPA, names = c("Pre-MPA", "Post-MPA"), col = c("cyan", "blue"), main = "Fish Species Pre and Post MPA")
\(H_0\): \(\mu_d\) = 0
\(H_a\): \(\mu_d\) \(\neq\) 0
d.bar <- data$Post.MPA - data$Pre.MPA
mean.diff <- mean(d.bar)
mean.diff
## [1] 3.284249
sd.diff <- sd(d.bar)
variance <- sd.diff^2
n <- length(data$Pre.MPA) #aka 36
denominator <- sqrt(variance/n)
denominator
## [1] 0.2838918
t.calc <- mean.diff/denominator
t.calc
## [1] 11.56867
t.critical <- qt(0.975, 35)
t.calc #greater than t-critical
## [1] 11.56867
t.critical
## [1] 2.030108
Since t.critical is less than t.calc, we would reject the null hypothesis and conclude that there is a significant difference in fish biodiversity before and after the MPA.
t.test(data$Pre.MPA, data$Post.MPA, paired = TRUE)
##
## Paired t-test
##
## data: data$Pre.MPA and data$Post.MPA
## t = -11.569, df = 35, p-value = 1.64e-13
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -3.860580 -2.707918
## sample estimates:
## mean difference
## -3.284249
Since p-value < 0.05, we would reject the null hypothesis and conclude that there is a statistically significant difference in fish biodiversity before and after the MPA.
data2 <- read.csv("midterm2b.csv")
One-way Analysis of Variance (ANOVA)
result_habitat <- aov(Post.MPA ~ substrate, data = data2)
The assumptions are:
hist(result_habitat$residuals) #not too normal but lets double check using shapiro
shapiro.test(residuals(result_habitat)) #data comes from normal dist. close enough!
##
## Shapiro-Wilk normality test
##
## data: residuals(result_habitat)
## W = 0.94771, p-value = 0.08856
# Levene's test
leveneTest(Post.MPA ~ substrate, data = data2) #variances are equal enough
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.1281 0.3358
## 33
# Boxplot to visualize the data
boxplot(Post.MPA ~ substrate, data = data2, col = c("purple", "red", "cyan"), main = "Fish Species by Substrate Type in MPA")
\(H_0\): \(\mu_1\) = \(\mu_2\) = \(\mu_3\)
\(H_a\): At least 1 mean is not equal to the others.
result_habitat <- aov(Post.MPA ~ substrate, data = data2)
# Display the ANOVA table
summary(result_habitat)
## Df Sum Sq Mean Sq F value Pr(>F)
## substrate 2 1.49 0.7451 0.607 0.551
## Residuals 33 40.49 1.2269
Since p-value > 0.05, we would fail to reject the null hypothesis and conclude that there is not a statistically significant difference in fish substrate types within the MPA.
In the post-hoc test below you can see again, that there is no statistical evidence (p adj column) to suggest that the substrate matters, at least within this dataset.
# Tukey's HSD for post-hoc analysis
TukeyHSD(result_habitat)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Post.MPA ~ substrate, data = data2)
##
## $substrate
## diff lwr upr p adj
## boulders-bedrock 0.39703621 -0.7905438 1.5846162 0.6932255
## cobbles-bedrock -0.08648996 -1.1654225 0.9924426 0.9789093
## cobbles-boulders -0.48352617 -1.5931454 0.6260930 0.5394916
All answers will vary, but here are some possible answers:
\(H_0:\): There is no difference in the mean number of fish species between the MPA and Open Control (OC) groups.
\(H_a:\) There is a statistically significant difference in the mean number of fish species between the MPA and OC groups.
Steps:
Use a two-sample t-test to compare the mean number of fish species between the MPA and OC groups.
Function in R: t.test(Pre.MPA, OC, paired = FALSE).
Interpret the p-value; if it’s less than the significance level (e.g., 0.05), you may reject the null hypothesis.
\(H_0:\) There is no correlation between the number of fish species and the number of invertebrate taxa.
\(H_a:\) There is a statistically significant correlation between the number of fish species and the number of invertebrate taxa.
Steps:
Use a correlation test to assess the relationship between the number of fish species and the number of invertebrate taxa (e.g., Pearson correlation or Spearman correlation).
Function in R:
cor.test(data$Pre.MPA, data$Invertebrate_Taxa, method = "pearson").
Always check assumptions before running the tests. For example, assumptions of normality and homogeneity of variance for t-tests and correlation.
Visualize the data using appropriate plots (e.g., scatter plots, boxplots) to get a better understanding of the distributions and relationships.
These proposed tests will provide insights into the differences in fish species between the MPA and OC groups and the potential relationship between fish and invertebrate taxa across all sites.