Markdown Author: Jessie Bell, 2023

Libraries Used: ggplot, car

Answers: orange

Part 1

A group of scientists from the University of Plymouth in the UK conducted a field study to -investigate the impact of a implementing a “whole-site” Marine Protected Area (MPA) on the abundance of a fish in Lyme Bay, a temperate. They collected fish abundance by species before the protections went into effect and then again after 11 years of protection. The Lyme Bay MPA was the UK’s first and largest example of an ambitious, whole-site approach to marine protection, which was designed to manage, recover, and protect reef biodiversity by considering the whole ecosystem.

data <- read.csv("midterm2.csv")

a

Test: Paired Samples t-test

b

The assumptions are:

Normality of differences (you can check this using a normality test or visual inspection of a histogram or Q-Q plot).
Homogeneity of variances (you can check this using a test like Levene’s test).

c

# Boxplot to visualize the data
boxplot(data$Pre.MPA, data$Post.MPA, names = c("Pre-MPA", "Post-MPA"), col = c("cyan", "blue"), main = "Fish Species Pre and Post MPA")

d

\(H_0\): \(\mu_d\) = 0

\(H_a\): \(\mu_d\) \(\neq\) 0

e

d.bar <- data$Post.MPA - data$Pre.MPA

mean.diff <- mean(d.bar)
mean.diff

## [1] 3.284249

sd.diff <- sd(d.bar)

variance <- sd.diff^2

n <- length(data$Pre.MPA) #aka 36

denominator <- sqrt(variance/n)

denominator

## [1] 0.2838918

t.calc <- mean.diff/denominator
  


t.calc

## [1] 11.56867

f

t.critical <- qt(0.975, 35)

t.calc #greater than t-critical

## [1] 11.56867

t.critical

## [1] 2.030108

g

Since t.critical is less than t.calc, we would reject the null hypothesis and conclude that there is a significant difference in fish biodiversity before and after the MPA.

h

t.test(data$Pre.MPA, data$Post.MPA, paired = TRUE)

## 
##  Paired t-test
## 
## data:  data$Pre.MPA and data$Post.MPA
## t = -11.569, df = 35, p-value = 1.64e-13
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -3.860580 -2.707918
## sample estimates:
## mean difference 
##       -3.284249

i

Since p-value < 0.05, we would reject the null hypothesis and conclude that there is a statistically significant difference in fish biodiversity before and after the MPA.

Part 2

data2 <- read.csv("midterm2b.csv")

a

One-way Analysis of Variance (ANOVA)

result_habitat <- aov(Post.MPA ~ substrate, data = data2)

b

The assumptions are:

Normality of residuals

hist(result_habitat$residuals) #not too normal but lets double check using shapiro

shapiro.test(residuals(result_habitat)) #data comes from normal dist. close enough!

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(result_habitat)
## W = 0.94771, p-value = 0.08856

Homogeneity of variances

# Levene's test
leveneTest(Post.MPA ~ substrate, data = data2) #variances are equal enough

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  1.1281 0.3358
##       33

c

# Boxplot to visualize the data
boxplot(Post.MPA ~ substrate, data = data2, col = c("purple", "red", "cyan"), main = "Fish Species by Substrate Type in MPA")

d

\(H_0\): \(\mu_1\) = \(\mu_2\) = \(\mu_3\)

\(H_a\): At least 1 mean is not equal to the others.

e

result_habitat <- aov(Post.MPA ~ substrate, data = data2)

# Display the ANOVA table
summary(result_habitat)

##             Df Sum Sq Mean Sq F value Pr(>F)
## substrate    2   1.49  0.7451   0.607  0.551
## Residuals   33  40.49  1.2269

f

Since p-value > 0.05, we would fail to reject the null hypothesis and conclude that there is not a statistically significant difference in fish substrate types within the MPA.

g

In the post-hoc test below you can see again, that there is no statistical evidence (p adj column) to suggest that the substrate matters, at least within this dataset.

# Tukey's HSD for post-hoc analysis
TukeyHSD(result_habitat)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Post.MPA ~ substrate, data = data2)
## 
## $substrate
##                         diff        lwr       upr     p adj
## boulders-bedrock  0.39703621 -0.7905438 1.5846162 0.6932255
## cobbles-bedrock  -0.08648996 -1.1654225 0.9924426 0.9789093
## cobbles-boulders -0.48352617 -1.5931454 0.6260930 0.5394916

Part 3

All answers will vary, but here are some possible answers:

Test 1: Comparison of Fish Species between MPA and Open Control (OC)

\(H_0:\): There is no difference in the mean number of fish species between the MPA and Open Control (OC) groups.

\(H_a:\) There is a statistically significant difference in the mean number of fish species between the MPA and OC groups.

Steps:

Use a two-sample t-test to compare the mean number of fish species between the MPA and OC groups.
Function in R: t.test(Pre.MPA, OC, paired = FALSE).
Interpret the p-value; if it’s less than the significance level (e.g., 0.05), you may reject the null hypothesis.

Test 2: Relationship between Fish and Invertebrate Taxa

\(H_0:\) There is no correlation between the number of fish species and the number of invertebrate taxa.

\(H_a:\) There is a statistically significant correlation between the number of fish species and the number of invertebrate taxa.