Problem 1: Climate change in the coastal ocean

benguela <- read.csv("http://faraway.neu.edu/biostats/assn3_benguela.csv")
california <- read.csv("http://faraway.neu.edu/biostats/assn3_california.csv")
canary <- read.csv("http://faraway.neu.edu/biostats/assn3_canary.csv")
humboldt <- read.csv("http://faraway.neu.edu/biostats/assn3_humboldt.csv")

1.1

california$period <- ifelse(california$year<2025, "before", "after")
benguela$period <- ifelse(benguela$year<2025, "before", "after")
canary$period <- ifelse(canary$year<2025, "before", "after")
humboldt$period <- ifelse(humboldt$year<2025, "before", "after")

1.2

# Multimodel mean for California
california.mean <- data.frame(year=california$year, period=california$period, upwelling=rowMeans(california[,1:22]))
# Multimodel mean for Canary
canary.mean <- data.frame(year=canary$year, period=canary$period, upwelling=rowMeans(canary[,1:22]))
# Multimodel mean for Humboldt
humboldt.mean <- data.frame(year=humboldt$year, period=humboldt$period, upwelling=rowMeans(humboldt[,1:22]))
# Multimodel mean for Benguela
benguela.mean <- data.frame(year=benguela$year, period=benguela$period, upwelling=rowMeans(benguela[,1:22]))

1.3

H0: There is no significant difference between the California EBCS’ distribution and a normal distribution. HA: There is a significant difference between the California EBCS’ distribution and a normal distribution.

H0: There is no significant difference between the Canary EBCS’ distribution and a normal distribution. HA: There is a significant difference between the Canary EBCS’ distribution and a normal distribution.

H0: There is no significant difference between the Humboldt EBCS’ distribution and a normal distribution. HA: There is a significant difference between the Humboldt EBCS’ distribution and a normal distribution.

H0: There is no significant difference between the Benguela EBCS’ distribution and a normal distribution. HA: There is a significant difference between the Benguela EBCS’ distribution and a normal distribution.

# do shapiro-wilk test for normality on all 4 EBCS
shapiro.test(california.mean$upwelling)
## 
##  Shapiro-Wilk normality test
## 
## data:  california.mean$upwelling
## W = 0.99364, p-value = 0.7527
shapiro.test(canary.mean$upwelling)
## 
##  Shapiro-Wilk normality test
## 
## data:  canary.mean$upwelling
## W = 0.9555, p-value = 9.646e-05
shapiro.test(humboldt.mean$upwelling)
## 
##  Shapiro-Wilk normality test
## 
## data:  humboldt.mean$upwelling
## W = 0.97777, p-value = 0.01559
shapiro.test(benguela.mean$upwelling)
## 
##  Shapiro-Wilk normality test
## 
## data:  benguela.mean$upwelling
## W = 0.98166, p-value = 0.04288

4 tests need to be done, one for each EBCS. After the Shapiro-Wilk test, we must reject normality at an alpha of 0.05 in of the 4 EBCS. This means non paremetric tests have be used. One potential downside of using non-parametric tests is the fact that they are less powerful, and therefore less efficient as parametric tests, because they are distribution free. This means that the results may not provide an accurate answer. The results must be more extreme to reject the null hypothesis, leading to a higher chance of a type 2 error.

1.4

H0: The median of the population of differences before and after 2024 in the California EBCS is zero. HA: The median of the population of differences before and after 2024 in the California EBCS is significantly different than zero.

# wilcox test for california region
cal.before = subset(california.mean, period == "before", upwelling)
cal.after = subset(california.mean, period == "after", upwelling)
wilcox.test(cal.before$upwelling, cal.after$upwelling, paired = TRUE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  cal.before$upwelling and cal.after$upwelling
## V = 707, p-value = 0.0001514
## alternative hypothesis: true location shift is not equal to 0

Since p is 0.0001514, we must reject the null hypothesis at an alpha of 0.05. There is a significent difference in the emedian of the population before and after 2024 in the California EBCS.

H0: The median of the population of differences before and after 2024 in the Canary EBCS is zero. HA: The median of the population of differences before and after 2024 in the Canary EBCS is significantly different than zero.

# wilcox test for canary region
can.before = subset(canary.mean, period == "before", upwelling)
can.after = subset(canary.mean, period == "after", upwelling)
wilcox.test(can.before$upwelling, can.after$upwelling, paired = TRUE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  can.before$upwelling and can.after$upwelling
## V = 2, p-value = 5.841e-14
## alternative hypothesis: true location shift is not equal to 0

Since p is 5.841e-14, we must reject the null hypothesis at an alpha of 0.05. There is a significent difference in the meedian of the population before and after 2024 in the Canary EBCS.

H0: The median of the population of differences before and after 2024 in the Humboldt EBCS is zero. HA: The median of the population of differences before and after 2024 in the Humboldt EBCS is significantly different than zero.

# wilcox test for humboldt region
hmb.before = subset(humboldt.mean, period == "before", upwelling)
hmb.after = subset(humboldt.mean, period == "after", upwelling)
wilcox.test(hmb.before$upwelling, hmb.after$upwelling, paired = TRUE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  hmb.before$upwelling and hmb.after$upwelling
## V = 1, p-value = 5.61e-14
## alternative hypothesis: true location shift is not equal to 0

Since p is 5.61e-14, we must reject the null hypothesis at an alpha of 0.05. There is a significent difference in the meedian of the population before and after 2024 in the Humboldt EBCS.

H0: The median of the population of differences before and after 2024 in the Benguela EBCS is zero. HA: The median of the population of differences before and after 2024 in the Benguela EBCS is significantly different than zero.

# wilcox test for benguela region
ben.before = subset(benguela.mean, period == "before", upwelling)
ben.after = subset(benguela.mean, period == "after", upwelling)
wilcox.test(ben.before$upwelling, ben.after$upwelling, paired = TRUE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  ben.before$upwelling and ben.after$upwelling
## V = 0, p-value = 5.388e-14
## alternative hypothesis: true location shift is not equal to 0

Since p is 5.388e-14, we must reject the null hypothesis at an alpha of 0.05. There is a significent difference in the meedian of the population before and after 2024 in the Benguela EBCS.

1.5

# Compute group means and stderrs for California
mean.upwelling.california <- aggregate(upwelling ~ period, FUN=mean, data=california.mean)
sderr.upwelling.california <- aggregate(upwelling ~ period, FUN=function (x) sd(x)/sqrt(length(x)), data=california.mean)
# Compute group means and stderrs for Canary
mean.upwelling.canary <- aggregate(upwelling ~ period, FUN=mean, data=canary.mean)
sderr.upwelling.canary <- aggregate(upwelling ~ period, FUN=function (x) sd(x)/sqrt(length(x)), data=canary.mean)
# Compute group means and stderrs for Humboldt
mean.upwelling.humboldt <- aggregate(upwelling ~ period, FUN=mean, data=humboldt.mean)
sderr.upwelling.humboldt <- aggregate(upwelling ~ period, FUN=function (x) sd(x)/sqrt(length(x)), data=humboldt.mean)
# Compute group means and stderrs for Benguela
mean.upwelling.benguela <- aggregate(upwelling ~ period, FUN=mean, data=benguela.mean)
sderr.upwelling.benguela <- aggregate(upwelling ~ period, FUN=function (x) sd(x)/sqrt(length(x)), data=benguela.mean)

1.6

# Combine data into a single means dataframe
upwelling.mean <- data.frame(region=c("Canary", "California", "Benguela", "Humboldt"),
  after=c(mean.upwelling.canary$upwelling[2],mean.upwelling.california$upwelling[2],
        mean.upwelling.benguela$upwelling[2],mean.upwelling.humboldt$upwelling[2]),
  before=c(mean.upwelling.canary$upwelling[1],mean.upwelling.california$upwelling[1],
         mean.upwelling.benguela$upwelling[1],mean.upwelling.humboldt$upwelling[1]))
# Combine data into a single stderrs dataframe
upwelling.sderr <- data.frame(region=c("Canary", "California", "Benguela", "Humboldt"),
  after=c(sderr.upwelling.canary$upwelling[2], sderr.upwelling.california$upwelling[2],
        sderr.upwelling.benguela$upwelling[2],sderr.upwelling.humboldt$upwelling[2]),
  before=c(sderr.upwelling.canary$upwelling[1],sderr.upwelling.california$upwelling[1],
         sderr.upwelling.benguela$upwelling[1],sderr.upwelling.humboldt$upwelling[1]))

# plot the barplot of the before and after for each region
bp <- barplot(t(upwelling.mean[, 2:3]), beside = T, names = c("Canary", "California", "Benguela", "Humboldt"),
              col=c("blue", "red"),ylim=c(0,1))

legend(x = "topright", legend = c("After", "Before"), fill = c("blue", "red")) 

# plot the error bars 
arrows(x0 = bp, x1 = bp, y0 = t(upwelling.mean[, 2:3]) - t(upwelling.sderr[, 2:3]), y1 = t(upwelling.mean[, 2:3]) + 
         t(upwelling.sderr[, 2:3]), angle = 90, len = 0.1, code = 3)

1.7

Upwelling tends to decrease, and the mean upwelling does decrease, in each of the four regions tested in today’s data. The mean Upwelling drop seems significant as well, with a significant visual drop in three of the four regions, and a slight one in California.

1.8

One test that could be used is Levene’s test, which compares the variances of multiple groups and calculates the statistic W. It does not assume or require the normality of the data. It assumes that both samples are random and independent. Another test that could be used in the F test. It calculates the ratio between the variances of 2 groups and compares this value to 1, and follows the F distribution. It assumes that data are normally distributed and that samples are independent from one another.

1.9

H0: There is no significant difference in variance before and after 2024 in the California EBCS. HA: There is a significant difference in variance before and after 2024 in the California EBCS.

# levene test for california region
leveneTest(upwelling ~ period, california.mean)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value  Pr(>F)  
## group   1  5.7206 0.01802 *
##       148                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is a significant different in variance before and after 2024 in the California EBCS at an alpha of 0.5, as the p-value is 0.01802.

H0: There is no significant difference in variance before and after 2024 in the Canary EBCS. HA: There is a significant difference in variance before and after 2024 in the Canary EBCS.

# levene test for canary region
leveneTest(upwelling ~ period, canary.mean)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value  Pr(>F)  
## group   1  6.6044 0.01116 *
##       148                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is a significant different in variance before and after 2024 in the Canary EBCS at an alpha of 0.5, as the p-value is 0.01116.

H0: There is no significant difference in variance before and after 2024 in the Humboldt EBCS. HA: There is a significant difference in variance before and after 2024 in the Humboldt EBCS.

# levene test for humboldt region
leveneTest(upwelling ~ period, humboldt.mean)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value  Pr(>F)  
## group   1  5.3757 0.02179 *
##       148                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is a significant different in variance before and after 2024 in the Humboldt EBCS at an alpha of 0.5, as the p-value is 0.02179.

H0: There is no significant difference in variance before and after 2024 in the Benguela EBCS. HA: There is a significant difference in variance before and after 2024 in the Benguela EBCS.

# levene test for benguela region
leveneTest(upwelling ~ period, benguela.mean)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   1  0.5294  0.468
##       148

There is no significant different in variance before and after 2024 in the Benguela EBCS at an alpha of 0.5, as the p-value is 0.468.

Problem 2: Quantifying the rate of change in upwelling

2.1

The proper analysis in this situation is linear regression. Since we are measuring upwelling vs time, we have many individual data points that we are trying to determine the trend of.

2.2

The assumptions of linear regression are: 1. At each value of X, there is a population of possible Y-values whose mean lies on the true regression line 2. At each value of X, the distribution of Y is normal 3. The variance of Y values is the same at each value of X 4. At each value of X, the Y measurements are a random sample from the population of possible Y-values

H0: There is no significant linear regression model for the relation between upwelling and years in the California EBSC. HA: There is a significant linear regression model for the relation between upwelling and years in the California EBSC.

# lin regression test for california region
cal.lm = lm(upwelling ~ year, data=california.mean)
summary(cal.lm)
## 
## Call:
## lm(formula = upwelling ~ year, data = california.mean)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.038587 -0.009693 -0.000679  0.008362  0.040669 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.040e-01  5.509e-02   1.887   0.0611 .  
## year        1.198e-04  2.721e-05   4.404 2.02e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01443 on 148 degrees of freedom
## Multiple R-squared:  0.1159, Adjusted R-squared:  0.1099 
## F-statistic:  19.4 on 1 and 148 DF,  p-value: 2.023e-05

At an alpha of 0.05, we can reject the null hypothesis, as the p value is 2.02e-05, less than 0.05. There is a significant linear regression model for the relation between upwelling and years in the California EBSC.

H0: There is no significant linear regression model for the relation between upwelling and years in the Canary EBSC. HA: There is a significant linear regression model for the relation between upwelling and years in the Canary EBSC.

can.lm = lm(upwelling ~ year, data=canary.mean)
summary(can.lm)
## 
## Call:
## lm(formula = upwelling ~ year, data = canary.mean)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.066392 -0.016333 -0.001149  0.016434  0.065723 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.9472689  0.0935523  -20.82   <2e-16 ***
## year         0.0012185  0.0000462   26.38   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0245 on 148 degrees of freedom
## Multiple R-squared:  0.8246, Adjusted R-squared:  0.8234 
## F-statistic: 695.6 on 1 and 148 DF,  p-value: < 2.2e-16

At an alpha of 0.05, we can reject the null hypothesis, as the p value is less than 2e-16, less than 0.05. There is a significant linear regression model for the relation between upwelling and years in the Canary EBSC.

H0: There is no significant linear regression model for the relation between upwelling and years in the Humboldt EBSC. HA: There is a significant linear regression model for the relation between upwelling and years in the Humboldt EBSC.

hmb.lm = lm(upwelling ~ year, data=humboldt.mean)
summary(hmb.lm)
## 
## Call:
## lm(formula = upwelling ~ year, data = humboldt.mean)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.101277 -0.014199  0.000978  0.017080  0.075162 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.472e+00  1.015e-01  -24.35   <2e-16 ***
## year         1.430e-03  5.013e-05   28.52   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02659 on 148 degrees of freedom
## Multiple R-squared:  0.846,  Adjusted R-squared:  0.845 
## F-statistic: 813.2 on 1 and 148 DF,  p-value: < 2.2e-16

At an alpha of 0.05, we can reject the null hypothesis, as the p value is less than 2e-16, less than 0.05. There is a significant linear regression model for the relation between upwelling and years in the Humboldt EBSC.

H0: There is no significant linear regression model for the relation between upwelling and years in the Humboldt EBSC. HA: There is a significant linear regression model for the relation between upwelling and years in the Humboldt EBSC.

ben.lm = lm(upwelling ~ year, data=benguela.mean)
summary(ben.lm)
## 
## Call:
## lm(formula = upwelling ~ year, data = benguela.mean)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.053242 -0.019020  0.001129  0.017508  0.083227 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.796e+00  1.031e-01  -17.41   <2e-16 ***
## year         1.316e-03  5.093e-05   25.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02701 on 148 degrees of freedom
## Multiple R-squared:  0.8185, Adjusted R-squared:  0.8173 
## F-statistic: 667.6 on 1 and 148 DF,  p-value: < 2.2e-16

At an alpha of 0.05, we can reject the null hypothesis, as the p value is less than 2e-16, less than 0.05. There is a significant linear regression model for the relation between upwelling and years in the Benguela EBSC.

2.3

# plot year vs upwelling
plot(california.mean$upwelling ~ california.mean$year, xlab = "Year", ylab = "Upwelling", main="Upwelling vs Year", col="red", ylim=c(0.2,1))
abline(cal.lm, col="red")

points(canary.mean$upwelling ~ canary.mean$year, col="orange")
abline(can.lm, col="orange")

points(humboldt.mean$upwelling ~ humboldt.mean$year, col="blue")
abline(hmb.lm, col="blue")

points(benguela.mean$upwelling ~ benguela.mean$year, col="purple")
abline(ben.lm, col="purple")

#plot the legend
legend(x = "topright", legend = c("California", "Canary", "Humboldt", "Benguela"), fill = c("red", "orange", "blue", "purple")) 

Three of the EBCS (Canary, Humboldt, Benguela) have a very noticable linear increase in upwelling over the 150 year period. There is a relatively clear positive linear trend between those three regions. California has a calculated positive linear trend, and while the graph shows a visual linear trend, it is difficult to make out whether is it positive or not, and it seems close to 0. This is clear when looking at the regression summary data, as the slope is very, very small. While the slopes for the other three conditions are all small, they are over 10 times larger than California’s slope.