Warning: package 'tidyverse' was built under R version 4.5.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(marginaleffects)
Warning: package 'marginaleffects' was built under R version 4.5.1
library(gglm)
Warning: package 'gglm' was built under R version 4.5.1
Results
Climatology comparison
The climatologies of the Index and the Mean Monthly Strength showed a significant correlation between them (Pearson correlation = 0.673, p = 0.0165).
Climatology comparison between EAC CCI and Mean Monthly Strength
Scatterplot of the climatologies
Relationship between Index values and Mean Monthly Strength
The raw values of the two were compared, there was statistically significant correlation between the two (Pearson = 0.497, p < 0.001).
Scatterplot of Index values against Mean Monthly Velocity values
To further explore the relationship between the output of the Index and the calculated mean monthly strength values, multiple linear regression models were calculated.
Raw: EAC strength vs. copepod index
# data wrangling# Extract data sourcesraw_data <-readRDS(file.path("var", "raw_data_list.rds"))raw_strength_data <- raw_data$raw_strength_dataraw_index_data <- raw_data$raw_index_datastr_clim <- raw_data$str_clim# combine raw data valuescomb_raw_data <- raw_strength_data %>%full_join(raw_index_data, by ="date") %>%mutate(month =month(date) # extract month value ) %>%rename(mean_strength = mean_vel) %>%filter(!is.na(mean_strength) &!is.na(eac_cci))comb_raw_data <- comb_raw_data %>%inner_join(str_clim, by ="month")head(comb_raw_data)
There are no samples taken in January, so the data only covers the period from February to December.
A multiple linear regression was conducted to examine the relationship between the Index values and the mean monthly strength.
# create the modelcci_str_model <-lm(eac_cci ~ mean_strength, data = comb_raw_data)summary(cci_str_model)
Call:
lm(formula = eac_cci ~ mean_strength, data = comb_raw_data)
Residuals:
Min 1Q Median 3Q Max
-0.242219 -0.060886 -0.001213 0.051507 0.297174
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.19661 0.04973 -3.953 0.000219 ***
mean_strength 0.77412 0.18080 4.282 7.34e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1071 on 56 degrees of freedom
Multiple R-squared: 0.2466, Adjusted R-squared: 0.2332
F-statistic: 18.33 on 1 and 56 DF, p-value: 7.337e-05
The overall model was statistically significant, F(1, 56) = 18.33, p = < 0.001, explaining approximately 23% of the variance in the dependent variable (R² = 0.247, adjusted R² = 0.233).
Residuals were symmetrically distributed with a standard error of 0.107 (df = 56), ranging from -0.242 to 0.297, suggesting that the model does not under or overfit.
A second regression was conducted to understand the seasonal variation.
# Index against strength climatologycci_str_model2 <-lm(eac_cci ~ month_clim_str, data = comb_raw_data)# month_clim_str is the strength climatology value.summary(cci_str_model2)
Call:
lm(formula = eac_cci ~ month_clim_str, data = comb_raw_data)
Residuals:
Min 1Q Median 3Q Max
-0.237738 -0.099558 -0.004869 0.076313 0.277945
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1538 0.0852 -1.805 0.0764 .
month_clim_str 0.6152 0.3191 1.928 0.0590 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1195 on 56 degrees of freedom
Multiple R-squared: 0.06223, Adjusted R-squared: 0.04549
F-statistic: 3.716 on 1 and 56 DF, p-value: 0.05896
This model was not statistically significant, F(1, 56) = 3.716, p = 0.238, explaining approximately 4.6% of the variance in the dependent variable (R² = 0.062, adjusted R² = 0.046).
Residuals were symmetrically distributed with a standard error of 0.12 (df = 56), ranging from -0.237 to 0.278.
Finally, we calculate the total variation in CCI that is explained by the seasonal and total variation in current. We calculate the non-seasonal component as the difference between the variance explained by the nested models.
# Index against strength climatologycci_str_model3 <-lm(eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data)summary(cci_str_model3)
Call:
lm(formula = eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data)
Residuals:
Min 1Q Median 3Q Max
-0.243860 -0.061231 -0.001909 0.052632 0.293831
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.21530 0.07878 -2.733 0.008425 **
month_clim_str 0.09862 0.32065 0.308 0.759582
mean_strength 0.74687 0.20267 3.685 0.000524 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1079 on 55 degrees of freedom
Multiple R-squared: 0.2479, Adjusted R-squared: 0.2206
F-statistic: 9.066 on 2 and 55 DF, p-value: 0.0003955
cci_str_model4 <-lm(mean_strength ~ eac_cci, data = comb_raw_data)summary(cci_str_model4)
Call:
lm(formula = mean_strength ~ eac_cci, data = comb_raw_data)
Residuals:
Min 1Q Median 3Q Max
-0.118074 -0.048626 -0.008999 0.056730 0.149406
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.261425 0.009037 28.929 < 2e-16 ***
eac_cci 0.318600 0.074409 4.282 7.34e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.06869 on 56 degrees of freedom
Multiple R-squared: 0.2466, Adjusted R-squared: 0.2332
F-statistic: 18.33 on 1 and 56 DF, p-value: 7.337e-05
The overall model was statistically significant, F(2, 55) = 9.066, p < 0.001, explaining approximately 22% of the variance in the dependent variable (R² = 0.248, adjusted R² = 0.221).
Residuals were symmetrically distributed with a standard error of 0.108 (df = 55), ranging from -0.244 to 0.294, suggesting that the model does not under or overfit.
Model comparison
We test the significance of the non-seasonal component with ANOVA of the nested models.
# compare the two models anova(cci_str_model2, cci_str_model3, test ="F")
Analysis of Variance Table
Model 1: eac_cci ~ month_clim_str
Model 2: eac_cci ~ month_clim_str + mean_strength
Res.Df RSS Df Sum of Sq F Pr(>F)
1 56 0.79905
2 55 0.64082 1 0.15823 13.58 0.0005237 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The comparison revealed that Model 3 provided a better fit to the data than Model 2, F(1, 55) = 13.58, p < 0.001.
This suggests that the monthly mean strength value explains more of the variance in the EAC CCI than the strength climatology alone
Model diagnostics
Diagnostics of Model 3 show that the values have a fairly normal distribution.
# look at model 3gglm(cci_str_model3)
A plot showing the model’s predicted values for different mean strength values, overlaid with actual raw data points to visualize the model’s fit to the observed data.
Climatologies of the Index and the observational data from the mooring array were compared, however there was no significant correlation (Pearson correlation = 0.204, p = 0.526)
Comparison of climatologies from the mooring array data and the EAC CCI
Scatterplot of climatologies
The velocity data was plotted first against the full index values, and then against just the index values from the Northern region of the EAC. However, while the Pearson correlation value changed, the significance did not (p = 0.08)
Scatterplot of Mean velocity against the full Index
Scatterplot of mean velocity against just the Northern region of the Index
Linear regressions were run on the data from the mooring array against the results of the full index.
# A tibble: 6 × 5
date eac_cci mean_vel vel_clim month
<date> <dbl> <dbl> <dbl> <dbl>
1 2010-08-01 -0.124 NA NA 8
2 2010-09-01 -0.144 NA NA 9
3 2010-10-01 -0.169 NA NA 10
4 2010-12-01 -0.0828 NA NA 12
5 2011-03-01 -0.0519 NA NA 3
6 2011-05-01 -0.110 NA NA 5
A linear regression was conducted to examine the relationship between the Index values and the observed mean monthly velocity.
# create the modelcci_moor_model <-lm(eac_cci ~ mean_vel, data = comb_moor_data)summary(cci_moor_model)
Call:
lm(formula = eac_cci ~ mean_vel, data = comb_moor_data)
Residuals:
Min 1Q Median 3Q Max
-0.16788 -0.07334 -0.01364 0.07946 0.22983
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.01424 0.06042 0.236 0.815
mean_vel -0.01889 0.23138 -0.082 0.935
Residual standard error: 0.1048 on 33 degrees of freedom
(96 observations deleted due to missingness)
Multiple R-squared: 0.0002019, Adjusted R-squared: -0.0301
F-statistic: 0.006664 on 1 and 33 DF, p-value: 0.9354
The overall model was not statistically significant, F(1, 33) = 0.007, p = 0.935, explaining very little of the variance in the dependent variable (R² = 0.0002, adjusted R² = -0.03).
Residuals were symmetrically distributed with a standard error of 0.1048 (df = 33), ranging from -0.168 to 0.230, suggesting that the model does not under- or overfit.
A second regression was conducted to understand the seasonal variation.
# Index against velocity climatology cci_moor_model2 <-lm(eac_cci ~ vel_clim, data = comb_moor_data) # vel_clim is the climatology value. summary(cci_moor_model2)
Call:
lm(formula = eac_cci ~ vel_clim, data = comb_moor_data)
Residuals:
Min 1Q Median 3Q Max
-0.164764 -0.083281 -0.007283 0.077675 0.198586
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1058 0.1063 -0.995 0.327
vel_clim 0.3085 0.2806 1.099 0.280
Residual standard error: 0.1029 on 33 degrees of freedom
(96 observations deleted due to missingness)
Multiple R-squared: 0.03533, Adjusted R-squared: 0.006097
F-statistic: 1.209 on 1 and 33 DF, p-value: 0.2796
This model was also not statistically significant, F(1, 33) = 1.209, p = 0.2796, explaining less than 1% of the variance in the dependent variable (R² = 0.035, adjusted R² = 0.006).
Residuals were symmetrically distributed with a standard error of 0.1029 (df = 33), ranging from -0.165 to 0.199, suggesting that the model does not under- or overfit.
Finally, we calculate the total variation in CCI that is explained by the seasonal and total variation in current. We calculate the non-seasonal component as the difference between the variance explained by the nested models.
# Index against velocity and climatology cci_moor_model3 <-lm(eac_cci ~ vel_clim + mean_vel, data = comb_moor_data) summary(cci_moor_model3)
Call:
lm(formula = eac_cci ~ vel_clim + mean_vel, data = comb_moor_data)
Residuals:
Min 1Q Median 3Q Max
-0.162752 -0.089102 -0.002646 0.077500 0.197560
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.09545 0.10942 -0.872 0.390
vel_clim 0.36527 0.30464 1.199 0.239
mean_vel -0.12650 0.24676 -0.513 0.612
Residual standard error: 0.1041 on 32 degrees of freedom
(96 observations deleted due to missingness)
Multiple R-squared: 0.04319, Adjusted R-squared: -0.01661
F-statistic: 0.7222 on 2 and 32 DF, p-value: 0.4934
This model was not statistically significant , F(2, 32) = 0.722, p = 0.493, explaining very little of the variance in the dependent variable (R² = 0.043, adjusted R² = -0.017).
Residuals were symmetrically distributed with a standard error of 0.1041 (df = 32), ranging from -0.163 to 0.198, suggesting that the model does not under- or overfit.
The results of the regression against the full index were not significant, possibly because the index covers the whole of the EAC and the mooring array is only at 27 S latitude.
Observed Velocity against the EAC CCI from the Northern part of the EAC
The regressions were then run again, using only index values from the northern part of the EAC.
# load northern Index values cci_north_data <-readRDS(file.path("var", "eac_cci_north_list.rds"))cci_north <- cci_north_data$cci_northcomb_moor_data_north <- comb_moor_data %>%full_join(cci_north, by ="date")head(comb_moor_data_north)
# A tibble: 6 × 6
date eac_cci mean_vel vel_clim month cci_north
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2010-08-01 00:00:00 -0.124 NA NA 8 -0.125
2 2010-09-01 00:00:00 -0.144 NA NA 9 NA
3 2010-10-01 00:00:00 -0.169 NA NA 10 -0.174
4 2010-12-01 00:00:00 -0.0828 NA NA 12 NA
5 2011-03-01 00:00:00 -0.0519 NA NA 3 0.0186
6 2011-05-01 00:00:00 -0.110 NA NA 5 -0.145
First the linear regression of the index against mean velocity.
# initial model cci_moor_model_n <-lm(cci_north ~ mean_vel, data = comb_moor_data_north)summary(cci_moor_model_n)
Call:
lm(formula = cci_north ~ mean_vel, data = comb_moor_data_north)
Residuals:
Min 1Q Median 3Q Max
-0.20088 -0.06455 0.03359 0.07295 0.16910
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.01369 0.08896 -0.154 0.879
mean_vel 0.11794 0.32733 0.360 0.722
Residual standard error: 0.11 on 23 degrees of freedom
(106 observations deleted due to missingness)
Multiple R-squared: 0.005613, Adjusted R-squared: -0.03762
F-statistic: 0.1298 on 1 and 23 DF, p-value: 0.7219
This model was not statistically significant , F(1, 23) = 0.13, p = 0.722, explaining very little of the variance in the dependent variable (R² = 0.006, adjusted R² = -0.038).
Residuals were symmetrically distributed with a standard error of 0.1041 (df = 32), ranging from -0.201 to 0.169, suggesting that the model does not under- or overfit.
A second regression was conducted to understand the seasonal variation.
# Index against velocity climatology cci_moor_model_n2 <-lm(cci_north ~ vel_clim, data = comb_moor_data_north) # vel_clim is the climatology value. summary(cci_moor_model_n2)
Call:
lm(formula = cci_north ~ vel_clim, data = comb_moor_data_north)
Residuals:
Min 1Q Median 3Q Max
-0.19206 -0.05308 0.00889 0.06336 0.14602
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.3195 0.1129 -2.829 0.00952 **
vel_clim 0.8992 0.2973 3.024 0.00604 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09332 on 23 degrees of freedom
(106 observations deleted due to missingness)
Multiple R-squared: 0.2845, Adjusted R-squared: 0.2534
F-statistic: 9.145 on 1 and 23 DF, p-value: 0.006038
This model was statistically significant, F(1, 23) = 9.145, p = 0.006, explaining approximately 25% of the variance in the dependent variable (R² = 0.285, adjusted R² = 0.253).
Residuals were symmetrically distributed with a standard error of 0.0933 (df = 23), ranging from -0.192 to 0.146, suggesting that the model does not under- or overfit.
Finally, we calculate the total variation in CCI that is explained by the seasonal and total variation in current. We calculate the non-seasonal component as the difference between the variance explained by the nested models.
# Index against velocity and climatology cci_moor_model_n3 <-lm(cci_north ~ vel_clim + mean_vel, data = comb_moor_data_north) summary(cci_moor_model_n3)
Call:
lm(formula = cci_north ~ vel_clim + mean_vel, data = comb_moor_data_north)
Residuals:
Min 1Q Median 3Q Max
-0.19330 -0.04991 0.01429 0.07210 0.14349
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.2983 0.1223 -2.439 0.02326 *
vel_clim 0.9473 0.3171 2.988 0.00678 **
mean_vel -0.1489 0.2961 -0.503 0.62003
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09488 on 22 degrees of freedom
(106 observations deleted due to missingness)
Multiple R-squared: 0.2926, Adjusted R-squared: 0.2283
F-statistic: 4.551 on 2 and 22 DF, p-value: 0.02219
This model was statistically significant, F(2, 22) = 4.551, p = 0.0222, explaining approximately 22.8% of the variance in the dependent variable (R² = 0.293, adjusted R² = 0.228).
Residuals were symmetrically distributed with a standard error of 0.0949 (df = 22), ranging from -0.193 to 0.143, suggesting that the model does not under- or overfit.
Model comparison
We test the significance of the non-seasonal component with ANOVA of the nested models.
# compare the two models anova(cci_moor_model_n3, cci_moor_model_n2, test ="F")
Analysis of Variance Table
Model 1: cci_north ~ vel_clim + mean_vel
Model 2: cci_north ~ vel_clim
Res.Df RSS Df Sum of Sq F Pr(>F)
1 22 0.19803
2 23 0.20031 -1 -0.0022765 0.2529 0.62
This comparison shows that adding the mean velocity value to the model does not explain any additional variance beyond what is explained by the climatology.
Model diagnostics
Diagnostics of Model 2 show that the values have a fairly normal distribution.
# look at model 1 gglm(cci_moor_model_n2)
A plot showing the model’s predicted values for different mean strength values, overlaid with actual raw data points to visualize the model’s fit to the observed data.
Warning: Removed 106 rows containing missing values or values outside the scale range
(`geom_point()`).
North vs South
Index values north and south of the separation zone were plotted against the climatology for the whole index.
# load north and south datacci_data_n <-readRDS(file.path("var", "eac_cci_north.rds"))index_n <- cci_data_n$month_data %>%# monthly average Index valuemutate(date =as.Date(trip_month))cci_data_s <-readRDS(file.path("var", "eac_cci_south.rds"))index_s <- cci_data_s$month_data %>%# monthly average Index valuemutate(date =as.Date(trip_month))
# Extract Velocity data for North and South# Extract data sourcesraw_data_n <-readRDS(file.path("var", "raw_data_list_n.rds"))raw_strength_data_n <- raw_data_n$raw_strength_data_nstr_clim_n <- raw_data_n$str_clim_n# combine raw data valuescomb_raw_data_n <- raw_strength_data_n %>%full_join(index_n, by ="date") %>%mutate(month =month(date) # extract month value ) %>%rename(mean_strength = mean_vel) %>%filter(!is.na(mean_strength) &!is.na(eac_cci))comb_raw_data_n <- comb_raw_data_n %>%inner_join(str_clim_n, by ="month")head(comb_raw_data_n)
To further understand the relationship between the EAC Velocity and the EAC Index, linear regressions were conducted against the two regions north and south of the separation zone.
North of the Separation Zone
First looking at the Northern region
# create the modelcci_str_model_n <-lm(eac_cci ~ mean_strength, data = comb_raw_data_n)summary(cci_str_model_n)
Call:
lm(formula = eac_cci ~ mean_strength, data = comb_raw_data_n)
Residuals:
Min 1Q Median 3Q Max
-0.213102 -0.067327 0.008573 0.073997 0.163753
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.22130 0.04762 -4.648 3.61e-05 ***
mean_strength 0.51525 0.10362 4.973 1.29e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09564 on 40 degrees of freedom
Multiple R-squared: 0.382, Adjusted R-squared: 0.3666
F-statistic: 24.73 on 1 and 40 DF, p-value: 1.292e-05
The overall model was statistically significant, F(1, 40) = 24.73, p < 0.001, explaining approximately 36.7% of the variance in the dependent variable (R² = 0.382, adjusted R² = 0.3666).
Residuals were symmetrically distributed with a standard error of 0.1153 (df = 40), ranging from -0.213 to 0.164.
A second regression was conducted to understand the seasonal variation.
# Index against velocity climatology cci_str_model_n2 <-lm(eac_cci ~ month_clim_str, data = comb_raw_data_n)summary(cci_str_model_n2)
Call:
lm(formula = eac_cci ~ month_clim_str, data = comb_raw_data_n)
Residuals:
Min 1Q Median 3Q Max
-0.288982 -0.072166 0.002886 0.077524 0.214175
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.2953 0.0923 -3.199 0.00270 **
month_clim_str 0.6754 0.2050 3.295 0.00207 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1079 on 40 degrees of freedom
Multiple R-squared: 0.2134, Adjusted R-squared: 0.1938
F-statistic: 10.85 on 1 and 40 DF, p-value: 0.002069
This model was also statistically significant, F(1, 40) = 10.85, p = 0.002, explaining approximately 19.4% of the variance in the dependent variable (R² = 0.213, adjusted R² = 0.194).
Residuals were symmetrically distributed with a standard error of 0.108 (df = 40), ranging from -0.289 to 0.214.
Finally, we calculate the total variation in CCI that is explained by the seasonal and total variation in current. We calculate the non-seasonal component as the difference between the variance explained by the nested models.
cci_str_model_n3 <-lm(eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data_n)summary(cci_str_model_n3)
Call:
lm(formula = eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data_n)
Residuals:
Min 1Q Median 3Q Max
-0.237955 -0.059713 -0.000473 0.066587 0.157075
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.36248 0.08009 -4.526 5.52e-05 ***
month_clim_str 0.40046 0.18674 2.144 0.038286 *
mean_strength 0.43246 0.10649 4.061 0.000228 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09161 on 39 degrees of freedom
Multiple R-squared: 0.4472, Adjusted R-squared: 0.4189
F-statistic: 15.78 on 2 and 39 DF, p-value: 9.55e-06
This model was also statistically significant, F(2, 39) = 15.78, p < 0.001, explaining approximately 42% of the variance in the dependent variable (R² = 0.447, adjusted R² = 0.419).
Residuals were symmetrically distributed with a standard error of 0.092 (df = 39), ranging from -0.238 to 0.1757.
Model comparison
We test the significance of the non-seasonal component with ANOVA of the nested models.
# compare the models anova(cci_str_model_n2, cci_str_model_n3, test ="F")
Analysis of Variance Table
Model 1: eac_cci ~ month_clim_str
Model 2: eac_cci ~ month_clim_str + mean_strength
Res.Df RSS Df Sum of Sq F Pr(>F)
1 40 0.46571
2 39 0.32731 1 0.13841 16.492 0.0002281 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Whilst all three models were statistically significant and explained some variance in the Index values, including both the mean velocity values and the climatology values, explains a larger portion of the variance (F(1, 39) = 16.49, p = 0.00023).
Model diagnostics
Diagnostics of Model 3 show that the values have a fairly normal distribution.
# look at model 3 gglm(cci_str_model_n3)
A plot showing the model’s predicted values for different mean strength values, overlaid with actual raw data points to visualize the model’s fit to the observed data.
Then repeating the analysis for the southern region
# create the initial model cci_str_model_s <-lm(eac_cci ~ mean_strength, data = comb_raw_data_s) summary(cci_str_model_s)
Call:
lm(formula = eac_cci ~ mean_strength, data = comb_raw_data_s)
Residuals:
Min 1Q Median 3Q Max
-0.24566 -0.09180 -0.01042 0.07893 0.30184
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.15132 0.05243 -2.886 0.00553 **
mean_strength 0.75092 0.22903 3.279 0.00180 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1344 on 56 degrees of freedom
Multiple R-squared: 0.161, Adjusted R-squared: 0.1461
F-statistic: 10.75 on 1 and 56 DF, p-value: 0.001796
The initial model was statistically significant, F(1, 56) = 10.75, p =0.0018, explaining approximately 14.6% of the variance in the dependent variable (R² = 0.161, adjusted R² = 0.146).
Residuals were symmetrically distributed with a standard error of 0.134 (df = 56), ranging from -0.246 to 0.302.
A second regression was conducted to understand the seasonal variation.
# Index against velocity climatology cci_str_model_s2 <-lm(eac_cci ~ month_clim_str, data = comb_raw_data_s) summary(cci_str_model_s2)
Call:
lm(formula = eac_cci ~ month_clim_str, data = comb_raw_data_s)
Residuals:
Min 1Q Median 3Q Max
-0.2432 -0.1096 -0.0283 0.1083 0.2786
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1383 0.1012 -1.366 0.177
month_clim_str 0.6979 0.4664 1.496 0.140
Residual standard error: 0.1438 on 56 degrees of freedom
Multiple R-squared: 0.03845, Adjusted R-squared: 0.02128
F-statistic: 2.239 on 1 and 56 DF, p-value: 0.1401
This model was not statistically significant, F(1, 56) = 2.239, p = 0.1401, explaining approximately 2% of the variance in the dependent variable (R² = 0.0385, adjusted R² = 0.0212).
Residuals were symmetrically distributed with a standard error of 0.1438 (df = 56), ranging from -0.243 to 0.279.
Finally, we calculate the total variation in CCI that is explained by the seasonal and total variation in current. We calculate the non-seasonal component as the difference between the variance explained by the nested models.
cci_str_model_s3 <-lm(eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data_s) summary(cci_str_model_s3)
Call:
lm(formula = eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data_s)
Residuals:
Min 1Q Median 3Q Max
-0.246275 -0.096545 -0.007547 0.082988 0.295954
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.17907 0.09638 -1.858 0.0685 .
month_clim_str 0.16433 0.47715 0.344 0.7319
mean_strength 0.71712 0.25085 2.859 0.0060 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1354 on 55 degrees of freedom
Multiple R-squared: 0.1628, Adjusted R-squared: 0.1324
F-statistic: 5.349 on 2 and 55 DF, p-value: 0.007536
This model was statistically significant, F(2, 55) = 5.439, p = 0.0075, explaining approximately 13% of the variance in the dependent variable (R² = 0.163, adjusted R² = 0.132).
Residuals were symmetrically distributed with a standard error of 0.135 (df = 55), ranging from -0.246 to 0.296.
Model comparison
We test the significance of the non-seasonal component with ANOVA of the nested models.
# compare the models anova(cci_str_model_s2, cci_str_model_s3, test ="F")
Analysis of Variance Table
Model 1: eac_cci ~ month_clim_str
Model 2: eac_cci ~ month_clim_str + mean_strength
Res.Df RSS Df Sum of Sq F Pr(>F)
1 56 1.1585
2 55 1.0086 1 0.14988 8.1727 0.005996 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Including both the mean velocity values and the climatology values explains a larger portion of the variance (F(1, 55) = 8.173, p = 0.0059).
Model diagnostics
Diagnostics of Model 3 show that the values have a fairly normal distribution.
# look at model 3 gglm(cci_str_model_s3)
A plot showing the model’s predicted values for different mean strength values, overlaid with actual raw data points to visualize the model’s fit to the observed data.
While both north and south showed a clear relationship to the mean velocity of the EAC, this relationship was stronger in the North, with the model explaining 42% of the variance in the index values in the North compared to the South where the model only explained 13% of the variance.