Index comparisons

# load libraries  
library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.5.1

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(marginaleffects)

Warning: package 'marginaleffects' was built under R version 4.5.1

library(gglm)

Warning: package 'gglm' was built under R version 4.5.1

Results

Climatology comparison

The climatologies of the Index and the Mean Monthly Strength showed a significant correlation between them (Pearson correlation = 0.673, p = 0.0165).

Relationship between Index values and Mean Monthly Strength

The raw values of the two were compared, there was statistically significant correlation between the two (Pearson = 0.497, p < 0.001).

Scatterplot of Index values against Mean Monthly Velocity values

To further explore the relationship between the output of the Index and the calculated mean monthly strength values, multiple linear regression models were calculated.

Raw: EAC strength vs. copepod index

# data wrangling

# Extract data sources
raw_data <- readRDS(file.path("var", "raw_data_list.rds"))

raw_strength_data <- raw_data$raw_strength_data
raw_index_data <- raw_data$raw_index_data
str_clim <- raw_data$str_clim

# combine raw data values

comb_raw_data <- raw_strength_data %>% 
  full_join(raw_index_data, by = "date") %>%  
  mutate(
    month = month(date) # extract month value
  ) %>% 
  rename(mean_strength = mean_vel) %>% 
  filter(!is.na(mean_strength) & !is.na(eac_cci))

comb_raw_data <- comb_raw_data %>% 
  inner_join(str_clim, by = "month")


head(comb_raw_data)

# A tibble: 6 × 6
  date       mean_strength eac_cci  year month month_clim_str
  <date>             <dbl>   <dbl> <dbl> <dbl>          <dbl>
1 2011-03-01         0.375 -0.0519  2011     3          0.297
2 2011-05-01         0.195 -0.110   2011     5          0.227
3 2011-07-01         0.198 -0.0420  2011     7          0.202
4 2011-12-01         0.207 -0.0323  2011    12          0.345
5 2012-04-01         0.264  0.0490  2012     4          0.260
6 2012-09-01         0.151 -0.150   2012     9          0.249

There are no samples taken in January, so the data only covers the period from February to December.

A multiple linear regression was conducted to examine the relationship between the Index values and the mean monthly strength.

# create the model

cci_str_model <- lm(eac_cci ~ mean_strength, data = comb_raw_data)

summary(cci_str_model)


Call:
lm(formula = eac_cci ~ mean_strength, data = comb_raw_data)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.242219 -0.060886 -0.001213  0.051507  0.297174 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -0.19661    0.04973  -3.953 0.000219 ***
mean_strength  0.77412    0.18080   4.282 7.34e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1071 on 56 degrees of freedom
Multiple R-squared:  0.2466,    Adjusted R-squared:  0.2332 
F-statistic: 18.33 on 1 and 56 DF,  p-value: 7.337e-05

The overall model was statistically significant, F(1, 56) = 18.33, p = < 0.001, explaining approximately 23% of the variance in the dependent variable (R² = 0.247, adjusted R² = 0.233).

Residuals were symmetrically distributed with a standard error of 0.107 (df = 56), ranging from -0.242 to 0.297, suggesting that the model does not under or overfit.

A second regression was conducted to understand the seasonal variation.

# Index against strength climatology
cci_str_model2 <- lm(eac_cci ~ month_clim_str, data = comb_raw_data)
# month_clim_str is the strength climatology value.

summary(cci_str_model2)


Call:
lm(formula = eac_cci ~ month_clim_str, data = comb_raw_data)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.237738 -0.099558 -0.004869  0.076313  0.277945 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)  
(Intercept)     -0.1538     0.0852  -1.805   0.0764 .
month_clim_str   0.6152     0.3191   1.928   0.0590 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1195 on 56 degrees of freedom
Multiple R-squared:  0.06223,   Adjusted R-squared:  0.04549 
F-statistic: 3.716 on 1 and 56 DF,  p-value: 0.05896

This model was not statistically significant, F(1, 56) = 3.716, p = 0.238, explaining approximately 4.6% of the variance in the dependent variable (R² = 0.062, adjusted R² = 0.046).

Residuals were symmetrically distributed with a standard error of 0.12 (df = 56), ranging from -0.237 to 0.278.

Finally, we calculate the total variation in CCI that is explained by the seasonal and total variation in current. We calculate the non-seasonal component as the difference between the variance explained by the nested models.

# Index against strength climatology
cci_str_model3 <- lm(eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data)

summary(cci_str_model3)


Call:
lm(formula = eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.243860 -0.061231 -0.001909  0.052632  0.293831 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.21530    0.07878  -2.733 0.008425 ** 
month_clim_str  0.09862    0.32065   0.308 0.759582    
mean_strength   0.74687    0.20267   3.685 0.000524 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1079 on 55 degrees of freedom
Multiple R-squared:  0.2479,    Adjusted R-squared:  0.2206 
F-statistic: 9.066 on 2 and 55 DF,  p-value: 0.0003955

cci_str_model4 <- lm(mean_strength ~ eac_cci, data = comb_raw_data)

summary(cci_str_model4)


Call:
lm(formula = mean_strength ~ eac_cci, data = comb_raw_data)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.118074 -0.048626 -0.008999  0.056730  0.149406 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.261425   0.009037  28.929  < 2e-16 ***
eac_cci     0.318600   0.074409   4.282 7.34e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.06869 on 56 degrees of freedom
Multiple R-squared:  0.2466,    Adjusted R-squared:  0.2332 
F-statistic: 18.33 on 1 and 56 DF,  p-value: 7.337e-05

The overall model was statistically significant, F(2, 55) = 9.066, p < 0.001, explaining approximately 22% of the variance in the dependent variable (R² = 0.248, adjusted R² = 0.221).

Residuals were symmetrically distributed with a standard error of 0.108 (df = 55), ranging from -0.244 to 0.294, suggesting that the model does not under or overfit.

Model comparison

We test the significance of the non-seasonal component with ANOVA of the nested models.

# compare the two models 

anova(cci_str_model2, cci_str_model3, test = "F")

Analysis of Variance Table

Model 1: eac_cci ~ month_clim_str
Model 2: eac_cci ~ month_clim_str + mean_strength
  Res.Df     RSS Df Sum of Sq     F    Pr(>F)    
1     56 0.79905                                 
2     55 0.64082  1   0.15823 13.58 0.0005237 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The comparison revealed that Model 3 provided a better fit to the data than Model 2, F(1, 55) = 13.58, p < 0.001.

This suggests that the monthly mean strength value explains more of the variance in the EAC CCI than the strength climatology alone

Model diagnostics

Diagnostics of Model 3 show that the values have a fairly normal distribution.

# look at model 3

gglm(cci_str_model3)

A plot showing the model’s predicted values for different mean strength values, overlaid with actual raw data points to visualize the model’s fit to the observed data.

plot_predictions(cci_str_model3, condition = "mean_strength") +
  geom_point(data = comb_raw_data,
             aes(x = mean_strength, y = eac_cci))

plot_predictions(cci_str_model3, condition = "month_clim_str") +
  geom_point(data = comb_raw_data,
             aes(x = month_clim_str, y = eac_cci))

plot_predictions(cci_str_model3, condition = "mean_strength", points= 0.25)

Mooring Array

Climatologies of the Index and the observational data from the mooring array were compared, however there was no significant correlation (Pearson correlation = 0.204, p = 0.526)

The velocity data was plotted first against the full index values, and then against just the index values from the Northern region of the EAC. However, while the Pearson correlation value changed, the significance did not (p = 0.08)

Scatterplot of Mean velocity against the full Index

Scatterplot of mean velocity against just the Northern region of the Index

Linear regressions were run on the data from the mooring array against the results of the full index.

# load mooring array data

mooring_data <- readRDS(file.path("var", "mooring_data_list.rds"))

mooring_vel_data <- mooring_data$mooring_vel_data 
moor_clim <- mooring_data$moor_clim

comb_moor_data <- mooring_vel_data %>% 
   full_join(moor_clim, by = "date") %>% 
  mutate(
    month = month(date)
  )

head(comb_moor_data)

# A tibble: 6 × 5
  date       eac_cci mean_vel vel_clim month
  <date>       <dbl>    <dbl>    <dbl> <dbl>
1 2010-08-01 -0.124        NA       NA     8
2 2010-09-01 -0.144        NA       NA     9
3 2010-10-01 -0.169        NA       NA    10
4 2010-12-01 -0.0828       NA       NA    12
5 2011-03-01 -0.0519       NA       NA     3
6 2011-05-01 -0.110        NA       NA     5

A linear regression was conducted to examine the relationship between the Index values and the observed mean monthly velocity.

# create the model

cci_moor_model <- lm(eac_cci ~ mean_vel, data = comb_moor_data)

summary(cci_moor_model)


Call:
lm(formula = eac_cci ~ mean_vel, data = comb_moor_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.16788 -0.07334 -0.01364  0.07946  0.22983 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.01424    0.06042   0.236    0.815
mean_vel    -0.01889    0.23138  -0.082    0.935

Residual standard error: 0.1048 on 33 degrees of freedom
  (96 observations deleted due to missingness)
Multiple R-squared:  0.0002019, Adjusted R-squared:  -0.0301 
F-statistic: 0.006664 on 1 and 33 DF,  p-value: 0.9354

The overall model was not statistically significant, F(1, 33) = 0.007, p = 0.935, explaining very little of the variance in the dependent variable (R² = 0.0002, adjusted R² = -0.03).

Residuals were symmetrically distributed with a standard error of 0.1048 (df = 33), ranging from -0.168 to 0.230, suggesting that the model does not under- or overfit.

A second regression was conducted to understand the seasonal variation.

# Index against velocity climatology 
cci_moor_model2 <- lm(eac_cci ~ vel_clim, data = comb_moor_data) # vel_clim is the  climatology value.  

summary(cci_moor_model2)


Call:
lm(formula = eac_cci ~ vel_clim, data = comb_moor_data)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.164764 -0.083281 -0.007283  0.077675  0.198586 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.1058     0.1063  -0.995    0.327
vel_clim      0.3085     0.2806   1.099    0.280

Residual standard error: 0.1029 on 33 degrees of freedom
  (96 observations deleted due to missingness)
Multiple R-squared:  0.03533,   Adjusted R-squared:  0.006097 
F-statistic: 1.209 on 1 and 33 DF,  p-value: 0.2796

This model was also not statistically significant, F(1, 33) = 1.209, p = 0.2796, explaining less than 1% of the variance in the dependent variable (R² = 0.035, adjusted R² = 0.006).

Residuals were symmetrically distributed with a standard error of 0.1029 (df = 33), ranging from -0.165 to 0.199, suggesting that the model does not under- or overfit.

# Index against velocity and climatology 
cci_moor_model3 <- lm(eac_cci ~ vel_clim + mean_vel, data = comb_moor_data)  

summary(cci_moor_model3)


Call:
lm(formula = eac_cci ~ vel_clim + mean_vel, data = comb_moor_data)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.162752 -0.089102 -0.002646  0.077500  0.197560 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.09545    0.10942  -0.872    0.390
vel_clim     0.36527    0.30464   1.199    0.239
mean_vel    -0.12650    0.24676  -0.513    0.612

Residual standard error: 0.1041 on 32 degrees of freedom
  (96 observations deleted due to missingness)
Multiple R-squared:  0.04319,   Adjusted R-squared:  -0.01661 
F-statistic: 0.7222 on 2 and 32 DF,  p-value: 0.4934

This model was not statistically significant , F(2, 32) = 0.722, p = 0.493, explaining very little of the variance in the dependent variable (R² = 0.043, adjusted R² = -0.017).

Residuals were symmetrically distributed with a standard error of 0.1041 (df = 32), ranging from -0.163 to 0.198, suggesting that the model does not under- or overfit.

The results of the regression against the full index were not significant, possibly because the index covers the whole of the EAC and the mooring array is only at 27 S latitude.

Observed Velocity against the EAC CCI from the Northern part of the EAC

The regressions were then run again, using only index values from the northern part of the EAC.

# load northern Index values 
cci_north_data <- readRDS(file.path("var", "eac_cci_north_list.rds"))

cci_north <- cci_north_data$cci_north

comb_moor_data_north <- comb_moor_data %>%  
  full_join(cci_north, by = "date")

head(comb_moor_data_north)

# A tibble: 6 × 6
  date                eac_cci mean_vel vel_clim month cci_north
  <dttm>                <dbl>    <dbl>    <dbl> <dbl>     <dbl>
1 2010-08-01 00:00:00 -0.124        NA       NA     8   -0.125 
2 2010-09-01 00:00:00 -0.144        NA       NA     9   NA     
3 2010-10-01 00:00:00 -0.169        NA       NA    10   -0.174 
4 2010-12-01 00:00:00 -0.0828       NA       NA    12   NA     
5 2011-03-01 00:00:00 -0.0519       NA       NA     3    0.0186
6 2011-05-01 00:00:00 -0.110        NA       NA     5   -0.145

First the linear regression of the index against mean velocity.

# initial model 

cci_moor_model_n <- lm(cci_north ~ mean_vel, data = comb_moor_data_north)

summary(cci_moor_model_n)


Call:
lm(formula = cci_north ~ mean_vel, data = comb_moor_data_north)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.20088 -0.06455  0.03359  0.07295  0.16910 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.01369    0.08896  -0.154    0.879
mean_vel     0.11794    0.32733   0.360    0.722

Residual standard error: 0.11 on 23 degrees of freedom
  (106 observations deleted due to missingness)
Multiple R-squared:  0.005613,  Adjusted R-squared:  -0.03762 
F-statistic: 0.1298 on 1 and 23 DF,  p-value: 0.7219

This model was not statistically significant , F(1, 23) = 0.13, p = 0.722, explaining very little of the variance in the dependent variable (R² = 0.006, adjusted R² = -0.038).

Residuals were symmetrically distributed with a standard error of 0.1041 (df = 32), ranging from -0.201 to 0.169, suggesting that the model does not under- or overfit.

A second regression was conducted to understand the seasonal variation.

# Index against velocity climatology  
cci_moor_model_n2 <- lm(cci_north ~ vel_clim, data = comb_moor_data_north) # vel_clim is the  climatology value.    

summary(cci_moor_model_n2)


Call:
lm(formula = cci_north ~ vel_clim, data = comb_moor_data_north)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.19206 -0.05308  0.00889  0.06336  0.14602 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -0.3195     0.1129  -2.829  0.00952 **
vel_clim      0.8992     0.2973   3.024  0.00604 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09332 on 23 degrees of freedom
  (106 observations deleted due to missingness)
Multiple R-squared:  0.2845,    Adjusted R-squared:  0.2534 
F-statistic: 9.145 on 1 and 23 DF,  p-value: 0.006038

This model was statistically significant, F(1, 23) = 9.145, p = 0.006, explaining approximately 25% of the variance in the dependent variable (R² = 0.285, adjusted R² = 0.253).

Residuals were symmetrically distributed with a standard error of 0.0933 (df = 23), ranging from -0.192 to 0.146, suggesting that the model does not under- or overfit.

# Index against velocity and climatology  
cci_moor_model_n3 <- lm(cci_north ~ vel_clim + mean_vel, data = comb_moor_data_north)    
summary(cci_moor_model_n3)


Call:
lm(formula = cci_north ~ vel_clim + mean_vel, data = comb_moor_data_north)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.19330 -0.04991  0.01429  0.07210  0.14349 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -0.2983     0.1223  -2.439  0.02326 * 
vel_clim      0.9473     0.3171   2.988  0.00678 **
mean_vel     -0.1489     0.2961  -0.503  0.62003   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09488 on 22 degrees of freedom
  (106 observations deleted due to missingness)
Multiple R-squared:  0.2926,    Adjusted R-squared:  0.2283 
F-statistic: 4.551 on 2 and 22 DF,  p-value: 0.02219

This model was statistically significant, F(2, 22) = 4.551, p = 0.0222, explaining approximately 22.8% of the variance in the dependent variable (R² = 0.293, adjusted R² = 0.228).

Residuals were symmetrically distributed with a standard error of 0.0949 (df = 22), ranging from -0.193 to 0.143, suggesting that the model does not under- or overfit.

Model comparison

We test the significance of the non-seasonal component with ANOVA of the nested models.

# compare the two models   
anova(cci_moor_model_n3, cci_moor_model_n2, test = "F")

Analysis of Variance Table

Model 1: cci_north ~ vel_clim + mean_vel
Model 2: cci_north ~ vel_clim
  Res.Df     RSS Df  Sum of Sq      F Pr(>F)
1     22 0.19803                            
2     23 0.20031 -1 -0.0022765 0.2529   0.62

This comparison shows that adding the mean velocity value to the model does not explain any additional variance beyond what is explained by the climatology.

Model diagnostics

Diagnostics of Model 2 show that the values have a fairly normal distribution.

# look at model 1  
gglm(cci_moor_model_n2)

A plot showing the model’s predicted values for different mean strength values, overlaid with actual raw data points to visualize the model’s fit to the observed data.

plot_predictions(cci_moor_model_n2, condition = "vel_clim") +   geom_point(data = comb_moor_data_north,                                                aes(x = vel_clim, y = cci_north))

Warning: Removed 106 rows containing missing values or values outside the scale range
(`geom_point()`).

North vs South

Index values north and south of the separation zone were plotted against the climatology for the whole index.

# load north and south data
cci_data_n <- readRDS(file.path("var", "eac_cci_north.rds"))


index_n <- cci_data_n$month_data %>% # monthly average Index value
  mutate(date = as.Date(trip_month))

cci_data_s <- readRDS(file.path("var", "eac_cci_south.rds"))


index_s <- cci_data_s$month_data %>% # monthly average Index value
  mutate(date = as.Date(trip_month))

# Extract Velocity data for North and South

# Extract data sources
raw_data_n <- readRDS(file.path("var", "raw_data_list_n.rds"))

raw_strength_data_n <- raw_data_n$raw_strength_data_n
str_clim_n <- raw_data_n$str_clim_n

# combine raw data values

comb_raw_data_n <- raw_strength_data_n %>% 
  full_join(index_n, by = "date") %>%  
  mutate(
    month = month(date) # extract month value
  ) %>% 
  rename(mean_strength = mean_vel) %>% 
  filter(!is.na(mean_strength) & !is.na(eac_cci))

comb_raw_data_n <- comb_raw_data_n %>% 
  inner_join(str_clim_n, by = "month")


head(comb_raw_data_n)

# A tibble: 6 × 7
  date       mean_strength trip_month          eac_cci num_samples month
  <date>             <dbl> <dttm>                <dbl>       <int> <dbl>
1 2011-03-01         0.444 2011-03-01 00:00:00  0.0186          16     3
2 2011-05-01         0.313 2011-05-01 00:00:00 -0.145           10     5
3 2011-12-01         0.332 2011-12-01 00:00:00 -0.0260          12    12
4 2012-04-01         0.500 2012-04-01 00:00:00  0.128           17     4
5 2012-09-01         0.212 2012-09-01 00:00:00 -0.173           16     9
6 2014-03-01         0.577 2014-03-01 00:00:00  0.169            9     3
# ℹ 1 more variable: month_clim_str <dbl>

# Extract data sources
raw_data_s <- readRDS(file.path("var", "raw_data_list_s.rds"))

raw_strength_data_s <- raw_data_s$raw_strength_data_s
str_clim_s <- raw_data_s$str_clim_s

# combine raw data values

comb_raw_data_s <- raw_strength_data_s %>% 
  full_join(index_s, by = "date") %>%  
  mutate(
    month = month(date) # extract month value
  ) %>% 
  rename(mean_strength = mean_vel) %>% 
  filter(!is.na(mean_strength) & !is.na(eac_cci))

comb_raw_data_s <- comb_raw_data_s %>% 
  inner_join(str_clim_s, by = "month")


head(comb_raw_data_s)

# A tibble: 6 × 7
  date       mean_strength trip_month          eac_cci num_samples month
  <date>             <dbl> <dttm>                <dbl>       <int> <dbl>
1 2011-03-01         0.353 2011-03-01 00:00:00 -0.115           18     3
2 2011-05-01         0.163 2011-05-01 00:00:00 -0.0784          11     5
3 2011-07-01         0.155 2011-07-01 00:00:00 -0.0420          10     7
4 2011-12-01         0.171 2011-12-01 00:00:00 -0.108            1    12
5 2012-04-01         0.198 2012-04-01 00:00:00 -0.0546          13     4
6 2012-09-01         0.133 2012-09-01 00:00:00 -0.122           13     9
# ℹ 1 more variable: month_clim_str <dbl>

To further understand the relationship between the EAC Velocity and the EAC Index, linear regressions were conducted against the two regions north and south of the separation zone.

North of the Separation Zone

First looking at the Northern region

# create the model

cci_str_model_n <- lm(eac_cci ~ mean_strength, data = comb_raw_data_n)

summary(cci_str_model_n)


Call:
lm(formula = eac_cci ~ mean_strength, data = comb_raw_data_n)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.213102 -0.067327  0.008573  0.073997  0.163753 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -0.22130    0.04762  -4.648 3.61e-05 ***
mean_strength  0.51525    0.10362   4.973 1.29e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09564 on 40 degrees of freedom
Multiple R-squared:  0.382, Adjusted R-squared:  0.3666 
F-statistic: 24.73 on 1 and 40 DF,  p-value: 1.292e-05

The overall model was statistically significant, F(1, 40) = 24.73, p < 0.001, explaining approximately 36.7% of the variance in the dependent variable (R² = 0.382, adjusted R² = 0.3666).

Residuals were symmetrically distributed with a standard error of 0.1153 (df = 40), ranging from -0.213 to 0.164.

A second regression was conducted to understand the seasonal variation.

# Index against velocity climatology 
cci_str_model_n2 <- lm(eac_cci ~ month_clim_str, data = comb_raw_data_n)

summary(cci_str_model_n2)


Call:
lm(formula = eac_cci ~ month_clim_str, data = comb_raw_data_n)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.288982 -0.072166  0.002886  0.077524  0.214175 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)   
(Intercept)     -0.2953     0.0923  -3.199  0.00270 **
month_clim_str   0.6754     0.2050   3.295  0.00207 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1079 on 40 degrees of freedom
Multiple R-squared:  0.2134,    Adjusted R-squared:  0.1938 
F-statistic: 10.85 on 1 and 40 DF,  p-value: 0.002069

This model was also statistically significant, F(1, 40) = 10.85, p = 0.002, explaining approximately 19.4% of the variance in the dependent variable (R² = 0.213, adjusted R² = 0.194).

Residuals were symmetrically distributed with a standard error of 0.108 (df = 40), ranging from -0.289 to 0.214.

cci_str_model_n3 <- lm(eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data_n)

summary(cci_str_model_n3)


Call:
lm(formula = eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data_n)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.237955 -0.059713 -0.000473  0.066587  0.157075 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.36248    0.08009  -4.526 5.52e-05 ***
month_clim_str  0.40046    0.18674   2.144 0.038286 *  
mean_strength   0.43246    0.10649   4.061 0.000228 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09161 on 39 degrees of freedom
Multiple R-squared:  0.4472,    Adjusted R-squared:  0.4189 
F-statistic: 15.78 on 2 and 39 DF,  p-value: 9.55e-06

This model was also statistically significant, F(2, 39) = 15.78, p < 0.001, explaining approximately 42% of the variance in the dependent variable (R² = 0.447, adjusted R² = 0.419).

Residuals were symmetrically distributed with a standard error of 0.092 (df = 39), ranging from -0.238 to 0.1757.

Model comparison

We test the significance of the non-seasonal component with ANOVA of the nested models.

# compare the models 

anova(cci_str_model_n2, cci_str_model_n3, test = "F")

Analysis of Variance Table

Model 1: eac_cci ~ month_clim_str
Model 2: eac_cci ~ month_clim_str + mean_strength
  Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
1     40 0.46571                                  
2     39 0.32731  1   0.13841 16.492 0.0002281 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Whilst all three models were statistically significant and explained some variance in the Index values, including both the mean velocity values and the climatology values, explains a larger portion of the variance (F(1, 39) = 16.49, p = 0.00023).

Model diagnostics

Diagnostics of Model 3 show that the values have a fairly normal distribution.

# look at model 3  
gglm(cci_str_model_n3)

A plot showing the model’s predicted values for different mean strength values, overlaid with actual raw data points to visualize the model’s fit to the observed data.

plot_predictions(cci_str_model_n3, condition = "mean_strength") +
  geom_point(data = comb_raw_data_n,
             aes(x = mean_strength, y = eac_cci))

South of the separation zone

Then repeating the analysis for the southern region

# create the initial model  
cci_str_model_s <- lm(eac_cci ~ mean_strength, data = comb_raw_data_s)  
summary(cci_str_model_s)


Call:
lm(formula = eac_cci ~ mean_strength, data = comb_raw_data_s)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.24566 -0.09180 -0.01042  0.07893  0.30184 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)   -0.15132    0.05243  -2.886  0.00553 **
mean_strength  0.75092    0.22903   3.279  0.00180 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1344 on 56 degrees of freedom
Multiple R-squared:  0.161, Adjusted R-squared:  0.1461 
F-statistic: 10.75 on 1 and 56 DF,  p-value: 0.001796

The initial model was statistically significant, F(1, 56) = 10.75, p =0.0018, explaining approximately 14.6% of the variance in the dependent variable (R² = 0.161, adjusted R² = 0.146).

Residuals were symmetrically distributed with a standard error of 0.134 (df = 56), ranging from -0.246 to 0.302.

A second regression was conducted to understand the seasonal variation.

# Index against velocity climatology  
cci_str_model_s2 <- lm(eac_cci ~ month_clim_str, data = comb_raw_data_s)  
summary(cci_str_model_s2)


Call:
lm(formula = eac_cci ~ month_clim_str, data = comb_raw_data_s)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.2432 -0.1096 -0.0283  0.1083  0.2786 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)
(Intercept)     -0.1383     0.1012  -1.366    0.177
month_clim_str   0.6979     0.4664   1.496    0.140

Residual standard error: 0.1438 on 56 degrees of freedom
Multiple R-squared:  0.03845,   Adjusted R-squared:  0.02128 
F-statistic: 2.239 on 1 and 56 DF,  p-value: 0.1401

This model was not statistically significant, F(1, 56) = 2.239, p = 0.1401, explaining approximately 2% of the variance in the dependent variable (R² = 0.0385, adjusted R² = 0.0212).

Residuals were symmetrically distributed with a standard error of 0.1438 (df = 56), ranging from -0.243 to 0.279.

cci_str_model_s3 <- lm(eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data_s)  
summary(cci_str_model_s3)


Call:
lm(formula = eac_cci ~ month_clim_str + mean_strength, data = comb_raw_data_s)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.246275 -0.096545 -0.007547  0.082988  0.295954 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)   
(Intercept)    -0.17907    0.09638  -1.858   0.0685 . 
month_clim_str  0.16433    0.47715   0.344   0.7319   
mean_strength   0.71712    0.25085   2.859   0.0060 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1354 on 55 degrees of freedom
Multiple R-squared:  0.1628,    Adjusted R-squared:  0.1324 
F-statistic: 5.349 on 2 and 55 DF,  p-value: 0.007536

This model was statistically significant, F(2, 55) = 5.439, p = 0.0075, explaining approximately 13% of the variance in the dependent variable (R² = 0.163, adjusted R² = 0.132).

Residuals were symmetrically distributed with a standard error of 0.135 (df = 55), ranging from -0.246 to 0.296.

Model comparison

We test the significance of the non-seasonal component with ANOVA of the nested models.

# compare the models   
anova(cci_str_model_s2, cci_str_model_s3, test = "F")

Analysis of Variance Table

Model 1: eac_cci ~ month_clim_str
Model 2: eac_cci ~ month_clim_str + mean_strength
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1     56 1.1585                                
2     55 1.0086  1   0.14988 8.1727 0.005996 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Including both the mean velocity values and the climatology values explains a larger portion of the variance (F(1, 55) = 8.173, p = 0.0059).

Model diagnostics

Diagnostics of Model 3 show that the values have a fairly normal distribution.

# look at model 3   
gglm(cci_str_model_s3)

A plot showing the model’s predicted values for different mean strength values, overlaid with actual raw data points to visualize the model’s fit to the observed data.

plot_predictions(cci_str_model_s3, condition = "mean_strength") +   
  geom_point(data = comb_raw_data_s,
             aes(x = mean_strength, y = eac_cci))

North and South comparison

While both north and south showed a clear relationship to the mean velocity of the EAC, this relationship was stronger in the North, with the model explaining 42% of the variance in the index values in the North compared to the South where the model only explained 13% of the variance.