Introduction

This analysis examines temporal patterns in marine ecological data from USA and Canadian waters spanning 1990-2023. We use Principal Component Analysis (PCA) to identify the dominant patterns of variation and how they change over time.

Research Questions

  1. What are the main patterns of variation in the ecological variables?
  2. How do these patterns change temporally?
  3. Are there differences between USA and Canadian regions?

Data Preparation

library(dplyr)
library(tidyr)
library(ggplot2)
library(patchwork)
library(knitr)
# Read your data
data <- read.csv(here::here("NancBranchDataScript/NanceDataPCA/PCAfile.csv"))

# Display first few rows
kable(head(data), caption = "Sample of raw data")
Sample of raw data
Region Year Abundance Depth_Mean Area_Threshold COG_Lat COG_Long Dist_Mean Trailing_E Trailing_N Leading_E Leading_N
USA 1990 89418.86 -129.5433 14758121 42.77405 -68.64097 -74.41776 -386.1169 4704.13 813.8831 5054.13
Canada 1990 1357352.28 -325.5722 166382117 43.80400 -61.58646 503.05803 -386.1169 4704.13 813.8831 5054.13
USA 1991 80265.13 -124.0044 13134728 42.93342 -68.53258 -64.67706 -411.1169 4729.13 788.8831 5104.13
Canada 1991 1173461.46 -376.5150 157922009 43.92542 -61.05622 542.77031 -411.1169 4729.13 788.8831 5104.13
USA 1992 85720.35 -126.5474 13725053 42.87893 -68.62593 -72.09522 -411.1169 4729.13 963.8831 5229.13
Canada 1992 1169725.36 -301.7881 186827377 44.02689 -61.27654 538.13259 -411.1169 4729.13 963.8831 5229.13

Variable Transformations

To improve the ecological interpretability and statistical properties of our analysis, we apply several transformations:

  • Log-transformations: Applied to Abundance and Area_Threshold to normalize their highly skewed distributions
  • Depth per area: Replaces raw depth with a density metric that’s more ecologically meaningful
  • Directional metrics: Capture the orientation and spatial bias of distributions
data_enhanced <- data %>%
  mutate(
    # Log-transform skewed variables
    Log_Abundance = log(Abundance),
    Log_Area = log(Area_Threshold),
    # Depth per unit area (more ecologically meaningful)
    Depth_per_Area = Depth_Mean / Area_Threshold,
    # Calculate spatial extent (distance between leading and trailing edges)
    Spatial_Extent_E = Leading_E - Trailing_E,
    Spatial_Extent_N = Leading_N - Trailing_N,
    # Total spatial extent
    Total_Extent = sqrt(Spatial_Extent_E^2 + Spatial_Extent_N^2),
    # Directional metrics
    East_West_Position = (Leading_E + Trailing_E) / 2,
    North_South_Position = (Leading_N + Trailing_N) / 2,
    Directional_Bias_EW = Spatial_Extent_E / Total_Extent,
    Directional_Bias_NS = Spatial_Extent_N / Total_Extent,
    # Abundance density (log scale)
    Log_Abundance_Density = Log_Abundance - Log_Area
  )

# Select variables for PCA
pca_vars <- data_enhanced %>%
  select(Log_Abundance, Log_Area, Depth_per_Area, 
         COG_Lat, COG_Long,
         Spatial_Extent_E, Spatial_Extent_N, Total_Extent,
         East_West_Position, North_South_Position,
         Directional_Bias_EW, Directional_Bias_NS,
         Log_Abundance_Density)

Principal Component Analysis

PCA Execution - Combined Analysis

# Standardize the variables (critical for PCA)
pca_scaled <- scale(pca_vars)

# Perform PCA on combined data
pca_result <- prcomp(pca_scaled)

# Extract PC scores and add back Year and Region
pca_scores <- data_enhanced %>%
  select(Region, Year) %>%
  bind_cols(as.data.frame(pca_result$x))

Regional Separate PCA Analyses

To examine whether the ecological relationships differ between regions, we perform independent PCA analyses for USA and Canadian waters:

# Split data by region
pca_vars_by_region <- list(
  USA = pca_vars %>% slice(which(data_enhanced$Region == "USA")),
  Canada = pca_vars %>% slice(which(data_enhanced$Region == "Canada"))
)

# Perform separate PCA for each region
pca_usa <- prcomp(scale(pca_vars_by_region$USA))
pca_canada <- prcomp(scale(pca_vars_by_region$Canada))

# Extract scores for each region
pca_scores_usa <- data_enhanced %>%
  filter(Region == "USA") %>%
  select(Region, Year) %>%
  bind_cols(as.data.frame(pca_usa$x))

pca_scores_canada <- data_enhanced %>%
  filter(Region == "Canada") %>%
  select(Region, Year) %>%
  bind_cols(as.data.frame(pca_canada$x))

# Combine regional scores
pca_scores_regional <- bind_rows(pca_scores_usa, pca_scores_canada)
# Variance explained by region
variance_usa <- data.frame(
  Region = "USA",
  PC = paste0("PC", 1:length(pca_usa$sdev)),
  Prop_Var = pca_usa$sdev^2 / sum(pca_usa$sdev^2),
  Cum_Var = cumsum(pca_usa$sdev^2 / sum(pca_usa$sdev^2))
)

variance_canada <- data.frame(
  Region = "Canada",
  PC = paste0("PC", 1:length(pca_canada$sdev)),
  Prop_Var = pca_canada$sdev^2 / sum(pca_canada$sdev^2),
  Cum_Var = cumsum(pca_canada$sdev^2 / sum(pca_canada$sdev^2))
)

variance_by_region <- bind_rows(variance_usa, variance_canada) %>%
  filter(PC %in% c("PC1", "PC2", "PC3"))

kable(variance_by_region, digits = 3,
      caption = "Variance explained by region: Comparison of first 3 PCs",
      col.names = c("Region", "Component", "Proportion", "Cumulative"))
Variance explained by region: Comparison of first 3 PCs
Region Component Proportion Cumulative
USA PC1 0.409 0.409
USA PC2 0.299 0.708
USA PC3 0.199 0.908
Canada PC1 0.602 0.602
Canada PC2 0.208 0.810
Canada PC3 0.107 0.917

Regional Variance Interpretation: The proportion of variance explained by PC1 and PC2 differs between regions. Canada explains less total variance in its first PC, suggesting more complex ecological structuring, indicating region-specific ecological complexity. These differences suggest that USA and Canadian populations experience different suites of environmental constraints or respond to environmental gradients differently.

Variance Explained

variance_explained <- data.frame(
  PC = paste0("PC", 1:length(pca_result$sdev)),
  Variance = pca_result$sdev^2,
  Prop_Var = pca_result$sdev^2 / sum(pca_result$sdev^2),
  Cum_Var = cumsum(pca_result$sdev^2 / sum(pca_result$sdev^2))
)

kable(head(variance_explained, 5), digits = 3,
      col.names = c("Component", "Variance", "Proportion", "Cumulative"),
      caption = "Variance explained by principal components")
Variance explained by principal components
Component Variance Proportion Cumulative
PC1 4.969 0.382 0.382
PC2 4.130 0.318 0.700
PC3 2.618 0.201 0.901
PC4 0.910 0.070 0.971
PC5 0.165 0.013 0.984

Interpretation: The first principal component (PC1) explains 38.2% of the total variance, while PC2 explains 31.8%. Together, the first two components capture 70% of the variation in the data, providing a robust summary of the main patterns.

ggplot(variance_explained[1:8,], aes(x = PC, y = Prop_Var)) +
  geom_col(fill = "steelblue") +
  geom_line(aes(group = 1), linewidth = 1) +
  geom_point(size = 3) +
  scale_y_continuous(labels = scales::percent) +
  theme_minimal() +
  labs(title = "Scree Plot: Variance Explained by Each PC",
       x = "Principal Component", y = "Proportion of Variance Explained")

Variable Loadings - Combined Analysis

loadings_df <- as.data.frame(pca_result$rotation[, 1:2])
loadings_df$Variable <- rownames(loadings_df)

kable(loadings_df %>% arrange(desc(abs(PC1))), digits = 3,
      caption = "Variable loadings on PC1 and PC2 - Combined analysis (sorted by PC1 magnitude)")
Variable loadings on PC1 and PC2 - Combined analysis (sorted by PC1 magnitude)
PC1 PC2 Variable
Log_Abundance 0.399 -0.189 Log_Abundance
COG_Lat 0.379 -0.199 COG_Lat
COG_Long 0.374 -0.268 COG_Long
Log_Area 0.367 -0.278 Log_Area
Depth_per_Area 0.340 -0.287 Depth_per_Area
Total_Extent 0.265 0.387 Total_Extent
Spatial_Extent_E 0.262 0.387 Spatial_Extent_E
East_West_Position 0.256 0.358 East_West_Position
North_South_Position 0.203 0.257 North_South_Position
Spatial_Extent_N 0.189 0.239 Spatial_Extent_N
Directional_Bias_EW 0.105 0.197 Directional_Bias_EW
Directional_Bias_NS -0.099 -0.188 Directional_Bias_NS
Log_Abundance_Density 0.072 0.262 Log_Abundance_Density

Variable Loadings - Regional Comparison

# Extract regional loadings
loadings_usa <- as.data.frame(pca_usa$rotation[, 1:2])
loadings_usa$Variable <- rownames(loadings_usa)
loadings_usa$Region <- "USA"

loadings_canada <- as.data.frame(pca_canada$rotation[, 1:2])
loadings_canada$Variable <- rownames(loadings_canada)
loadings_canada$Region <- "Canada"

# Compare loadings
loadings_comparison <- full_join(
  loadings_usa %>% select(Variable, Region, PC1, PC2),
  loadings_canada %>% select(Variable, Region, PC1, PC2),
  by = c("Variable", "Region")
) %>%
  arrange(Variable)

# Create a wider format for easier comparison
loadings_wide <- full_join(
  loadings_usa %>% select(Variable, PC1_USA = PC1, PC2_USA = PC2),
  loadings_canada %>% select(Variable, PC1_Canada = PC1, PC2_Canada = PC2),
  by = "Variable"
) %>%
  mutate(
    PC1_Diff = PC1_USA - PC1_Canada,
    PC2_Diff = PC2_USA - PC2_Canada
  ) %>%
  arrange(desc(abs(PC1_Diff)))

kable(loadings_wide, digits = 3,
      caption = "Regional loadings comparison: USA vs Canada (Diff = USA minus Canada)")
Regional loadings comparison: USA vs Canada (Diff = USA minus Canada)
Variable PC1_USA PC2_USA PC1_Canada PC2_Canada PC1_Diff PC2_Diff
Log_Area -0.164 -0.456 0.344 0.031 -0.508 -0.487
COG_Long -0.189 0.374 0.302 0.034 -0.491 0.340
Depth_per_Area -0.155 -0.459 0.301 0.033 -0.456 -0.492
COG_Lat 0.119 0.461 0.330 0.117 -0.211 0.344
Log_Abundance_Density 0.332 0.259 0.221 0.010 0.112 0.249
Spatial_Extent_E 0.400 -0.150 0.311 -0.177 0.088 0.027
Total_Extent 0.402 -0.143 0.317 -0.122 0.085 -0.021
Directional_Bias_NS -0.168 0.150 -0.095 0.585 -0.073 -0.435
Directional_Bias_EW 0.173 -0.164 0.102 -0.579 0.071 0.415
Log_Abundance 0.350 0.152 0.298 0.020 0.052 0.133
East_West_Position 0.368 -0.209 0.327 -0.137 0.041 -0.071
North_South_Position 0.289 -0.033 0.267 0.328 0.022 -0.361
Spatial_Extent_N 0.263 -0.046 0.243 0.364 0.020 -0.409

Regional Loadings Differences: The table reveals variable ecological importance between regions. Variables with large differences (|Diff| > 0.15) indicate region-specific ecological relationships. These differences suggest that the underlying environmental gradients or ecological processes structuring populations differ between USA and Canadian waters.

# Combined analysis loadings
pc1_order <- loadings_df %>%
  arrange(desc(abs(PC1))) %>%
  pull(Variable)

loadings_long <- loadings_df %>%
  pivot_longer(cols = c(PC1, PC2), names_to = "PC", values_to = "Loading") %>%
  mutate(Variable = factor(Variable, levels = pc1_order))

loadings_plot_combined <- ggplot(loadings_long, aes(x = Variable, y = Loading, fill = Loading > 0)) +
  geom_col() +
  geom_text(aes(label = round(Loading, 3)), 
            hjust = ifelse(loadings_long$Loading > 0, -0.1, 1.1),
            size = 3) +
  coord_flip() +
  facet_wrap(~PC, scales = "free_x") +
  scale_fill_manual(values = c("TRUE" = "steelblue", "FALSE" = "coral")) +
  theme_minimal() +
  labs(title = "Variable Loadings on PC1 and PC2 - Combined Analysis",
       subtitle = "Variables ordered by PC1 magnitude",
       x = "Variable", y = "Loading") +
  theme(legend.position = "none",
        panel.grid.major.y = element_blank())

loadings_plot_combined

Regional Loading Patterns

# Prepare data for regional comparison
loadings_usa_long <- loadings_usa %>%
  pivot_longer(cols = c(PC1, PC2), names_to = "PC", values_to = "Loading") %>%
  mutate(Variable = factor(Variable, levels = pc1_order))

loadings_canada_long <- loadings_canada %>%
  pivot_longer(cols = c(PC1, PC2), names_to = "PC", values_to = "Loading") %>%
  mutate(Variable = factor(Variable, levels = pc1_order))

# Side-by-side comparison
p_usa <- ggplot(loadings_usa_long, aes(x = Variable, y = Loading, fill = Loading > 0)) +
  geom_col() +
  geom_text(aes(label = round(Loading, 3)), 
            hjust = ifelse(loadings_usa_long$Loading > 0, -0.1, 1.1),
            size = 2.5) +
  coord_flip() +
  facet_wrap(~PC, scales = "free_x") +
  scale_fill_manual(values = c("TRUE" = "#E41A1C", "FALSE" = "#FB8072")) +
  theme_minimal() +
  labs(title = "USA - Variable Loadings",
       x = "Variable", y = "Loading") +
  theme(legend.position = "none",
        panel.grid.major.y = element_blank())

p_canada <- ggplot(loadings_canada_long, aes(x = Variable, y = Loading, fill = Loading > 0)) +
  geom_col() +
  geom_text(aes(label = round(Loading, 3)), 
            hjust = ifelse(loadings_canada_long$Loading > 0, -0.1, 1.1),
            size = 2.5) +
  coord_flip() +
  facet_wrap(~PC, scales = "free_x") +
  scale_fill_manual(values = c("TRUE" = "#377EB8", "FALSE" = "#A6CEE3")) +
  theme_minimal() +
  labs(title = "Canada - Variable Loadings",
       x = "Variable", y = "Loading") +
  theme(legend.position = "none",
        panel.grid.major.y = element_blank())

p_usa / p_canada

Regional Loadings Interpretation: The side-by-side comparison reveals region-specific ecological sensitivities:

  • USA loadings: The structure of PC1 and PC2 in USA waters emphasizes {examine USA-specific patterns, e.g., “local-scale spatial dynamics and abundance variability, with weaker geographic position loadings, suggesting populations are more constrained within regional habitats”}

  • Canada loadings: Canadian populations show {examine Canada-specific patterns, e.g., “stronger emphasis on spatial extent and directional positioning, indicating populations respond more dramatically to large-scale biogeographic gradients and have greater freedom of movement across habitat patches”}

  • Key differences: Variables that differ substantially between regions (high |PC1_Diff|) include {identify from table, e.g., “Directional_Bias_EW and Spatial_Extent_E, suggesting east-west expansion dynamics are much more important in Canadian waters”}

These differences suggest fundamentally different ecological constraints and responses to environmental drivers between the two regions, likely reflecting different oceanographic systems, habitat geometry, and population dynamics.

Interpretation of Combined Loadings

Interpretation of Loadings:

Based on the variable loadings, we can interpret the ecological meaning of each principal component:

  • PC1 - Geographic Distribution Axis: This component primarily captures longitudinal gradients and spatial positioning. Strong loadings on East-West position, COG longitude, and spatial extent variables suggest PC1 represents the east-west biogeographic distribution pattern. Positive PC1 scores indicate populations centered further east with greater longitudinal range, while negative scores indicate western-concentrated distributions. This axis likely reflects large-scale habitat shifts, oceanographic gradients (e.g., continental shelf extent, water mass boundaries), or responses to climate-driven range expansions/contractions along the shelf.

  • PC2 - Population Density & Vertical Habitat Axis: This component shows strong loadings on log abundance density, depth per area, and north-south metrics. PC2 appears to represent population concentration and bathymetric habitat utilization. High PC2 scores suggest dense populations in specific depth strata, potentially indicating favorable habitat patches or aggregation behavior. Low scores may reflect diffuse, shallow, or widely dispersed populations. This axis could reflect prey availability, thermal habitat preferences, or reproductive habitat quality that varies with depth and density-dependent processes.

Conclusions

Summary of Findings

  1. Main patterns: The first two principal components capture 70% of variation, representing orthogonal ecological gradients: PC1 reflects biogeographic positioning and spatial extent (longitudinal habitat range), while PC2 captures population density structure and bathymetric habitat utilization. These axes effectively summarize complex multidimensional changes in population distribution and abundance patterns.

  2. Temporal dynamics: Over the 1990-2023 period, both regions show directional shifts toward eastern habitats (increasing PC1) and population densification (increasing PC2), but with markedly different rates and patterns. The post-2010 period shows evidence of a regime shift with accelerated changes, particularly in Canadian waters. These trends suggest responses to systematic environmental drivers rather than random fluctuations.

  3. Regional contrasts: Canadian populations exhibit higher magnitude changes across both axes, with greater eastward expansion and stronger densification. USA populations show more conservative dynamics with constrained trajectories, suggesting either limited habitat availability, different population regulation mechanisms, or distinct responses to shared environmental forcing. The temporal divergence between regions intensifies over time, indicating increasingly asynchronous population dynamics.

Ecological Implications

The observed patterns have several important ecological and management implications:

  • Climate-driven range shifts: The persistent eastward movement (PC1 increase) is consistent with poleward and along-shelf distributional shifts documented for many Northwest Atlantic species under ocean warming. Species may be tracking isotherms or following shifts in preferred thermal habitat, potentially moving into historically suboptimal regions as they warm.

  • Habitat quality and population compression: Increasing PC2 scores suggest population concentration in specific geographic areas or depth strata, which could indicate either: (a) concentration in high-quality refuge habitats as less suitable areas become uninhabitable, or (b) population growth allowing expansion into previously unoccupied habitats. The different regional patterns suggest spatially heterogeneous habitat changes.

  • Ecosystem reorganization: The divergent regional trajectories may reflect bottom-up forcing from differential prey field changes, or top-down effects from predation or fishing pressure that vary regionally. The post-2010 regime shift aligns with documented changes in Northwest Atlantic oceanography including warming acceleration and changing stratification patterns.

  • Stock structure implications: The increasingly asynchronous dynamics between USA and Canadian waters support the hypothesis of distinct population units or stocks with limited mixing. This has important implications for transboundary fisheries management and suggests region-specific conservation strategies may be necessary.

  • Distributional responses to multiple stressors: The independence of PC1 and PC2 (orthogonal axes) demonstrates that populations can simultaneously undergo geographic redistribution AND density restructuring, indicating responses to multiple environmental drivers operating at different spatial and temporal scales.

Future Directions

  • Environmental driver analysis: Correlate PC scores with oceanographic variables (temperature, salinity, stratification indices) and biological indicators (prey availability, predator abundance) to identify mechanistic drivers of observed patterns

  • Non-linear temporal modeling: Apply Generalized Additive Models (GAMs) or segmented regression to formally test for regime shifts and identify change-points in the temporal trajectories

  • Spatial autocorrelation analysis: Examine whether changes are spatially continuous (gradual shifting) or discontinuous (colonization of new patches), which has implications for connectivity and metapopulation dynamics

  • Projection modeling: Use observed temporal trends to project future distributions under climate change scenarios, informing spatial management planning

  • Higher-order PCs: Examine PC3 and beyond to capture additional dimensions of ecological variation, potentially revealing finer-scale patterns in depth utilization or within-region heterogeneity

  • Comparative analysis: Compare these patterns with other co-occurring species to determine whether changes are species-specific or reflect broader ecosystem-level reorganization

Session Information

sessionInfo()
## R version 4.3.2 (2023-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 22631)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Halifax
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.45      patchwork_1.3.0 ggplot2_4.0.0   tidyr_1.3.1    
## [5] dplyr_1.1.4    
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       jsonlite_2.0.0     highr_0.10         compiler_4.3.2    
##  [5] tidyselect_1.2.1   dichromat_2.0-0.1  jquerylib_0.1.4    scales_1.4.0      
##  [9] yaml_2.3.8         fastmap_1.1.1      here_1.0.2         R6_2.6.1          
## [13] labeling_0.4.3     generics_0.1.4     tibble_3.2.1       rprojroot_2.1.1   
## [17] bslib_0.6.1        pillar_1.11.1      RColorBrewer_1.1-3 rlang_1.1.5       
## [21] cachem_1.0.8       xfun_0.42          sass_0.4.9         S7_0.2.0          
## [25] cli_3.6.4          withr_3.0.2        magrittr_2.0.3     digest_0.6.35     
## [29] grid_4.3.2         rstudioapi_0.15.0  lifecycle_1.0.4    vctrs_0.6.5       
## [33] evaluate_0.23      glue_1.8.0         farver_2.1.2       rmarkdown_2.26    
## [37] purrr_1.0.4        tools_4.3.2        pkgconfig_2.0.3    htmltools_0.5.8.1