Exploring ENSO-Related Climate Variability Across Eurasia Using Association Rule Mining

# library
#NetCDF & climate data
library(ncdf4)
library(raster)

# Data manipulation
library(dplyr)
library(tidyr)
library(lubridate)
library(tidyverse)

library(arules)

#visualization
library(ggplot2)
library(arulesViz)
library(kableExtra)

1 Introduction

This project explores the association between the El Niño–Southern Oscillation (ENSO) and regional climate variability across Eurasia. Rather than employing traditional regression-based methods, this study leverages association rule mining to investigate whether specific temperature and precipitation anomaly states co-occur under different ENSO phases. By comparing El Niño and Neutral conditions, we examine variations in rule presence and strength to assess how these co-occurrence patterns evolve across ENSO states.

2 Key References Informing the Project

  • 1.Agrawal & Srikant (1994), Fast Algorithms for Mining Association Rules. This paper provides the foundational methodology for association rule mining used in this project.1

  • 2.McPhaden et al. (2006), ENSO as an Integrating Concept in Earth Science. This study explains the physical mechanisms of ENSO and motivates its role as a large-scale climate driver.2

  • 3.Trenberth et al. (2002), The Evolution of ENSO and Global Atmospheric Circulation. It informs the interpretation of ENSO teleconnections affecting Eurasian climate.3

  • 4.Yuan et al. (2018), ENSO Influence on Eurasian Climate Variability. This paper supports the regional focus and motivates separating El Niño from Neutral conditions.4

  • 5.Hersbach et al. (2020), The ERA5 Global Reanalysis. Although not directly used, this work guides best practices in large-scale climate data processing and anomaly analysis.5

3 Data Process

Summary of dataset
Domain Category Specifications.Data.Source Identification.Criteria.Thresholds Period
Geographic Focus Western Europe UK, France, Germany, Benelux, Scandinavia 35°–60°N, 10°W–15°E
Eastern Europe Poland, Balkans, Western Russia 45°–60°N, 15°E–40°E
Mediterranean Southern Europe, North Mediterranean 30°–45°N, 10°W–40°E
Central Asia Kazakhstan and inland regions 35°–55°N, 40°E–75°E
South Asia India, Pakistan, Bangladesh 5°–30°N, 65°E–95°E
East Asia China, Korea, Japan 20°–50°N, 100°E–145°E
Northern Eurasia Siberia, High-latitude Russia 55°–75°N, 40°E–140°E
Climate Drivers
ENSO (ONI) El Niño NOAA Climate Prediction Center (CPC) ONI > 0.5 Monthly
Neutral -0.5< ONI < 0.5
La Niña ONI < −0.5
Temperature High NOAA GlobalTemp v6 (5∘ Gridded) z≥1 Monthly
Normal −1<z<1
Low z≤−1
Precipitation Wet: GPCP v2.3 (2.5∘ Gridded) z≥1 Monthly
Normal l−1<z<1
Dry z≤−1
▲ Click here to expand the code of data process
#process oni index
monthly_oni <- read.csv("data/monthly_oni.csv")
oni <- read.csv("data/oni.csv")

oni_long <- oni %>%
  pivot_longer(cols = -Year, names_to = "season", values_to = "oni") %>%
  mutate(
    MON = match(season, c("DJF","JFM","FMA","MAM","AMJ","MJJ",
                          "JJA","JAS","ASO","SON","OND","NDJ")),
    year = as.integer(Year),
    month = as.integer(MON)
    ) %>%
  dplyr::select(year, month, oni)

oni_long <- oni_long %>%
  mutate(
    ENSO = case_when(
      oni >=  0.5 ~ "ElNino",
      oni <= -0.5 ~ "LaNina",
      oni >  -0.5 & oni < 0.5 ~ "Neutral",
      TRUE ~ NA_character_
    )
  ) %>%
  filter(!is.na(ENSO))

#process NOAAG index
nc <- nc_open("data/NOAAGlobalTemp_v6.0.0_gridded_s185001_e202512_c20260107T093746.nc")
lon  <- ncvar_get(nc, "lon")
lat  <- ncvar_get(nc, "lat")
time <- ncvar_get(nc, "time")
temp <- ncvar_get(nc, "anom")

time_origin <- as.Date("1800-01-01")
dates <- time_origin + time

time_df <- data.frame(
  year  = year(dates),
  month = month(dates),
  t_idx = seq_along(dates)
)

nc_close(nc)

# define area longtidue&latitude
regions <- tibble(
  region = c("WestEurope","EastEurope","Mediterranean",
             "CentralAsia","SouthAsia","EastAsia","NorthEurasia"),
  lat_min = c(35, 45, 30, 35,  5, 20, 55),
  lat_max = c(60, 60, 45, 55, 30, 50, 75),
  lon_min = c(-10, 15, -10, 40, 65,100, 40),
  lon_max = c( 15, 40,  40, 75, 95,145,140)
)
# aggravate tempo
temp_brick <- brick(temp,
                    xmn = min(lon), xmx = max(lon),
                    ymn = min(lat), ymx = max(lat),
                    crs = CRS("+proj=longlat +datum=WGS84"))

region_mean <- function(r, lat_min, lat_max, lon_min, lon_max) {
  crop_r <- crop(r,
                 extent(lon_min, lon_max,
                        lat_min, lat_max))
  cellStats(crop_r, stat = "mean", na.rm = TRUE)
}

temp_region <- lapply(1:nrow(regions), function(i) {
  reg <- regions[i, ]
  
  vals <- sapply(1:nlayers(temp_brick), function(t) {
    region_mean(temp_brick[[t]],
                reg$lat_min, reg$lat_max,
                reg$lon_min, reg$lon_max)
  })
  
  data.frame(
    region = reg$region,
    t_idx  = 1:length(vals),
    temp_anomaly = vals
  )
}) %>%
  bind_rows()

temp_region_aggeravate <- temp_region %>%
      left_join(time_df, by = "t_idx") %>%
      filter(year >=1950) %>%
      left_join(oni_long, by = c("year","month"))%>%
      dplyr::select(year,month,region,temp_anomaly,ENSO)

temp_state <- temp_region_aggeravate %>%
  group_by(region) %>%
  mutate(
    temp_z = (temp_anomaly - mean(temp_anomaly, na.rm = TRUE)) /
      sd(temp_anomaly, na.rm = TRUE)
  ) %>%
  ungroup()

temp_state <- temp_state %>%
  mutate(
    Temp_state = case_when(
      temp_z >=  1  ~ "Temp_High",
      temp_z <= -1  ~ "Temp_Low",
      TRUE          ~ "Temp_Normal"
    )
  )

temp_state <- temp_state %>%
  mutate(
    Temp_item = paste0(Temp_state, "_", region)
  )

# aggerate rainfall 
nc_p <- nc_open("data/precip.mon.mean.nc")
lon_p  <- ncvar_get(nc_p, "lon")
lat_p  <- ncvar_get(nc_p, "lat")
time_p <- ncvar_get(nc_p, "time")

precip <- ncvar_get(nc_p, "precip")  
nc_close(nc_p)
# ncatt_get(nc_p, "time", "units")

time_origin_p <- as.Date("1800-01-01")
dates_p <- time_origin_p + time_p

time_df_p <- data.frame(
  date = dates_p,
  year  = year(dates_p),
  month = month(dates_p),
  t_idx = seq_along(dates_p)
)

precip_brick <- brick(
  precip,
  xmn = min(lon_p), xmx = max(lon_p),
  ymn = min(lat_p), ymx = max(lat_p),
  crs = CRS("+proj=longlat +datum=WGS84")
)

precip_region <- lapply(1:nrow(regions), function(i) {
  
  reg <- regions[i, ]
  
  vals <- sapply(1:nlayers(precip_brick), function(t) {
    region_mean(
      precip_brick[[t]],
      reg$lat_min, reg$lat_max,
      reg$lon_min, reg$lon_max
    )
  })
  
  data.frame(
    region = reg$region,
    t_idx  = seq_along(vals),
    precip_mean = vals
  )
}) %>%
  bind_rows()

precip_region_aggeravate <- precip_region %>%
  left_join(time_df_p, by = "t_idx") %>%
  left_join(oni_long, by = c("year","month")
  )%>%mutate(
    days_in_month = days_in_month(as.Date(paste(year, month, "01", sep = "-")))
  )%>%mutate(
    precip_monthly = precip_mean * days_in_month
  )%>%
  dplyr::select(year,month,region,precip_monthly,ENSO)

precip_state <- precip_region_aggeravate %>%
  group_by(region) %>%
  mutate(
    precip_z = (precip_monthly - mean(precip_monthly, na.rm = TRUE)) /
      sd(precip_monthly, na.rm = TRUE)
  ) %>%
  ungroup()

precip_state <- precip_state %>%
  mutate(
    Precip_state = case_when(
      precip_z >=  1  ~ "Precip_Wet",
      precip_z <= -1  ~ "Precip_Dry",
      TRUE            ~ "Precip_Normal"
    )
  )

precip_state <- precip_state %>%
  mutate(
    Precip_item = paste0(Precip_state, "_", region)
  )
# The code above loads very slowly, so I saved a copy of Rdata for easy use.
load("data_prepare.RData")

4 Overview of The Dataset

The main dataset (temp_state) contains the following variables:
Description of the variables
Variables Description Types
year, month time identifiers Integer
region predefined Eurasian regions Character
temp_anomaly regional monthly temperature anomaly Numeric
ENSO ENSO phase (El Niño or Neutral) Factor
temp_z region-specific standardized anomaly Numeric
Temp_state categorical state (High, Normal, Low) Factor
Temp_item item label used for association rules Character
Sample of Temperature Anomaly States
t_idx year month region temp_anomaly oni ENSO temp_z Temp_state Temp_item
1201 1950 1 WestEurope 0.9252591 -1.5 LaNina 0.5032373 Temp_Normal Temp_Normal_WestEurope
1202 1950 2 WestEurope 0.8052714 -1.3 LaNina 0.4338347 Temp_Normal Temp_Normal_WestEurope
1203 1950 3 WestEurope -0.3032144 -1.2 LaNina -0.2073293 Temp_Normal Temp_Normal_WestEurope
1204 1950 4 WestEurope -0.3703260 -1.2 LaNina -0.2461477 Temp_Normal Temp_Normal_WestEurope
1205 1950 5 WestEurope -0.4951236 -1.1 LaNina -0.3183324 Temp_Normal Temp_Normal_WestEurope
1206 1950 6 WestEurope 0.4398493 -0.9 LaNina 0.2224693 Temp_Normal Temp_Normal_WestEurope
Sample of Precipitation Anomaly States
year month region precip_monthly ENSO precip_z Precip_state Precip_item
1979 1 WestEurope 0.000000 Neutral -0.9583615 Precip_Normal Precip_Normal_WestEurope
1979 2 WestEurope 9.727943 Neutral 0.7657893 Precip_Normal Precip_Normal_WestEurope
1979 3 WestEurope 8.071862 Neutral 0.4722705 Precip_Normal Precip_Normal_WestEurope
1979 4 WestEurope 5.021848 Neutral -0.0683046 Precip_Normal Precip_Normal_WestEurope
1979 5 WestEurope 16.524888 Neutral 1.9704589 Precip_Wet Precip_Wet_WestEurope
1979 6 WestEurope 6.913927 Neutral 0.2670417 Precip_Normal Precip_Normal_WestEurope
table(temp_state$ENSO, temp_state$Temp_state)
##          
##           Temp_High Temp_Low Temp_Normal
##   ElNino        269      294        1243
##   LaNina        255      273        1327
##   Neutral       442      416        1858
summary(temp_state$temp_z)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -3.50749 -0.63681  0.02815  0.00000  0.66407  4.19279
table(precip_state$ENSO, precip_state$Precip_state)
##          
##           Precip_Dry Precip_Normal Precip_Wet
##   ElNino         151           794        154
##   LaNina         141           804        161
##   Neutral        170          1250        316
summary(precip_state$precip_z)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.6576 -0.7592 -0.1847  0.0000  0.6157  6.5851

Based on the statistical distributions shown in the frequency tables, I have integrated the raw counts with the Association Rule Mining (ARM) framework. These results inform my rule discovery process as follows:

  • Frequency and Support: The counts for extreme states, such as Precip_Dry (151) and Temp_Low (294) during El Niño, indicate that these events have sufficient frequency to be analyzed. In ARM terms, these states will meet the minimum Support thresholds I have set to ensure that the discovered co-occurrence patterns across Eurasia are statistically relevant.

  • State Dependency and Confidence: My data shows that while Normal states dominate during Neutral phases, the frequency of extreme anomalies is sustained during El Niño.

  • Non-Linear Teleconnections: The high z-score extremes (Max_Temp: 4.19; Max_Precip: 6.58) suggest that ENSO drives complex, non-linear climate responses. I believe ARM is uniquely suited for this task; unlike traditional regression-based approaches, it allows me to capture whether these extreme “states” tend to cluster in specific regions, such as South Asia or Northern Eurasia, under El Niño conditions.

  • Baseline for Lift: By quantifying the occurrence of states during Neutral phases (where Temp_Normal counts are highest at 1858), I have established a robust baseline. This allows me to calculate the Lift of my rules, identifying which Eurasian climate patterns are truly amplified by ENSO rather than occurring by random chance.

5 Make Transaction List

Each region–month combination was treated as one transaction containing two items: a temperature state and a precipitation state. These were converted into a transactions object using the arules package in R.

#aggravate
state_table <- temp_state %>%
  dplyr::select(year, month, region, Temp_item, ENSO) %>%
  left_join(
    precip_state %>%
      dplyr::select(year, month, region, Precip_item),
    by = c("year", "month", "region")
  )%>%
  filter(year>=1979)%>%drop_na()
#transaction id
state_table <- state_table %>%
  mutate(
    trans_id = paste(year, month, region, sep = "_")
  )
item_long <- state_table %>%
  dplyr::select(trans_id, Temp_item, Precip_item) %>%
  pivot_longer(
    cols = c(Temp_item, Precip_item),
    names_to = "type",
    values_to = "item"
  ) %>%
  dplyr::select(trans_id, item)

5.1 Convert to a Transactions Object

trans_all <- as(
  split(item_long$item, item_long$trans_id),
  "transactions"
)
inspect(trans_all[1:5])
##     items                                                 transactionID       
## [1] {Precip_Normal_CentralAsia, Temp_Normal_CentralAsia}  1979_1_CentralAsia  
## [2] {Precip_Normal_EastAsia, Temp_High_EastAsia}          1979_1_EastAsia     
## [3] {Precip_Normal_EastEurope, Temp_Normal_EastEurope}    1979_1_EastEurope   
## [4] {Precip_Dry_Mediterranean, Temp_Normal_Mediterranean} 1979_1_Mediterranean
## [5] {Precip_Normal_NorthEurasia, Temp_Low_NorthEurasia}   1979_1_NorthEurasia
itemLabels(trans_all)
##  [1] "Precip_Dry_CentralAsia"      "Precip_Dry_EastAsia"        
##  [3] "Precip_Dry_EastEurope"       "Precip_Dry_Mediterranean"   
##  [5] "Precip_Dry_NorthEurasia"     "Precip_Dry_SouthAsia"       
##  [7] "Precip_Normal_CentralAsia"   "Precip_Normal_EastAsia"     
##  [9] "Precip_Normal_EastEurope"    "Precip_Normal_Mediterranean"
## [11] "Precip_Normal_NorthEurasia"  "Precip_Normal_SouthAsia"    
## [13] "Precip_Normal_WestEurope"    "Precip_Wet_CentralAsia"     
## [15] "Precip_Wet_EastAsia"         "Precip_Wet_EastEurope"      
## [17] "Precip_Wet_Mediterranean"    "Precip_Wet_NorthEurasia"    
## [19] "Precip_Wet_SouthAsia"        "Precip_Wet_WestEurope"      
## [21] "Temp_High_CentralAsia"       "Temp_High_EastAsia"         
## [23] "Temp_High_EastEurope"        "Temp_High_Mediterranean"    
## [25] "Temp_High_NorthEurasia"      "Temp_High_SouthAsia"        
## [27] "Temp_High_WestEurope"        "Temp_Low_CentralAsia"       
## [29] "Temp_Low_EastAsia"           "Temp_Low_EastEurope"        
## [31] "Temp_Low_Mediterranean"      "Temp_Low_NorthEurasia"      
## [33] "Temp_Low_SouthAsia"          "Temp_Low_WestEurope"        
## [35] "Temp_Normal_CentralAsia"     "Temp_Normal_EastAsia"       
## [37] "Temp_Normal_EastEurope"      "Temp_Normal_Mediterranean"  
## [39] "Temp_Normal_NorthEurasia"    "Temp_Normal_SouthAsia"      
## [41] "Temp_Normal_WestEurope"

5.2 Samples Divided by ENSO Status (El Niño vs Neutral)

Transactions were split into two subsets based on ENSO phase: El Niño and Neutral, allowing direct comparison under consistent settings.

enso_map <- state_table %>%
  dplyr::select(trans_id, ENSO) %>%
  distinct()
trans_labels <- labels(trans_all)

target_indices <- which(enso_map$ENSO == "ElNino")
safe_indices <- target_indices[target_indices <= length(trans_all)]
trans_elnino <- trans_all[safe_indices]

target_indices <- which(enso_map$ENSO == "Neutral")
safe_indices <- target_indices[target_indices <= length(trans_all)]
trans_neutral <- trans_all[safe_indices]

target_indices <- which(enso_map$ENSO == "LaNina")
safe_indices <- target_indices[target_indices <= length(trans_all)]
trans_lanina <- trans_all[safe_indices]

6 Association Rules

6.1 First round

6.1.1 Apriori

Association rules were mined using conservative thresholds (support ≈ 4%, confidence ≥ 60%, lift > 1) to ensure robustness.

rules_elnino <- apriori(
  trans_elnino,
  parameter = list(
    supp = 0.04,     
    conf = 0.6,
    minlen = 2
  )
)
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.6    0.1    1 none FALSE            TRUE       5    0.04      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 43 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[40 item(s), 1099 transaction(s)] done [0.00s].
## sorting and recoding items ... [14 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [14 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules_neutral <- apriori(
  trans_neutral,
  parameter = list(
    supp = 0.04,    
    conf = 0.6,
    minlen = 2
  )
)
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.6    0.1    1 none FALSE            TRUE       5    0.04      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 69 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[41 item(s), 1736 transaction(s)] done [0.00s].
## sorting and recoding items ... [14 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [14 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules_elnino  <- subset(rules_elnino, lift > 1)
rules_neutral <- subset(rules_neutral, lift > 1)

Association rule mining was applied separately to El Niño and Neutral samples using the same parameter settings (minimum support around 4–5%, confidence ≥ 0.6). In both cases, the algorithm generated 14 rules, all of which involved normal temperature and normal precipitation states within the same region.

No robust rules linking anomalous temperature or precipitation states were identified. This result suggests that, under conservative and comparable thresholds, anomalous climate states do not form frequent or stable co-occurrence patterns, regardless of ENSO phase.

length(rules_elnino)
## [1] 14
length(rules_neutral)
## [1] 14
inspect(sort(rules_elnino, by = "lift")[1:10])
##      lhs                              rhs                              support confidence   coverage     lift count
## [1]  {Temp_Normal_EastEurope}      => {Precip_Normal_EastEurope}    0.08644222  0.8558559 0.10100091 7.348325    95
## [2]  {Precip_Normal_EastEurope}    => {Temp_Normal_EastEurope}      0.08644222  0.7421875 0.11646952 7.348325    95
## [3]  {Temp_Normal_WestEurope}      => {Precip_Normal_WestEurope}    0.09372157  0.8879310 0.10555050 7.228416   103
## [4]  {Precip_Normal_WestEurope}    => {Temp_Normal_WestEurope}      0.09372157  0.7629630 0.12283894 7.228416   103
## [5]  {Precip_Normal_EastAsia}      => {Temp_Normal_EastAsia}        0.06824386  0.7075472 0.09645132 7.069039    75
## [6]  {Temp_Normal_EastAsia}        => {Precip_Normal_EastAsia}      0.06824386  0.6818182 0.10009099 7.069039    75
## [7]  {Precip_Normal_Mediterranean} => {Temp_Normal_Mediterranean}   0.06187443  0.6800000 0.09099181 6.984299    68
## [8]  {Temp_Normal_Mediterranean}   => {Precip_Normal_Mediterranean} 0.06187443  0.6355140 0.09736124 6.984299    68
## [9]  {Temp_Normal_NorthEurasia}    => {Precip_Normal_NorthEurasia}  0.06915378  0.6280992 0.11010009 6.902810    76
## [10] {Precip_Normal_NorthEurasia}  => {Temp_Normal_NorthEurasia}    0.06915378  0.7600000 0.09099181 6.902810    76
inspect(sort(rules_neutral, by = "lift")[1:10])
##      lhs                              rhs                              support confidence   coverage     lift count
## [1]  {Temp_Normal_EastEurope}      => {Precip_Normal_EastEurope}    0.09447005  0.8817204 0.10714286 7.615257   164
## [2]  {Precip_Normal_EastEurope}    => {Temp_Normal_EastEurope}      0.09447005  0.8159204 0.11578341 7.615257   164
## [3]  {Precip_Normal_NorthEurasia}  => {Temp_Normal_NorthEurasia}    0.07546083  0.7705882 0.09792627 7.310061   131
## [4]  {Temp_Normal_NorthEurasia}    => {Precip_Normal_NorthEurasia}  0.07546083  0.7158470 0.10541475 7.310061   131
## [5]  {Temp_Normal_EastAsia}        => {Precip_Normal_EastAsia}      0.07200461  0.6906077 0.10426267 7.179012   125
## [6]  {Precip_Normal_EastAsia}      => {Temp_Normal_EastAsia}        0.07200461  0.7485030 0.09619816 7.179012   125
## [7]  {Precip_Normal_Mediterranean} => {Temp_Normal_Mediterranean}   0.07546083  0.7751479 0.09735023 7.157749   131
## [8]  {Temp_Normal_Mediterranean}   => {Precip_Normal_Mediterranean} 0.07546083  0.6968085 0.10829493 7.157749   131
## [9]  {Temp_Normal_WestEurope}      => {Precip_Normal_WestEurope}    0.08928571  0.8516484 0.10483871 7.073979   155
## [10] {Precip_Normal_WestEurope}    => {Temp_Normal_WestEurope}      0.08928571  0.7416268 0.12039171 7.073979   155
inspect(sort(rules_elnino, by = "confidence")[1:10])
##      lhs                              rhs                            support confidence   coverage     lift count
## [1]  {Temp_Normal_WestEurope}      => {Precip_Normal_WestEurope}  0.09372157  0.8879310 0.10555050 7.228416   103
## [2]  {Temp_Normal_EastEurope}      => {Precip_Normal_EastEurope}  0.08644222  0.8558559 0.10100091 7.348325    95
## [3]  {Precip_Normal_WestEurope}    => {Temp_Normal_WestEurope}    0.09372157  0.7629630 0.12283894 7.228416   103
## [4]  {Precip_Normal_NorthEurasia}  => {Temp_Normal_NorthEurasia}  0.06915378  0.7600000 0.09099181 6.902810    76
## [5]  {Precip_Normal_EastEurope}    => {Temp_Normal_EastEurope}    0.08644222  0.7421875 0.11646952 7.348325    95
## [6]  {Precip_Normal_EastAsia}      => {Temp_Normal_EastAsia}      0.06824386  0.7075472 0.09645132 7.069039    75
## [7]  {Temp_Normal_EastAsia}        => {Precip_Normal_EastAsia}    0.06824386  0.6818182 0.10009099 7.069039    75
## [8]  {Precip_Normal_Mediterranean} => {Temp_Normal_Mediterranean} 0.06187443  0.6800000 0.09099181 6.984299    68
## [9]  {Precip_Normal_CentralAsia}   => {Temp_Normal_CentralAsia}   0.06187443  0.6800000 0.09099181 6.793818    68
## [10] {Precip_Normal_SouthAsia}     => {Temp_Normal_SouthAsia}     0.06369427  0.6481481 0.09827116 6.534998    70
inspect(sort(rules_neutral, by = "confidence")[1:10])
##      lhs                              rhs                             support confidence   coverage     lift count
## [1]  {Temp_Normal_EastEurope}      => {Precip_Normal_EastEurope}   0.09447005  0.8817204 0.10714286 7.615257   164
## [2]  {Temp_Normal_WestEurope}      => {Precip_Normal_WestEurope}   0.08928571  0.8516484 0.10483871 7.073979   155
## [3]  {Precip_Normal_EastEurope}    => {Temp_Normal_EastEurope}     0.09447005  0.8159204 0.11578341 7.615257   164
## [4]  {Precip_Normal_Mediterranean} => {Temp_Normal_Mediterranean}  0.07546083  0.7751479 0.09735023 7.157749   131
## [5]  {Precip_Normal_NorthEurasia}  => {Temp_Normal_NorthEurasia}   0.07546083  0.7705882 0.09792627 7.310061   131
## [6]  {Precip_Normal_EastAsia}      => {Temp_Normal_EastAsia}       0.07200461  0.7485030 0.09619816 7.179012   125
## [7]  {Precip_Normal_WestEurope}    => {Temp_Normal_WestEurope}     0.08928571  0.7416268 0.12039171 7.073979   155
## [8]  {Precip_Normal_SouthAsia}     => {Temp_Normal_SouthAsia}      0.07315668  0.7298851 0.10023041 6.886307   127
## [9]  {Temp_Normal_NorthEurasia}    => {Precip_Normal_NorthEurasia} 0.07546083  0.7158470 0.10541475 7.310061   131
## [10] {Temp_Normal_CentralAsia}     => {Precip_Normal_CentralAsia}  0.07027650  0.7134503 0.09850230 6.768031   122

6.1.2 Comparison

Under both ENSO phases, the rules primarily reflected Normal–Normal temperature–precipitation associations within regions. No stable or frequent rules involving anomalous states (High/Low or Wet/Dry) were identified.

elnino_only <- setdiff(
  labels(rules_elnino),
  labels(rules_neutral)
)

length(elnino_only)
## [1] 0
common_rules <- intersect(
  labels(rules_elnino),
  labels(rules_neutral)
)
length(common_rules)
## [1] 14
common_rules <- intersect(
  labels(rules_elnino),
  labels(rules_neutral)
)

6.1.3 Visualization

plot(rules_elnino, method = "scatterplot", 
     control = list(main = "Scatter Plot for El Nino Rules"))

plot(rules_elnino, method = "grouped", 
     control = list(main = "Grouped Matrix of El Nino Weather States"))
## Available control parameters (with default values):
## k     =  20
## aggr.fun  =  function (x, ...)  UseMethod("mean")
## rhs_max   =  10
## lhs_label_items   =  2
## col   =  c("#EE0000FF", "#EEEEEEFF")
## groups    =  NULL
## engine    =  ggplot2
## verbose   =  FALSE

plot(rules_elnino, method = "graph", 
     control = list(type = "items", main = "Network Graph of Climate Associations"))
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

plot(rules_elnino, method = "paracoord", 
     control = list(reorder = TRUE, main = "Parallel Coordinates Plot"))

The scatter plots and grouped visualizations of association rules further confirm the numerical results. Rules are concentrated in a narrow range of support and confidence, with relatively high lift values, indicating strong baseline associations.

Across all visualizations, the left-hand-side and right-hand-side items are dominated by Temp_Normal and Precip_Normal states. The absence of anomalous states in these plots visually reinforces the conclusion that association rules primarily capture baseline climate coupling, rather than extreme or rare events. Differences between El Niño and Neutral conditions remain limited in both rule structure and strength.

6.1.4 Supplement

To complement this result, I examined anomaly frequencies directly. Bar charts of temperature and precipitation anomaly frequencies revealed that El Niño alters how often anomalies occur, even though it does not create frequent co-occurring anomaly patterns. This suggests ENSO mainly affects marginal distributions rather than stable co-occurrence structures.

rules_elnino_non_normal <- subset(
  rules_elnino,
  !(lhs %pin% "Normal" & rhs %pin% "Normal")
)

rules_neutral_non_normal <- subset(
  rules_neutral,
  !(lhs %pin% "Normal" & rhs %pin% "Normal")
)

length(rules_elnino_non_normal)
## [1] 0
length(rules_neutral_non_normal)
## [1] 0
temp_freq <- state_table %>%
  filter(ENSO %in% c("ElNino", "Neutral")) %>%
  filter(!str_detect(Temp_item, "Temp_Normal"))%>%
  group_by(ENSO, region, Temp_item) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(ENSO, region) %>%
  mutate(freq = n / sum(n),
         Temp_item = str_remove(Temp_item, "_[^_]+$"))
print(temp_freq)
## # A tibble: 28 × 5
## # Groups:   ENSO, region [14]
##    ENSO   region        Temp_item     n   freq
##    <chr>  <chr>         <chr>     <int>  <dbl>
##  1 ElNino CentralAsia   Temp_High    24 0.6   
##  2 ElNino CentralAsia   Temp_Low     16 0.4   
##  3 ElNino EastAsia      Temp_High    53 0.946 
##  4 ElNino EastAsia      Temp_Low      3 0.0536
##  5 ElNino EastEurope    Temp_High    15 0.441 
##  6 ElNino EastEurope    Temp_Low     19 0.559 
##  7 ElNino Mediterranean Temp_High    21 0.553 
##  8 ElNino Mediterranean Temp_Low     17 0.447 
##  9 ElNino NorthEurasia  Temp_High    51 0.981 
## 10 ElNino NorthEurasia  Temp_Low      1 0.0192
## # ℹ 18 more rows
precip_freq <- state_table %>%
  filter(ENSO %in% c("ElNino", "Neutral")) %>%
  filter(!str_detect(Precip_item, "Precip_Normal"))%>%
  group_by(ENSO, region, Precip_item) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(ENSO, region) %>%
  mutate(freq = n / sum(n),
         Precip_item = str_remove(Precip_item , "_[^_]+$"))
print(precip_freq)
## # A tibble: 26 × 5
## # Groups:   ENSO, region [14]
##    ENSO   region        Precip_item     n  freq
##    <chr>  <chr>         <chr>       <int> <dbl>
##  1 ElNino CentralAsia   Precip_Dry     28 0.528
##  2 ElNino CentralAsia   Precip_Wet     25 0.472
##  3 ElNino EastAsia      Precip_Dry     44 0.611
##  4 ElNino EastAsia      Precip_Wet     28 0.389
##  5 ElNino EastEurope    Precip_Dry      6 0.3  
##  6 ElNino EastEurope    Precip_Wet     14 0.7  
##  7 ElNino Mediterranean Precip_Dry     29 0.558
##  8 ElNino Mediterranean Precip_Wet     23 0.442
##  9 ElNino NorthEurasia  Precip_Dry     21 0.477
## 10 ElNino NorthEurasia  Precip_Wet     23 0.523
## # ℹ 16 more rows
ggplot(temp_freq, aes(x = region, y = freq, fill = ENSO)) +
  geom_col(position = "dodge") +
  facet_wrap(~ Temp_item) +
  labs(
    title = "Frequency of Temperature Anomalies by ENSO Phase",
    y = "Relative Frequency",
    x = "Region"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(precip_freq, aes(x = region, y = freq, fill = ENSO)) +
  geom_col(position = "dodge") +
  facet_wrap(~ Precip_item) +
  labs(
    title = "Frequency of Precipitation Anomalies by ENSO Phase",
    y = "Relative Frequency",
    x = "Region"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

In contrast to association rules, the frequency comparison plots reveal clear ENSO-related signals. Under El Niño conditions, warm temperature anomalies occur more frequently in several regions, including East Asia, South Asia, and Northern Eurasia. Cold anomalies do not show a similarly consistent increase.

For precipitation, El Niño periods are generally associated with a higher frequency of dry anomalies, while wet anomalies tend to be more common during Neutral conditions. These results indicate that ENSO influences climate variability mainly by shifting the probability of anomalies, rather than creating frequent co-occurring anomalous patterns across regions.

6.2 Second Round

6.2.1 Binarization apporach

To address the issues of excessive “Normal” states and overly fragmented classifications, I have refined my research strategy by shifting the focus from “Which specific anomaly occurs?” to “Does any anomaly occur?”. I am implementing a binarization approach that merges specific states (High/Low, Wet/Dry) into a single “Anomalous” category based on the \(|z|\ge1\) threshold. This redefinition is not meant to manufacture outcomes, but to fundamentally simplify the state space and capture the core atmospheric instability associated with different ENSO phases.

temp_binary <- temp_state %>%
  dplyr::select(year, month, region, Temp_state, ENSO) %>%
  filter(year>=1979)%>%drop_na()%>%
  mutate(Temp_binary = ifelse(Temp_state == "Temp_Normal",
                              "Temp_Normal", "Temp_Anomaly"))

precip_binary <- precip_state %>%
  dplyr::select(year, month, region, Precip_state, ENSO) %>%
  filter(year>=1979)%>%drop_na()%>%
  mutate(Precip_binary =ifelse(Precip_state == "Precip_Normal",
                             "Precip_Normal","Precip_Anomaly"))
#aggravate
state_table_bin <- temp_binary %>%
  left_join(
    precip_binary,by = c("year", "month", "region","ENSO")
  )%>%
mutate(
  Temp_item_bin   = paste(Temp_binary, region, sep = "_"),
  Precip_item_bin = paste(Precip_binary, region, sep = "_"),
  trans_id = paste(year, month, region, sep = "_")
)
item_long_bin <- state_table_bin %>%
  dplyr::select(trans_id, Temp_item_bin, Precip_item_bin) %>%
  pivot_longer(
    cols = c(Temp_item_bin, Precip_item_bin),
    names_to = "type",
    values_to = "item"
  ) %>%
  dplyr::select(trans_id, item)

6.2.2 Convert to a Transactions Object

library(arules)

trans_all_bin <- as(
  split(item_long_bin$item, item_long_bin$trans_id),
  "transactions"
)
inspect(trans_all_bin[1:5])
##     items                                  transactionID
## [1] {Precip_Normal_CentralAsia,                         
##      Temp_Normal_CentralAsia}       1979_1_CentralAsia  
## [2] {Precip_Normal_EastAsia,                            
##      Temp_Anomaly_EastAsia}         1979_1_EastAsia     
## [3] {Precip_Normal_EastEurope,                          
##      Temp_Normal_EastEurope}        1979_1_EastEurope   
## [4] {Precip_Anomaly_Mediterranean,                      
##      Temp_Normal_Mediterranean}     1979_1_Mediterranean
## [5] {Precip_Normal_NorthEurasia,                        
##      Temp_Anomaly_NorthEurasia}     1979_1_NorthEurasia
itemLabels(trans_all_bin)
##  [1] "Precip_Anomaly_CentralAsia"   "Precip_Anomaly_EastAsia"     
##  [3] "Precip_Anomaly_EastEurope"    "Precip_Anomaly_Mediterranean"
##  [5] "Precip_Anomaly_NorthEurasia"  "Precip_Anomaly_SouthAsia"    
##  [7] "Precip_Anomaly_WestEurope"    "Precip_Normal_CentralAsia"   
##  [9] "Precip_Normal_EastAsia"       "Precip_Normal_EastEurope"    
## [11] "Precip_Normal_Mediterranean"  "Precip_Normal_NorthEurasia"  
## [13] "Precip_Normal_SouthAsia"      "Precip_Normal_WestEurope"    
## [15] "Temp_Anomaly_CentralAsia"     "Temp_Anomaly_EastAsia"       
## [17] "Temp_Anomaly_EastEurope"      "Temp_Anomaly_Mediterranean"  
## [19] "Temp_Anomaly_NorthEurasia"    "Temp_Anomaly_SouthAsia"      
## [21] "Temp_Anomaly_WestEurope"      "Temp_Normal_CentralAsia"     
## [23] "Temp_Normal_EastAsia"         "Temp_Normal_EastEurope"      
## [25] "Temp_Normal_Mediterranean"    "Temp_Normal_NorthEurasia"    
## [27] "Temp_Normal_SouthAsia"        "Temp_Normal_WestEurope"
tx_labels_bin <- labels(trans_all_bin)

enso_map_bin <- state_table_bin %>%
  dplyr::select(trans_id, ENSO) %>%
  distinct()
  
target_indices <- which(enso_map_bin$ENSO == "ElNino")
safe_indices <- target_indices[target_indices <= length(trans_all)]
trans_elnino_bin <- trans_all_bin[safe_indices]

target_indices <- which(enso_map_bin$ENSO == "Neutral")
safe_indices <- target_indices[target_indices <= length(trans_all)]
trans_neutral_bin <- trans_all[safe_indices]

6.2.3 Second Apriori

rules_bin_elnino <- apriori(
  trans_elnino_bin,
  parameter = list(
    supp = 0.05,
    conf = 0.6,
    minlen = 2
  )
)
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.6    0.1    1 none FALSE            TRUE       5    0.05      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 54 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[28 item(s), 1099 transaction(s)] done [0.00s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [14 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules_bin_elnino <- subset(rules_bin_elnino, lift > 1)

By binarizing the data, I successfully reduced the item count from 40 to 17, significantly decreasing data sparsity and consolidating the “Anomalous” states. Despite setting a robust support threshold (minimum count of 54) to ensure climatic significance, the number of generated rules remained low at 14—identical to the previous three-state classification. This critical observation indicates that while El Niño drives regional instability, these climate anomalies tend to occur in isolation rather than as synchronized, co-occurring events across the study area. Consequently, my strategy now pivots from looking for “global” spatial patterns to analyzing why these anomalies remain localized despite the overarching influence of El Niño.

6.2.4 Conditional Probability and Risk Ratio

#temperature
temp_rr <- temp_state %>%
  mutate(
    Temp_Anomaly = ifelse(Temp_state == "Temp_Normal", 0, 1)
  ) %>%
  filter(ENSO %in% c("ElNino", "Neutral")) %>%
  group_by(region, ENSO) %>%
  summarise(
    p_anom = mean(Temp_Anomaly),
    .groups = "drop"
  ) %>%
  tidyr::pivot_wider(
    names_from = ENSO,
    values_from = p_anom
  ) %>%
  mutate(
    RR = ElNino / Neutral,
    Diff = ElNino - Neutral
  )
#preciption
precip_rr <- precip_state %>%
  mutate(
    Anomaly = ifelse(Precip_state == "Precip_Normal", 0, 1)
  ) %>%
  filter(ENSO %in% c("ElNino", "Neutral")) %>%
  group_by(region, ENSO) %>%
  summarise(
    p_anom = mean(Anomaly),
    .groups = "drop"
  ) %>%
  pivot_wider(
    names_from = ENSO,
    values_from = p_anom
  ) %>%
  mutate(
    RR = ElNino / Neutral,
    Diff = ElNino - Neutral
  )

6.2.5 Visalisation

ggplot(temp_rr, aes(x = reorder(region, RR), y = RR, fill = RR > 1)) +
  geom_col(show.legend = FALSE) +

  scale_fill_manual(values = c("TRUE" = "#2E8B57", "FALSE" = "#B3B3B3")) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "black", linewidth = 0.8) +
  geom_text(aes(label = round(RR, 2)), vjust = -0.5, size = 3.5, fontface = "bold") +
  labs(
    title = "Risk Ratio of Temperature Anomalies (El Niño vs Neutral)",
    subtitle = "RR > 1 indicates El Niño increases the probability of temperature instability",
    y = "Risk Ratio",
    x = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.grid.major.x = element_blank()
  )

ggplot(precip_rr, aes(x = reorder(region, RR), y = RR, fill = RR > 1)) +
  geom_col(show.legend = FALSE) +

  scale_fill_manual(values = c("TRUE" = "#2E8B57", "FALSE" = "#B3B3B3")) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "black", linewidth = 0.8) +

  geom_text(aes(label = round(RR, 2)), vjust = -0.5, size = 3.5) +
  labs(
    title = "Risk Ratio of Precipitation Anomalies (El Niño vs Neutral)",
    subtitle = "RR > 1 indicates increased risk of anomalies during El Niño",
    y = "Risk Ratio (Anomalous State)",
    x = "Region"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

I quantified the regional impact of El Niño by calculating the Risk Ratio (RR) for binarized temperature and precipitation anomalies (\(|z|\ge 1\)). By filtering the high-frequency “Normal” background, the results reveal a significant increase in atmospheric instability (RR > 1) across specific regions like East Asia, which shows the highest risk for both thermal (\(RR=1.17\)) and rainfall (\(RR=1.46\)) perturbations.

However, the fact that the number of association rules did not increase alongside these rising RRs suggests that El Niño drives independent, localized regional responses rather than large-scale synchronized anomaly clusters. This localized sensitivity indicates that while the “threat” of an anomaly is higher during El Niño, these events typically do not occur simultaneously across the continent.

7 Conclusion

This project applies association rule mining to explore the relationship between ENSO phases and regional climate anomalies across Eurasia. The main analysis shows that, despite clear ENSO-related influences on regional climate variability, El Niño does not generate frequent or stable co-occurrence patterns of anomalous temperature and precipitation states across regions. Most of the extracted high-confidence rules are dominated by normal climate conditions, indicating that strong baseline regional structures persist under different ENSO phases.

These findings suggest that ENSO does not act as a synchronizing mechanism for extreme regional responses, but rather operates as a large-scale background influence. From the perspective of association rule mining, climate anomalies appear to be relatively rare events whose co-occurrence frequency is too low to form robust rules under reasonable support and confidence thresholds. This highlights the suitability of association rule mining for identifying frequent and stable patterns, as well as its limitations when applied to extreme or infrequent phenomena.

As an additional extension, conditional probability and risk ratio analyses were introduced to further examine ENSO-related effects. These results show that El Niño shifts the probability of temperature and precipitation anomalies in a region-dependent manner, even though such changes are not reflected in frequent co-occurrence rules. This comparison provides extra insight into how different analytical approaches capture different aspects of climate variability, and helps place the association rule results in a broader methodological context.

8 References


  1. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), 487-499.↩︎

  2. McPhaden, M. J., Zebiak, S. E., & Glantz, M. H. (2006). ENSO as an integrating concept in earth science. Science, 314(5806), 1740-1745.↩︎

  3. Trenberth, K. E., Caron, J. M., Stepaniak, D. P., & Worley, S. (2002). Evolution of El Niño–Southern Oscillation and global atmospheric surface temperatures. Journal of Geophysical Research: Atmospheres, 107(D10), ACL 5-1.↩︎

  4. Yuan, X., Chen, D., Li, C., Wang, H., & Li, L. (2018). Arctic sea ice anomalies during the two types of El Niño. Journal of Climate, 31(5), 1961-1979.↩︎

  5. Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz‐Sabater, J., … & Thépaut, J. N. (2020). The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society, 146(730), 1999-2049.↩︎