Exploring ENSO-Related Climate Variability Across Eurasia Using Association Rule Mining
# library
#NetCDF & climate data
library(ncdf4)
library(raster)
# Data manipulation
library(dplyr)
library(tidyr)
library(lubridate)
library(tidyverse)
library(arules)
#visualization
library(ggplot2)
library(arulesViz)
library(kableExtra)1 Introduction
This project explores the association between the El Niño–Southern Oscillation (ENSO) and regional climate variability across Eurasia. Rather than employing traditional regression-based methods, this study leverages association rule mining to investigate whether specific temperature and precipitation anomaly states co-occur under different ENSO phases. By comparing El Niño and Neutral conditions, we examine variations in rule presence and strength to assess how these co-occurrence patterns evolve across ENSO states.
2 Key References Informing the Project
1.Agrawal & Srikant (1994), Fast Algorithms for Mining Association Rules. This paper provides the foundational methodology for association rule mining used in this project.1
2.McPhaden et al. (2006), ENSO as an Integrating Concept in Earth Science. This study explains the physical mechanisms of ENSO and motivates its role as a large-scale climate driver.2
3.Trenberth et al. (2002), The Evolution of ENSO and Global Atmospheric Circulation. It informs the interpretation of ENSO teleconnections affecting Eurasian climate.3
4.Yuan et al. (2018), ENSO Influence on Eurasian Climate Variability. This paper supports the regional focus and motivates separating El Niño from Neutral conditions.4
5.Hersbach et al. (2020), The ERA5 Global Reanalysis. Although not directly used, this work guides best practices in large-scale climate data processing and anomaly analysis.5
3 Data Process
Monthly data from January 1979 to November 2025 were used.
Dataset were downloaded from the website below:
ENSO phases were identified using the (ONI index from the U.S. National Weather Service).
Temperature anomalies were obtained from the (NOAA Merged Land–Ocean Global Surface Temperature Analysis (V6.0)).
Precipitation data from (GPCP V2.3).
Data were aggregated to monthly regional averages, standardized within each region, and discretized into categorical states under the rule.
| Domain | Category | Specifications.Data.Source | Identification.Criteria.Thresholds | Period |
|---|---|---|---|---|
| Geographic Focus | Western Europe | UK, France, Germany, Benelux, Scandinavia | 35°–60°N, 10°W–15°E | |
| Eastern Europe | Poland, Balkans, Western Russia | 45°–60°N, 15°E–40°E | ||
| Mediterranean | Southern Europe, North Mediterranean | 30°–45°N, 10°W–40°E | ||
| Central Asia | Kazakhstan and inland regions | 35°–55°N, 40°E–75°E | ||
| South Asia | India, Pakistan, Bangladesh | 5°–30°N, 65°E–95°E | ||
| East Asia | China, Korea, Japan | 20°–50°N, 100°E–145°E | ||
| Northern Eurasia | Siberia, High-latitude Russia | 55°–75°N, 40°E–140°E | ||
| Climate Drivers | ||||
| ENSO (ONI) | El Niño | NOAA Climate Prediction Center (CPC) | ONI > 0.5 | Monthly |
| Neutral | -0.5< ONI < 0.5 | |||
| La Niña | ONI < −0.5 | |||
| Temperature | High | NOAA GlobalTemp v6 (5∘ Gridded) | z≥1 | Monthly |
| Normal | −1<z<1 | |||
| Low | z≤−1 | |||
| Precipitation | Wet: | GPCP v2.3 (2.5∘ Gridded) | z≥1 | Monthly |
| Normal | l−1<z<1 | |||
| Dry | z≤−1 |
▲ Click here to expand the code of data process
#process oni index
monthly_oni <- read.csv("data/monthly_oni.csv")
oni <- read.csv("data/oni.csv")
oni_long <- oni %>%
pivot_longer(cols = -Year, names_to = "season", values_to = "oni") %>%
mutate(
MON = match(season, c("DJF","JFM","FMA","MAM","AMJ","MJJ",
"JJA","JAS","ASO","SON","OND","NDJ")),
year = as.integer(Year),
month = as.integer(MON)
) %>%
dplyr::select(year, month, oni)
oni_long <- oni_long %>%
mutate(
ENSO = case_when(
oni >= 0.5 ~ "ElNino",
oni <= -0.5 ~ "LaNina",
oni > -0.5 & oni < 0.5 ~ "Neutral",
TRUE ~ NA_character_
)
) %>%
filter(!is.na(ENSO))
#process NOAAG index
nc <- nc_open("data/NOAAGlobalTemp_v6.0.0_gridded_s185001_e202512_c20260107T093746.nc")
lon <- ncvar_get(nc, "lon")
lat <- ncvar_get(nc, "lat")
time <- ncvar_get(nc, "time")
temp <- ncvar_get(nc, "anom")
time_origin <- as.Date("1800-01-01")
dates <- time_origin + time
time_df <- data.frame(
year = year(dates),
month = month(dates),
t_idx = seq_along(dates)
)
nc_close(nc)
# define area longtidue&latitude
regions <- tibble(
region = c("WestEurope","EastEurope","Mediterranean",
"CentralAsia","SouthAsia","EastAsia","NorthEurasia"),
lat_min = c(35, 45, 30, 35, 5, 20, 55),
lat_max = c(60, 60, 45, 55, 30, 50, 75),
lon_min = c(-10, 15, -10, 40, 65,100, 40),
lon_max = c( 15, 40, 40, 75, 95,145,140)
)
# aggravate tempo
temp_brick <- brick(temp,
xmn = min(lon), xmx = max(lon),
ymn = min(lat), ymx = max(lat),
crs = CRS("+proj=longlat +datum=WGS84"))
region_mean <- function(r, lat_min, lat_max, lon_min, lon_max) {
crop_r <- crop(r,
extent(lon_min, lon_max,
lat_min, lat_max))
cellStats(crop_r, stat = "mean", na.rm = TRUE)
}
temp_region <- lapply(1:nrow(regions), function(i) {
reg <- regions[i, ]
vals <- sapply(1:nlayers(temp_brick), function(t) {
region_mean(temp_brick[[t]],
reg$lat_min, reg$lat_max,
reg$lon_min, reg$lon_max)
})
data.frame(
region = reg$region,
t_idx = 1:length(vals),
temp_anomaly = vals
)
}) %>%
bind_rows()
temp_region_aggeravate <- temp_region %>%
left_join(time_df, by = "t_idx") %>%
filter(year >=1950) %>%
left_join(oni_long, by = c("year","month"))%>%
dplyr::select(year,month,region,temp_anomaly,ENSO)
temp_state <- temp_region_aggeravate %>%
group_by(region) %>%
mutate(
temp_z = (temp_anomaly - mean(temp_anomaly, na.rm = TRUE)) /
sd(temp_anomaly, na.rm = TRUE)
) %>%
ungroup()
temp_state <- temp_state %>%
mutate(
Temp_state = case_when(
temp_z >= 1 ~ "Temp_High",
temp_z <= -1 ~ "Temp_Low",
TRUE ~ "Temp_Normal"
)
)
temp_state <- temp_state %>%
mutate(
Temp_item = paste0(Temp_state, "_", region)
)
# aggerate rainfall
nc_p <- nc_open("data/precip.mon.mean.nc")
lon_p <- ncvar_get(nc_p, "lon")
lat_p <- ncvar_get(nc_p, "lat")
time_p <- ncvar_get(nc_p, "time")
precip <- ncvar_get(nc_p, "precip")
nc_close(nc_p)
# ncatt_get(nc_p, "time", "units")
time_origin_p <- as.Date("1800-01-01")
dates_p <- time_origin_p + time_p
time_df_p <- data.frame(
date = dates_p,
year = year(dates_p),
month = month(dates_p),
t_idx = seq_along(dates_p)
)
precip_brick <- brick(
precip,
xmn = min(lon_p), xmx = max(lon_p),
ymn = min(lat_p), ymx = max(lat_p),
crs = CRS("+proj=longlat +datum=WGS84")
)
precip_region <- lapply(1:nrow(regions), function(i) {
reg <- regions[i, ]
vals <- sapply(1:nlayers(precip_brick), function(t) {
region_mean(
precip_brick[[t]],
reg$lat_min, reg$lat_max,
reg$lon_min, reg$lon_max
)
})
data.frame(
region = reg$region,
t_idx = seq_along(vals),
precip_mean = vals
)
}) %>%
bind_rows()
precip_region_aggeravate <- precip_region %>%
left_join(time_df_p, by = "t_idx") %>%
left_join(oni_long, by = c("year","month")
)%>%mutate(
days_in_month = days_in_month(as.Date(paste(year, month, "01", sep = "-")))
)%>%mutate(
precip_monthly = precip_mean * days_in_month
)%>%
dplyr::select(year,month,region,precip_monthly,ENSO)
precip_state <- precip_region_aggeravate %>%
group_by(region) %>%
mutate(
precip_z = (precip_monthly - mean(precip_monthly, na.rm = TRUE)) /
sd(precip_monthly, na.rm = TRUE)
) %>%
ungroup()
precip_state <- precip_state %>%
mutate(
Precip_state = case_when(
precip_z >= 1 ~ "Precip_Wet",
precip_z <= -1 ~ "Precip_Dry",
TRUE ~ "Precip_Normal"
)
)
precip_state <- precip_state %>%
mutate(
Precip_item = paste0(Precip_state, "_", region)
)4 Overview of The Dataset
The main dataset (temp_state) contains the following variables:| Variables | Description | Types |
|---|---|---|
| year, month | time identifiers | Integer |
| region | predefined Eurasian regions | Character |
| temp_anomaly | regional monthly temperature anomaly | Numeric |
| ENSO | ENSO phase (El Niño or Neutral) | Factor |
| temp_z | region-specific standardized anomaly | Numeric |
| Temp_state | categorical state (High, Normal, Low) | Factor |
| Temp_item | item label used for association rules | Character |
| t_idx | year | month | region | temp_anomaly | oni | ENSO | temp_z | Temp_state | Temp_item |
|---|---|---|---|---|---|---|---|---|---|
| 1201 | 1950 | 1 | WestEurope | 0.9252591 | -1.5 | LaNina | 0.5032373 | Temp_Normal | Temp_Normal_WestEurope |
| 1202 | 1950 | 2 | WestEurope | 0.8052714 | -1.3 | LaNina | 0.4338347 | Temp_Normal | Temp_Normal_WestEurope |
| 1203 | 1950 | 3 | WestEurope | -0.3032144 | -1.2 | LaNina | -0.2073293 | Temp_Normal | Temp_Normal_WestEurope |
| 1204 | 1950 | 4 | WestEurope | -0.3703260 | -1.2 | LaNina | -0.2461477 | Temp_Normal | Temp_Normal_WestEurope |
| 1205 | 1950 | 5 | WestEurope | -0.4951236 | -1.1 | LaNina | -0.3183324 | Temp_Normal | Temp_Normal_WestEurope |
| 1206 | 1950 | 6 | WestEurope | 0.4398493 | -0.9 | LaNina | 0.2224693 | Temp_Normal | Temp_Normal_WestEurope |
| year | month | region | precip_monthly | ENSO | precip_z | Precip_state | Precip_item |
|---|---|---|---|---|---|---|---|
| 1979 | 1 | WestEurope | 0.000000 | Neutral | -0.9583615 | Precip_Normal | Precip_Normal_WestEurope |
| 1979 | 2 | WestEurope | 9.727943 | Neutral | 0.7657893 | Precip_Normal | Precip_Normal_WestEurope |
| 1979 | 3 | WestEurope | 8.071862 | Neutral | 0.4722705 | Precip_Normal | Precip_Normal_WestEurope |
| 1979 | 4 | WestEurope | 5.021848 | Neutral | -0.0683046 | Precip_Normal | Precip_Normal_WestEurope |
| 1979 | 5 | WestEurope | 16.524888 | Neutral | 1.9704589 | Precip_Wet | Precip_Wet_WestEurope |
| 1979 | 6 | WestEurope | 6.913927 | Neutral | 0.2670417 | Precip_Normal | Precip_Normal_WestEurope |
##
## Temp_High Temp_Low Temp_Normal
## ElNino 269 294 1243
## LaNina 255 273 1327
## Neutral 442 416 1858
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.50749 -0.63681 0.02815 0.00000 0.66407 4.19279
##
## Precip_Dry Precip_Normal Precip_Wet
## ElNino 151 794 154
## LaNina 141 804 161
## Neutral 170 1250 316
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.6576 -0.7592 -0.1847 0.0000 0.6157 6.5851
Based on the statistical distributions shown in the frequency tables, I have integrated the raw counts with the Association Rule Mining (ARM) framework. These results inform my rule discovery process as follows:
Frequency and Support: The counts for extreme states, such as
Precip_Dry(151) andTemp_Low(294) during El Niño, indicate that these events have sufficient frequency to be analyzed. In ARM terms, these states will meet the minimum Support thresholds I have set to ensure that the discovered co-occurrence patterns across Eurasia are statistically relevant.State Dependency and Confidence: My data shows that while
Normalstates dominate during Neutral phases, the frequency of extreme anomalies is sustained during El Niño.Non-Linear Teleconnections: The high z-score extremes (
Max_Temp: 4.19;Max_Precip: 6.58) suggest that ENSO drives complex, non-linear climate responses. I believe ARM is uniquely suited for this task; unlike traditional regression-based approaches, it allows me to capture whether these extreme “states” tend to cluster in specific regions, such as South Asia or Northern Eurasia, under El Niño conditions.Baseline for Lift: By quantifying the occurrence of states during Neutral phases (where
Temp_Normalcounts are highest at 1858), I have established a robust baseline. This allows me to calculate the Lift of my rules, identifying which Eurasian climate patterns are truly amplified by ENSO rather than occurring by random chance.
5 Make Transaction List
Each region–month combination was treated as one transaction containing two items: a temperature state and a precipitation state. These were converted into a transactions object using the arules package in R.
#aggravate
state_table <- temp_state %>%
dplyr::select(year, month, region, Temp_item, ENSO) %>%
left_join(
precip_state %>%
dplyr::select(year, month, region, Precip_item),
by = c("year", "month", "region")
)%>%
filter(year>=1979)%>%drop_na()
#transaction id
state_table <- state_table %>%
mutate(
trans_id = paste(year, month, region, sep = "_")
)
item_long <- state_table %>%
dplyr::select(trans_id, Temp_item, Precip_item) %>%
pivot_longer(
cols = c(Temp_item, Precip_item),
names_to = "type",
values_to = "item"
) %>%
dplyr::select(trans_id, item)5.1 Convert to a Transactions Object
## items transactionID
## [1] {Precip_Normal_CentralAsia, Temp_Normal_CentralAsia} 1979_1_CentralAsia
## [2] {Precip_Normal_EastAsia, Temp_High_EastAsia} 1979_1_EastAsia
## [3] {Precip_Normal_EastEurope, Temp_Normal_EastEurope} 1979_1_EastEurope
## [4] {Precip_Dry_Mediterranean, Temp_Normal_Mediterranean} 1979_1_Mediterranean
## [5] {Precip_Normal_NorthEurasia, Temp_Low_NorthEurasia} 1979_1_NorthEurasia
## [1] "Precip_Dry_CentralAsia" "Precip_Dry_EastAsia"
## [3] "Precip_Dry_EastEurope" "Precip_Dry_Mediterranean"
## [5] "Precip_Dry_NorthEurasia" "Precip_Dry_SouthAsia"
## [7] "Precip_Normal_CentralAsia" "Precip_Normal_EastAsia"
## [9] "Precip_Normal_EastEurope" "Precip_Normal_Mediterranean"
## [11] "Precip_Normal_NorthEurasia" "Precip_Normal_SouthAsia"
## [13] "Precip_Normal_WestEurope" "Precip_Wet_CentralAsia"
## [15] "Precip_Wet_EastAsia" "Precip_Wet_EastEurope"
## [17] "Precip_Wet_Mediterranean" "Precip_Wet_NorthEurasia"
## [19] "Precip_Wet_SouthAsia" "Precip_Wet_WestEurope"
## [21] "Temp_High_CentralAsia" "Temp_High_EastAsia"
## [23] "Temp_High_EastEurope" "Temp_High_Mediterranean"
## [25] "Temp_High_NorthEurasia" "Temp_High_SouthAsia"
## [27] "Temp_High_WestEurope" "Temp_Low_CentralAsia"
## [29] "Temp_Low_EastAsia" "Temp_Low_EastEurope"
## [31] "Temp_Low_Mediterranean" "Temp_Low_NorthEurasia"
## [33] "Temp_Low_SouthAsia" "Temp_Low_WestEurope"
## [35] "Temp_Normal_CentralAsia" "Temp_Normal_EastAsia"
## [37] "Temp_Normal_EastEurope" "Temp_Normal_Mediterranean"
## [39] "Temp_Normal_NorthEurasia" "Temp_Normal_SouthAsia"
## [41] "Temp_Normal_WestEurope"
5.2 Samples Divided by ENSO Status (El Niño vs Neutral)
Transactions were split into two subsets based on ENSO phase: El Niño and Neutral, allowing direct comparison under consistent settings.
trans_labels <- labels(trans_all)
target_indices <- which(enso_map$ENSO == "ElNino")
safe_indices <- target_indices[target_indices <= length(trans_all)]
trans_elnino <- trans_all[safe_indices]
target_indices <- which(enso_map$ENSO == "Neutral")
safe_indices <- target_indices[target_indices <= length(trans_all)]
trans_neutral <- trans_all[safe_indices]
target_indices <- which(enso_map$ENSO == "LaNina")
safe_indices <- target_indices[target_indices <= length(trans_all)]
trans_lanina <- trans_all[safe_indices]6 Association Rules
6.1 First round
6.1.1 Apriori
Association rules were mined using conservative thresholds (support ≈ 4%, confidence ≥ 60%, lift > 1) to ensure robustness.
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.04 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 43
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[40 item(s), 1099 transaction(s)] done [0.00s].
## sorting and recoding items ... [14 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [14 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.04 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 69
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[41 item(s), 1736 transaction(s)] done [0.00s].
## sorting and recoding items ... [14 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [14 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Association rule mining was applied separately to El Niño and Neutral samples using the same parameter settings (minimum support around 4–5%, confidence ≥ 0.6). In both cases, the algorithm generated 14 rules, all of which involved normal temperature and normal precipitation states within the same region.
No robust rules linking anomalous temperature or precipitation states were identified. This result suggests that, under conservative and comparable thresholds, anomalous climate states do not form frequent or stable co-occurrence patterns, regardless of ENSO phase.
## [1] 14
## [1] 14
## lhs rhs support confidence coverage lift count
## [1] {Temp_Normal_EastEurope} => {Precip_Normal_EastEurope} 0.08644222 0.8558559 0.10100091 7.348325 95
## [2] {Precip_Normal_EastEurope} => {Temp_Normal_EastEurope} 0.08644222 0.7421875 0.11646952 7.348325 95
## [3] {Temp_Normal_WestEurope} => {Precip_Normal_WestEurope} 0.09372157 0.8879310 0.10555050 7.228416 103
## [4] {Precip_Normal_WestEurope} => {Temp_Normal_WestEurope} 0.09372157 0.7629630 0.12283894 7.228416 103
## [5] {Precip_Normal_EastAsia} => {Temp_Normal_EastAsia} 0.06824386 0.7075472 0.09645132 7.069039 75
## [6] {Temp_Normal_EastAsia} => {Precip_Normal_EastAsia} 0.06824386 0.6818182 0.10009099 7.069039 75
## [7] {Precip_Normal_Mediterranean} => {Temp_Normal_Mediterranean} 0.06187443 0.6800000 0.09099181 6.984299 68
## [8] {Temp_Normal_Mediterranean} => {Precip_Normal_Mediterranean} 0.06187443 0.6355140 0.09736124 6.984299 68
## [9] {Temp_Normal_NorthEurasia} => {Precip_Normal_NorthEurasia} 0.06915378 0.6280992 0.11010009 6.902810 76
## [10] {Precip_Normal_NorthEurasia} => {Temp_Normal_NorthEurasia} 0.06915378 0.7600000 0.09099181 6.902810 76
## lhs rhs support confidence coverage lift count
## [1] {Temp_Normal_EastEurope} => {Precip_Normal_EastEurope} 0.09447005 0.8817204 0.10714286 7.615257 164
## [2] {Precip_Normal_EastEurope} => {Temp_Normal_EastEurope} 0.09447005 0.8159204 0.11578341 7.615257 164
## [3] {Precip_Normal_NorthEurasia} => {Temp_Normal_NorthEurasia} 0.07546083 0.7705882 0.09792627 7.310061 131
## [4] {Temp_Normal_NorthEurasia} => {Precip_Normal_NorthEurasia} 0.07546083 0.7158470 0.10541475 7.310061 131
## [5] {Temp_Normal_EastAsia} => {Precip_Normal_EastAsia} 0.07200461 0.6906077 0.10426267 7.179012 125
## [6] {Precip_Normal_EastAsia} => {Temp_Normal_EastAsia} 0.07200461 0.7485030 0.09619816 7.179012 125
## [7] {Precip_Normal_Mediterranean} => {Temp_Normal_Mediterranean} 0.07546083 0.7751479 0.09735023 7.157749 131
## [8] {Temp_Normal_Mediterranean} => {Precip_Normal_Mediterranean} 0.07546083 0.6968085 0.10829493 7.157749 131
## [9] {Temp_Normal_WestEurope} => {Precip_Normal_WestEurope} 0.08928571 0.8516484 0.10483871 7.073979 155
## [10] {Precip_Normal_WestEurope} => {Temp_Normal_WestEurope} 0.08928571 0.7416268 0.12039171 7.073979 155
## lhs rhs support confidence coverage lift count
## [1] {Temp_Normal_WestEurope} => {Precip_Normal_WestEurope} 0.09372157 0.8879310 0.10555050 7.228416 103
## [2] {Temp_Normal_EastEurope} => {Precip_Normal_EastEurope} 0.08644222 0.8558559 0.10100091 7.348325 95
## [3] {Precip_Normal_WestEurope} => {Temp_Normal_WestEurope} 0.09372157 0.7629630 0.12283894 7.228416 103
## [4] {Precip_Normal_NorthEurasia} => {Temp_Normal_NorthEurasia} 0.06915378 0.7600000 0.09099181 6.902810 76
## [5] {Precip_Normal_EastEurope} => {Temp_Normal_EastEurope} 0.08644222 0.7421875 0.11646952 7.348325 95
## [6] {Precip_Normal_EastAsia} => {Temp_Normal_EastAsia} 0.06824386 0.7075472 0.09645132 7.069039 75
## [7] {Temp_Normal_EastAsia} => {Precip_Normal_EastAsia} 0.06824386 0.6818182 0.10009099 7.069039 75
## [8] {Precip_Normal_Mediterranean} => {Temp_Normal_Mediterranean} 0.06187443 0.6800000 0.09099181 6.984299 68
## [9] {Precip_Normal_CentralAsia} => {Temp_Normal_CentralAsia} 0.06187443 0.6800000 0.09099181 6.793818 68
## [10] {Precip_Normal_SouthAsia} => {Temp_Normal_SouthAsia} 0.06369427 0.6481481 0.09827116 6.534998 70
## lhs rhs support confidence coverage lift count
## [1] {Temp_Normal_EastEurope} => {Precip_Normal_EastEurope} 0.09447005 0.8817204 0.10714286 7.615257 164
## [2] {Temp_Normal_WestEurope} => {Precip_Normal_WestEurope} 0.08928571 0.8516484 0.10483871 7.073979 155
## [3] {Precip_Normal_EastEurope} => {Temp_Normal_EastEurope} 0.09447005 0.8159204 0.11578341 7.615257 164
## [4] {Precip_Normal_Mediterranean} => {Temp_Normal_Mediterranean} 0.07546083 0.7751479 0.09735023 7.157749 131
## [5] {Precip_Normal_NorthEurasia} => {Temp_Normal_NorthEurasia} 0.07546083 0.7705882 0.09792627 7.310061 131
## [6] {Precip_Normal_EastAsia} => {Temp_Normal_EastAsia} 0.07200461 0.7485030 0.09619816 7.179012 125
## [7] {Precip_Normal_WestEurope} => {Temp_Normal_WestEurope} 0.08928571 0.7416268 0.12039171 7.073979 155
## [8] {Precip_Normal_SouthAsia} => {Temp_Normal_SouthAsia} 0.07315668 0.7298851 0.10023041 6.886307 127
## [9] {Temp_Normal_NorthEurasia} => {Precip_Normal_NorthEurasia} 0.07546083 0.7158470 0.10541475 7.310061 131
## [10] {Temp_Normal_CentralAsia} => {Precip_Normal_CentralAsia} 0.07027650 0.7134503 0.09850230 6.768031 122
6.1.2 Comparison
Under both ENSO phases, the rules primarily reflected Normal–Normal temperature–precipitation associations within regions. No stable or frequent rules involving anomalous states (High/Low or Wet/Dry) were identified.
## [1] 0
## [1] 14
6.1.3 Visualization
plot(rules_elnino, method = "grouped",
control = list(main = "Grouped Matrix of El Nino Weather States"))## Available control parameters (with default values):
## k = 20
## aggr.fun = function (x, ...) UseMethod("mean")
## rhs_max = 10
## lhs_label_items = 2
## col = c("#EE0000FF", "#EEEEEEFF")
## groups = NULL
## engine = ggplot2
## verbose = FALSE
plot(rules_elnino, method = "graph",
control = list(type = "items", main = "Network Graph of Climate Associations"))## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
plot(rules_elnino, method = "paracoord",
control = list(reorder = TRUE, main = "Parallel Coordinates Plot"))
The scatter plots and grouped visualizations of association rules
further confirm the numerical results. Rules are concentrated in a
narrow range of support and confidence, with relatively high lift
values, indicating strong baseline associations.
Across all visualizations, the left-hand-side and right-hand-side
items are dominated by Temp_Normal and
Precip_Normal states. The absence of anomalous states in
these plots visually reinforces the conclusion that association rules
primarily capture baseline climate coupling, rather than extreme or rare
events. Differences between El Niño and Neutral conditions remain
limited in both rule structure and strength.
6.1.4 Supplement
To complement this result, I examined anomaly frequencies directly. Bar charts of temperature and precipitation anomaly frequencies revealed that El Niño alters how often anomalies occur, even though it does not create frequent co-occurring anomaly patterns. This suggests ENSO mainly affects marginal distributions rather than stable co-occurrence structures.
rules_elnino_non_normal <- subset(
rules_elnino,
!(lhs %pin% "Normal" & rhs %pin% "Normal")
)
rules_neutral_non_normal <- subset(
rules_neutral,
!(lhs %pin% "Normal" & rhs %pin% "Normal")
)
length(rules_elnino_non_normal)## [1] 0
## [1] 0
temp_freq <- state_table %>%
filter(ENSO %in% c("ElNino", "Neutral")) %>%
filter(!str_detect(Temp_item, "Temp_Normal"))%>%
group_by(ENSO, region, Temp_item) %>%
summarise(n = n(), .groups = "drop") %>%
group_by(ENSO, region) %>%
mutate(freq = n / sum(n),
Temp_item = str_remove(Temp_item, "_[^_]+$"))
print(temp_freq)## # A tibble: 28 × 5
## # Groups: ENSO, region [14]
## ENSO region Temp_item n freq
## <chr> <chr> <chr> <int> <dbl>
## 1 ElNino CentralAsia Temp_High 24 0.6
## 2 ElNino CentralAsia Temp_Low 16 0.4
## 3 ElNino EastAsia Temp_High 53 0.946
## 4 ElNino EastAsia Temp_Low 3 0.0536
## 5 ElNino EastEurope Temp_High 15 0.441
## 6 ElNino EastEurope Temp_Low 19 0.559
## 7 ElNino Mediterranean Temp_High 21 0.553
## 8 ElNino Mediterranean Temp_Low 17 0.447
## 9 ElNino NorthEurasia Temp_High 51 0.981
## 10 ElNino NorthEurasia Temp_Low 1 0.0192
## # ℹ 18 more rows
precip_freq <- state_table %>%
filter(ENSO %in% c("ElNino", "Neutral")) %>%
filter(!str_detect(Precip_item, "Precip_Normal"))%>%
group_by(ENSO, region, Precip_item) %>%
summarise(n = n(), .groups = "drop") %>%
group_by(ENSO, region) %>%
mutate(freq = n / sum(n),
Precip_item = str_remove(Precip_item , "_[^_]+$"))
print(precip_freq)## # A tibble: 26 × 5
## # Groups: ENSO, region [14]
## ENSO region Precip_item n freq
## <chr> <chr> <chr> <int> <dbl>
## 1 ElNino CentralAsia Precip_Dry 28 0.528
## 2 ElNino CentralAsia Precip_Wet 25 0.472
## 3 ElNino EastAsia Precip_Dry 44 0.611
## 4 ElNino EastAsia Precip_Wet 28 0.389
## 5 ElNino EastEurope Precip_Dry 6 0.3
## 6 ElNino EastEurope Precip_Wet 14 0.7
## 7 ElNino Mediterranean Precip_Dry 29 0.558
## 8 ElNino Mediterranean Precip_Wet 23 0.442
## 9 ElNino NorthEurasia Precip_Dry 21 0.477
## 10 ElNino NorthEurasia Precip_Wet 23 0.523
## # ℹ 16 more rows
ggplot(temp_freq, aes(x = region, y = freq, fill = ENSO)) +
geom_col(position = "dodge") +
facet_wrap(~ Temp_item) +
labs(
title = "Frequency of Temperature Anomalies by ENSO Phase",
y = "Relative Frequency",
x = "Region"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))ggplot(precip_freq, aes(x = region, y = freq, fill = ENSO)) +
geom_col(position = "dodge") +
facet_wrap(~ Precip_item) +
labs(
title = "Frequency of Precipitation Anomalies by ENSO Phase",
y = "Relative Frequency",
x = "Region"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
In contrast to association rules, the frequency comparison plots reveal
clear ENSO-related signals. Under El Niño conditions, warm temperature
anomalies occur more frequently in several regions, including East Asia,
South Asia, and Northern Eurasia. Cold anomalies do not show a similarly
consistent increase.
For precipitation, El Niño periods are generally associated with a higher frequency of dry anomalies, while wet anomalies tend to be more common during Neutral conditions. These results indicate that ENSO influences climate variability mainly by shifting the probability of anomalies, rather than creating frequent co-occurring anomalous patterns across regions.
6.2 Second Round
6.2.1 Binarization apporach
To address the issues of excessive “Normal” states and overly fragmented classifications, I have refined my research strategy by shifting the focus from “Which specific anomaly occurs?” to “Does any anomaly occur?”. I am implementing a binarization approach that merges specific states (High/Low, Wet/Dry) into a single “Anomalous” category based on the \(|z|\ge1\) threshold. This redefinition is not meant to manufacture outcomes, but to fundamentally simplify the state space and capture the core atmospheric instability associated with different ENSO phases.
temp_binary <- temp_state %>%
dplyr::select(year, month, region, Temp_state, ENSO) %>%
filter(year>=1979)%>%drop_na()%>%
mutate(Temp_binary = ifelse(Temp_state == "Temp_Normal",
"Temp_Normal", "Temp_Anomaly"))
precip_binary <- precip_state %>%
dplyr::select(year, month, region, Precip_state, ENSO) %>%
filter(year>=1979)%>%drop_na()%>%
mutate(Precip_binary =ifelse(Precip_state == "Precip_Normal",
"Precip_Normal","Precip_Anomaly"))6.2.2 Convert to a Transactions Object
library(arules)
trans_all_bin <- as(
split(item_long_bin$item, item_long_bin$trans_id),
"transactions"
)## items transactionID
## [1] {Precip_Normal_CentralAsia,
## Temp_Normal_CentralAsia} 1979_1_CentralAsia
## [2] {Precip_Normal_EastAsia,
## Temp_Anomaly_EastAsia} 1979_1_EastAsia
## [3] {Precip_Normal_EastEurope,
## Temp_Normal_EastEurope} 1979_1_EastEurope
## [4] {Precip_Anomaly_Mediterranean,
## Temp_Normal_Mediterranean} 1979_1_Mediterranean
## [5] {Precip_Normal_NorthEurasia,
## Temp_Anomaly_NorthEurasia} 1979_1_NorthEurasia
## [1] "Precip_Anomaly_CentralAsia" "Precip_Anomaly_EastAsia"
## [3] "Precip_Anomaly_EastEurope" "Precip_Anomaly_Mediterranean"
## [5] "Precip_Anomaly_NorthEurasia" "Precip_Anomaly_SouthAsia"
## [7] "Precip_Anomaly_WestEurope" "Precip_Normal_CentralAsia"
## [9] "Precip_Normal_EastAsia" "Precip_Normal_EastEurope"
## [11] "Precip_Normal_Mediterranean" "Precip_Normal_NorthEurasia"
## [13] "Precip_Normal_SouthAsia" "Precip_Normal_WestEurope"
## [15] "Temp_Anomaly_CentralAsia" "Temp_Anomaly_EastAsia"
## [17] "Temp_Anomaly_EastEurope" "Temp_Anomaly_Mediterranean"
## [19] "Temp_Anomaly_NorthEurasia" "Temp_Anomaly_SouthAsia"
## [21] "Temp_Anomaly_WestEurope" "Temp_Normal_CentralAsia"
## [23] "Temp_Normal_EastAsia" "Temp_Normal_EastEurope"
## [25] "Temp_Normal_Mediterranean" "Temp_Normal_NorthEurasia"
## [27] "Temp_Normal_SouthAsia" "Temp_Normal_WestEurope"
tx_labels_bin <- labels(trans_all_bin)
enso_map_bin <- state_table_bin %>%
dplyr::select(trans_id, ENSO) %>%
distinct()
target_indices <- which(enso_map_bin$ENSO == "ElNino")
safe_indices <- target_indices[target_indices <= length(trans_all)]
trans_elnino_bin <- trans_all_bin[safe_indices]
target_indices <- which(enso_map_bin$ENSO == "Neutral")
safe_indices <- target_indices[target_indices <= length(trans_all)]
trans_neutral_bin <- trans_all[safe_indices]6.2.3 Second Apriori
rules_bin_elnino <- apriori(
trans_elnino_bin,
parameter = list(
supp = 0.05,
conf = 0.6,
minlen = 2
)
)## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.05 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 54
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[28 item(s), 1099 transaction(s)] done [0.00s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [14 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
By binarizing the data, I successfully reduced the item count from 40 to 17, significantly decreasing data sparsity and consolidating the “Anomalous” states. Despite setting a robust support threshold (minimum count of 54) to ensure climatic significance, the number of generated rules remained low at 14—identical to the previous three-state classification. This critical observation indicates that while El Niño drives regional instability, these climate anomalies tend to occur in isolation rather than as synchronized, co-occurring events across the study area. Consequently, my strategy now pivots from looking for “global” spatial patterns to analyzing why these anomalies remain localized despite the overarching influence of El Niño.
6.2.4 Conditional Probability and Risk Ratio
#temperature
temp_rr <- temp_state %>%
mutate(
Temp_Anomaly = ifelse(Temp_state == "Temp_Normal", 0, 1)
) %>%
filter(ENSO %in% c("ElNino", "Neutral")) %>%
group_by(region, ENSO) %>%
summarise(
p_anom = mean(Temp_Anomaly),
.groups = "drop"
) %>%
tidyr::pivot_wider(
names_from = ENSO,
values_from = p_anom
) %>%
mutate(
RR = ElNino / Neutral,
Diff = ElNino - Neutral
)#preciption
precip_rr <- precip_state %>%
mutate(
Anomaly = ifelse(Precip_state == "Precip_Normal", 0, 1)
) %>%
filter(ENSO %in% c("ElNino", "Neutral")) %>%
group_by(region, ENSO) %>%
summarise(
p_anom = mean(Anomaly),
.groups = "drop"
) %>%
pivot_wider(
names_from = ENSO,
values_from = p_anom
) %>%
mutate(
RR = ElNino / Neutral,
Diff = ElNino - Neutral
)6.2.5 Visalisation
ggplot(temp_rr, aes(x = reorder(region, RR), y = RR, fill = RR > 1)) +
geom_col(show.legend = FALSE) +
scale_fill_manual(values = c("TRUE" = "#2E8B57", "FALSE" = "#B3B3B3")) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black", linewidth = 0.8) +
geom_text(aes(label = round(RR, 2)), vjust = -0.5, size = 3.5, fontface = "bold") +
labs(
title = "Risk Ratio of Temperature Anomalies (El Niño vs Neutral)",
subtitle = "RR > 1 indicates El Niño increases the probability of temperature instability",
y = "Risk Ratio",
x = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
panel.grid.major.x = element_blank()
)ggplot(precip_rr, aes(x = reorder(region, RR), y = RR, fill = RR > 1)) +
geom_col(show.legend = FALSE) +
scale_fill_manual(values = c("TRUE" = "#2E8B57", "FALSE" = "#B3B3B3")) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black", linewidth = 0.8) +
geom_text(aes(label = round(RR, 2)), vjust = -0.5, size = 3.5) +
labs(
title = "Risk Ratio of Precipitation Anomalies (El Niño vs Neutral)",
subtitle = "RR > 1 indicates increased risk of anomalies during El Niño",
y = "Risk Ratio (Anomalous State)",
x = "Region"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
I quantified the regional impact of El Niño by calculating the
Risk Ratio (RR) for binarized temperature and
precipitation anomalies (\(|z|\ge 1\)).
By filtering the high-frequency “Normal” background, the results reveal
a significant increase in atmospheric instability (RR > 1) across
specific regions like East Asia, which shows the
highest risk for both thermal (\(RR=1.17\)) and rainfall (\(RR=1.46\)) perturbations.
However, the fact that the number of association rules did not increase alongside these rising RRs suggests that El Niño drives independent, localized regional responses rather than large-scale synchronized anomaly clusters. This localized sensitivity indicates that while the “threat” of an anomaly is higher during El Niño, these events typically do not occur simultaneously across the continent.
7 Conclusion
This project applies association rule mining to explore the relationship between ENSO phases and regional climate anomalies across Eurasia. The main analysis shows that, despite clear ENSO-related influences on regional climate variability, El Niño does not generate frequent or stable co-occurrence patterns of anomalous temperature and precipitation states across regions. Most of the extracted high-confidence rules are dominated by normal climate conditions, indicating that strong baseline regional structures persist under different ENSO phases.
These findings suggest that ENSO does not act as a synchronizing mechanism for extreme regional responses, but rather operates as a large-scale background influence. From the perspective of association rule mining, climate anomalies appear to be relatively rare events whose co-occurrence frequency is too low to form robust rules under reasonable support and confidence thresholds. This highlights the suitability of association rule mining for identifying frequent and stable patterns, as well as its limitations when applied to extreme or infrequent phenomena.
As an additional extension, conditional probability and risk ratio analyses were introduced to further examine ENSO-related effects. These results show that El Niño shifts the probability of temperature and precipitation anomalies in a region-dependent manner, even though such changes are not reflected in frequent co-occurrence rules. This comparison provides extra insight into how different analytical approaches capture different aspects of climate variability, and helps place the association rule results in a broader methodological context.
8 References
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), 487-499.↩︎
McPhaden, M. J., Zebiak, S. E., & Glantz, M. H. (2006). ENSO as an integrating concept in earth science. Science, 314(5806), 1740-1745.↩︎
Trenberth, K. E., Caron, J. M., Stepaniak, D. P., & Worley, S. (2002). Evolution of El Niño–Southern Oscillation and global atmospheric surface temperatures. Journal of Geophysical Research: Atmospheres, 107(D10), ACL 5-1.↩︎
Yuan, X., Chen, D., Li, C., Wang, H., & Li, L. (2018). Arctic sea ice anomalies during the two types of El Niño. Journal of Climate, 31(5), 1961-1979.↩︎
Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz‐Sabater, J., … & Thépaut, J. N. (2020). The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society, 146(730), 1999-2049.↩︎