During COVID-19, governments rarely implemented a single border
policy in isolation. Instead, they introduced structured policy
packages (e.g., closing air travel + visa bans + specific
exceptions).
This project applies association rule learning (Apriori
+ ECLAT) to discover which border policy measures tend to co-occur.
Transaction logic (core idea): -
Transaction = one policy event: (Country + Start
Date + Policy ID) - Item = a policy attribute
active in that event (e.g., AIR, VISA_BAN,
CITIZEN_EXCEP, plus TYPE_* and
SUBTYPE_*)
Outputs we aim to deliver: - frequent policy bundles (frequent itemsets), - strong associations between policy measures (rules), - visualizations of rule structure, - interpretation in an institutional/governance context.
We use the COVID Border Accountability Project
(COBAP) policy list dataset (downloaded from Harvard Dataverse
by the student).
The dataset provides event-level border restriction records
including:
POLICY_TYPE,
POLICY_SUBTYPEAIR, LAND,
SEAVISA_BAN,
HISTORY_BAN, CITIZEN,
REFUGEECITIZEN_EXCEP,
COUNTRY_EXCEP, WORK_EXCEPCOUNTRY_NAME,
START_DATE, IDThis dataset is well-suited for association rules because each policy event naturally represents a set of co-implemented measures.
We load packages used for: - data manipulation
(tidyverse, lubridate) - association rule
mining (arules) - rule visualization
(arulesViz)
knitr::opts_chunk$set(echo = TRUE)
pkgs <- c("tidyverse", "lubridate", "arules", "arulesViz")
to_install <- pkgs[!sapply(pkgs, requireNamespace, quietly = TRUE)]
if(length(to_install) > 0) install.packages(to_install)
library(tidyverse)
library(lubridate)
library(arules)
library(arulesViz)
| ## 2) Load Data |
| df <- read.csv(“policy_list.csv”, stringsAsFactors = FALSE) |
| dim(df) |
| names(df) |
| head(df, 3) |
We create a TRANS identifier for each policy event and convert policy attributes into an item list. ## Data preparation
The original COBAP dataset is provided in a “wide” format, where each row corresponds to a single border policy event and each column represents a specific policy attribute (e.g. air travel restrictions, visa bans, citizen exceptions).
However, association rule algorithms such as Apriori and ECLAT require the data to be in a transaction format, where: - each transaction is a set of items, - each item represents a characteristic that is present in that transaction.
In this project, we define:
This transformation allows us to interpret each policy event as a “policy package” consisting of several coordinated measures.
item_cols <- c( “POLICY_TYPE”, “POLICY_SUBTYPE”, “AIR”, “LAND”, “SEA”, “VISA_BAN”, “CITIZEN”, “HISTORY_BAN”, “REFUGEE”, “CITIZEN_EXCEP”, “COUNTRY_EXCEP”, “WORK_EXCEP” )
df2 <- df %>% select(ID, COUNTRY_NAME, START_DATE, all_of(item_cols)) %>% mutate( TRANS = paste(COUNTRY_NAME, START_DATE, ID, sep = “”), POLICY_TYPE = paste0(”TYPE”, POLICY_TYPE), POLICY_SUBTYPE = paste0(“SUBTYPE_”, POLICY_SUBTYPE) )
binary_cols <- c( “AIR”,“LAND”,“SEA”, “VISA_BAN”,“CITIZEN”,“HISTORY_BAN”,“REFUGEE”, “CITIZEN_EXCEP”,“COUNTRY_EXCEP”,“WORK_EXCEP” )
long_bin <- df2 %>% select(TRANS, all_of(binary_cols)) %>% pivot_longer(-TRANS, names_to=“var”, values_to=“val”) %>% filter(val == 1) %>% mutate(ITEM = var) %>% select(TRANS, ITEM)
long_cat <- df2 %>% select(TRANS, POLICY_TYPE, POLICY_SUBTYPE) %>% pivot_longer(-TRANS, names_to=“var”, values_to=“val”) %>% filter(!is.na(val), val != “SUBTYPE_NONE”) %>% mutate(ITEM = val) %>% select(TRANS, ITEM)
trans_single <- bind_rows(long_cat, long_bin) %>% distinct()
head(trans_single, 15)
The arules package expects a transactions object. We save the single-format table to CSV and read it back as transactions.
write.csv(trans_single, “trans_single.csv”, row.names = FALSE)
cobap_trans <- read.transactions( “trans_single.csv”, format = “single”, cols = c(1,2), header = TRUE, sep = “,” )
cobap_trans summary(cobap_trans) inspect(cobap_trans[1:5])
We first explore: which items are most frequent, and how “long” transactions are (how many items per policy event).
itemFrequencyPlot(cobap_trans, topN = 20, type = “absolute”, main = “Top 20 items (absolute frequency)”) itemFrequencyPlot(cobap_trans, topN = 20, type = “relative”, main = “Top 20 items (relative frequency)”)
sort(itemFrequency(cobap_trans), decreasing = TRUE) %>% head(20)
Association rules: measures (support, confidence, lift) Association rules have the form: {LHS} → {RHS} Key measures: Support: how often the whole rule occurs in the dataset (share of transactions containing both LHS and RHS) Confidence: how often RHS occurs given LHS (conditional probability of RHS given LHS) Lift: strength compared to random coincidence lift > 1 means LHS increases the likelihood of RHS. In policy terms, high-lift rules suggest that certain measures form coherent policy packages.
We use Apriori with: minimum support = 0.02 minimum confidence = 0.60 min rule length = 2 This produced a manageable and interpretable rule set.
rules <- apriori( cobap_trans, parameter = list(supp = 0.02, conf = 0.60, minlen = 2) )
rules summary(rules)
Redundant rules can repeat the same information. We remove them and sort by lift.
rules_nored <- rules[!is.redundant(rules)] summary(rules_nored)
rules_lift <- sort(rules_nored, by = “lift”, decreasing = TRUE) inspect(head(rules_lift, 20))
rules_complete <- apriori( cobap_trans, parameter = list(supp = 0.01, conf = 0.40, minlen = 2), appearance = list(default = “lhs”, rhs = “TYPE_COMPLETE”), control = list(verbose = FALSE) )
summary(rules_complete) inspect(head(sort(rules_complete, by = “lift”, decreasing = TRUE), 15))
We visualize the rule structure with arulesViz (top rules by lift).
top_rules <- head(rules_lift, 100)
plot(top_rules, measure = c(“support”,“lift”), shading = “confidence”) plot(top_rules, method = “matrix”, measure = “lift”) plot(top_rules, method = “grouped”)
plot(head(rules_lift, 30), method = “graph”, control = list(type = “items”))
ECLAT finds frequent itemsets directly. Then we can induce rules from those itemsets. This provides a second algorithmic perspective consistent with the course material.
freq_sets <- eclat(cobap_trans, parameter = list(supp = 0.02, maxlen = 5)) summary(freq_sets) inspect(head(sort(freq_sets, by = “support”, decreasing = TRUE), 15))
eclat_rules <- ruleInduction(freq_sets, cobap_trans, confidence = 0.60) summary(eclat_rules) inspect(head(sort(eclat_rules, by = “lift”, decreasing = TRUE), 15))
Interpretation of results (write-up) From the strongest rules (by lift and confidence), we observe that border closures were typically designed as packages: closure measures (air/land/sea), plus restriction types (visa/history/citizen), plus controlled exception structures (citizens, workers, specific countries). Example interpretation template: Rule: {A, B} → {C} Support: x% Confidence: y% Lift: z Meaning: When A and B are present, C appears with high probability; lift > 1 implies the association is stronger than chance. Limitations Association rules reveal co-occurrence patterns, not causality. They help identify which policy measures tend to be implemented together, but not the political or epidemiological reasons behind them. Additionally, results may reflect reporting differences and variation in how policies are coded across countries. Conclusion This project demonstrates how association rule learning can uncover structured patterns in real-world governance data. Using COBAP policy events as transactions, Apriori and ECLAT identify frequent border policy bundles and strong associations, supporting the conclusion that COVID border restrictions were deployed as coordinated policy packages rather than independent actions.