International sanctions increasingly target maritime transportation, leading to a growing number of vessels added to sanctions lists1. These vessels differ widely in flag, type, and sanction program, creating a complex, high‑dimensional dataset that is difficult to interpret using simple descriptive methods.
Unsupervised learning offers a way to uncover structure in this complexity by identifying patterns without predefined labels. This study applies association rule mining, PCA, and clustering to explore how vessel characteristics relate to sanction programs and whether meaningful groups emerge.
The analysis addresses three research questions:
How have vessel sanctions evolved over time?
Are there recurring associations between vessel types, flags, and sanction programs?
Can unsupervised learning methods identify meaningful groups of sanctioned vessels?
By addressing these questions, the study provides an exploratory, data‑driven perspective on maritime sanctions and demonstrates how unsupervised learning can reveal hidden structure in complex regulatory datasets.
This study applies a sequence of unsupervised learning techniques to explore structural patterns in sanctioned maritime vessels. The methodological pipeline integrates data collection, preprocessing, association rule mining, dimensionality reduction, and clustering. Each step is designed to uncover a different layer of structure in the dataset, moving from categorical co‑occurrence patterns to broader vessel groupings.
Libraries used:
library(tidyverse)
library(lubridate)
library(fastDummies)
library(factoextra)
library(clusterCrit)
library(cluster)
library(arules)
library(arulesViz)
Data Collection and Preprocessing (Python):
The dataset comes from the OFAC SDN List2, published in XML format. Python was used to extract vessel‑related records, select relevant attributes, retrieve associated sanction programs, and save the results as a structured CSV file. The process included isolating vessel entries, matching sanction programs and sanction dates, and performing data cleaning and field standardization. All data was collected on 25 December 2025.
The full preprocessing workflow and the Streamlit dashboard used for initial exploration are available on GitHub.
A set of exploratory visualizations was created to understand how vessel‑related sanctions evolved over time. These plots highlight both overall trends and program‑specific dynamics, providing context for the unsupervised learning methods applied later.
Note: A threshold of 25 sanctioned vessels per year was applied when constructing this plot.
Key observations:
Overall growth: The number of sanctioned vessels increases substantially over the period, with especially sharp rises after 2018 and again in 2023–2026.
Periods of escalation: Several years show pronounced spikes, suggesting shifts in enforcement intensity or geopolitical events.
Program‑level variation: Different sanctions programs dominate different years. For example, some years are driven primarily by IRAN‑related programs, while others show strong activity from RUSSIA‑EO14024, or UKRAINE‑EO13662.
These temporal patterns help frame the unsupervised analysis by showing that sanctions activity is not uniform across time or programs. The next sections focus on uncovering structural patterns within the vessel attributes themselves using association rules, PCA, and clustering.
Dataset: Load and Prepare Data in R
After preprocessing in Python, the cleaned CSV file was imported into R for modeling.
vessels <- read_csv("vessels_extracted_cleaned.csv")
For the unsupervised analysis, the dataset was reduced to variables relevant for pattern discovery. Missing categorical values were recoded as “Unknown” to retain all observations without introducing deletion bias.
vessels_clean <- vessels %>%
select(VESSEL_TYPE, FLAG, PROGRAMS, SANCTION_DATE) %>%
mutate(
SANCTION_YEAR = year(as.Date(SANCTION_DATE))
) %>%
select(-SANCTION_DATE) %>%
mutate(across(where(is.character), ~replace_na(.x, "Unknown")))
With the dataset prepared, the analysis proceeds to Association Rule Mining to identify frequent co‑occurring vessel attributes.
To identify frequently co-occurring vessel characteristics, Association Rule Mining3 was applied. Each vessel is treated as a transaction containing categorical attributes such as vessel type, flag, sanction program, and the year of sanctioning. Unlike clustering, which groups vessels as a whole, association rules focus on local dependency patterns, answering questions such as:
Which vessel types are commonly associated with specific sanction programs?
Do certain flags repeatedly appear with particular programs in specific time periods?
vessels_rules <- vessels_clean %>%
separate_rows(PROGRAMS, sep = ",\\s*") %>%
mutate(
PROGRAMS = paste0("PROGRAM_", PROGRAMS),
YEAR_GROUP = case_when(
SANCTION_YEAR <= 2010 ~ "Before_2010",
SANCTION_YEAR <= 2015 ~ "2011_2015",
SANCTION_YEAR <= 2020 ~ "2016_2020",
TRUE ~ "After_2020"
)
) %>%
select(VESSEL_TYPE, FLAG, PROGRAMS, YEAR_GROUP)
Sanction programs are split into separate rows to ensure that vessels sanctioned under multiple programs contribute to each program‑specific rule. A categorical time variable (YEAR_GROUP) is created to capture temporal patterns in sanctions. Only the attributes relevant for rule mining are retained.
vessels_rules <- vessels_rules %>% mutate(across(everything(), as.factor))
transactions <- as(vessels_rules, "transactions")
All variables are converted to factors and transformed into a transaction format required by the Apriori algorithm. Each vessel becomes a “basket” of categorical attributes.
The Apriori algorithm4 is run with a minimum support of 1.5% and confidence of 70%, ensuring that only meaningful and reliable patterns are retained. Rules are filtered to include only those predicting sanction programs, and redundant rules are removed to avoid duplicated information. A lift threshold of 2 ensures that only associations substantially stronger than chance remain.
rules <- apriori(transactions, parameter = list(supp = 0.015, conf = 0.7, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.015 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 28
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[126 item(s), 1883 transaction(s)] done [0.00s].
## sorting and recoding items ... [40 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [181 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules <- subset(rules, rhs %pin% "PROGRAM_")
rules <- rules[!is.redundant(rules)]
rules <- subset(rules, lift > 2)
combo_rules <- subset(rules, lhs %pin% "FLAG=" & lhs %pin% "VESSEL_TYPE=" & rhs %pin% "PROGRAM_")
Rules are grouped into interpretable categories:
Flag → Program
Vessel Type → Program
Time Period → Program
Flag + Type → Program
This structure supports clearer interpretation and comparison across attribute types.
flag_program_rules <- subset( rules, lhs %pin% "FLAG=" & rhs %pin% "PROGRAM_")
type_program_rules <- subset( rules, lhs %pin% "VESSEL_TYPE=" & rhs %pin% "PROGRAM_")
time_program_rules <- subset( rules, lhs %pin% "YEAR_GROUP=" & rhs %pin% "PROGRAM_")
combo_rules <- subset( rules, lhs %pin% "FLAG=" & lhs %pin% "VESSEL_TYPE=" & rhs %pin% "PROGRAM_")
The top‑lift rules (5) highlight the strongest associations.
| LHS | RHS | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|---|
| {FLAG=Venezuela} | {PROGRAMS=PROGRAM_VENEZUELA-EO13850} | 0.0175252 | 0.8684211 | 0.0201806 | 30.853525 | 33 |
| {YEAR_GROUP=2011_2015} | {PROGRAMS=PROGRAM_IRAN} | 0.0323951 | 0.7349398 | 0.0440786 | 7.361125 | 61 |
| {VESSEL_TYPE=Fishing Vessel} | {PROGRAMS=PROGRAM_GLOMAG} | 0.0785980 | 0.9932886 | 0.0791290 | 11.913136 | 148 |
| {FLAG=China} | {PROGRAMS=PROGRAM_GLOMAG} | 0.0833776 | 0.9936709 | 0.0839087 | 11.917722 | 157 |
| {FLAG=Russia} | {PROGRAMS=PROGRAM_RUSSIA-EO14024} | 0.1237387 | 0.7191358 | 0.1720659 | 3.002512 | 233 |
Examples include:
Venezuelan‑flagged vessels → Venezuela EO13850 program
Chinese‑flagged or fishing vessels → GLOMAG program
Russian‑flagged vessels → EO14024 program
High lift values indicate that these combinations occur far more often than expected by chance.
The highest‑confidence rules (5) show cases where the sanction program is almost guaranteed given the vessel attributes.
| LHS | RHS | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|---|
| {VESSEL_TYPE=Tug,FLAG=Venezuela} | {PROGRAMS=PROGRAM_VENEZUELA-EO13850} | 0.0159320 | 1.0000000 | 0.0159320 | 35.528302 | 30 |
| {VESSEL_TYPE=Fishing Vessel,FLAG=China} | {PROGRAMS=PROGRAM_GLOMAG} | 0.0785980 | 1.0000000 | 0.0785980 | 11.993631 | 148 |
| {VESSEL_TYPE=Fishing Vessel,YEAR_GROUP=After_2020} | {PROGRAMS=PROGRAM_GLOMAG} | 0.0785980 | 1.0000000 | 0.0785980 | 11.993631 | 148 |
| {VESSEL_TYPE=General Cargo,FLAG=Russia} | {PROGRAMS=PROGRAM_RUSSIA-EO14024} | 0.0477961 | 1.0000000 | 0.0477961 | 4.175166 | 90 |
| {FLAG=China} | {PROGRAMS=PROGRAM_GLOMAG} | 0.0833776 | 0.9936709 | 0.0839087 | 11.917722 | 157 |
For example (confidence = 1.0):
Tug + Venezuela flag → EO13850
Fishing Vessel + China flag → GLOMAG
General Cargo + Russia flag → RUSSIA-EO14024
Representative rules from each category illustrate the diversity of associations:
| LHS | RHS | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|---|
| {FLAG=Venezuela} | {PROGRAMS=PROGRAM_VENEZUELA-EO13850} | 0.0175252 | 0.8684211 | 0.0201806 | 30.853525 | 33 |
| {FLAG=China} | {PROGRAMS=PROGRAM_GLOMAG} | 0.0833776 | 0.9936709 | 0.0839087 | 11.917722 | 157 |
| {FLAG=Russia} | {PROGRAMS=PROGRAM_RUSSIA-EO14024} | 0.1237387 | 0.7191358 | 0.1720659 | 3.002512 | 233 |
| {VESSEL_TYPE=Fishing Vessel} | {PROGRAMS=PROGRAM_GLOMAG} | 0.0785980 | 0.9932886 | 0.0791290 | 11.913136 | 148 |
| {VESSEL_TYPE=Tug,FLAG=Venezuela} | {PROGRAMS=PROGRAM_VENEZUELA-EO13850} | 0.0159320 | 1.0000000 | 0.0159320 | 35.528302 | 30 |
| {YEAR_GROUP=2011_2015} | {PROGRAMS=PROGRAM_IRAN} | 0.0323951 | 0.7349398 | 0.0440786 | 7.361125 | 61 |
| {VESSEL_TYPE=Unknown,FLAG=Democratic People’s Republic of Korea} | {PROGRAMS=PROGRAM_DPRK4} | 0.0238980 | 0.8181818 | 0.0292087 | 23.702098 | 45 |
Flag-based: Venezuela strongly predicts EO13850
Type-based: Fishing vessels strongly predict GLOMAG
Time-based: 2011–2015 sanctions often correspond to IRAN programs
Combined attributes: DPRK + Unknown vessel type strongly predicts DPRK4
A curated subset of rules is assembled for visualization. This selection balances: high‑lift rules, high‑confidence rules, diverse attribute categories
The graph visualization reveals how specific vessel attributes form clusters around distinct sanction programs. Clear hubs emerge—such as GLOMAG, EO13850, IRAN, and DPRK4—each linked to characteristic combinations of flags, vessel types, or temporal categories. This network structure provides an intuitive summary of the strongest co‑occurrence patterns identified by the association rules.
While association rule mining reveals localized co‑occurrence patterns among vessel attributes, it does not capture broader similarity relationships across the full multidimensional feature space. To explore these global patterns and identify groups of vessels with comparable profiles, the analysis now turns to dimensionality reduction and clustering. Because clustering algorithms operate on numerical feature representations, the next step involves encoding the categorical variables and preparing the data for cluster selection.
One-hot encoding5 converts categorical attributes into binary indicators, creating a sparse feature matrix. Scaling ensures that all variables contribute equally to distance calculations used in PCA and clustering.
vessels_encoded <- vessels_clean %>%
mutate(across(everything(), as.factor)) %>%
model.matrix(~ . -1, data = .) %>%
as.data.frame()
vessels_scaled <- scale(vessels_encoded)
Elbow Method6
The curve drops quickly up to about k = 5. After that, it flattens out, meaning that adding more clusters doesn’t improve things much.
Silhouette Score7
The highest silhouette score is at k=2. This means the data points are most well‑separated into 2 groups.
Calinski–Harabasz Index8
Peaks at k=2, then decreases as k increases. This supports the silhouette result, suggesting 2 clusters give the best separation.
Although both the Silhouette and Calinski–Harabasz indices indicate that a two-cluster solution provides the strongest statistical separation, this configuration results in groups that are too broad to capture important differences within the data. Selecting four clusters offers a better balance between statistical quality and interpretability, allowing distinct geopolitical and operational vessel patterns to emerge without unnecessarily fragmenting the dataset.
With the optimal number of clusters identified, dimensionality reduction is applied to improve clustering performance and interpretability in the high‑dimensional encoded feature space.
PCA9 is used to address the high dimensionality introduced by one-hot encoding of categorical variables.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.05869 1.95733 1.83036 1.80175 1.77623 1.72621 1.64736
## Proportion of Variance 0.02903 0.02624 0.02295 0.02224 0.02161 0.02041 0.01859
## Cumulative Proportion 0.02903 0.05527 0.07822 0.10045 0.12206 0.14247 0.16106
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 1.5845 1.55155 1.4899 1.4598 1.44341 1.38208 1.37005
## Proportion of Variance 0.0172 0.01649 0.0152 0.0146 0.01427 0.01308 0.01286
## Cumulative Proportion 0.1782 0.19474 0.2099 0.2245 0.23881 0.25190 0.26475
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 1.34943 1.31341 1.27750 1.25600 1.24606 1.24579 1.24241
## Proportion of Variance 0.01247 0.01182 0.01118 0.01081 0.01063 0.01063 0.01057
## Cumulative Proportion 0.27723 0.28904 0.30022 0.31103 0.32166 0.33229 0.34286
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 1.22711 1.21575 1.19990 1.18486 1.18099 1.16490 1.15711
## Proportion of Variance 0.01031 0.01012 0.00986 0.00962 0.00955 0.00929 0.00917
## Cumulative Proportion 0.35318 0.36330 0.37316 0.38278 0.39233 0.40162 0.41079
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 1.14514 1.13264 1.12836 1.11898 1.11102 1.10133 1.09563
## Proportion of Variance 0.00898 0.00879 0.00872 0.00858 0.00845 0.00831 0.00822
## Cumulative Proportion 0.41978 0.42856 0.43728 0.44586 0.45431 0.46262 0.47084
## PC36 PC37 PC38 PC39 PC40 PC41 PC42
## Standard deviation 1.09160 1.08828 1.08350 1.07475 1.07075 1.06244 1.05525
## Proportion of Variance 0.00816 0.00811 0.00804 0.00791 0.00785 0.00773 0.00763
## Cumulative Proportion 0.47901 0.48712 0.49516 0.50307 0.51092 0.51865 0.52628
## PC43 PC44 PC45 PC46 PC47 PC48 PC49
## Standard deviation 1.04990 1.04351 1.03853 1.0324 1.02643 1.0253 1.02191
## Proportion of Variance 0.00755 0.00746 0.00739 0.0073 0.00722 0.0072 0.00715
## Cumulative Proportion 0.53383 0.54129 0.54868 0.5560 0.56319 0.5704 0.57755
## PC50 PC51 PC52 PC53 PC54 PC55 PC56
## Standard deviation 1.01749 1.01479 1.01363 1.0109 1.01035 1.00828 1.00681
## Proportion of Variance 0.00709 0.00705 0.00704 0.0070 0.00699 0.00696 0.00694
## Cumulative Proportion 0.58464 0.59169 0.59873 0.6057 0.61272 0.61968 0.62663
## PC57 PC58 PC59 PC60 PC61 PC62 PC63
## Standard deviation 1.00526 1.00410 1.0039 1.00291 1.00257 1.00218 1.00183
## Proportion of Variance 0.00692 0.00691 0.0069 0.00689 0.00688 0.00688 0.00687
## Cumulative Proportion 0.63355 0.64045 0.6474 0.65425 0.66113 0.66801 0.67488
## PC64 PC65 PC66 PC67 PC68 PC69 PC70
## Standard deviation 1.00163 1.00155 1.00131 1.00127 1.00121 1.00104 1.00101
## Proportion of Variance 0.00687 0.00687 0.00687 0.00687 0.00687 0.00686 0.00686
## Cumulative Proportion 0.68176 0.68863 0.69549 0.70236 0.70923 0.71609 0.72295
## PC71 PC72 PC73 PC74 PC75 PC76 PC77
## Standard deviation 1.00092 1.00089 1.00084 1.00079 1.00073 1.00071 1.00070
## Proportion of Variance 0.00686 0.00686 0.00686 0.00686 0.00686 0.00686 0.00686
## Cumulative Proportion 0.72981 0.73668 0.74354 0.75040 0.75726 0.76412 0.77097
## PC78 PC79 PC80 PC81 PC82 PC83 PC84
## Standard deviation 1.00051 1.00046 1.00040 1.00035 1.00035 0.99415 0.98853
## Proportion of Variance 0.00686 0.00686 0.00685 0.00685 0.00685 0.00677 0.00669
## Cumulative Proportion 0.77783 0.78469 0.79154 0.79840 0.80525 0.81202 0.81871
## PC85 PC86 PC87 PC88 PC89 PC90 PC91
## Standard deviation 0.98583 0.9816 0.9745 0.97111 0.96398 0.95293 0.94513
## Proportion of Variance 0.00666 0.0066 0.0065 0.00646 0.00636 0.00622 0.00612
## Cumulative Proportion 0.82537 0.8320 0.8385 0.84493 0.85130 0.85752 0.86363
## PC92 PC93 PC94 PC95 PC96 PC97 PC98
## Standard deviation 0.92738 0.91071 0.90506 0.89924 0.89152 0.88504 0.88450
## Proportion of Variance 0.00589 0.00568 0.00561 0.00554 0.00544 0.00537 0.00536
## Cumulative Proportion 0.86953 0.87521 0.88082 0.88636 0.89180 0.89716 0.90252
## PC99 PC100 PC101 PC102 PC103 PC104 PC105
## Standard deviation 0.87714 0.87341 0.85727 0.84918 0.83226 0.82086 0.8193
## Proportion of Variance 0.00527 0.00522 0.00503 0.00494 0.00474 0.00462 0.0046
## Cumulative Proportion 0.90779 0.91302 0.91805 0.92299 0.92773 0.93235 0.9369
## PC106 PC107 PC108 PC109 PC110 PC111 PC112
## Standard deviation 0.80907 0.79867 0.79327 0.77180 0.75717 0.72637 0.71280
## Proportion of Variance 0.00448 0.00437 0.00431 0.00408 0.00393 0.00361 0.00348
## Cumulative Proportion 0.94143 0.94580 0.95011 0.95419 0.95812 0.96173 0.96521
## PC113 PC114 PC115 PC116 PC117 PC118 PC119
## Standard deviation 0.69958 0.68536 0.66930 0.65163 0.64761 0.60016 0.58629
## Proportion of Variance 0.00335 0.00322 0.00307 0.00291 0.00287 0.00247 0.00235
## Cumulative Proportion 0.96856 0.97178 0.97485 0.97776 0.98063 0.98310 0.98545
## PC120 PC121 PC122 PC123 PC124 PC125 PC126
## Standard deviation 0.55088 0.47345 0.46243 0.44151 0.41643 0.40990 0.35608
## Proportion of Variance 0.00208 0.00154 0.00146 0.00134 0.00119 0.00115 0.00087
## Cumulative Proportion 0.98753 0.98906 0.99053 0.99186 0.99305 0.99420 0.99507
## PC127 PC128 PC129 PC130 PC131 PC132 PC133
## Standard deviation 0.33584 0.32380 0.31639 0.2963 0.27161 0.26007 0.23645
## Proportion of Variance 0.00077 0.00072 0.00069 0.0006 0.00051 0.00046 0.00038
## Cumulative Proportion 0.99584 0.99656 0.99725 0.9979 0.99835 0.99882 0.99920
## PC134 PC135 PC136 PC137 PC138 PC139 PC140
## Standard deviation 0.2103 0.19130 0.12760 0.10035 0.06931 0.05603 0.03750
## Proportion of Variance 0.0003 0.00025 0.00011 0.00007 0.00003 0.00002 0.00001
## Cumulative Proportion 0.9995 0.99975 0.99986 0.99993 0.99997 0.99999 1.00000
## PC141 PC142 PC143 PC144 PC145
## Standard deviation 0.01859 1.931e-14 1.794e-14 1.335e-14 6.296e-15
## Proportion of Variance 0.00000 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.00000 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC146
## Standard deviation 9.142e-16
## Proportion of Variance 0.000e+00
## Cumulative Proportion 1.000e+00
The first 10 PCA components capture approximately 21% of the total variance. This relatively low variance per component is expected due to the high-dimensional, sparse one-hot encoding of categorical variables. Retaining these components balances dimensionality reduction with preserving meaningful structure for clustering.
The explained variance of individual principal components is relatively low, which is expected due to the high-dimensional and sparse nature of one-hot encoded categorical variables. In such settings, variance is distributed across many components rather than concentrated in a few. Therefore, PCA was evaluated based on cumulative explained variance and clustering performance rather than individual component dominance.
Figure below presents the k-means clustering10 results projected onto the PCA-reduced space, allowing for visual evaluation of cluster cohesion and overlap.
K-means clustering with 4 centers was applied on the PCA-reduced data, visualized the results, and summarized each cluster by its dominant flag, vessel type, and program. This links the statistical output to meaningful domain categories, making the clusters interpretable.
| Cluster | Dominant_Flag | Dominant_Vessel_Type | Dominant_Program | Count |
|---|---|---|---|---|
| 1 | Panama | Crude Oil Tanker | IRAN-EO13902 | 666 |
| 2 | Venezuela | Tug | VENEZUELA-EO13850 | 50 |
| 3 | China | Fishing Vessel | GLOMAG | 157 |
| 4 | Russia | General Cargo | RUSSIA-EO14024 | 542 |
Together, these clusters reveal four distinct geopolitical–operational vessel profiles, each aligned with specific sanction programs and maritime behaviors.
Cluster 1: A large cluster of Panama‑flagged crude oil tankers linked to Iran sanctions, representing a major structural pattern in the dataset.
Cluster 2: A smaller, niche cluster dominated by Venezuelan tugs, likely tied to localized operations and sanctions.
Cluster 3:A distinct group of Chinese fishing vessels under the Global Magnitsky program, clearly separated from cargo and tanker vessels.
Cluster 4:A large cluster of Russian general cargo vessels tied to Russia‑related sanctions, indicating a well-defined and internally consistent cluster.
This study combines exploratory analysis, association rule mining, and clustering to provide a multi-layered view of maritime sanctions. Each method contributes distinct insights into the structure of the data, and together they reveal consistent patterns linking vessel characteristics to sanction programs and time periods.
The exploratory analysis highlights substantial growth in vessel-related sanctions over time, with pronounced spikes corresponding to periods of intensified enforcement. These temporal dynamics suggest that sanctioning activity is shaped by geopolitical developments and evolving regulatory priorities. However, temporal trends alone do not explain how vessel attributes relate to specific sanction programs.
Association rule mining addresses this gap. It uncovers frequent co-occurrence patterns among vessel type, flag, sanction program, and time period. The resulting rules reveal strong and intuitive associations, such as fishing vessels flagged to China being linked to Global Magnitsky sanctions, or Venezuelan-flagged tugs appearing under Program EO13850. High lift and confidence values indicate that these patterns occur far more frequently than would be expected by chance, reinforcing their structural significance. Importantly, this approach also accommodates vessels sanctioned under multiple programs, capturing overlapping regulatory classifications that would be obscured by single-label methods.
While association rules focus on localized relationships between specific attributes, clustering methods provide a complementary global perspective. By encoding categorical variables and applying PCA, the analysis reduces dimensionality while preserving the dominant variance structure. K-means clustering on the PCA-reduced data identifies four distinct vessel groups characterized by coherent combinations of flag, vessel type, and sanction program. These clusters correspond to recognizable geopolitical and operational profiles, such as Panama-flagged crude oil tankers linked to Iran sanctions or Russian general cargo vessels associated with Russia-related programs.
Taken together, the results demonstrate that unsupervised learning methods can uncover meaningful structure in complex regulatory datasets. Association rules highlight precise regulatory linkages, while clustering reveals broader vessel typologies. The consistency between these methods strengthens confidence in the findings and illustrates the value of applying multiple unsupervised techniques within a single analytical framework.
This study has several limitations: First, the analysis relies on publicly available sanctions data, which may reflect enforcement priorities rather than the full universe of sanctionable activity. Second, some vessels are associated with multiple sanction programs; while this structure is explicitly modeled in the association rule analysis, clustering results summarize dominant attributes and may underrepresent overlapping regulatory designations. Third, clustering outcomes depend on methodological choices such as distance metrics, the number of retained principal components, and the selected number of clusters. Finally, the unsupervised nature of the methods limits causal interpretation; identified patterns should be understood as descriptive associations rather than explanations of sanctioning decisions.
This study applies unsupervised learning techniques to explore patterns in sanctioned maritime vessels using publicly available OFAC data. By integrating exploratory temporal analysis, association rule mining, dimensionality reduction, and clustering, the analysis reveals both localized relationships and broader structural groupings within the data.
The results show that vessel sanctions are not randomly distributed but instead reflect systematic associations between vessel characteristics, sanction programs, and time periods. Association rule mining identifies strong co-occurrence patterns, while clustering uncovers distinct vessel profiles aligned with geopolitical and operational contexts. These findings demonstrate that unsupervised methods are well suited for analyzing complex, high-dimensional regulatory datasets where predefined labels are incomplete or overlapping.
Beyond the specific case of maritime sanctions, this project illustrates how combining multiple unsupervised learning techniques can enhance interpretability and insight in applied economic and policy-oriented data analysis. The approach can be extended to other sanction domains or regulatory datasets to support exploratory research, risk assessment, and monitoring efforts. Overall, the study highlights the practical value of unsupervised learning as a tool for uncovering hidden structure in real-world data.
**** Unsupervised Learning course materials by Jacek Lewkowicz.University of Warsaw, Faculty of Economic Science
https://www.geeksforgeeks.org/r-language/association-rule-mining-in-r-programming/↩︎
https://www.geeksforgeeks.org/r-language/apriori-algorithm-in-r-programming/↩︎
https://www.geeksforgeeks.org/machine-learning/ml-one-hot-encoding/↩︎
https://en.wikipedia.org/wiki/Calinski%E2%80%93Harabasz_index↩︎
https://en.wikipedia.org/wiki/Principal_component_analysis↩︎
https://en.wikipedia.org/wiki/K-means_clustering#Applications↩︎