Introduction

International sanctions increasingly target maritime transportation, leading to a growing number of vessels added to sanctions lists1. These vessels differ widely in flag, type, and sanction program, creating a complex, high‑dimensional dataset that is difficult to interpret using simple descriptive methods.

Unsupervised learning offers a way to uncover structure in this complexity by identifying patterns without predefined labels. This study applies association rule mining, PCA, and clustering to explore how vessel characteristics relate to sanction programs and whether meaningful groups emerge.

The analysis addresses three research questions:

  1. How have vessel sanctions evolved over time?

  2. Are there recurring associations between vessel types, flags, and sanction programs?

  3. Can unsupervised learning methods identify meaningful groups of sanctioned vessels?

By addressing these questions, the study provides an exploratory, data‑driven perspective on maritime sanctions and demonstrates how unsupervised learning can reveal hidden structure in complex regulatory datasets.

Methodology

This study applies a sequence of unsupervised learning techniques to explore structural patterns in sanctioned maritime vessels. The methodological pipeline integrates data collection, preprocessing, association rule mining, dimensionality reduction, and clustering. Each step is designed to uncover a different layer of structure in the dataset, moving from categorical co‑occurrence patterns to broader vessel groupings.

Libraries used:

library(tidyverse)
library(lubridate)
library(fastDummies)
library(factoextra)
library(clusterCrit)
library(cluster)
library(arules)
library(arulesViz)

Data Collection and Preprocessing (Python):

The dataset comes from the OFAC SDN List2, published in XML format. Python was used to extract vessel‑related records, select relevant attributes, retrieve associated sanction programs, and save the results as a structured CSV file. The process included isolating vessel entries, matching sanction programs and sanction dates, and performing data cleaning and field standardization. All data was collected on 25 December 2025.

The full preprocessing workflow and the Streamlit dashboard used for initial exploration are available on GitHub.

Exploratory Analysis

A set of exploratory visualizations was created to understand how vessel‑related sanctions evolved over time. These plots highlight both overall trends and program‑specific dynamics, providing context for the unsupervised learning methods applied later.

Note: A threshold of 25 sanctioned vessels per year was applied when constructing this plot.

Key observations:

  • Overall growth: The number of sanctioned vessels increases substantially over the period, with especially sharp rises after 2018 and again in 2023–2026.

  • Periods of escalation: Several years show pronounced spikes, suggesting shifts in enforcement intensity or geopolitical events.

  • Program‑level variation: Different sanctions programs dominate different years. For example, some years are driven primarily by IRAN‑related programs, while others show strong activity from RUSSIA‑EO14024, or UKRAINE‑EO13662.

These temporal patterns help frame the unsupervised analysis by showing that sanctions activity is not uniform across time or programs. The next sections focus on uncovering structural patterns within the vessel attributes themselves using association rules, PCA, and clustering.

Unsupervised Learning Analysis

Dataset: Load and Prepare Data in R

After preprocessing in Python, the cleaned CSV file was imported into R for modeling.

vessels <- read_csv("vessels_extracted_cleaned.csv")

For the unsupervised analysis, the dataset was reduced to variables relevant for pattern discovery. Missing categorical values were recoded as “Unknown” to retain all observations without introducing deletion bias.

vessels_clean <- vessels %>%
  select(VESSEL_TYPE, FLAG, PROGRAMS, SANCTION_DATE) %>%
  mutate(
    SANCTION_YEAR = year(as.Date(SANCTION_DATE))
  ) %>%
  select(-SANCTION_DATE) %>%
  mutate(across(where(is.character), ~replace_na(.x, "Unknown")))

With the dataset prepared, the analysis proceeds to Association Rule Mining to identify frequent co‑occurring vessel attributes.

Association Rule Mining

To identify frequently co-occurring vessel characteristics, Association Rule Mining3 was applied. Each vessel is treated as a transaction containing categorical attributes such as vessel type, flag, sanction program, and the year of sanctioning. Unlike clustering, which groups vessels as a whole, association rules focus on local dependency patterns, answering questions such as:

  • Which vessel types are commonly associated with specific sanction programs?

  • Do certain flags repeatedly appear with particular programs in specific time periods?

Data Preparation

vessels_rules <- vessels_clean %>%
  separate_rows(PROGRAMS, sep = ",\\s*") %>%
  mutate(
    PROGRAMS = paste0("PROGRAM_", PROGRAMS),
    YEAR_GROUP = case_when(
      SANCTION_YEAR <= 2010 ~ "Before_2010",
      SANCTION_YEAR <= 2015 ~ "2011_2015",
      SANCTION_YEAR <= 2020 ~ "2016_2020",
      TRUE ~ "After_2020"
    )
  ) %>%
  select(VESSEL_TYPE, FLAG, PROGRAMS, YEAR_GROUP)

Sanction programs are split into separate rows to ensure that vessels sanctioned under multiple programs contribute to each program‑specific rule. A categorical time variable (YEAR_GROUP) is created to capture temporal patterns in sanctions. Only the attributes relevant for rule mining are retained.

vessels_rules <- vessels_rules %>% mutate(across(everything(), as.factor))
transactions <- as(vessels_rules, "transactions")

All variables are converted to factors and transformed into a transaction format required by the Apriori algorithm. Each vessel becomes a “basket” of categorical attributes.

Apriori Algorithm

The Apriori algorithm4 is run with a minimum support of 1.5% and confidence of 70%, ensuring that only meaningful and reliable patterns are retained. Rules are filtered to include only those predicting sanction programs, and redundant rules are removed to avoid duplicated information. A lift threshold of 2 ensures that only associations substantially stronger than chance remain.

rules <- apriori(transactions, parameter = list(supp = 0.015, conf = 0.7, minlen = 2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5   0.015      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 28 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[126 item(s), 1883 transaction(s)] done [0.00s].
## sorting and recoding items ... [40 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [181 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules <- subset(rules, rhs %pin% "PROGRAM_") 
rules <- rules[!is.redundant(rules)] 
rules <- subset(rules, lift > 2) 
combo_rules <- subset(rules, lhs %pin% "FLAG=" & lhs %pin% "VESSEL_TYPE=" & rhs %pin% "PROGRAM_")

Rules are grouped into interpretable categories:

Flag → Program

Vessel Type → Program

Time Period → Program

Flag + Type → Program

This structure supports clearer interpretation and comparison across attribute types.

flag_program_rules <- subset( rules, lhs %pin% "FLAG=" & rhs %pin% "PROGRAM_") 
type_program_rules <- subset( rules, lhs %pin% "VESSEL_TYPE=" & rhs %pin% "PROGRAM_") 
time_program_rules <- subset( rules, lhs %pin% "YEAR_GROUP=" & rhs %pin% "PROGRAM_") 
combo_rules <- subset( rules, lhs %pin% "FLAG=" & lhs %pin% "VESSEL_TYPE=" & rhs %pin% "PROGRAM_")
Results and Interpretation

The top‑lift rules (5) highlight the strongest associations.

Top 5 Association Rules by Lift
LHS RHS support confidence coverage lift count
{FLAG=Venezuela} {PROGRAMS=PROGRAM_VENEZUELA-EO13850} 0.0175252 0.8684211 0.0201806 30.853525 33
{YEAR_GROUP=2011_2015} {PROGRAMS=PROGRAM_IRAN} 0.0323951 0.7349398 0.0440786 7.361125 61
{VESSEL_TYPE=Fishing Vessel} {PROGRAMS=PROGRAM_GLOMAG} 0.0785980 0.9932886 0.0791290 11.913136 148
{FLAG=China} {PROGRAMS=PROGRAM_GLOMAG} 0.0833776 0.9936709 0.0839087 11.917722 157
{FLAG=Russia} {PROGRAMS=PROGRAM_RUSSIA-EO14024} 0.1237387 0.7191358 0.1720659 3.002512 233

Examples include:

  • Venezuelan‑flagged vessels → Venezuela EO13850 program

  • Chinese‑flagged or fishing vessels → GLOMAG program

  • Russian‑flagged vessels → EO14024 program

High lift values indicate that these combinations occur far more often than expected by chance.

The highest‑confidence rules (5) show cases where the sanction program is almost guaranteed given the vessel attributes.

Top 5 Association Rules by Confidence
LHS RHS support confidence coverage lift count
{VESSEL_TYPE=Tug,FLAG=Venezuela} {PROGRAMS=PROGRAM_VENEZUELA-EO13850} 0.0159320 1.0000000 0.0159320 35.528302 30
{VESSEL_TYPE=Fishing Vessel,FLAG=China} {PROGRAMS=PROGRAM_GLOMAG} 0.0785980 1.0000000 0.0785980 11.993631 148
{VESSEL_TYPE=Fishing Vessel,YEAR_GROUP=After_2020} {PROGRAMS=PROGRAM_GLOMAG} 0.0785980 1.0000000 0.0785980 11.993631 148
{VESSEL_TYPE=General Cargo,FLAG=Russia} {PROGRAMS=PROGRAM_RUSSIA-EO14024} 0.0477961 1.0000000 0.0477961 4.175166 90
{FLAG=China} {PROGRAMS=PROGRAM_GLOMAG} 0.0833776 0.9936709 0.0839087 11.917722 157

For example (confidence = 1.0):

  • Tug + Venezuela flag → EO13850

  • Fishing Vessel + China flag → GLOMAG

  • General Cargo + Russia flag → RUSSIA-EO14024

Representative rules from each category illustrate the diversity of associations:

Selected Association Rules Across Attribute Categories
LHS RHS support confidence coverage lift count
{FLAG=Venezuela} {PROGRAMS=PROGRAM_VENEZUELA-EO13850} 0.0175252 0.8684211 0.0201806 30.853525 33
{FLAG=China} {PROGRAMS=PROGRAM_GLOMAG} 0.0833776 0.9936709 0.0839087 11.917722 157
{FLAG=Russia} {PROGRAMS=PROGRAM_RUSSIA-EO14024} 0.1237387 0.7191358 0.1720659 3.002512 233
{VESSEL_TYPE=Fishing Vessel} {PROGRAMS=PROGRAM_GLOMAG} 0.0785980 0.9932886 0.0791290 11.913136 148
{VESSEL_TYPE=Tug,FLAG=Venezuela} {PROGRAMS=PROGRAM_VENEZUELA-EO13850} 0.0159320 1.0000000 0.0159320 35.528302 30
{YEAR_GROUP=2011_2015} {PROGRAMS=PROGRAM_IRAN} 0.0323951 0.7349398 0.0440786 7.361125 61
{VESSEL_TYPE=Unknown,FLAG=Democratic People’s Republic of Korea} {PROGRAMS=PROGRAM_DPRK4} 0.0238980 0.8181818 0.0292087 23.702098 45

Flag-based: Venezuela strongly predicts EO13850

Type-based: Fishing vessels strongly predict GLOMAG

Time-based: 2011–2015 sanctions often correspond to IRAN programs

Combined attributes: DPRK + Unknown vessel type strongly predicts DPRK4

A curated subset of rules is assembled for visualization. This selection balances: high‑lift rules, high‑confidence rules, diverse attribute categories

The graph visualization reveals how specific vessel attributes form clusters around distinct sanction programs. Clear hubs emerge—such as GLOMAG, EO13850, IRAN, and DPRK4—each linked to characteristic combinations of flags, vessel types, or temporal categories. This network structure provides an intuitive summary of the strongest co‑occurrence patterns identified by the association rules.

While association rule mining reveals localized co‑occurrence patterns among vessel attributes, it does not capture broader similarity relationships across the full multidimensional feature space. To explore these global patterns and identify groups of vessels with comparable profiles, the analysis now turns to dimensionality reduction and clustering. Because clustering algorithms operate on numerical feature representations, the next step involves encoding the categorical variables and preparing the data for cluster selection.

Data Encoding and Cluster Selection

Feature Encoding and Scaling

One-hot encoding5 converts categorical attributes into binary indicators, creating a sparse feature matrix. Scaling ensures that all variables contribute equally to distance calculations used in PCA and clustering.

vessels_encoded <- vessels_clean %>%
  mutate(across(everything(), as.factor)) %>%
  model.matrix(~ . -1, data = .) %>%
  as.data.frame()

vessels_scaled <- scale(vessels_encoded)

Determining the Number of Clusters

Elbow Method6

The curve drops quickly up to about k = 5. After that, it flattens out, meaning that adding more clusters doesn’t improve things much.

Silhouette Score7

The highest silhouette score is at k=2. This means the data points are most well‑separated into 2 groups.

Calinski–Harabasz Index8

Peaks at k=2, then decreases as k increases. This supports the silhouette result, suggesting 2 clusters give the best separation.

Although both the Silhouette and Calinski–Harabasz indices indicate that a two-cluster solution provides the strongest statistical separation, this configuration results in groups that are too broad to capture important differences within the data. Selecting four clusters offers a better balance between statistical quality and interpretability, allowing distinct geopolitical and operational vessel patterns to emerge without unnecessarily fragmenting the dataset.

With the optimal number of clusters identified, dimensionality reduction is applied to improve clustering performance and interpretability in the high‑dimensional encoded feature space.

Principal Component Analysis (PCA)

PCA9 is used to address the high dimensionality introduced by one-hot encoding of categorical variables.

## Importance of components:
##                            PC1     PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.05869 1.95733 1.83036 1.80175 1.77623 1.72621 1.64736
## Proportion of Variance 0.02903 0.02624 0.02295 0.02224 0.02161 0.02041 0.01859
## Cumulative Proportion  0.02903 0.05527 0.07822 0.10045 0.12206 0.14247 0.16106
##                           PC8     PC9   PC10   PC11    PC12    PC13    PC14
## Standard deviation     1.5845 1.55155 1.4899 1.4598 1.44341 1.38208 1.37005
## Proportion of Variance 0.0172 0.01649 0.0152 0.0146 0.01427 0.01308 0.01286
## Cumulative Proportion  0.1782 0.19474 0.2099 0.2245 0.23881 0.25190 0.26475
##                           PC15    PC16    PC17    PC18    PC19    PC20    PC21
## Standard deviation     1.34943 1.31341 1.27750 1.25600 1.24606 1.24579 1.24241
## Proportion of Variance 0.01247 0.01182 0.01118 0.01081 0.01063 0.01063 0.01057
## Cumulative Proportion  0.27723 0.28904 0.30022 0.31103 0.32166 0.33229 0.34286
##                           PC22    PC23    PC24    PC25    PC26    PC27    PC28
## Standard deviation     1.22711 1.21575 1.19990 1.18486 1.18099 1.16490 1.15711
## Proportion of Variance 0.01031 0.01012 0.00986 0.00962 0.00955 0.00929 0.00917
## Cumulative Proportion  0.35318 0.36330 0.37316 0.38278 0.39233 0.40162 0.41079
##                           PC29    PC30    PC31    PC32    PC33    PC34    PC35
## Standard deviation     1.14514 1.13264 1.12836 1.11898 1.11102 1.10133 1.09563
## Proportion of Variance 0.00898 0.00879 0.00872 0.00858 0.00845 0.00831 0.00822
## Cumulative Proportion  0.41978 0.42856 0.43728 0.44586 0.45431 0.46262 0.47084
##                           PC36    PC37    PC38    PC39    PC40    PC41    PC42
## Standard deviation     1.09160 1.08828 1.08350 1.07475 1.07075 1.06244 1.05525
## Proportion of Variance 0.00816 0.00811 0.00804 0.00791 0.00785 0.00773 0.00763
## Cumulative Proportion  0.47901 0.48712 0.49516 0.50307 0.51092 0.51865 0.52628
##                           PC43    PC44    PC45   PC46    PC47   PC48    PC49
## Standard deviation     1.04990 1.04351 1.03853 1.0324 1.02643 1.0253 1.02191
## Proportion of Variance 0.00755 0.00746 0.00739 0.0073 0.00722 0.0072 0.00715
## Cumulative Proportion  0.53383 0.54129 0.54868 0.5560 0.56319 0.5704 0.57755
##                           PC50    PC51    PC52   PC53    PC54    PC55    PC56
## Standard deviation     1.01749 1.01479 1.01363 1.0109 1.01035 1.00828 1.00681
## Proportion of Variance 0.00709 0.00705 0.00704 0.0070 0.00699 0.00696 0.00694
## Cumulative Proportion  0.58464 0.59169 0.59873 0.6057 0.61272 0.61968 0.62663
##                           PC57    PC58   PC59    PC60    PC61    PC62    PC63
## Standard deviation     1.00526 1.00410 1.0039 1.00291 1.00257 1.00218 1.00183
## Proportion of Variance 0.00692 0.00691 0.0069 0.00689 0.00688 0.00688 0.00687
## Cumulative Proportion  0.63355 0.64045 0.6474 0.65425 0.66113 0.66801 0.67488
##                           PC64    PC65    PC66    PC67    PC68    PC69    PC70
## Standard deviation     1.00163 1.00155 1.00131 1.00127 1.00121 1.00104 1.00101
## Proportion of Variance 0.00687 0.00687 0.00687 0.00687 0.00687 0.00686 0.00686
## Cumulative Proportion  0.68176 0.68863 0.69549 0.70236 0.70923 0.71609 0.72295
##                           PC71    PC72    PC73    PC74    PC75    PC76    PC77
## Standard deviation     1.00092 1.00089 1.00084 1.00079 1.00073 1.00071 1.00070
## Proportion of Variance 0.00686 0.00686 0.00686 0.00686 0.00686 0.00686 0.00686
## Cumulative Proportion  0.72981 0.73668 0.74354 0.75040 0.75726 0.76412 0.77097
##                           PC78    PC79    PC80    PC81    PC82    PC83    PC84
## Standard deviation     1.00051 1.00046 1.00040 1.00035 1.00035 0.99415 0.98853
## Proportion of Variance 0.00686 0.00686 0.00685 0.00685 0.00685 0.00677 0.00669
## Cumulative Proportion  0.77783 0.78469 0.79154 0.79840 0.80525 0.81202 0.81871
##                           PC85   PC86   PC87    PC88    PC89    PC90    PC91
## Standard deviation     0.98583 0.9816 0.9745 0.97111 0.96398 0.95293 0.94513
## Proportion of Variance 0.00666 0.0066 0.0065 0.00646 0.00636 0.00622 0.00612
## Cumulative Proportion  0.82537 0.8320 0.8385 0.84493 0.85130 0.85752 0.86363
##                           PC92    PC93    PC94    PC95    PC96    PC97    PC98
## Standard deviation     0.92738 0.91071 0.90506 0.89924 0.89152 0.88504 0.88450
## Proportion of Variance 0.00589 0.00568 0.00561 0.00554 0.00544 0.00537 0.00536
## Cumulative Proportion  0.86953 0.87521 0.88082 0.88636 0.89180 0.89716 0.90252
##                           PC99   PC100   PC101   PC102   PC103   PC104  PC105
## Standard deviation     0.87714 0.87341 0.85727 0.84918 0.83226 0.82086 0.8193
## Proportion of Variance 0.00527 0.00522 0.00503 0.00494 0.00474 0.00462 0.0046
## Cumulative Proportion  0.90779 0.91302 0.91805 0.92299 0.92773 0.93235 0.9369
##                          PC106   PC107   PC108   PC109   PC110   PC111   PC112
## Standard deviation     0.80907 0.79867 0.79327 0.77180 0.75717 0.72637 0.71280
## Proportion of Variance 0.00448 0.00437 0.00431 0.00408 0.00393 0.00361 0.00348
## Cumulative Proportion  0.94143 0.94580 0.95011 0.95419 0.95812 0.96173 0.96521
##                          PC113   PC114   PC115   PC116   PC117   PC118   PC119
## Standard deviation     0.69958 0.68536 0.66930 0.65163 0.64761 0.60016 0.58629
## Proportion of Variance 0.00335 0.00322 0.00307 0.00291 0.00287 0.00247 0.00235
## Cumulative Proportion  0.96856 0.97178 0.97485 0.97776 0.98063 0.98310 0.98545
##                          PC120   PC121   PC122   PC123   PC124   PC125   PC126
## Standard deviation     0.55088 0.47345 0.46243 0.44151 0.41643 0.40990 0.35608
## Proportion of Variance 0.00208 0.00154 0.00146 0.00134 0.00119 0.00115 0.00087
## Cumulative Proportion  0.98753 0.98906 0.99053 0.99186 0.99305 0.99420 0.99507
##                          PC127   PC128   PC129  PC130   PC131   PC132   PC133
## Standard deviation     0.33584 0.32380 0.31639 0.2963 0.27161 0.26007 0.23645
## Proportion of Variance 0.00077 0.00072 0.00069 0.0006 0.00051 0.00046 0.00038
## Cumulative Proportion  0.99584 0.99656 0.99725 0.9979 0.99835 0.99882 0.99920
##                         PC134   PC135   PC136   PC137   PC138   PC139   PC140
## Standard deviation     0.2103 0.19130 0.12760 0.10035 0.06931 0.05603 0.03750
## Proportion of Variance 0.0003 0.00025 0.00011 0.00007 0.00003 0.00002 0.00001
## Cumulative Proportion  0.9995 0.99975 0.99986 0.99993 0.99997 0.99999 1.00000
##                          PC141     PC142     PC143     PC144     PC145
## Standard deviation     0.01859 1.931e-14 1.794e-14 1.335e-14 6.296e-15
## Proportion of Variance 0.00000 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion  1.00000 1.000e+00 1.000e+00 1.000e+00 1.000e+00
##                            PC146
## Standard deviation     9.142e-16
## Proportion of Variance 0.000e+00
## Cumulative Proportion  1.000e+00

The first 10 PCA components capture approximately 21% of the total variance. This relatively low variance per component is expected due to the high-dimensional, sparse one-hot encoding of categorical variables. Retaining these components balances dimensionality reduction with preserving meaningful structure for clustering.

Results and Interpretation

The explained variance of individual principal components is relatively low, which is expected due to the high-dimensional and sparse nature of one-hot encoded categorical variables. In such settings, variance is distributed across many components rather than concentrated in a few. Therefore, PCA was evaluated based on cumulative explained variance and clustering performance rather than individual component dominance.

K-Means Clustering on PCA Components

Figure below presents the k-means clustering10 results projected onto the PCA-reduced space, allowing for visual evaluation of cluster cohesion and overlap.

Results and Cluster Interpretation

K-means clustering with 4 centers was applied on the PCA-reduced data, visualized the results, and summarized each cluster by its dominant flag, vessel type, and program. This links the statistical output to meaningful domain categories, making the clusters interpretable.

Cluster Sizes and Dominant Attributes
Cluster Dominant_Flag Dominant_Vessel_Type Dominant_Program Count
1 Panama Crude Oil Tanker IRAN-EO13902 666
2 Venezuela Tug VENEZUELA-EO13850 50
3 China Fishing Vessel GLOMAG 157
4 Russia General Cargo RUSSIA-EO14024 542

Together, these clusters reveal four distinct geopolitical–operational vessel profiles, each aligned with specific sanction programs and maritime behaviors.

  • Cluster 1: A large cluster of Panama‑flagged crude oil tankers linked to Iran sanctions, representing a major structural pattern in the dataset.

  • Cluster 2: A smaller, niche cluster dominated by Venezuelan tugs, likely tied to localized operations and sanctions.

  • Cluster 3:A distinct group of Chinese fishing vessels under the Global Magnitsky program, clearly separated from cargo and tanker vessels.

  • Cluster 4:A large cluster of Russian general cargo vessels tied to Russia‑related sanctions, indicating a well-defined and internally consistent cluster.

Discussion

This study combines exploratory analysis, association rule mining, and clustering to provide a multi-layered view of maritime sanctions. Each method contributes distinct insights into the structure of the data, and together they reveal consistent patterns linking vessel characteristics to sanction programs and time periods.

The exploratory analysis highlights substantial growth in vessel-related sanctions over time, with pronounced spikes corresponding to periods of intensified enforcement. These temporal dynamics suggest that sanctioning activity is shaped by geopolitical developments and evolving regulatory priorities. However, temporal trends alone do not explain how vessel attributes relate to specific sanction programs.

Association rule mining addresses this gap. It uncovers frequent co-occurrence patterns among vessel type, flag, sanction program, and time period. The resulting rules reveal strong and intuitive associations, such as fishing vessels flagged to China being linked to Global Magnitsky sanctions, or Venezuelan-flagged tugs appearing under Program EO13850. High lift and confidence values indicate that these patterns occur far more frequently than would be expected by chance, reinforcing their structural significance. Importantly, this approach also accommodates vessels sanctioned under multiple programs, capturing overlapping regulatory classifications that would be obscured by single-label methods.

While association rules focus on localized relationships between specific attributes, clustering methods provide a complementary global perspective. By encoding categorical variables and applying PCA, the analysis reduces dimensionality while preserving the dominant variance structure. K-means clustering on the PCA-reduced data identifies four distinct vessel groups characterized by coherent combinations of flag, vessel type, and sanction program. These clusters correspond to recognizable geopolitical and operational profiles, such as Panama-flagged crude oil tankers linked to Iran sanctions or Russian general cargo vessels associated with Russia-related programs.

Taken together, the results demonstrate that unsupervised learning methods can uncover meaningful structure in complex regulatory datasets. Association rules highlight precise regulatory linkages, while clustering reveals broader vessel typologies. The consistency between these methods strengthens confidence in the findings and illustrates the value of applying multiple unsupervised techniques within a single analytical framework.

Limitations

This study has several limitations: First, the analysis relies on publicly available sanctions data, which may reflect enforcement priorities rather than the full universe of sanctionable activity. Second, some vessels are associated with multiple sanction programs; while this structure is explicitly modeled in the association rule analysis, clustering results summarize dominant attributes and may underrepresent overlapping regulatory designations. Third, clustering outcomes depend on methodological choices such as distance metrics, the number of retained principal components, and the selected number of clusters. Finally, the unsupervised nature of the methods limits causal interpretation; identified patterns should be understood as descriptive associations rather than explanations of sanctioning decisions.

Conclusion

This study applies unsupervised learning techniques to explore patterns in sanctioned maritime vessels using publicly available OFAC data. By integrating exploratory temporal analysis, association rule mining, dimensionality reduction, and clustering, the analysis reveals both localized relationships and broader structural groupings within the data.

The results show that vessel sanctions are not randomly distributed but instead reflect systematic associations between vessel characteristics, sanction programs, and time periods. Association rule mining identifies strong co-occurrence patterns, while clustering uncovers distinct vessel profiles aligned with geopolitical and operational contexts. These findings demonstrate that unsupervised methods are well suited for analyzing complex, high-dimensional regulatory datasets where predefined labels are incomplete or overlapping.

Beyond the specific case of maritime sanctions, this project illustrates how combining multiple unsupervised learning techniques can enhance interpretability and insight in applied economic and policy-oriented data analysis. The approach can be extended to other sanction domains or regulatory datasets to support exploratory research, risk assessment, and monitoring efforts. Overall, the study highlights the practical value of unsupervised learning as a tool for uncovering hidden structure in real-world data.

References

**** Unsupervised Learning course materials by Jacek Lewkowicz.University of Warsaw, Faculty of Economic Science