Machine Learning for Peace:

Forecasting DoS Travel Advisories

DevLab@Penn

University of Pennsylvania

November 18, 2024

Mahda Soltani, Jeremy Springman, Erik Wibbels

Overview

Introducing Machine Learning for Peace

Data collection and processing
Digital tools

Forecasting DoS Travel Advisories

Objectives and data
Modeling and terminology
Performance

Introducing MLP Infrastructure

MLP Approach

How can data improve decision-making?

Awareness: data on what’s happening very recently
- Mass scraping online news + ML to track events
- Interactive data dashboard
Planning: predictive analytics for strategic decisions
- Forecasting political events
- Civic Space Early Warning System

High Quality Corpus

Input: Online news

400 news sources
40 languages
100+ million articles

Data quality

Focus on high-quality local sources (medium data)
Direct, human monitored scraping
Much better coverage than big data corpora

Output: Monthly data

60 countries
2012 - last month

Measuring Civic Space: Tunisia

Measuring Chinese Influence

Forecasting Events: Bangladesh

MLP’s Digital Tools

Forecasting High-Level Travel Advisories

Motivation & Takeaways

High-level advisories are costly
- Anticipating volume, location, timing of advisories can improve planning and resource allocation
Surprisingly strong performance:
- Similar to the best conflict forecasting projects
Limitations
- Limited sample of countries and time

Data Collection

Measuring high-level advisory onset

Scraped advisory data posted by DOS and archived on web.archive.org (2012-2023)
Beginning 2018: level 3 and 4 are considered “high-level”
Before 2018: all advisories are considered “high-level”
Identified start month of every high-level advisory (onset)

Forecast Target

Probability of high-level advisory onset for each country-month
- Onsets are rare (123 in 8680 months)
- Rare events are difficult to forecast

Forecast Predictors

Monthly data on domestic political events
- MLP data on 20 political events (arrests, legal action, raids, etc.)
Monthly data on local economic conditions
- TradingEconomics data on prices, growth, wages, etc.
Modeling variables
- Country-specific features
- COVID months

Modeling

Objective: detect patterns in historical data that will generalize to future data
- Train a model on historical chunks of data
- Test ability to predict onsets in later months
- Correct month; correct quarter
Measuring performance:
- Difference between prediction and reality
- Improvement over random guessing

Terminology & Trade-offs

True positive: onsets we predicted and actually did occur
False positive: onsets we predicted but did not occur
Identifying more true positives means tolerating more false positives

Performance 6 months out (quarter)

ROC-AUC = 0.9
AUPRC = 0.57 (random guessing = 0.02)

Forecasts for 2024

Our training data ends in December 2023; what does our model say about March and June 2024?

Highest probability country-months:
- 3-month: Zimbabwe, Liberia
- 6-month: Bangladesh, Zimbabwe
Bangladesh was the only true onset

What are the early warning signs?

Election activity, state of emergency, and disasters emerge as consistent, high-impact predictors

Takeaways & Questions

Would regular forecasts be useful to DoS?
- We could provide this on an ongoing basis (quarterly forecasts based on recent data)
Decision points
- Trade-off true positives and false positives?
- Most actionable forecast horizon?
- Temporal precision: specific months or 3-month window?

Appendix

Data Processing Pipeline

Machine learning to detect events and locations
Measure amount of activity for each country-month
Normalization to account for fluctuations in volume

Measuring Civic Space Activity

MLP vs Big Data Media Corpora

International media sources have limited, skewed coverage of events in developing countries
- Event datasets that rely on international language sources will have major biases
Accurate data collection from domestic outlets requires careful human curation
- GDELT, Common Crawl, Internet Archive have limited coverage and include major errors
MLP’s stable composition has advantages for forecasting

Importance of Domestic News

Challenges of Big Data Scraping

Importance of Domestic News

Lexis Nexis vs MLP

Countries where LN has zero local sources: 6/56

Albania, Belarus, Kosovo, Jamaica, Angola, South Sudan

Comparing LN on other metrics:

Fewer languages: 17 vs 34
Slightly more local sources per country: 5.5 vs 5 (median; excluding MLP’s regional sources)
Shorter, more sporadic coverage over time

Importance of Domestic News

Scraping Case Study

Bangladesh: easiest case for automated scrapers

Massive volume
Good web architecture
2/5 sources are in English

Scraping Domestic Outlets is Tough

GDELT

MLP: 2013 and 2015
GDELT: 2019 forward
GDELT’s best covered source: 2,100 articles/mo compared to 2,500 per month from MLP
Broken links, redirects, duplicate articles, and advertising
Restricts requests to one search every 5 seconds, so that scraping even a single source for the full time-period can take several days

Scraping Domestic Outlets is Tough

Internet Archive

Took nearly two weeks to collect URLs from a single source from 2019-2023
Numerous irrelevant, broken, and duplicate links
Less than half or urls were usable

Random Audit of 5 MLP Countries

Task

Use algorithm to identify major events
Use GPT to summarize 5 most important events
Human check of location and event classification accuracy

Results

40 events detected from April - June 2024 (300 possible)
Correct country: 34/40
Correct event: 38/40

Measuring Russian Influence Activity

Modeling

Model: LightGBM + Temporal CV
Hyperparameters: Wide grid search for learning rate, proportion of features, depth of trees
Evaluation Metrics:
- ROC-AUC: ranking months in test set
- AUC-PR: optimal for imbalanced data
- Brier Score: calibrating probabilities

Performance 6 months out (quarter)

ROC-AUC 0.87: 87%% of onset months received a higher probability than non-onset months
AUPRC 0.31: Random guessing is 0.01