Machine Learning for Peace:

Forecasting DoS Travel Advisories

DevLab@Penn

University of Pennsylvania

November 18, 2024

Mahda Soltani, Jeremy Springman, Erik Wibbels

Overview


Introducing Machine Learning for Peace

  1. Data collection and processing
  2. Digital tools

Forecasting DoS Travel Advisories

  1. Objectives and data
  2. Modeling and terminology
  3. Performance

Introducing MLP Infrastructure

MLP Approach


How can data improve decision-making?

  1. Awareness: data on what’s happening very recently
    • Mass scraping online news + ML to track events
    • Interactive data dashboard

  2. Planning: predictive analytics for strategic decisions
    • Forecasting political events
    • Civic Space Early Warning System

High Quality Corpus

Input: Online news

  • 400 news sources
  • 40 languages
  • 100+ million articles

Data quality

  • Focus on high-quality local sources (medium data)
  • Direct, human monitored scraping
  • Much better coverage than big data corpora



Output: Monthly data

  • 60 countries
  • 2012 - last month

Measuring Civic Space: Tunisia

Measuring Chinese Influence

Forecasting Events: Bangladesh

MLP’s Digital Tools

Forecasting High-Level Travel Advisories

Motivation & Takeaways

  • High-level advisories are costly
    • Anticipating volume, location, timing of advisories can improve planning and resource allocation
  • Surprisingly strong performance:
    • Similar to the best conflict forecasting projects
  • Limitations
    • Limited sample of countries and time

Data Collection

Measuring high-level advisory onset

  • Scraped advisory data posted by DOS and archived on web.archive.org (2012-2023)
  • Beginning 2018: level 3 and 4 are considered “high-level”
  • Before 2018: all advisories are considered “high-level”
  • Identified start month of every high-level advisory (onset)

Forecast Target

  • Probability of high-level advisory onset for each country-month
    • Onsets are rare (123 in 8680 months)
    • Rare events are difficult to forecast

Forecast Predictors

  • Monthly data on domestic political events
    • MLP data on 20 political events (arrests, legal action, raids, etc.)
  • Monthly data on local economic conditions
    • TradingEconomics data on prices, growth, wages, etc.
  • Modeling variables
    • Country-specific features
    • COVID months

Modeling

  • Objective: detect patterns in historical data that will generalize to future data
    • Train a model on historical chunks of data
    • Test ability to predict onsets in later months
    • Correct month; correct quarter
  • Measuring performance:
    • Difference between prediction and reality
    • Improvement over random guessing

Terminology & Trade-offs

  • True positive: onsets we predicted and actually did occur
  • False positive: onsets we predicted but did not occur
  • Identifying more true positives means tolerating more false positives

Performance 6 months out (quarter)

  • ROC-AUC = 0.9
  • AUPRC = 0.57 (random guessing = 0.02)

Forecasts for 2024

Our training data ends in December 2023; what does our model say about March and June 2024?

  • Highest probability country-months:
    • 3-month: Zimbabwe, Liberia
    • 6-month: Bangladesh, Zimbabwe
  • Bangladesh was the only true onset

What are the early warning signs?

Election activity, state of emergency, and disasters emerge as consistent, high-impact predictors

Takeaways & Questions

  • Would regular forecasts be useful to DoS?
    • We could provide this on an ongoing basis (quarterly forecasts based on recent data)
  • Decision points
    • Trade-off true positives and false positives?
    • Most actionable forecast horizon?
    • Temporal precision: specific months or 3-month window?

Appendix

Data Processing Pipeline

  • Machine learning to detect events and locations
  • Measure amount of activity for each country-month
  • Normalization to account for fluctuations in volume

Measuring Civic Space Activity

MLP vs Big Data Media Corpora

  • International media sources have limited, skewed coverage of events in developing countries
    • Event datasets that rely on international language sources will have major biases
  • Accurate data collection from domestic outlets requires careful human curation
    • GDELT, Common Crawl, Internet Archive have limited coverage and include major errors
  • MLP’s stable composition has advantages for forecasting

Importance of Domestic News

Challenges of Big Data Scraping

Importance of Domestic News

Importance of Domestic News

Lexis Nexis vs MLP

Countries where LN has zero local sources: 6/56

  • Albania, Belarus, Kosovo, Jamaica, Angola, South Sudan

Comparing LN on other metrics:

  • Fewer languages: 17 vs 34
  • Slightly more local sources per country: 5.5 vs 5 (median; excluding MLP’s regional sources)
  • Shorter, more sporadic coverage over time

Importance of Domestic News

Scraping Case Study

Bangladesh: easiest case for automated scrapers

  • Massive volume
  • Good web architecture
  • 2/5 sources are in English

Scraping Domestic Outlets is Tough

GDELT

  • MLP: 2013 and 2015
  • GDELT: 2019 forward
  • GDELT’s best covered source: 2,100 articles/mo compared to 2,500 per month from MLP
  • Broken links, redirects, duplicate articles, and advertising
  • Restricts requests to one search every 5 seconds, so that scraping even a single source for the full time-period can take several days

Scraping Domestic Outlets is Tough

Internet Archive

  • Took nearly two weeks to collect URLs from a single source from 2019-2023
  • Numerous irrelevant, broken, and duplicate links
  • Less than half or urls were usable

Random Audit of 5 MLP Countries

Task

  • Use algorithm to identify major events
  • Use GPT to summarize 5 most important events
  • Human check of location and event classification accuracy

Results

  • 40 events detected from April - June 2024 (300 possible)
  • Correct country: 34/40
  • Correct event: 38/40

Measuring Russian Influence Activity

Modeling

  • Model: LightGBM + Temporal CV
  • Hyperparameters: Wide grid search for learning rate, proportion of features, depth of trees
  • Evaluation Metrics:
    • ROC-AUC: ranking months in test set
    • AUC-PR: optimal for imbalanced data
    • Brier Score: calibrating probabilities

Performance 6 months out (quarter)

  • ROC-AUC 0.87: 87%% of onset months received a higher probability than non-onset months
  • AUPRC 0.31: Random guessing is 0.01