Ukrainian vs Russian Language Use in Ukraine

Author

John Ihor Campagna

I Introduction

Language has long shaped Ukrainian national identity and its internal political struggles. Ukrainian was historically repressed under foreign rule, while Russian dominated cities and public life, often seen as a language of prestige and progress. After independence in 1991, overt suppression of Ukrainian ended — but the status of Russian remained deeply contested.

In the 2001 census, 67.5% of Ukrainians identified Ukrainian as their native language, and 29.6% named Russian. But “native language” often reflected identity more than use. Most Ukrainians are bilingual and move between languages  Ukrainian, Russian, or a hybrid known as Surzhyk. Understanding everyday language use is difficult. Traditional surveys capture what people claim they speak  not how they act.

This project uses Google Trends to track the relative usage of Ukrainian and Russian across time and region. By comparing the search frequencies of common queries in both languages I examine patterns in digital behavior as a proxy for language preference in daily life.

The goal is to understand whether Ukrainians are shifting their linguistic habits online especially in response to key political events such as the 2013–2014 Maidan Revolution and the 2022 Russian invasion. If search behavior reflects cultural or identity shifts, this approach may offer a window into Ukraine’s evolving national consciousness.

II Data and Methodology

This project uses data from Google Trends, accessed via the gtrendsR package in R, to estimate language usage in Ukraine. I compare the relative search interest of equivalent queries spelled differently in  Ukrainian and Russian that over time and across regions.

Google Trends provides normalized search scores (0–100). I compute a Ukrainian-to-Russian ratio to capture relative language preference:

Ratio = Ukrainian score / Russian score

Data was collected:

  • Nationally from 2010 to 2025 (weekly).

  • Regionally using Ukrainian ISO subregion codes (e.g., UA-30 for Kyiv).

Limitations

  • Relative scores: Google Trends data is scaled within queries and not comparable across unrelated searches.

  • Opacity: Sampling methods and geographic boundaries are not publicly disclosed.

  • User behavior bias: Search language reflects usage, not identity or fluency.

  • Content availability bias: Historically, more content was available in Russian. This may overstate Russian usage, especially in earlier years or content-heavy queries.

  • Digital divide: Results reflect internet users, possibly underrepresenting rural or older populations.