Introduction

The question I have decided to answer is: What effect did the COVID-19 pandemic have on air travel worldwide? To answer this question, I found the dataset “Passenger-kilometers by air”, published by Our World in Data using data gathered from the International Civil Aviation Organization. The base dataset contains 1048 observations and 4 columns. While there are many ways to interpret my research question, I am specifically going to measure how air travel changed for individual countries. It should be noted that only UN member states are included in this dataset. Also, I do recognize that my question and analysis imply causation. This has already been proven, and this report serves merely as an exploratory analysis.

My analysis will make use of the following columns:

Data Analysis

I will start by importing my data and necessary tools, then cleaning the data before I start my analysis. My question is fairly straightforward as it concerns this dataset, so I don’t have to do much to answer the question. First, I will filter my data to only see rows that correspond to countries. Then, using the 10 countries with the most flight kilometers, I will plot how their air travel changed over the course of the pandemic (for arguments’ sake, 2020-2022). While it would be nice to graph every single countries’ data, that is not possible here.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo 
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
flight_data <- read_csv("air-passenger-kilometers.csv")
## Rows: 1048 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Entity, Code
## dbl (2): Year, _9_1_2__is_rdp_pfvol__air_transport
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(flight_data)
## # A tibble: 6 × 4
##   Entity      Code   Year `_9_1_2__is_rdp_pfvol__air_transport`
##   <chr>       <chr> <dbl>                                 <dbl>
## 1 Afghanistan AFG    2017                            1843686016
## 2 Afghanistan AFG    2018                            1198398208
## 3 Afghanistan AFG    2019                            1039593024
## 4 Afghanistan AFG    2020                             504406688
## 5 Afghanistan AFG    2021                             300047712
## 6 Afghanistan AFG    2022                            2173708800
flight_data <- flight_data |>
  rename(
    flight_km="_9_1_2__is_rdp_pfvol__air_transport",
    name="Entity",
    )
names(flight_data) <- tolower(names(flight_data))

sum(is.na(flight_data$code)) ## Tells us there are 96 rows with no country code, so they are'nt relevant to the question
## [1] 96
flight_data2 <- flight_data |>
  filter(!is.na(code))

I’m leaving the Entity (renamed to “name”) column in place for clarity when information is displayed later. Now that the data is organized, we will move to a new copy and begin manipulation.

A visual check of the dataset showed that there are some codes, ‘OWID_WRL’ plus some extra continents/zones, that do not refer to a country but might be useful later, so we can keep them in a non-active version of the dataset.

flight_data_countries <- flight_data2 |>
  filter(!code %in% c("OWID_WRL"))
length(unique(flight_data_countries$code))
## [1] 168

There are now 168 unique countries represented in the dataset, too many to graph. Let’s try taking the 10 biggest fliers in 2019, then plotting their flight data over the ensuing years.

top10_2019 <- flight_data_countries |>
  filter(year == 2019) |>
  arrange(desc(flight_km)) |>
  slice_head(n = 10)

top10_years <- flight_data_countries |>
  filter(name %in% top10_2019$name)

ggplot(top10_years, aes(year,flight_km,color=name, group=name)) +
  geom_line(linewidth = 1) +
  scale_y_continuous(limits=c(4e10,2e12))+
  labs(
    title = "Pandemic Air Travel (2019 top fliers)",
    x = "Year",
    y = "Flight Distance (km)",
    color = "Country"
  ) +
  theme(
    plot.background=element_rect(fill="black"),
    plot.title = element_text(color="aliceblue"),
    axis.text = element_text(color = "aliceblue"),
    axis.title = element_text(color = "aliceblue")
    )+
  theme(legend.position = "bottom")

Conclusion and Future Directions

I set out to understand the effect the COVID-19 pandemic had on air travel. At least from looking at the 10 countries with the most flight-kilometers logged in 2019, I was not super surprised. There was a slight trend up going into 2019, as there usually is in times of economic growth. 2020 saw a massive drop, especially present in the United States. Most countries saw a climb in air travel as early as 2021, but China suprisingly enough continued to trend downward. I would be interested to see what happened over the next few years, and what factors led to this continued decline in flight.

To answer my original question, the COVID-19 pandemic had a serious negative effect on air travel. More than just data on a screen however, this decrease in air travel led to tens of thousands of jobs lost. There is also an argument to be made that the lack of maintenance or capital on the part of certain members of the aviation industry have something to do with the increased number in fatal accidents. If I had more time and opportunity, I would like to join this dataset with some region and population data, then perform some new analysis using per capita figures instead of nation totals. I’m especially interested to see where China, India, and the UAE would end up considering they are each outliers in wealth or population.

References

“Data Page: Passenger-kilometers by air”. Our World in Data (2025). Data adapted from International Civil Aviation Organization, International Transport Forum and United Nations Conference on Trade and Development. Retrieved from https://archive.ourworldindata.org/20250909-093708/grapher/air-passenger-kilometers.html [online resource] (archived on September 9, 2025).