The question I have decided to answer is: What effect did the COVID-19 pandemic have on air travel worldwide? To answer this question, I found the dataset “Passenger-kilometers by air”, published by Our World in Data using data gathered from the International Civil Aviation Organization. The base dataset contains 1048 observations and 4 columns. While there are many ways to interpret my research question, I am specifically going to measure how air travel changed for individual countries. It should be noted that only UN member states are included in this dataset. Also, I do recognize that my question and analysis imply causation. This has already been proven, and this report serves merely as an exploratory analysis.
My analysis will make use of the following columns:
Code
Refers to the UN-specified code of a nation. I am using this column instead of the Entity column because rows that do not have a country code are not applicable to my research question (e.g. regions, UN development zones).
Year
A value between 2017 and 2022 (inclusive)
_9_1_2__is_rdp_pfvol__air_transport
The number of passengers transported by air, multiplied by the number of kilometers each person traveled; measured as passenger-kilometers
I will start by importing my data and necessary tools, then cleaning the data before I start my analysis. My question is fairly straightforward as it concerns this dataset, so I don’t have to do much to answer the question. First, I will filter my data to only see rows that correspond to countries. Then, using the 10 countries with the most flight kilometers, I will plot how their air travel changed over the course of the pandemic (for arguments’ sake, 2020-2022). While it would be nice to graph every single countries’ data, that is not possible here.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
flight_data <- read_csv("air-passenger-kilometers.csv")
## Rows: 1048 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Entity, Code
## dbl (2): Year, _9_1_2__is_rdp_pfvol__air_transport
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(flight_data)
## # A tibble: 6 × 4
## Entity Code Year `_9_1_2__is_rdp_pfvol__air_transport`
## <chr> <chr> <dbl> <dbl>
## 1 Afghanistan AFG 2017 1843686016
## 2 Afghanistan AFG 2018 1198398208
## 3 Afghanistan AFG 2019 1039593024
## 4 Afghanistan AFG 2020 504406688
## 5 Afghanistan AFG 2021 300047712
## 6 Afghanistan AFG 2022 2173708800
flight_data <- flight_data |>
rename(
flight_km="_9_1_2__is_rdp_pfvol__air_transport",
name="Entity",
)
names(flight_data) <- tolower(names(flight_data))
sum(is.na(flight_data$code)) ## Tells us there are 96 rows with no country code, so they are'nt relevant to the question
## [1] 96
flight_data2 <- flight_data |>
filter(!is.na(code))
I’m leaving the Entity (renamed to “name”) column in place for clarity when information is displayed later. Now that the data is organized, we will move to a new copy and begin manipulation.
A visual check of the dataset showed that there are some codes, ‘OWID_WRL’ plus some extra continents/zones, that do not refer to a country but might be useful later, so we can keep them in a non-active version of the dataset.
flight_data_countries <- flight_data2 |>
filter(!code %in% c("OWID_WRL"))
length(unique(flight_data_countries$code))
## [1] 168
There are now 168 unique countries represented in the dataset, too many to graph. Let’s try taking the 10 biggest fliers in 2019, then plotting their flight data over the ensuing years.
top10_2019 <- flight_data_countries |>
filter(year == 2019) |>
arrange(desc(flight_km)) |>
slice_head(n = 10)
top10_years <- flight_data_countries |>
filter(name %in% top10_2019$name)
ggplot(top10_years, aes(year,flight_km,color=name, group=name)) +
geom_line(linewidth = 1) +
scale_y_continuous(limits=c(4e10,2e12))+
labs(
title = "Pandemic Air Travel (2019 top fliers)",
x = "Year",
y = "Flight Distance (km)",
color = "Country"
) +
theme(
plot.background=element_rect(fill="black"),
plot.title = element_text(color="aliceblue"),
axis.text = element_text(color = "aliceblue"),
axis.title = element_text(color = "aliceblue")
)+
theme(legend.position = "bottom")
I set out to understand the effect the COVID-19 pandemic had on air travel. At least from looking at the 10 countries with the most flight-kilometers logged in 2019, I was not super surprised. There was a slight trend up going into 2019, as there usually is in times of economic growth. 2020 saw a massive drop, especially present in the United States. Most countries saw a climb in air travel as early as 2021, but China suprisingly enough continued to trend downward. I would be interested to see what happened over the next few years, and what factors led to this continued decline in flight.
To answer my original question, the COVID-19 pandemic had a serious negative effect on air travel. More than just data on a screen however, this decrease in air travel led to tens of thousands of jobs lost. There is also an argument to be made that the lack of maintenance or capital on the part of certain members of the aviation industry have something to do with the increased number in fatal accidents. If I had more time and opportunity, I would like to join this dataset with some region and population data, then perform some new analysis using per capita figures instead of nation totals. I’m especially interested to see where China, India, and the UAE would end up considering they are each outliers in wealth or population.
“Data Page: Passenger-kilometers by air”. Our World in Data (2025). Data adapted from International Civil Aviation Organization, International Transport Forum and United Nations Conference on Trade and Development. Retrieved from https://archive.ourworldindata.org/20250909-093708/grapher/air-passenger-kilometers.html [online resource] (archived on September 9, 2025).