The COVID-19 pandemic has dramatically disrupted air travel around the world. This project analyzes how domestic and international passenger traffic at San Francisco International Airport (SFO) changed between 2018 and 2024.
The analysis uses the open data SFO Air Traffic Passenger Statistics dataset from data.gov. This dataset provides information such as: monthly passenger counts, airline information, geographic summaries (domestic vs. international), and airport codes.
For this project, we focus on the following columns:
The data analysis for this project focuses on understanding how the COVID-19 pandemic affected both domestic and international air traffic at San Francisco International Airport (SFO). I will start by cleaning the dataset and creating new variables for year and month to make time-based analysis possible. Then, I will use exploratory data analysis (EDA) functions to explore and refine the dataset. After preparing the data, I will organize passenger counts by year and by domestic or international, to visualize the general travel trends. Finally, I will create a line plot to compare passenger traffic patterns over time, allowing us to see how travel volumes dropped during 2020 and gradually recovered later on.
# Load dataset
sfo <- read.csv("Air_Traffic_Passenger_Statistics.csv")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Cleaning data:
# extracting year and month from Activity.Period
sfo <- sfo %>%
mutate(
Year = as.numeric(substr(Activity.Period, 1, 4)),
Month = as.numeric(substr(Activity.Period, 5, 6))
)
# views structure and summary for EDA
summary(sfo$Passenger.Count)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 4359 8600 27831 19657 856501
# keeping only the relevant columns and categories
sfo_filtered <- sfo %>%
select(Year, Month, GEO.Summary, Passenger.Count) %>%
filter(GEO.Summary %in% c("Domestic", "International"))
head(sfo_filtered)
## Year Month GEO.Summary Passenger.Count
## 1 1999 7 Domestic 31432
## 2 1999 7 Domestic 31353
## 3 1999 7 Domestic 2518
## 4 1999 7 International 1324
## 5 1999 7 International 1198
## 6 1999 7 International 24124
# Summarizes all categories
sfo_summary <- sfo_filtered %>%
group_by(Year, GEO.Summary) %>%
summarise(Total_Passengers = sum(Passenger.Count, na.rm = TRUE))
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
sfo_summary
## # A tibble: 54 × 3
## # Groups: Year [27]
## Year GEO.Summary Total_Passengers
## <dbl> <chr> <int>
## 1 1999 Domestic 17180570
## 2 1999 International 3802062
## 3 2000 Domestic 32469208
## 4 2000 International 8222233
## 5 2001 Domestic 26665077
## 6 2001 International 7622247
## 7 2002 Domestic 23699764
## 8 2002 International 7391570
## 9 2003 Domestic 22260057
## 10 2003 International 6789671
## # ℹ 44 more rows
# Creates a line plot, showing yearly trends
library(ggplot2)
ggplot(sfo_summary, aes(x = Year, y = Total_Passengers, color = GEO.Summary)) +
geom_line(size = 1.2) +
geom_point(size = 2) +
labs(
title = "SFO Passenger Traffic (2018–2024)",
x = "Year",
y = "Total Passengers",
color = "Traffic Type"
) +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The analysis highlights the impact of the COVID-19 pandemic, causing a dramatic decline in both domestic and international passenger traffic at the San Francisco International Airport (SFO). In 2020, total passenger volumes dropped to a fraction of the original levels, reflecting the global travel bans and public health restrictions. Domestic air travel began to recover sooner, showing a noticeable growth in 2021 and 2022 as those restrictions eased within the United States. International passenger counts, however, remained lower for a longer period, likely due to extended border closures and slower global vaccination.
These findings emphasizes the unequal pace of recovery and the higher restrictions with policy between domestic and international air travel. It also stresses the broader challenges faced by the aviation industry throughout the pandemic. Understanding and utilizing these patterns can help airports and policymakers better prepare for future crisis, having now better flight management and recovery strategies.
For future research, it would be useful to track how passenger traffic varied by specific airlines, specifically the geographic regions to identify which routes had been more flexible. Incorporating additional data, such as flight cancellations or ticket prices, can also provide a more larger overview of how the pandemic reshaped air transportation patterns at not only at SFO, but globally as well.
City and County of San Francisco. (n.d.). Air Traffic Passenger Statistics. Data.gov. https://catalog.data.gov/dataset/air-traffic-passenger-statistics
U.S. Department of Transportation. (n.d.). COVID-19 and the Airline Industry. Bureau of Transportation Statistics. https://www.bts.gov/covid-19