Introduction

How did the COVID-19 pandemic affect domestic and international air passenger traffic at San Francisco International Airport (SFO)?

The COVID-19 pandemic has dramatically disrupted air travel around the world. This project analyzes how domestic and international passenger traffic at San Francisco International Airport (SFO) changed between 2018 and 2024.

The analysis uses the open data SFO Air Traffic Passenger Statistics dataset from data.gov. This dataset provides information such as: monthly passenger counts, airline information, geographic summaries (domestic vs. international), and airport codes.

For this project, we focus on the following columns:

  • Activity Period: indicates year and month of activity (e.g., 202003 for March 2020)
  • GEO Summary: identifies whether flights were domestic or international
  • Passenger Count: number of passengers per airline, per month
  • Year and Month: dates from the Activity Period

Data Analysis

The data analysis for this project focuses on understanding how the COVID-19 pandemic affected both domestic and international air traffic at San Francisco International Airport (SFO). I will start by cleaning the dataset and creating new variables for year and month to make time-based analysis possible. Then, I will use exploratory data analysis (EDA) functions to explore and refine the dataset. After preparing the data, I will organize passenger counts by year and by domestic or international, to visualize the general travel trends. Finally, I will create a line plot to compare passenger traffic patterns over time, allowing us to see how travel volumes dropped during 2020 and gradually recovered later on.

# Load dataset
sfo <- read.csv("Air_Traffic_Passenger_Statistics.csv")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Cleaning data: 
# extracting year and month from Activity.Period
sfo <- sfo %>%
  mutate(
    Year = as.numeric(substr(Activity.Period, 1, 4)),
    Month = as.numeric(substr(Activity.Period, 5, 6))
  )

# views structure and summary for EDA
summary(sfo$Passenger.Count)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    4359    8600   27831   19657  856501
# keeping only the relevant columns and categories
sfo_filtered <- sfo %>%
  select(Year, Month, GEO.Summary, Passenger.Count) %>%
  filter(GEO.Summary %in% c("Domestic", "International"))

head(sfo_filtered)
##   Year Month   GEO.Summary Passenger.Count
## 1 1999     7      Domestic           31432
## 2 1999     7      Domestic           31353
## 3 1999     7      Domestic            2518
## 4 1999     7 International            1324
## 5 1999     7 International            1198
## 6 1999     7 International           24124
# Summarizes all categories
sfo_summary <- sfo_filtered %>%
  group_by(Year, GEO.Summary) %>%
  summarise(Total_Passengers = sum(Passenger.Count, na.rm = TRUE))
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
sfo_summary
## # A tibble: 54 × 3
## # Groups:   Year [27]
##     Year GEO.Summary   Total_Passengers
##    <dbl> <chr>                    <int>
##  1  1999 Domestic              17180570
##  2  1999 International          3802062
##  3  2000 Domestic              32469208
##  4  2000 International          8222233
##  5  2001 Domestic              26665077
##  6  2001 International          7622247
##  7  2002 Domestic              23699764
##  8  2002 International          7391570
##  9  2003 Domestic              22260057
## 10  2003 International          6789671
## # ℹ 44 more rows
# Creates a line plot, showing yearly trends
library(ggplot2)

ggplot(sfo_summary, aes(x = Year, y = Total_Passengers, color = GEO.Summary)) +
  geom_line(size = 1.2) +
  geom_point(size = 2) +
  labs(
    title = "SFO Passenger Traffic (2018–2024)",
    x = "Year",
    y = "Total Passengers",
    color = "Traffic Type"
  ) +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Conclusion

The analysis highlights the impact of the COVID-19 pandemic, causing a dramatic decline in both domestic and international passenger traffic at the San Francisco International Airport (SFO). In 2020, total passenger volumes dropped to a fraction of the original levels, reflecting the global travel bans and public health restrictions. Domestic air travel began to recover sooner, showing a noticeable growth in 2021 and 2022 as those restrictions eased within the United States. International passenger counts, however, remained lower for a longer period, likely due to extended border closures and slower global vaccination.

These findings emphasizes the unequal pace of recovery and the higher restrictions with policy between domestic and international air travel. It also stresses the broader challenges faced by the aviation industry throughout the pandemic. Understanding and utilizing these patterns can help airports and policymakers better prepare for future crisis, having now better flight management and recovery strategies.

For future research, it would be useful to track how passenger traffic varied by specific airlines, specifically the geographic regions to identify which routes had been more flexible. Incorporating additional data, such as flight cancellations or ticket prices, can also provide a more larger overview of how the pandemic reshaped air transportation patterns at not only at SFO, but globally as well.

References:

City and County of San Francisco. (n.d.). Air Traffic Passenger Statistics. Data.gov. https://catalog.data.gov/dataset/air-traffic-passenger-statistics

U.S. Department of Transportation. (n.d.). COVID-19 and the Airline Industry. Bureau of Transportation Statistics. https://www.bts.gov/covid-19