Introduction

This report analyzes U.S. international and domestic air passenger traffic data from 2018 through October 2020. The dataset shows flight and passenger records across regional, major, and international airlines that passed through TSA checkpoints. The goal of this analysis is to use data visualizations to uncover trends in air travel and show the impact that the COVID-19 pandemic had on the aviation industry beginning in early 2020.

Dataset

The data, secured from https://www.kaggle.com/datasets/dkungu/tsa-passengers, used in this analysis comes from three separate CSV files covering the years 2018, 2019, and 2020. It should be noted that the dataset provided for 2020 only goes up to October of the 2020 calendar year. Each file contains records of passenger counts, origin countries, flight months, and airline information. The three files were combined into a single dataset for analysis. All charts exclude domestic U.S. origin travel in order to focus on the overall impact that the pandemic had on closing the U.S. borders. By keeping domestic U.S. flights and their TSA Passenger counts, there would be risk of intermingling additional flights where someone flies in internationally than immediately hops on another plane to get to a certain location.

Findings

The following five charts explore different dimensions of the data, from total flight activity and busiest origin countries to the seasonal patterns of passenger travel and the year over year impact of COVID-19.

Total Flights Through TSA

This scatter plot visualizes the total number of flights recorded each month across 2018, 2019, and 2020. Each bubble represents a specific month-year combination, with the bubble’s size and color reflecting the volume of flights at that time. The chart shows consistent levels of total flights in the U.S. during 2018 and 2019, followed by a decline in flight activity beginning in April 2020, as the COVID-19 pandemic forced travel shutdowns in the U.S. and abroad. The darker, smaller bubbles in the lower half of the 2020 show the decline becoming extremely apparent out of nowhere.

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'C:/Users/rober/miniconda3/Library/plugins/platforms'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import warnings

warnings.filterwarnings("ignore")

path = "C:/Users/rober/OneDrive/Desktop/Python/"

files = ['total-passengers-2018.csv',
         'total-passengers-2019.csv',
         'total-passengers-2020.csv']

df_combined = pd.concat([pd.read_csv(path + f, usecols=['year', 'month', 'passengers', 'origin_country']) for f in files], ignore_index=True)

x = df_combined.groupby(['year', 'month'])['year'].count().reset_index(name='total flight occurances')
x = pd.DataFrame(x)
x['total flight occurances_hundreds'] = round(x['total flight occurances']/100, 0)

plt.figure(figsize=(14,6))

plt.scatter(x['month'], x['year'], marker='8', cmap='viridis',
            c=x['total flight occurances_hundreds'], s=x['total flight occurances_hundreds'], edgecolors='black',
            vmin=x['total flight occurances_hundreds'].min(),
            vmax=x['total flight occurances_hundreds'].max())

plt.title('Total Flights By Regional and Major Airlines That Had Passengers Go Through TSA', fontsize=18)
plt.xlabel('Months of the Year', fontsize=14)
plt.ylabel('Year', fontsize=14)
plt.xticks(range(1, 13), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.yticks([2018, 2019, 2020])
cbar = plt.colorbar()
cbar.set_label('Total Flight Occurances', rotation=270, fontsize=14, color='black', labelpad=30)

my_colorbar_ticks = [*range(20, int(x['total flight occurances_hundreds'].max()), 20)]
cbar.set_ticks(my_colorbar_ticks)

my_colorbar_tick_labels = [*range(2000, int(x['total flight occurances'].max()), 2000)]
my_colorbar_tick_labels = ['{:,}'.format(each) for each in my_colorbar_tick_labels]
cbar.set_ticklabels(my_colorbar_tick_labels)

plt.text(11, 2017.7, '*2020 data only through October', fontsize=10, color='gray')
plt.show()

Top 10 Busiest Countries

This bar chart breaks down which international origin countries contributed the most passengers traveling into the United States between 2018 and October 2020. Taking the United States out of consideration allows for a better picture of which countries we see the largest proportion of travel coming in from. This is not to say that there are not connecting flights that lead to these nations, but there is great information provided here showing what countries we engage the most with. The color-coding system identifies which countries perform above average, near average, and below average in passenger volume, providing a quick visual reference for understanding where the bulk of international air traffic originates and which countries are outliers in either direction.

def pick_colors_according_to_mean_count(this_data):
    colors=[]
    avg = this_data.passengers.mean()
    for each in this_data.passengers:
        if each > avg*1.01:
            colors.append('lightcoral')
        elif each < avg*0.99:
            colors.append('green')
        else:
            colors.append('black')
    return colors

x = df_combined.groupby('origin_country')['passengers'].sum().reset_index()
x = pd.DataFrame(x)

x_no_us = x[x['origin_country'] != 'US'].sort_values('passengers', ascending=False).reset_index(drop=True)
bottom1 = 1
top1 = 157
d1 = x_no_us.loc[bottom1:top1]
my_colors1 = pick_colors_according_to_mean_count(d1)

bottom2 = 1
top2 = 10
d2 = x_no_us.loc[bottom2:top2]
my_colors2 = pick_colors_according_to_mean_count(d2)

Above = mpatches.Patch(color='lightcoral', label='Above Average')
At = mpatches.Patch(color='black', label='Within 1% of the Average')
Below = mpatches.Patch(color='green', label='Below Average')

fig = plt.figure(figsize=(40,32))
fig.suptitle('Frequency of Passengers Traveling Analysis by International Origin Country:\n Top ' + str(top1) + ' and Top ' + str(top2) + ' From 2018- Oct. 2020', fontsize=18, fontweight='bold')

ax1 = fig.add_subplot(2, 1, 1)
ax1.bar(d1.origin_country, d1.passengers, label='Passengers', color=my_colors1)
#ax1.legend(fontsize=14)
ax1.legend(handles=[Above, At, Below], fontsize=14)
plt.axhline(d1.passengers.mean(), color='black', linestyle='dashed')
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.axes.xaxis.set_visible(False)
ax1.set_title('Top ' + str(top1) + ' Countries', size=20)
ax1.text(top1-10, d1.passengers.mean()+5, 'Mean = ' + str(round(d1.passengers.mean())), rotation=0, fontsize=14)

ax2 = fig.add_subplot(2, 1, 2)
ax2.bar(d2.origin_country, d2.passengers, label='Passengers', color=my_colors2)
#ax2.legend(fontsize=14)
ax2.legend(handles=[Above, At, Below], fontsize=14)
plt.axhline(d2.passengers.mean(), color='black', linestyle='solid')
ax2.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)
#ax2.axes.xaxis.set_visible(False)
ax2.set_title('Top ' + str(top2) + ' Countries', size=20)
ax2.text(top1-1, d2.passengers.mean()+5, 'Mean = ' + str(round(d1.passengers.mean())), rotation=0, fontsize=14)

plt.show()

COVID-19 Impact (Waterfall)

This waterfall diagram tells the most powerful story of the three years of data by showing the year over year change in total passengers. Starting from the 2018 baseline of over one billion passengers, the chart shows a modest green increase into 2019, followed by a massive red decline in 2020 due to the COVID-19 pandemic. The grand total bar on the right shows the combined passenger count across all three years. The visual contrast between the green and red bars makes it immediately clear just how severe the impact of COVID-19 was on global air travel.

#Data Preparation for Waterfall Diagram
wf_df = df_combined.groupby('year')['passengers'].sum().reset_index()
#Individual Variable Creation For Total Passenger Count for Each Year
total_2018 = wf_df[wf_df['year'] == 2018]['passengers'].values[0]
total_2019 = wf_df[wf_df['year'] == 2019]['passengers'].values[0]
total_2020 = wf_df[wf_df['year'] == 2020]['passengers'].values[0]
#Data Deviations
change_2019 = total_2019 - total_2018
change_2020 = total_2020 - total_2019
grand_total = total_2018 + total_2019 + total_2020

wf_df['deviation'] = [0, change_2019, change_2020]
wf_df['year'] = wf_df['year'].astype(str)
wf_df.loc[wf_df.index.max()+1] = ['Total', wf_df['passengers'].sum(), 0]
years = ['2018', '2019', '2020', 'Total']
wf_df['year'] = pd.Categorical(wf_df['year'], categories = years, ordered = True)
wf_df.sort_values(by='year', inplace=True)
wf_df.reset_index(inplace= True, drop = True)
import plotly.graph_objects as go
import plotly.io as pio

if wf_df.loc[3, 'deviation'] > 0:
    end_color = 'black'
elif wf_df.loc[3, 'deviation'] < 0:
    end_color = 'red'
else: end_color = 'steelblue'

wf_fig = go.Figure(go.Waterfall(
    name='', 
    orientation='v',
    x=['2018', '2018 to 2019', '2019 to 2020', 'Total'],
    textposition='outside',
    measure=['relative', 'relative', 'relative', 'total'],
    y=[total_2018/1e9, change_2019/1e9, change_2020/1e9, grand_total/1e9],
    text=['{:.2f}B'.format(total_2018/1e9),
          '{:.2f}B'.format(change_2019/1e9),
          '{:.2f}B'.format(change_2020/1e9),
          '{:.2f}B'.format(grand_total/1e9)],
    decreasing={'marker':{'color':'red'}},
    increasing={'marker':{'color':'green'}},
    totals={'marker':{'color': end_color}}))

wf_fig.update_layout(yaxis=dict(tickformat='.1f'))
wf_fig.update_xaxes(
    title_text='Years and Their Recorded Deviation From Prior Year Total Passenger Count Starting From 2018', 
    title_font={'size': 18}
)
wf_fig.update_yaxes(
    title_text='Total Passengers (Billions)', 
    title_font={'size': 18},
    dtick=0.5, 
    ticksuffix='B', 
    zeroline=True
)
wf_fig.update_layout(
    title=dict(
        text='Total Passenger Traffic 2018-2020: Impact of COVID-19 on Air Travel (Waterfall Diagram)<br>' +
             'Increases in overall passenger traffic appear in Green, Declines appear in Red',
        font=dict(family='Arial', size=18, color='black')
    ),
    template='simple_white',
    title_x=0.5,
    showlegend=False,
    autosize=True,
    height=800,
    width=1400,
    margin=dict(l=100, r=100, t=250, b=100)
)

pio.write_html(wf_fig, path + "waterfall_result.html", auto_open=False)

International Origin Share

This donut chart complements the bar chart by showing the proportional share of total passengers contributed by each of the top 10 international origin countries. Rather than showing raw numbers, it illustrates how the overall international passenger pie is divided, making it easy to see which countries dominate international air travel into the US relative to one another. Countries that appear as larger slices represent a disproportionately large share of total international traffic.

# Building Data Frame For Top 10 Countries 
donut_data = df_combined.groupby('origin_country')['passengers'].sum().reset_index()
donut_data = donut_data[donut_data['origin_country'] != 'US']
donut_data = donut_data.nlargest(10, 'passengers')

print(donut_data)
# Middle of Donut Total Passenger Count
all_passengers = donut_data.passengers.sum()


#Creation for the Full Chart
plt.figure(figsize=(14, 10))

plt.pie(donut_data['passengers'],
        labels=donut_data['origin_country'],
        autopct= lambda p : '{:.2f}%\n({:.1f}M)'. format(p, (p/100)*all_passengers/1e+6),
        startangle=90,
        pctdistance=0.85,
        labeldistance=1.1,
        textprops= {'fontsize':18},
        wedgeprops=dict(edgecolor='white'))
#Middle Hole That Makes Donut
hole = plt.Circle((0,0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

#Text For Middle Hole
plt.text(0, 0, 'Passenger Total\n' + str(round(all_passengers/1e6, 2)) + 'M', ha='center', va='center', fontsize=14)

# Title
plt.title('Top 10 International Origin Countries\nShare of Total Passengers (2018 - Oct. 2020)', 
          fontsize=18)
plt.tight_layout()
plt.show()

Conclusion

This analysis clearly demonstrates the devastating impact that COVID-19 had on international air travel into the United States. Across all five visualizations, the data tells a consistent story. 2018 and 2019 showed stable and growing passenger traffic, while 2020 experienced an unprecedented collapse beginning in April that had never been seen in modern aviation history. This, of course, is a direct result of the COVID-19 pandemic that shocked and locked down the world in 2020. The data also reveals that a small number of countries, led by Canada and the United Kingdom, account for an overwhelmingly large share of international passenger traffic into the US, highlighting the importance of these key travel corridors to the overall health of the aviation industry.