How COVID-19 has effected airline travel in the United States

In the past year COVID has effected many aspects of all of our lives. From how we participate in school, to how we interact with strangers, and even how we enjoy our personal time. One of the many things that every American faced was debate on whether or not to travel. Many questioned whether they should go by car, bus, planes, or ultimately just stay home. In relation to this and how businesses were affected, airports were one of the many US corporations that had to adapt to losing almost all of its business. This analysis will describe and evaluate how COVID-19 has changed travel by public airlines in the United States.

(https://www.engineering.com/story/fact-check-flying-the-covid-skies-is-it-safe)

Introduction

Question of interest:

In this project I will analyze how exactly COVID-19 has effected one of the most common means of transportation: by airplane. This project will include the TSA checkpoint travel numbers for the year before and the year after COVID-19. The first year will cover months March 2019-February 2020 and the second year will cover March 2020-February 2021. I will first compare the yearly averages and then perform a hypothesis test to perform my analysis. Also, the project will provide additional data and compare the traveler throughput for the slowest and busiest days in both time frames.

Plan for collecting data:

The data was imported from TSA’s website and manipulated accordingly for the project. The data from TSA displayed the number of TSA checkpoint travel numbers for each day from the past 3 years. For this analysis, I created an additional table with the monthly averages ranging from January 2019 to March 2021 for easier manipulation.

Extracting the Data

The raw data extracted from TSA’s website is as follows:

Fortunately, the data did not contain any missing values that would affect any calculations. It should be noted that the data for 2021 TSA checkpoints stops at April 5 so there are NA values for the dates afterwards. The table contains 4 columns and 365 rows.

For further investigation, I averaged the TSA checkpoint travel numbers per month for each year, resulting in:

Data Management

Yearly Data

Below is a graph showing a scatter plot of the daily TSA checkpoints from years 2019-2021. The data starts on January 1, 2019 and ends on April 5, 2021. The graph shows the constant number of check-ins in 2019, the sudden drop in check-ins in 2020, and the recovering stage of the pandemic in 2021.

The following minimums and maximums for each year are displayed below with their corresponding dates. The highest value overall occurs in 2019 and the lowest value overall occurs in 2020.

Monthly Data

As seen in the graph, the monthly averages of TSA check-ins are well above the 2,000,000 value until COVID-19 happened. The first drop from 2,158,174 value to the 1,104,630 value occurred in March 2020 and then the next sudden drop from the 1,104,630 value to the lowest point recorded on the graph of 106,275 occurred in April 2020. It can also be shown that since June 2020, the travel numbers have slowly been rising, with a minor decrease in January 2021.

The graph below demonstrates the comparison between the monthly averages of TSA check-ins before and after COVID-19.

Analysis

This analysis will be testing to determine if there is a significant difference in the average of TSA check-ins for the year before COVID-19 and the year after COVID-19. More specifically, to determine if the average number of TSA check-ins decreased due to COVID-19. My hypotheses are as follows with respect to a one sample t-test:

H₀: µ_D = 0

H₁: µ_D > 0

Before COVID-19

The year before COVID-19 covers the following months: March 2019-February 2020. The following calculations are with respect to these months:

Data for Year Before COVID-19
Date	Average
2019-03-01	2339686
2019-04-01	2345235
2019-05-01	2403202
2019-06-01	2553997
2019-07-01	2564902
2019-08-01	2412129
2019-09-01	2217709
2019-10-01	2325693
2019-11-01	2292922
2019-12-01	2265141
2020-01-01	1997751
2020-02-01	2158174

\(\bar{x}\) = 2323045

After COVID-19

The year after COVID-19 covers the following months: March 2020-February 2021. The following calculations are with respect to these months:

Data for Year After COVID-19
Date	Average
2020-03-01	1064354.9
2020-04-01	108643.2
2020-05-01	231155.8
2020-06-01	482726.7
2020-07-01	669057.5
2020-08-01	700260.4
2020-09-01	716275.4
2020-10-01	826983.7
2020-11-01	850432.9
2020-12-01	851347.3
2021-01-01	761233.2
2021-02-01	873083.8

\(\bar{x}\) = 677962.9

Descriptive Statistics

TSA Checkins Before & After COVID-19 with Respecting Differences
before	after	difference
2339686	1064354.9	1275331
2345235	108643.2	2236592
2403202	231155.8	2172046
2553997	482726.7	2071270
2564902	669057.5	1895845
2412129	700260.4	1711869
2217709	716275.4	1501433
2325693	826983.7	1498710
2292922	850432.9	1442489
2265141	851347.3	1413793
1997751	761233.2	1236518
2158174	873083.8	1285090

The difference in the TSA check-ins (before-after) from the year before and year after COVID-19 has a mean of 1645082 and a standard deviation of 362615.1. The graph of the sample of differences looks right skewed. The potential outliers are 919851.9 and 2370312. Using the criteria that potential outliers are outside 2 standard deviations from the mean, there are no potential outliers from this sample.

Assumptions

This sample is representative of the population of interest.
The qqplot of the differences in TSA check-ins (Before-After) seems to have a relatively linear pattern therefore it is reasonable to assume that the sampling distribution of D-bar is approximately normal distributed.

Perform Test

## 
##  One Sample t-test
## 
## data:  difference
## t = 15.716, df = 11, p-value = 3.485e-09
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  1457093     Inf
## sample estimates:
## mean of x 
##   1645082

Summary

Analysis Summary

At any reasonable significance level, the p-value (4.01e-09) is less than any reasonable alpha level. Thus, we should reject the null hypothesis. There is sufficient evidence to suggest that COVID-19 did in fact effect US flight travel.

Conclusion

After performing a single sample t-test on the data concerning the daily TSA check-ins ranging from the dates January 1, 2019 to April 5, 2021, it is safe to conclude that COVID-19 did in fact have an effect on airplane travel in the US. From the data shown in the tables and then graphing the results, we can conclude that COVID-19 resulted in a negative effect on the economy. However, by looking at the yearly graph, we can see that the economy is slowly making its way back to normal. The 2021 scatter plot shows the TSA check-ins slowly rising, signaling a return to normalcy for the US.

My original plan for my project was to compare the average travel spending rates before and after COVID-19. However, after having difficulty trying to find relevant data for about a week I decided to shift my focus on something more specific: airline travel. The TSA’s website was fairly easy to understand and their data was easily accessible for this analysis.

Based on my conclusion, it is clear that COVID-19, a worldwide pandemic, had a major effect on travel in the US, more specifically airline travel. For future reference, this conclusion might help airports and/or other travel methods to be more prepared when something like COVID-19 may happen again. Considering the analysis this project provided, it would be interesting to continue this analysis with other means of travel and compare how COVID-19 has also affected them. Such as: rental car transactions, tourism and city spending, hotel reservations, AirBnB reservations, gas purchases, RV site rentals, cruise ship reservations, etc.

Here is my RPUbs Link.

Personal Information

Name: Taylor Edwards

E-mail: tme@clemson.edu

Semester: Spring 2021

Class: STAT 4020: Introduction to Statistical Computing

About Me: I am a junior Mathematical Sciences major at Clemson this spring. My emphasis is Actuarial Science and Financial Management with a minor in Accounting. I am from Westminster SC, a very small town on the other side of Seneca. I enjoy hiking, traveling, and watching crime shows on Netflix. I thought this topic was relevant considering everything we have been through in the past year and how it has affected everyone’s means of travel.

Appendix/Code

Packages

Below are the packages required for the project:

library(readxl)
library(DT)
library(ggplot2)
library(summarytools)

Code for Graphs

Below are the code chunks that produced the graphs for this project:

Scatter Plot of Daily TSA Check-Ins

# Packages
library(DT)
library(ggplot2)
library(readxl)

# Import Data
travel.numbers <- read_xlsx("TSA Checkpoint travel numbers.xlsx")
# Create a scatter plot with the daily TSA Checkpoint travel numbers
# Overlap the 3 different years: 2019, 2020, 2021
ggplot(travel.numbers, aes(x=Date, y=Year, color=Year)) + geom_point(aes(y=`2021 TSA Checkpoint`, col="2021")) + geom_point(aes(y=`2020 TSA Checkpoint`, col="2020")) + geom_point(aes(y=`2019 TSA Checkpoint`, col="2019")) + xlab("Date") +ylab("TSA Checkins") +ggtitle("Daily TSA Check-Ins")

Scatter Plot of Monthly Averages of TSA Check-Ins

# Import data
monthly.averages <- read_xlsx("Monthly Averages.xlsx")

# Scatter plot of monthly averages 
plot <- ggplot(monthly.averages, aes(x=Date, y=Average)) + geom_point() + geom_line() +ggtitle("Monthly Averages of TSA Check-Ins") 
plot

Scatter Plot of Before & After Monthly Averages of TSA Check-Ins

# Import data
new.monthly <- read_xlsx("New Monthly Avg.xlsx")

# Create 2 overlapping scatter plot to compare before & after monthly averages
ggplot(new.monthly, aes(x=Date, y=value, color=Test)) + geom_point(aes(y=`Before Average`, col="Before")) + geom_point(aes(y=`After Average`, col="After")) + xlab("Month") +ylab("TSA Checkins") +ggtitle("Before & After Monthly Averages of TSA Check-Ins")

Code for Analysis

Below are the code chunks that helped performed the analysis for this project:

Data for Year Before COVID-19

library(DT)
library(readxl)

# Import Data
monthly.averages <- read_xlsx("Monthly Averages.xlsx")
# Get the averages of the year before COVID-19
first <- monthly.averages[3:14, 1:2]
# Create table of first year using kable
knitr::kable(first, caption="Data for Year Before COVID-19")
# Found the average of the first year averages
firstAvg <- mean(monthly.averages$Average[3:14])

# Created a function that takes a calculated number and removes the scientific notation
scientific <- function(number){
  new.number <- format(number, scientific=F)
  return(new.number)
}

# Print value after going through function
x <- scientific(firstAvg)

Data for Year After COVID-19

library(DT)
library(readxl)

# Import data
monthly.averages <- read_xlsx("Monthly Averages.xlsx")
# Second half of data
first <- monthly.averages[15:26, 1:2]
# Create kable from data
knitr::kable(first, caption="Data for Year After COVID-19")
# Calculate average of year after COVID-19
secondAvg <- mean(monthly.averages$Average[15:26])

scientific <- function(number){
  new.number <- format(number, scientific=F)
  return(new.number)
}
# calculate using function
y <- scientific(secondAvg)

Calculating the Differences

library(DT)
library(readxl)
library(summarytools)

# Import data 
monthly.averages <- read_xlsx("Monthly Averages.xlsx")

# Function code
scientific <- function(number){
  new.number <- format(number, scientific=F)
  return(new.number)
}

# Calculate the average of the first year averages
firstAvg <- mean(monthly.averages$Average[3:14])
x <- scientific(firstAvg)

# Calculate the average of the second year averages
secondAvg <- mean(monthly.averages$Average[15:26])
y <- scientific(secondAvg)

before <- monthly.averages$Average[3:14]
after <- monthly.averages$Average[15:26]

# Calculate difference between the average before and after COVID-19 
difference <- before-after
# Create table with before and after data and the corresponding differences
table <- cbind(before, after, difference)
knitr::kable(table, caption="TSA Checkins Before & After COVID-19 with Respecting Differences")

Calculating Values for Analysis

# function CODE
scientific <- function(number){
  new.number <- format(number, scientific=F)
  return(new.number)
}

# Center, shape, variability, and outlier are necessary variables for performing the analysis on this project 
center <- mean(difference)
center1 <- scientific(center)
shape <- hist(difference)
variability <- sd(difference)
variability1 <- scientific(variability)
outlier1 <- center-2*variability
outlier2 <- center+2*variability
outlier01 <- scientific(outlier1)
outlier02 <- scientific(outlier2)

QQNORM Graph

# Print qqnorm graph so we are able to further our analysis of the project data
qqnorm(difference)

T-test Results

# Print the summary results of the one sample t-test with the alternative hypothesis set to "greater"
result <- t.test(difference, alternative="greater")
print(result)

Final Project Spring 2021

Taylor Edwards

4/6/2021