In the past year COVID has effected many aspects of all of our lives. From how we participate in school, to how we interact with strangers, and even how we enjoy our personal time. One of the many things that every American faced was debate on whether or not to travel. Many questioned whether they should go by car, bus, planes, or ultimately just stay home. In relation to this and how businesses were affected, airports were one of the many US corporations that had to adapt to losing almost all of its business. This analysis will describe and evaluate how COVID-19 has changed travel by public airlines in the United States.
Question of interest:
In this project I will analyze how exactly COVID-19 has effected one of the most common means of transportation: by airplane. This project will include the TSA checkpoint travel numbers for the year before and the year after COVID-19. The first year will cover months March 2019-February 2020 and the second year will cover March 2020-February 2021. I will first compare the yearly averages and then perform a hypothesis test to perform my analysis. Also, the project will provide additional data and compare the traveler throughput for the slowest and busiest days in both time frames.
Plan for collecting data:
The data was imported from TSA’s website and manipulated accordingly for the project. The data from TSA displayed the number of TSA checkpoint travel numbers for each day from the past 3 years. For this analysis, I created an additional table with the monthly averages ranging from January 2019 to March 2021 for easier manipulation.
The raw data extracted from TSA’s website is as follows:
Fortunately, the data did not contain any missing values that would affect any calculations. It should be noted that the data for 2021 TSA checkpoints stops at April 5 so there are NA values for the dates afterwards. The table contains 4 columns and 365 rows.
For further investigation, I averaged the TSA checkpoint travel numbers per month for each year, resulting in:
Below is a graph showing a scatter plot of the daily TSA checkpoints from years 2019-2021. The data starts on January 1, 2019 and ends on April 5, 2021. The graph shows the constant number of check-ins in 2019, the sudden drop in check-ins in 2020, and the recovering stage of the pandemic in 2021.
The following minimums and maximums for each year are displayed below with their corresponding dates. The highest value overall occurs in 2019 and the lowest value overall occurs in 2020.
As seen in the graph, the monthly averages of TSA check-ins are well above the 2,000,000 value until COVID-19 happened. The first drop from 2,158,174 value to the 1,104,630 value occurred in March 2020 and then the next sudden drop from the 1,104,630 value to the lowest point recorded on the graph of 106,275 occurred in April 2020. It can also be shown that since June 2020, the travel numbers have slowly been rising, with a minor decrease in January 2021.
The graph below demonstrates the comparison between the monthly averages of TSA check-ins before and after COVID-19.
This analysis will be testing to determine if there is a significant difference in the average of TSA check-ins for the year before COVID-19 and the year after COVID-19. More specifically, to determine if the average number of TSA check-ins decreased due to COVID-19. My hypotheses are as follows with respect to a one sample t-test:
H0: µD = 0
H1: µD > 0
The year before COVID-19 covers the following months: March 2019-February 2020. The following calculations are with respect to these months:
| Date | Average |
|---|---|
| 2019-03-01 | 2339686 |
| 2019-04-01 | 2345235 |
| 2019-05-01 | 2403202 |
| 2019-06-01 | 2553997 |
| 2019-07-01 | 2564902 |
| 2019-08-01 | 2412129 |
| 2019-09-01 | 2217709 |
| 2019-10-01 | 2325693 |
| 2019-11-01 | 2292922 |
| 2019-12-01 | 2265141 |
| 2020-01-01 | 1997751 |
| 2020-02-01 | 2158174 |
\(\bar{x}\) = 2323045
The year after COVID-19 covers the following months: March 2020-February 2021. The following calculations are with respect to these months:
| Date | Average |
|---|---|
| 2020-03-01 | 1064354.9 |
| 2020-04-01 | 108643.2 |
| 2020-05-01 | 231155.8 |
| 2020-06-01 | 482726.7 |
| 2020-07-01 | 669057.5 |
| 2020-08-01 | 700260.4 |
| 2020-09-01 | 716275.4 |
| 2020-10-01 | 826983.7 |
| 2020-11-01 | 850432.9 |
| 2020-12-01 | 851347.3 |
| 2021-01-01 | 761233.2 |
| 2021-02-01 | 873083.8 |
\(\bar{x}\) = 677962.9
| before | after | difference |
|---|---|---|
| 2339686 | 1064354.9 | 1275331 |
| 2345235 | 108643.2 | 2236592 |
| 2403202 | 231155.8 | 2172046 |
| 2553997 | 482726.7 | 2071270 |
| 2564902 | 669057.5 | 1895845 |
| 2412129 | 700260.4 | 1711869 |
| 2217709 | 716275.4 | 1501433 |
| 2325693 | 826983.7 | 1498710 |
| 2292922 | 850432.9 | 1442489 |
| 2265141 | 851347.3 | 1413793 |
| 1997751 | 761233.2 | 1236518 |
| 2158174 | 873083.8 | 1285090 |
The difference in the TSA check-ins (before-after) from the year before and year after COVID-19 has a mean of 1645082 and a standard deviation of 362615.1. The graph of the sample of differences looks right skewed. The potential outliers are 919851.9 and 2370312. Using the criteria that potential outliers are outside 2 standard deviations from the mean, there are no potential outliers from this sample.
This sample is representative of the population of interest.
The qqplot of the differences in TSA check-ins (Before-After) seems to have a relatively linear pattern therefore it is reasonable to assume that the sampling distribution of D-bar is approximately normal distributed.
##
## One Sample t-test
##
## data: difference
## t = 15.716, df = 11, p-value = 3.485e-09
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
## 1457093 Inf
## sample estimates:
## mean of x
## 1645082
At any reasonable significance level, the p-value (4.01e-09) is less than any reasonable alpha level. Thus, we should reject the null hypothesis. There is sufficient evidence to suggest that COVID-19 did in fact effect US flight travel.
After performing a single sample t-test on the data concerning the daily TSA check-ins ranging from the dates January 1, 2019 to April 5, 2021, it is safe to conclude that COVID-19 did in fact have an effect on airplane travel in the US. From the data shown in the tables and then graphing the results, we can conclude that COVID-19 resulted in a negative effect on the economy. However, by looking at the yearly graph, we can see that the economy is slowly making its way back to normal. The 2021 scatter plot shows the TSA check-ins slowly rising, signaling a return to normalcy for the US.
My original plan for my project was to compare the average travel spending rates before and after COVID-19. However, after having difficulty trying to find relevant data for about a week I decided to shift my focus on something more specific: airline travel. The TSA’s website was fairly easy to understand and their data was easily accessible for this analysis.
Based on my conclusion, it is clear that COVID-19, a worldwide pandemic, had a major effect on travel in the US, more specifically airline travel. For future reference, this conclusion might help airports and/or other travel methods to be more prepared when something like COVID-19 may happen again. Considering the analysis this project provided, it would be interesting to continue this analysis with other means of travel and compare how COVID-19 has also affected them. Such as: rental car transactions, tourism and city spending, hotel reservations, AirBnB reservations, gas purchases, RV site rentals, cruise ship reservations, etc.
Here is my RPUbs Link.
Name: Taylor Edwards
E-mail: tme@clemson.edu
Semester: Spring 2021
Class: STAT 4020: Introduction to Statistical Computing
About Me: I am a junior Mathematical Sciences major at Clemson this spring. My emphasis is Actuarial Science and Financial Management with a minor in Accounting. I am from Westminster SC, a very small town on the other side of Seneca. I enjoy hiking, traveling, and watching crime shows on Netflix. I thought this topic was relevant considering everything we have been through in the past year and how it has affected everyone’s means of travel.
Below are the packages required for the project:
library(readxl)library(DT)library(ggplot2)library(summarytools)Below are the code chunks that produced the graphs for this project:
# Packages
library(DT)
library(ggplot2)
library(readxl)
# Import Data
travel.numbers <- read_xlsx("TSA Checkpoint travel numbers.xlsx")
# Create a scatter plot with the daily TSA Checkpoint travel numbers
# Overlap the 3 different years: 2019, 2020, 2021
ggplot(travel.numbers, aes(x=Date, y=Year, color=Year)) + geom_point(aes(y=`2021 TSA Checkpoint`, col="2021")) + geom_point(aes(y=`2020 TSA Checkpoint`, col="2020")) + geom_point(aes(y=`2019 TSA Checkpoint`, col="2019")) + xlab("Date") +ylab("TSA Checkins") +ggtitle("Daily TSA Check-Ins")
# Import data
monthly.averages <- read_xlsx("Monthly Averages.xlsx")
# Scatter plot of monthly averages
plot <- ggplot(monthly.averages, aes(x=Date, y=Average)) + geom_point() + geom_line() +ggtitle("Monthly Averages of TSA Check-Ins")
plot
# Import data
new.monthly <- read_xlsx("New Monthly Avg.xlsx")
# Create 2 overlapping scatter plot to compare before & after monthly averages
ggplot(new.monthly, aes(x=Date, y=value, color=Test)) + geom_point(aes(y=`Before Average`, col="Before")) + geom_point(aes(y=`After Average`, col="After")) + xlab("Month") +ylab("TSA Checkins") +ggtitle("Before & After Monthly Averages of TSA Check-Ins")
Below are the code chunks that helped performed the analysis for this project:
library(DT)
library(readxl)
# Import Data
monthly.averages <- read_xlsx("Monthly Averages.xlsx")
# Get the averages of the year before COVID-19
first <- monthly.averages[3:14, 1:2]
# Create table of first year using kable
knitr::kable(first, caption="Data for Year Before COVID-19")
# Found the average of the first year averages
firstAvg <- mean(monthly.averages$Average[3:14])
# Created a function that takes a calculated number and removes the scientific notation
scientific <- function(number){
new.number <- format(number, scientific=F)
return(new.number)
}
# Print value after going through function
x <- scientific(firstAvg)
library(DT)
library(readxl)
# Import data
monthly.averages <- read_xlsx("Monthly Averages.xlsx")
# Second half of data
first <- monthly.averages[15:26, 1:2]
# Create kable from data
knitr::kable(first, caption="Data for Year After COVID-19")
# Calculate average of year after COVID-19
secondAvg <- mean(monthly.averages$Average[15:26])
scientific <- function(number){
new.number <- format(number, scientific=F)
return(new.number)
}
# calculate using function
y <- scientific(secondAvg)
library(DT)
library(readxl)
library(summarytools)
# Import data
monthly.averages <- read_xlsx("Monthly Averages.xlsx")
# Function code
scientific <- function(number){
new.number <- format(number, scientific=F)
return(new.number)
}
# Calculate the average of the first year averages
firstAvg <- mean(monthly.averages$Average[3:14])
x <- scientific(firstAvg)
# Calculate the average of the second year averages
secondAvg <- mean(monthly.averages$Average[15:26])
y <- scientific(secondAvg)
before <- monthly.averages$Average[3:14]
after <- monthly.averages$Average[15:26]
# Calculate difference between the average before and after COVID-19
difference <- before-after
# Create table with before and after data and the corresponding differences
table <- cbind(before, after, difference)
knitr::kable(table, caption="TSA Checkins Before & After COVID-19 with Respecting Differences")
# function CODE
scientific <- function(number){
new.number <- format(number, scientific=F)
return(new.number)
}
# Center, shape, variability, and outlier are necessary variables for performing the analysis on this project
center <- mean(difference)
center1 <- scientific(center)
shape <- hist(difference)
variability <- sd(difference)
variability1 <- scientific(variability)
outlier1 <- center-2*variability
outlier2 <- center+2*variability
outlier01 <- scientific(outlier1)
outlier02 <- scientific(outlier2)
# Print qqnorm graph so we are able to further our analysis of the project data
qqnorm(difference)
# Print the summary results of the one sample t-test with the alternative hypothesis set to "greater"
result <- t.test(difference, alternative="greater")
print(result)