At this project we will use Dataset of Trips by US people (from 2019 to Nov 2021).
Datasets consist of 2 parts:
Notes:
Analyze the changes in people’s travel behavior after the pandemic. Please note that The first cases in North America were reported in the United States in January 2020.
library(dplyr)
library(lubridate)
library(ggplot2)
library(scales)Import data csv. Make sure our data placed in the same folder with our R project data.
country <- read.csv('data_input/National_trips.csv')
state <- read.csv('data_input/State_trips.csv')countrystateThe analysis will only be limited to the total population and total trips, so the other columns will be taken out. In addition, it is necessary to create a new column in the form of year data to be able to compare the total trips each year.
Data Inspection:
str(country)#> 'data.frame': 1055 obs. of 18 variables:
#> $ X : int 0 1 2 3 4 5 6 7 8 9 ...
#> $ Level : chr "National" "National" "National" "National" ...
#> $ Date : chr "2019/01/01" "2019/01/02" "2019/01/03" "2019/01/04" ...
#> $ Population.Staying.at.Home : chr "77,433,867" "61,305,201" "63,050,480" "61,803,652" ...
#> $ Population.Not.Staying.at.Home: chr "248,733,553" "264,862,219" "263,116,940" "264,363,768" ...
#> $ Number.of.Trips : chr "897,784,368" "1,139,452,281" "1,162,752,684" "1,181,953,829" ...
#> $ Number.of.Trips..1 : chr "241,667,151" "291,276,735" "296,375,014" "293,159,631" ...
#> $ Number.of.Trips.1.3 : chr "234,284,795" "285,887,315" "290,074,425" "295,643,296" ...
#> $ Number.of.Trips.3.5 : chr "108,078,903" "138,039,296" "140,771,581" "145,251,819" ...
#> $ Number.of.Trips.5.10 : chr "129,670,778" "171,637,514" "175,775,410" "181,324,645" ...
#> $ Number.of.Trips.10.25 : chr "116,904,343" "167,412,698" "172,027,487" "176,144,493" ...
#> $ Number.of.Trips.25.50 : chr "40,432,062" "56,148,976" "57,632,422" "58,761,592" ...
#> $ Number.of.Trips.50.100 : chr "15,686,639" "17,739,183" "18,366,626" "19,315,785" ...
#> $ Number.of.Trips.100.250 : chr "7,525,563" "7,817,044" "8,124,548" "8,687,318" ...
#> $ Number.of.Trips.250.500 : chr "1,806,022" "1,962,301" "2,038,099" "2,096,065" ...
#> $ Number.of.Trips...500 : chr "1,728,112" "1,531,219" "1,567,072" "1,569,185" ...
#> $ Week : int 0 0 0 0 0 1 1 1 1 1 ...
#> $ Month : int 1 1 1 1 1 1 1 1 1 1 ...
Data Cleansing & Coercion:
country <- country %>%
select(c(Date, Population.Staying.at.Home, Population.Not.Staying.at.Home, Number.of.Trips)) %>%
mutate(Date = as.Date(Date, "%Y/%m/%d")) %>%
mutate_at(c('Population.Staying.at.Home', 'Population.Not.Staying.at.Home', 'Number.of.Trips'),
~as.numeric(gsub(",","", .))) %>%
mutate(Year = year(Date))Missing value check:
anyNA(country)#> [1] FALSE
Final Data:
head(country)Data Inspection:
str(state)#> 'data.frame': 53805 obs. of 20 variables:
#> $ X : int 0 1 2 3 4 5 6 7 8 9 ...
#> $ Level : chr "State" "State" "State" "State" ...
#> $ Date : chr "2019/01/01" "2019/01/01" "2019/01/01" "2019/01/01" ...
#> $ State.Postal.Code : chr "NV" "FL" "DC" "DE" ...
#> $ Population.Staying.at.Home : chr "753,054" "4,524,666" "241,030" "224,457" ...
#> $ Population.Not.Staying.at.Home: chr "2,272,063" "16,709,556" "459,278" "739,757" ...
#> $ Number.of.Trips : chr "9,139,676" "55,258,300" "3,114,055" "2,420,942" ...
#> $ Number.of.Trips..1 : chr "2,999,886" "16,036,792" "1,314,825" "584,323" ...
#> $ Number.of.Trips.1.3 : chr "2,181,216" "14,469,767" "833,796" "640,348" ...
#> $ Number.of.Trips.3.5 : chr "1,039,500" "6,658,624" "367,547" "305,441" ...
#> $ Number.of.Trips.5.10 : chr "1,378,024" "7,977,612" "344,116" "370,359" ...
#> $ Number.of.Trips.10.25 : chr "1,108,436" "6,604,304" "183,744" "314,102" ...
#> $ Number.of.Trips.25.50 : chr "222,837" "2,113,009" "42,096" "128,454" ...
#> $ Number.of.Trips.50.100 : chr "93,301" "796,740" "10,730" "50,542" ...
#> $ Number.of.Trips.100.250 : chr "73,578" "412,224" "8,109" "22,701" ...
#> $ Number.of.Trips.250.500 : chr "24,471" "95,341" "3,476" "2,474" ...
#> $ Number.of.Trips...500 : chr "18,427" "93,887" "5,616" "2,198" ...
#> $ Row.ID : chr "32-00000-20190101" "12-00000-20190101" "11-00000-20190101" "10-00000-20190101" ...
#> $ Week : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Month : int 1 1 1 1 1 1 1 1 1 1 ...
Data Cleansing & Coercion:
state <- state %>%
select(c(Date, State.Postal.Code, Population.Staying.at.Home, Population.Not.Staying.at.Home,
Number.of.Trips)) %>%
mutate(Date = as.Date(Date, "%Y/%m/%d")) %>%
mutate_at(c('Population.Staying.at.Home', 'Population.Not.Staying.at.Home', 'Number.of.Trips'),
~as.numeric(gsub(",","", .))) %>%
mutate(Year = year(Date))Missing value check:
anyNA(state)#> [1] FALSE
Final Data:
head(state)Population Staying at Home
country %>%
ggplot(aes(x= Date, y = Population.Staying.at.Home)) +
geom_line(color="#69b3a2")+
labs(title = "Trend of Number of Population Staying at Home",
x = NULL,
y = NULL)+
scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6))+
geom_rect(
aes(xmin = as.Date('2020-03-01'), xmax = as.Date('2020-04-01'),
ymin = min(Population.Staying.at.Home), ymax = max(Population.Staying.at.Home)
), fill = 'orange', color = NA, alpha = 0.009)+
theme_minimal()+
theme(
plot.title = element_text(size = 15, face = "bold"),
)Based on the visualization above, there is a drastic increase in the number of population staying at home around March - April 2020. This is related to the COVID-19 case in the US dan travel restrictions.
“The coronavirus hit the United States in mid-March 2020, and cases started to soar at an alarming rate. The country has performed a high number of COVID-19 tests, which is a necessary step to manage the outbreak, but new coronavirus cases in the U.S. spiked again over the Christmas and New Year holiday season. Authorities must keep a vigilant eye on the virus, and people should continue to follow important public health measures, such as keeping hands clean and avoiding close contact.”
source: https://www.statista.com/statistics/1103185/cumulative-coronavirus-covid19-cases-number-us-by-day/
The number then continues to increase until the end of 2020. In 2021, the population staying at home is decreasing, this may be because Vaccinations in the United States began to be distributed on December 14, 2020 (source: HHS.gov), so people started traveling. To find out travel activity in the US, a visualization of the number of trips is performed:
country %>%
ggplot(aes(x= Date, y = Number.of.Trips)) +
geom_line(color="#69b3a2")+
labs(title = "Trend of Number of Trips",
x = NULL,
y = NULL)+
scale_y_continuous(labels = unit_format(unit = "B", scale = 1e-9))+
geom_rect(
aes(xmin = as.Date('2020-03-01'), xmax = as.Date('2020-04-01'),
ymin = min(Number.of.Trips), ymax = max(Number.of.Trips)
), fill = 'orange', color = NA, alpha = 0.009)+
theme_minimal()+
theme(
plot.title = element_text(size = 15, face = "bold"),
) It can be seen that when there is an increase in the population staying at home, there is also a drastic decrease in the number of trips. However, despite the increase in the number of people staying at home, the trend in the number of trips appears to be stable.
To find out the details of the trend in the number of trips in 2020, further visualization is carried out:
country %>%
filter(Year %in% '2020') %>%
ggplot(aes(x= Date, y = Number.of.Trips)) +
geom_line(color="#69b3a2")+
labs(title = "Trend of Number of Trips in 2020",
x = NULL,
y = NULL)+
scale_y_continuous(labels = unit_format(unit = "B", scale = 1e-9))+
geom_rect(
aes(xmin = as.Date('2020-03-01'), xmax = as.Date('2020-04-01'),
ymin = min(Number.of.Trips), ymax = max(Number.of.Trips)
), fill = 'orange', color = NA, alpha = 0.009)+
scale_x_date(labels = date_format('%b'), date_breaks ="1 month")+
theme_minimal()+
theme(
plot.title = element_text(size = 15, face = "bold"),
) Apart from March-April, there was a significant decline at the end of the year of 2020. When did this significant decline occur?
country %>%
filter(Year %in% '2020') %>%
filter(Number.of.Trips == min(Number.of.Trips))A significant decline occurred on December 25, 2020, coinciding with Christmas Day. This could be due to strict travel restrictions in December and instructions to celebrate Christmas at home.
country %>%
ggplot(aes(x=Year, y= Number.of.Trips, fill =factor(Year)))+
geom_boxplot()+
scale_y_continuous(labels = unit_format(unit = 'B', scale = 1e-9))+
scale_x_continuous(breaks = c(2019, 2020, 2021))+
labs(title = 'Comparison of Number of Trips in US',
fill = 'Year',
x=NULL,
y=NULL)+
theme(
plot.title = element_text(size = 15, face = "bold"),
plot.subtitle = element_text(size = 10)
) Although it decreased in 2020, the number of trips began to increase again in 2021 with a median that was almost the same as in 2019.
To get a more detailed change in travel behavior, visualization is carried out to find the state with the highest number of trips before the pandemic. After that, an analysis of travel trends was carried out from 2019 to 2021 for the 3 states with the highest number of trips in 2019.
state %>%
filter(Year %in% '2019') %>%
group_by(State.Postal.Code) %>%
summarise(Total_Trips = sum(Number.of.Trips)) %>%
arrange(desc(Total_Trips)) %>%
slice(1:10) %>%
ggplot(aes(x = Total_Trips, y = reorder(State.Postal.Code, Total_Trips))) +
geom_col(aes(fill = Total_Trips), show.legend = F)+
geom_label(aes(label = paste(round(Total_Trips / 1e9, 1), "B"), hjust = 1.05))+
scale_fill_gradient(low = "#69b3a2", high = "orange")+
labs(title = 'Top 10 State',
subtitle = 'by Number of Trips (2019)',
x = NULL,
y = NULL)+
scale_x_continuous(labels = comma)+
theme(
plot.title = element_text(color = "#69b3a2", size = 15, face = "bold"),
plot.subtitle = element_text(size = 10)
)state %>%
filter(State.Postal.Code %in% c('CA', 'TX', 'NY')) %>%
ggplot(aes(x= Date, y = Number.of.Trips, group = State.Postal.Code,
colour=State.Postal.Code, shape=State.Postal.Code))+
geom_line(size=0.01)+
geom_rect(
aes(xmin = as.Date('2020-03-01'), xmax = as.Date('2020-04-01'),
ymin = min(Number.of.Trips), ymax = max(Number.of.Trips)
), fill = 'orange', color = NA, alpha = 0.009)+
scale_y_continuous(labels = unit_format(unit = 'M', scale = 1e-6))+
scale_color_manual(values=c('orange', "gray", "#69b3a2"))+
labs(title = 'Trend of Top 3 State',
subtitle = 'by Highest Number of Trips',
color = 'State Postal Code',
x=NULL,
y=NULL)+
theme(
plot.title = element_text(color = "#69b3a2", size = 15, face = "bold"),
plot.subtitle = element_text(size = 10)
) Based on the visualization above, travel restrictions due to the Covid-19 case in mid-March greatly impacted the number of trips. California (CA) had the highest trend in the number of trips in 2019 compared to New York (NY) and Texas (TX), but dropped drastically to almost the same number as the other states.
state %>%
filter(Year %in% c('2019', '2021')) %>%
filter(State.Postal.Code %in% c('CA', 'TX', 'NY')) %>%
ggplot(aes(x=State.Postal.Code, y= Number.of.Trips, fill =factor(Year)))+
geom_boxplot()+
scale_y_continuous(labels = unit_format(unit = 'M', scale = 1e-6))+
labs(title = 'Comparison of Number of Trips',
subtitle = '2019 vs 2021',
fill = 'Year',
x=NULL,
y=NULL)+
theme(
plot.title = element_text(color = "#69b3a2", size = 15, face = "bold"),
plot.subtitle = element_text(size = 10)
) Based on the visualization above, it can be seen that the number of trips for New York and Texas in 2021 returns to almost the same number as in 2019. In contrast to California, although the number of trips has increased, the median in 2021 is still much different from 2019.
1. How is travel behavior changing in the US?
Compared to 2019, travel behavior underwent a significant change in mid-March 2020 due to Covid-19. The issuance of a travel restriction policy resulted in a decrease in the number of trips in all US states.
After the vaccine was distributed on December 14, 2020, the number of trips across the state increased again in 2021, some even reaching almost the same number as in 2019. This number is likely to increase further when the pandemic is over due to the high demand to travel after staying at home for so long.
“Fifty-seven percent of people are hoping to travel more when the pandemic is over and the same figure expects travel bookings to be flexible in terms of convenient, penalty-free change and cancellations.”
source: https://www.travelpulse.com/news/features/google-reveals-scale-of-pent-up-travel-demand-post-pandemic-trends.html
2. Recommendation for Further Analysis