Trips by US people(from Jan 2019 to Nov 2021)

This visualizations are created with two datasets from Kaggle.com

(https://www.kaggle.com/ramjasmaurya/trips-by-distancefrom-2019-to-nov-2021/version/2?select=National_trips.csv)
The Trips by Distance data and the number of people staying home and not staying home are estimated from the Bureau of Transportation Statistics . The travel statistics are produced from an anonymized national panel of mobile device data from multiple sources.
All data sources used in the creation of the metrics contain no personal information.

Data analysis is conducted at the aggregate national, state, and county levels. A weighting procedure expands the sample of millions of mobile devices, so the results are representative of the entire population in a nation(USA).

When Covid-19 started to impact the United States and the world overall, I still remember my last day at work, March 19th, 2020. I was working at a hotel, and the phone kept ringing because people wanted to cancel their reservations. We had meeting after meeting to plan our future. The next day, I was out of work, just like millions of us.

My family and I stayed home, but wondering if people in the country did the same. Suppose they were traveling where they were going to and how many times.

There are two datasets I used for this project:

1.National_trips 2.State_trips.

Variables in both datasets are very similar which including
Level:National or states
Date : Date of travel
Population Staying at Home : Number of people staying home
Population Not Staying at Home : Number of people are not staying home
Number of Trips : Total trips during Jan 2019-Nov 2021
Number of Trips <1 : Trips from less than 1 trip
Number of Trips 1-3 : Trips between 1-3
Number of Trips 3-5 : Trips between 2-5
Number of Trips 5-10 : Trips between 5-10
Number of Trips 10-25 : Trips between 10-25
Number of Trips 25-50 : Trips between 25-50
Number of Trips 50-100 : Trips between 50-100
Number of Trips 100-250 : Trips between 100-250
Number of Trips 250-500 : Trips between 250-500
Number of Trips >=500 : Trips 500 or more
Week : Number of week in a year
Month : Number of month in a year

Variables in my datasets were characteristics that I had to convert into date and numerical in order to create time-series charts. I did not use all variables. I will explain in sections which variables were used in each chart.

Question I want to explore

Were people stay home or travel during Covid-19?
If they were traveling, approximately how many trips?
Divide by states, which ones have the highest travel?

Let’s get started!

Loading libraries for visualization

# Libraries

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

#library(TSstudio)
library(ggplot2)
library(scales)
library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v tibble  3.1.4     v purrr   0.3.4
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x readr::col_factor() masks scales::col_factor()
## x purrr::discard()    masks scales::discard()
## x dplyr::filter()     masks stats::filter()
## x dplyr::lag()        masks stats::lag()

library(ggrepel)
library(ggtext)

## Warning: package 'ggtext' was built under R version 4.1.2

library(hrbrthemes)

## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.

##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and

##       if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow

library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

#library(streamgraph)
library(viridis)

## Loading required package: viridisLite

## 
## Attaching package: 'viridis'

## The following object is masked from 'package:scales':
## 
##     viridis_pal

Load data

setwd("C:/Users/gru_e/OneDrive/Desktop/DATA110/FinalProject")
national <- read.csv("National_trips.csv")
#Change column names to lowercase letters from capital
names(national)<-tolower(names(national)) 
#Convert date format
national$date <- as.Date(national$date)
#national


#This will show variables and type of data
str(national)

## 'data.frame':    1055 obs. of  18 variables:
##  $ x                             : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ level                         : chr  "National" "National" "National" "National" ...
##  $ date                          : Date, format: "2019-01-01" "2019-01-02" ...
##  $ population.staying.at.home    : chr  "77,433,867" "61,305,201" "63,050,480" "61,803,652" ...
##  $ population.not.staying.at.home: chr  "248,733,553" "264,862,219" "263,116,940" "264,363,768" ...
##  $ number.of.trips               : chr  "897,784,368" "1,139,452,281" "1,162,752,684" "1,181,953,829" ...
##  $ number.of.trips..1            : chr  "241,667,151" "291,276,735" "296,375,014" "293,159,631" ...
##  $ number.of.trips.1.3           : chr  "234,284,795" "285,887,315" "290,074,425" "295,643,296" ...
##  $ number.of.trips.3.5           : chr  "108,078,903" "138,039,296" "140,771,581" "145,251,819" ...
##  $ number.of.trips.5.10          : chr  "129,670,778" "171,637,514" "175,775,410" "181,324,645" ...
##  $ number.of.trips.10.25         : chr  "116,904,343" "167,412,698" "172,027,487" "176,144,493" ...
##  $ number.of.trips.25.50         : chr  "40,432,062" "56,148,976" "57,632,422" "58,761,592" ...
##  $ number.of.trips.50.100        : chr  "15,686,639" "17,739,183" "18,366,626" "19,315,785" ...
##  $ number.of.trips.100.250       : chr  "7,525,563" "7,817,044" "8,124,548" "8,687,318" ...
##  $ number.of.trips.250.500       : chr  "1,806,022" "1,962,301" "2,038,099" "2,096,065" ...
##  $ number.of.trips...500         : chr  "1,728,112" "1,531,219" "1,567,072" "1,569,185" ...
##  $ week                          : int  0 0 0 0 0 1 1 1 1 1 ...
##  $ month                         : int  1 1 1 1 1 1 1 1 1 1 ...

Cleaning data, select, to lower, rename, and converting wide to long data

national1 <- national %>%
  select(date, population.staying.at.home, population.not.staying.at.home, number.of.trips) %>%
  rename(stayhome = population.staying.at.home,
         not_stayhome = population.not.staying.at.home,
         total_trips = number.of.trips)
#national1

#Convert char to dbl
national1$stayhome <- gsub(",","", national1$stayhome)
national1$stayhome <- as.double(national1$stayhome)

national1$not_stayhome <- gsub(",","", national1$not_stayhome)
national1$not_stayhome <- as.double(national1$not_stayhome)


#Convert data wide to long
national_long <- gather(national1, condition, measurement, stayhome:not_stayhome, factor_key=TRUE)

#Calculate Population in Million
nation2 <- national_long %>%
 mutate(
    tripmillion = measurement/1000000)
    #not_stayhome_million = not_stayhome/10000
#nation2

#Rename variables
data_new <- nation2
levels(data_new$condition) <- list("Staying home" = "stayhome",
                                   "Not staying home" = "not_stayhome")

Create Trips by US people from Jan 2019 to Nov 2021

We will illustrate the comparison of folks who stay home and not stay home.

This chart uses three variables:
Date(date): to create time series on x axis
tripmillion(numeric or double): I converted 2 columns into one for number of trips and divided by million for easier looking
condition(char): to indicate whether people stay home or not

#Create chart

p<- ggplot(data_new, aes(x=date, y=tripmillion, group=condition, color=condition)) + labs(colour = "")+
  xlab("Date") +
   ylab("Trips (in million)")+
   labs(colour = "Date")+
  geom_line() + 
    scale_color_viridis(discrete = TRUE) +
    theme(legend.position="none",
      plot.title = element_text(size=14)) +
    ggtitle("Trips by US people from Jan 2019 to Nov 2021") +
    theme_ipsum()
ggplotly(p)

Since we know that U.S. populations who did not stay home seem way more than people who choose to stay home(before, during, and after Covid-19). I want to see comparison between the number of trips from less than 1 trip to more than 500 trips.

#Selecting number of trips to start
national3 <- national %>%
  select(date, number.of.trips, 
         number.of.trips..1,
number.of.trips.1.3, 
number.of.trips.3.5,
number.of.trips.5.10,
number.of.trips.10.25,
number.of.trips.25.50,
number.of.trips.50.100,
number.of.trips.100.250,
number.of.trips.250.500,
number.of.trips...500)

#Convert wide to long for spagethi plot
trip_long <- gather(national3, trips, measure, number.of.trips..1:number.of.trips...500, factor_key=TRUE)


trip_long$measure <- gsub(",","", trip_long$measure)
trip_long$measure <- as.double(trip_long$measure)
#trip_long

#Make the data in millions for easy interpretation
trip_long <- trip_long %>%
 mutate(
    measurebymillion = measure/1000000)

Cleaning some more data for our 2nd chart

#trip_long %>%
data_new2 <- trip_long
levels(data_new2$trips) <- list("Less than 1" = "number.of.trips..1",
                                   "1 to 3" = "number.of.trips.1.3",
                                   "3 to 5" = "number.of.trips.3.5",
                                   "5 to 10" = "number.of.trips.5.10",
                                   "10 to 25" = "number.of.trips.10.25",
                                   "25 to 50" = "number.of.trips.25.50",
                                   "50 to 100" = "number.of.trips.50.100",
                                   "100 to 250" = "number.of.trips.100.250",
                                   "250 to 500" = "number.of.trips.250.500",
                                   "500 or more" = "number.of.trips...500")

#Plot
 g1 <- ggplot(data_new2, aes(x=date, y=measurebymillion, group=trips, color=trips)) +  
  xlab("Date") + 
   ylab("Trips (in million)")+
   labs(colour = "Number of trips")+
    geom_line()   +
    scale_color_viridis(discrete = TRUE) +
    theme(legend.position="none",
      plot.title = element_text(size=10)
    ) +
    ggtitle("U.S. Traveling by number of trips during January 2019 - November 2021") +
    theme_ipsum()+
    theme(panel.spacing = unit(0.1, "lines"),
      plot.title = element_text(size=10)) 
 ggplotly(g1)

There was a drop in in 2020. The graph is clearly indicate that U.S. people keep traveling.

Separate charts by number of trips for a better comparison

#trip_long %>%
pfc <-  ggplot(data_new2, aes(x=date, y=measurebymillion, group=trips, fill=trips)) +
    geom_area() +
  xlab("Date") + 
   ylab("Trips (in million)")+
   labs(colour = "Number of trips")+
    scale_fill_viridis(discrete = TRUE) +
    theme(legend.position="none") +
    ggtitle("U.S. traveling from January 2019 - November 2021") +
    theme_ipsum() +
    theme(
      legend.position="none",
      panel.spacing = unit(0.1, "lines"),
      strip.text.x = element_text(size = 6),
      plot.title = element_text(size=9),
      axis.text.x = element_text(size=5),
      axis.text.y = element_text(size=5)
    ) +
    facet_wrap(~trips)
pfc

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

Surprisingly the number of trips between 5-10 and 10-25 more than trips 3-5, which means people make a lot of trips during this pandemic.

Next, This visualization shows U.S. traveling by States

“Every state has different mandates, different guidelines. Those guidelines are changing pretty regularly and within states often times cities have stricter guidelines than the state itself,” said Bill Walsh, the Vice President of content strategy and communications with AARP.
https://komonews.com/news/coronavirus/covid-restrictions-differ-state-to-state-prompting-questions-ahead-of-holiday-travel

“two states have travel restrictions that governors or state agencies issued in response to the coronavirus (COVID-19) pandemic. Overall, beginning in March 2020, 27 states and Washington D.C. enacted travel restrictions during the pandemic.”
https://ballotpedia.org/Travel_restrictions_issued_by_states_in_response_to_the_coronavirus_(COVID-19)_pandemic,_2020-2021

Many states have ended their restrictions in the middle of 2021. However, there are some states that remain enacted.

Timeline

Below are interstate travel restrictions beginning and end dates, 2020-2021
Alaska (March 11 - February 14, 2021)
Arizona (April 7 - May 12)
Arkansas (May 14 - June 15)
Connecticut (June 25 - March 19, 2021)
Delaware (March 29 - June 1)
Florida (March 24 - August 6)
Hawaii (March 17 - )
Idaho (April 15 - May 15)
Kansas (March 18 - )
Kentucky (March 30 - May 6)
Maine (April 3, 2020 - May 1, 2021)
Maryland (December 17 - March 12, 2021)
Massachusetts (August 1, 2020 - March 22, 2021)
Montana (March 30 - June 1)
New Jersey (June 25 - May 17, 2021)
New Mexico (March 27, 2020 - February 11, 2021)
New York (June 25, 2020 - April 1, 2021)
North Dakota (March 28 - May 8)
Oklahoma (March 29 - May 11)
Pennsylvania (November 20, 2020-March 1, 2021)
Rhode Island (March 28 - July 6, 2021)
South Carolina (March 27 - May 1)
Texas (March 26 - May 21)
Utah (April 8 - June 30)
Vermont (March 30 - May 14, 2021)
Washington D.C. (June 27, 2020 - August 6, 2021)
West Virginia (March 31 - May 21)
Wyoming (April 3 - May 7)

Boxplot is a standardized way of displaying the distribution of data based on a five number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”).
https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

By using this type of plot, we will be able to analyze each particular state using those five numbers, and compare one to another.

#Load Datasets for states
setwd("C:/Users/gru_e/OneDrive/Desktop/DATA110/FinalProject")

State_trips <- read.csv("State_trips.csv")
names(State_trips)<-tolower(names(State_trips)) 
State_trips$date <- as.Date(State_trips$date)

State_trips$number.of.trips <- gsub(",","", State_trips$number.of.trips)
State_trips$number.of.trips <- as.numeric(State_trips$number.of.trips)
#str(State_trips)

 #head(State_trips)
 
#Make trips in millions
State_trips1 <- State_trips %>%
 mutate(
    tripsinmillion = number.of.trips/1000000)
# A boxplot.
stg <- ggplot(State_trips1, aes(x=as.factor(state.postal.code), y=tripsinmillion, fill=state.postal.code)) + 
    geom_boxplot(alpha=0.8) + 
   ylab("Trips (in million)")+
  scale_fill_viridis(discrete = TRUE) +
    theme(legend.position="none") + 
  
    ggtitle("Travels by State from Jan 2019 to Nov 2021") +
    xlab("State Postal Code") +
  theme(axis.text.x = element_text(angle = -45, hjust=1, size=5)) 

 
 ggplotly(stg)

California seems to be the outstanding state along with New York City and Texas. The minimum travels of California(55.26 million) seem to be more than most maximum travels of most states(about 40 states).

We will pick multiple states that we are interested in comparing, including Washington D.C and Maryland.

State_trips2 <- State_trips1 %>%
 filter(state.postal.code == "DC" | state.postal.code == "CA" | state.postal.code == "MD" | state.postal.code == "MA" | state.postal.code == "FL" | state.postal.code == "NY" | state.postal.code == "WA"| state.postal.code == "TX") 
  #arrange(mean(tripsinmillion))



#State_trips2
# A boxplot.
stg2 <- ggplot(State_trips2, aes(x=as.factor(state.postal.code), y=tripsinmillion, fill=state.postal.code)) + 
    geom_boxplot(alpha=0.8) + 
   ylab("Trips (in million)")+
  scale_fill_viridis(discrete = TRUE) +
  geom_jitter(color="black", size=0.1, alpha=0.2) +
    theme(legend.position="none") + 
  
    ggtitle("Travels by Particular States from Jan 2019 to Nov 2021") +
    xlab("State Postal Code") +
  theme(axis.text.x = element_text(angle = -45, hjust=1, size=5)) 

 
 ggplotly(stg2)

In conclusion, people were traveling even though at one point most states had sheltered in place or stay home order. Surprisingly, many people travel over 5-10 trips during this time. At the end of my visualization, California was the destination people traveled to the most, which kind of surprised me when I thought people would have traveled to Florida or New York city more. The number of travels of states such as California indicates the fluctuation throughout 2019 to almost 2022. When some other states, such as Maine, Maryland, and Washington state seem to be very stable. I wonder what makes this state such a popular state to travel to?

Lastly, I wish I had more time, I would love to get this information mapping and visualize it with GIS.