Since the Covid-19 pandemic began in March 2020, the mass transit system in New York has struggled to maintain the same levels of ridership from pre-pandemic levels. Recently, the MTA has proposed cutting subway service on Fridays and Mondays, arguing that there are less commuters on those days than the other three days during the workweek. For purposes of this analysis, I will focus on the daily amount of riders on the subway, as well as the year-to-year trajectory of subway ridership. This dataset was downloaded from the New York State Data website, and starts from March 1, 2020 until the last date data was collected, February 23, 2023.
Using the MTA dataset, I focused on the the number of estimates commuters based on the day of the week.
df <- read.csv("https://raw.githubusercontent.com/moham6839/Data_607_Proj2_MTA_Ridership/main/MTA_Daily_Ridership_Data__Beginning_2020.csv", check.names=FALSE)
DT::datatable(df)
glimpse(df)
## Rows: 1,090
## Columns: 13
## $ Date <chr> "02/2…
## $ `Subways: Total Estimated Ridership` <int> 34999…
## $ `Subways: % of Comparable Pre-Pandemic Day` <dbl> 0.64,…
## $ `Buses: Total Estimated Ridership` <int> 89869…
## $ `Buses: % of Comparable Pre-Pandemic Day` <dbl> 0.42,…
## $ `LIRR: Total Estimated Ridership` <int> 18843…
## $ `LIRR: % of 2019 Monthly Weekday/Saturday/Sunday Average` <dbl> 0.62,…
## $ `Metro-North: Total Estimated Ridership` <int> 16719…
## $ `Metro-North: % of 2019 Monthly Weekday/Saturday/Sunday Average` <dbl> 0.62,…
## $ `Access-A-Ride: Total Scheduled Trips` <int> 27336…
## $ `Access-A-Ride: % of Comparable Pre-Pandemic Day` <dbl> 0.93,…
## $ `Bridges and Tunnels: Total Traffic` <int> 90632…
## $ `Bridges and Tunnels: % of Comparable Pre-Pandemic Day` <dbl> 1.03,…
I identified 4 NA values in the dataset. The was 1 missing value each in 4 of the columns pertaining to Total Estimated Ridership and % of 2019 Monthly Weekday/Saturday/Sunday Average for the LIRR and MetroNorth.
sum(is.na(df))
## [1] 4
which(is.na(df))
## [1] 6540 7630 8170 9260
apply(is.na(df), 2, which)
## $Date
## integer(0)
##
## $`Subways: Total Estimated Ridership`
## integer(0)
##
## $`Subways: % of Comparable Pre-Pandemic Day`
## integer(0)
##
## $`Buses: Total Estimated Ridership`
## integer(0)
##
## $`Buses: % of Comparable Pre-Pandemic Day`
## integer(0)
##
## $`LIRR: Total Estimated Ridership`
## [1] 1090
##
## $`LIRR: % of 2019 Monthly Weekday/Saturday/Sunday Average`
## [1] 1090
##
## $`Metro-North: Total Estimated Ridership`
## [1] 540
##
## $`Metro-North: % of 2019 Monthly Weekday/Saturday/Sunday Average`
## [1] 540
##
## $`Access-A-Ride: Total Scheduled Trips`
## integer(0)
##
## $`Access-A-Ride: % of Comparable Pre-Pandemic Day`
## integer(0)
##
## $`Bridges and Tunnels: Total Traffic`
## integer(0)
##
## $`Bridges and Tunnels: % of Comparable Pre-Pandemic Day`
## integer(0)
df[540, ]
## Date Subways: Total Estimated Ridership
## 540 09/02/2021 1664178
## Subways: % of Comparable Pre-Pandemic Day Buses: Total Estimated Ridership
## 540 0.29 1188941
## Buses: % of Comparable Pre-Pandemic Day LIRR: Total Estimated Ridership
## 540 0.51 105647
## LIRR: % of 2019 Monthly Weekday/Saturday/Sunday Average
## 540 0.32
## Metro-North: Total Estimated Ridership
## 540 NA
## Metro-North: % of 2019 Monthly Weekday/Saturday/Sunday Average
## 540 NA
## Access-A-Ride: Total Scheduled Trips
## 540 21179
## Access-A-Ride: % of Comparable Pre-Pandemic Day
## 540 0.71
## Bridges and Tunnels: Total Traffic
## 540 851145
## Bridges and Tunnels: % of Comparable Pre-Pandemic Day
## 540 0.89
df[1090, ]
## Date Subways: Total Estimated Ridership
## 1090 03/01/2020 2214601
## Subways: % of Comparable Pre-Pandemic Day Buses: Total Estimated Ridership
## 1090 0.96 1077000
## Buses: % of Comparable Pre-Pandemic Day LIRR: Total Estimated Ridership
## 1090 1.08 NA
## LIRR: % of 2019 Monthly Weekday/Saturday/Sunday Average
## 1090 NA
## Metro-North: Total Estimated Ridership
## 1090 55826
## Metro-North: % of 2019 Monthly Weekday/Saturday/Sunday Average
## 1090 0.59
## Access-A-Ride: Total Scheduled Trips
## 1090 19922
## Access-A-Ride: % of Comparable Pre-Pandemic Day
## 1090 1.13
## Bridges and Tunnels: Total Traffic
## 1090 786961
## Bridges and Tunnels: % of Comparable Pre-Pandemic Day
## 1090 0.98
In order to identify the day of the week corresponding to the date, I converted the Date column into month/date/year (mdy) column.
df$Date <- mdy(df$Date)
new_df <- df
new_df$Day_of_Week <- weekdays(new_df$Date)
DT::datatable(new_df)
new_df <- new_df
new_df$Day_of_Week <- weekdays(new_df$Date)
DT::datatable(new_df)
With the day of the week identified, I relocated the column to the left of the Date column.
new_df2 <- new_df %>%
relocate(Day_of_Week)
DT::datatable(new_df2)
#knitr:: kable(new_df2, "pipe", align=c("l", "c", "c"))
I looked at the overall subway ridership from 2020-2023, as well as the amount of daily riders during this period:
Subway_Overall_Totals <- new_df2 %>%
select(Date, `Subways: Total Estimated Ridership`) %>%
group_by(Date) %>%
summarize(Subway_Ridership_Totals = sum(`Subways: Total Estimated Ridership`))
#arrange(desc(Subway_Ridership_Totals))
DT::datatable(Subway_Overall_Totals)
#knitr::kable(Subway_Overall_Totals, "pipe", align=c("l", "c"))
ggplot(Subway_Overall_Totals, aes(x=Date, y=Subway_Ridership_Totals)) +
geom_bar(stat='identity', position='dodge') +
theme_minimal() +
labs(title="Subway Ridership Overall Totals",
y="Number of Subway Riders (in millions)",
x="Years (2020-2023)")
Subway_Daily_Totals <- new_df2 %>%
select(Day_of_Week, `Subways: Total Estimated Ridership`) %>%
group_by(Day_of_Week) %>%
summarize(Subway_Ridership_Totals = sum(`Subways: Total Estimated Ridership`)) %>%
arrange(desc(Subway_Ridership_Totals))
DT::datatable(Subway_Daily_Totals)
#knitr::kable(Subway_Daily_Totals, "pipe", align=c("l", "c"))
ggplot(Subway_Daily_Totals, aes(x=reorder(Day_of_Week, -Subway_Ridership_Totals), y=Subway_Ridership_Totals)) +
geom_bar(stat='identity', position='dodge', width=0.7) +
theme_minimal() +
labs(title="Subway Ridership Totals",
y="Number of Subway Riders (in 100 millions)",
x="Day of the Week")
I wanted to get a closer look at each year in the dataset. While understanding that two of the years, 2020 and 2023, do not contain a full year’s worth of ridership data, there are trends and patterns that can be identified in those years.
Subway_Yearly_Totals_2020 <- new_df2 %>%
select(Date, Day_of_Week, `Subways: Total Estimated Ridership`) %>%
filter(between(Date, as.Date('2020-03-01'), as.Date('2020-12-31')))%>%
group_by(Date, Day_of_Week) %>%
summarize(Subway_Ridership_Totals = sum(`Subways: Total Estimated Ridership`)) %>%
arrange(desc(Subway_Ridership_Totals))
## `summarise()` has grouped output by 'Date'. You can override using the
## `.groups` argument.
DT::datatable(Subway_Yearly_Totals_2020)
#knitr::kable(Subway_Yearly_Totals_2020, "pipe", align=c("l", "c"))
ggplot(Subway_Yearly_Totals_2020, aes(x=Date, y=Subway_Ridership_Totals)) +
geom_bar(stat='identity', position='dodge') +
theme_minimal() +
labs(title="Subway Ridership Totals",
y="Number of Subway Riders (in millions)",
x="Months(2020)")
Subway_Daily_Totals_2020 <- new_df2 %>%
select(Date, Day_of_Week, `Subways: Total Estimated Ridership`) %>%
filter(between(Date, as.Date('2020-03-01'), as.Date('2020-12-31')))%>%
group_by(Day_of_Week) %>%
summarize(Subway_Ridership_Totals = sum(`Subways: Total Estimated Ridership`)) %>%
arrange(desc(Subway_Ridership_Totals))
DT::datatable(Subway_Daily_Totals_2020)
#knitr::kable(Subway_Daily_Totals_2020, "pipe", align=c("l", "c"))
ggplot(Subway_Daily_Totals_2020, aes(x=reorder(Day_of_Week, -Subway_Ridership_Totals), y=Subway_Ridership_Totals)) +
geom_bar(stat='identity', position='dodge') +
theme_minimal() +
labs(title="Subway Ridership Totals 2020",
y="Number of Subway Riders (in 20 million)",
x="Month")
Subway_Yearly_Totals_2021 <- new_df2 %>%
select(Date, Day_of_Week, `Subways: Total Estimated Ridership`) %>%
filter(between(Date, as.Date('2021-01-01'), as.Date('2021-12-31')))%>%
group_by(Date, Day_of_Week) %>%
summarize(Subway_Ridership_Totals = sum(`Subways: Total Estimated Ridership`))
## `summarise()` has grouped output by 'Date'. You can override using the
## `.groups` argument.
#arrange(desc(Subway_Ridership_Totals))
DT::datatable(Subway_Yearly_Totals_2021)
#knitr::kable(Subway_Yearly_Totals_2021, "pipe", align=c("l", "c"))
ggplot(Subway_Yearly_Totals_2021, aes(x=Date, y=Subway_Ridership_Totals)) +
geom_bar(stat='identity', position='dodge') +
theme_minimal() +
labs(title="Subway Ridership Totals in 2021",
y="Number of Subway Riders (in millions)",
x="Month")
Subway_Daily_Totals_2021 <- new_df2 %>%
select(Date, Day_of_Week, `Subways: Total Estimated Ridership`) %>%
filter(between(Date, as.Date('2021-01-01'), as.Date('2021-12-31')))%>%
group_by(Day_of_Week) %>%
summarize(Subway_Ridership_Totals = sum(`Subways: Total Estimated Ridership`)) %>%
arrange(desc(Subway_Ridership_Totals))
DT::datatable(Subway_Daily_Totals_2021)
#knitr::kable(Subway_Daily_Totals_2021, "pipe", align=c("l", "c"))
ggplot(Subway_Daily_Totals_2021, aes(x=reorder(Day_of_Week, -Subway_Ridership_Totals), y=Subway_Ridership_Totals)) +
geom_bar(stat='identity', position='dodge') +
theme_minimal() +
labs(title="Subway 2021 Ridership Totals",
y="Number of Subway Riders (in 50 million)",
x="Months(2021)")
Subway_Yearly_Totals_2022 <- new_df2 %>%
select(Date, Day_of_Week, `Subways: Total Estimated Ridership`) %>%
filter(between(Date, as.Date('2022-01-01'), as.Date('2022-12-31')))%>%
group_by(Date, Day_of_Week) %>%
summarize(Subway_Ridership_Totals = sum(`Subways: Total Estimated Ridership`))
## `summarise()` has grouped output by 'Date'. You can override using the
## `.groups` argument.
#arrange(desc(Subway_Ridership_Totals))
DT::datatable(Subway_Yearly_Totals_2022)
#knitr::kable(Subway_Yearly_Totals_2021, "pipe", align=c("l", "c"))
ggplot(Subway_Yearly_Totals_2022, aes(x=Date, y=Subway_Ridership_Totals)) +
geom_bar(stat='identity', position='dodge') +
theme_minimal() +
labs(title="Subway 2022 Ridership Totals",
y="Number of Subway Riders (in millions)",
x="Months(2022)")
Subway_Daily_Totals_2022 <- new_df2 %>%
select(Date, Day_of_Week, `Subways: Total Estimated Ridership`) %>%
filter(between(Date, as.Date('2022-01-01'), as.Date('2022-12-31')))%>%
group_by(Day_of_Week) %>%
summarize(Subway_Ridership_Totals = sum(`Subways: Total Estimated Ridership`)) %>%
arrange(desc(Subway_Ridership_Totals))
DT::datatable(Subway_Daily_Totals_2022)
#knitr::kable(Subway_Daily_Totals_2021, "pipe", align=c("l", "c"))
ggplot(Subway_Daily_Totals_2022, aes(x=reorder(Day_of_Week, -Subway_Ridership_Totals), y=Subway_Ridership_Totals)) +
geom_bar(stat='identity', position='dodge') +
theme_minimal() +
labs(title="Subway 2022 Ridership Totals",
y="Number of Subway Riders (in 50 million)",
x="Months(2022)")
Subway_Yearly_Totals_2023 <- new_df2 %>%
select(Date, Day_of_Week, `Subways: Total Estimated Ridership`) %>%
filter(between(Date, as.Date('2023-01-01'), as.Date('2023-02-23')))%>%
group_by(Date, Day_of_Week) %>%
summarize(Subway_Ridership_Totals = sum(`Subways: Total Estimated Ridership`))
## `summarise()` has grouped output by 'Date'. You can override using the
## `.groups` argument.
#arrange(desc(Subway_Ridership_Totals))
DT::datatable(Subway_Yearly_Totals_2023)
#knitr::kable(Subway_Yearly_Totals_2023, "pipe", align=c("l", "c"))
ggplot(Subway_Yearly_Totals_2023, aes(x=Date, y=Subway_Ridership_Totals)) +
geom_bar(stat='identity', position='dodge') +
theme_minimal() +
labs(title="Subway 2023 Ridership Totals",
y="Number of Subway Riders (in millions)",
x="Months(2023)")
Subway_Daily_Totals_2023 <- new_df2 %>%
select(Date, Day_of_Week, `Subways: Total Estimated Ridership`) %>%
filter(between(Date, as.Date('2023-01-01'), as.Date('2023-02-23')))%>%
group_by(Day_of_Week) %>%
summarize(Subway_Ridership_Totals = sum(`Subways: Total Estimated Ridership`)) %>%
arrange(desc(Subway_Ridership_Totals))
DT::datatable(Subway_Daily_Totals_2023)
#knitr::kable(Subway_Daily_Totals_2023, "pipe", align=c("l", "c"))
ggplot(Subway_Daily_Totals_2023, aes(x=reorder(Day_of_Week, -Subway_Ridership_Totals), y=Subway_Ridership_Totals)) +
geom_bar(stat='identity', position='dodge') +
theme_minimal() +
labs(title="Subway Ridership Totals in 2023",
y="Number of Subway Riders (in 10 million)",
x="Months(2023)")
When examining the daily totals in each year, the pandemic severely depressed ridership. The level of ridership has not recovered to where it was during the first two weeks of March 2020, where over 5 million commuters were riding the subway. By the end of 2020, ridership was below 2 million. While there have been steady increases in ridership since 2020, the ability to work from home has contributed to the overall decline in ridership since its pre_pandemic peak. Based on the current data, ridership is on the edge of hitting 4 million riders, which is about a 1-1.5 million less than the peak reached in early March 2020
The overall days of the workweek show that Wednesday, Thursday, and Tuesday have the most subway riders, followed by Friday and Monday. In 2021, Fridays had the 2nd-most subway riders, trailing Wednesday by a close margin. The trend changed in 2022, as ridership declined on Fridays, falling behind Wednesday, Thursday, and Tuesday. So far in 2023, the decline of ridership on Fridays has continued, as ridership has lagged behind Wednesdays, Thursdays, and Tuesdays, and ahead of Mondays by only a small margin. These recent statistical trends could be a sign of things to come, as people continue to use the subway system infrequently, as the ability to work from home or attend school online becomes more widespread.
While cutting service on Fridays and Mondays would potentially extend the commute times for millions of people in New York, the number of riders over the last year has decreased on those days, in particular on Fridays. More studies can be done analyzing the reasons for the decline in ridership on those particular days, and seek solutions that will prevent longer wait times and commutes from occurring as a result of the service cuts.