Electric Vehicles and the Future of Highway Funding: Impacts on Fuel Tax Revenue and Infrastructure Needs in the United States

What relationships exist between electric vehicle adoption and highway maintenance funding/needs in the United States?

Author

Eric Lehman, Lydia Ko, Sam Evans, & Hailey Kopp

Published

December 7, 2025

Introduction

Code
#Cleaning Data 
lines <- unlist(strsplit(txt, "\n"))

data_lines <- lines[str_detect(lines, "^[ ]*[A-Za-z]{3}-[0-9]{2}")]

split_rows <- str_split(str_squish(data_lines), " ")

trimmed_rows <- lapply(split_rows, function(x) x[1:5])

df <- as.data.frame(do.call(rbind, trimmed_rows), stringsAsFactors = FALSE)

colnames(df) <- c("MonthYear", "BEV", "PHEV", "HEV", "TotalLDV")

df$Year <- as.numeric(paste0("20", substr(df$MonthYear, 5, 6)))

df$BEV  <- as.numeric(gsub(",", "", df$BEV))
df$PHEV <- as.numeric(gsub(",", "", df$PHEV))

# Not including the hybrids in this calculation
df$EV <- df$BEV


ev_yearly <- df %>%
    group_by(Year) %>%
    summarise(EV_Sold = sum(EV, na.rm = TRUE))

#write.csv(ev_yearly, "data_processed/ev_yearly.csv", row.names = FALSE)

View(ev_yearly)

ev_yearly %>%  
    ggplot(aes(x = Year, y = EV_Sold)) +
    geom_point(alpha = 0.6, color = "gray50") +
    geom_smooth(se = FALSE, method = "loess", color = "black", linewidth = 1) +
    scale_y_continuous(
        labels = comma,
        breaks = scales::extended_breaks()(ev_yearly$EV_Sold)  # auto extend
    ) +
    theme_minimal() +
    labs(
        x = "Year",
        y = "EV's Sold",
        title = "EV Sales by Year in the US"

    ) +
    theme(plot.title = element_text(hjust = 0.5))

Figure 1 represents how EV sales are changing over time.

Electric vehicles (EVs) have rapidly transformed the U.S. transportation landscape over the past decade, driven by growing consumer demand and federal initiatives to reduce greenhouse gas emissions, demonstrated by Figure 1. According to the U.S. Department of Energy’s Alternative Fuels Data Center, EV registrations have surged [1], signaling a nationwide shift toward cleaner mobility. This transition delivers clear environmental benefits but simultaneously introduces urgent challenges for highway infrastructure and transportation funding.

Code
# Cleaning Data 
colnames(raw) <- raw[2, ]

df <- raw[-c(1, 2), ]



names(df)[1] <- "Date"

df$Date <- as.Date(as.numeric(df$Date), origin = "1899-12-30")

df <- df %>% 
    filter(Date >= as.Date("1990-08-20"))


df$GasPrice <- as.numeric(df[[2]])

df$Year <- format(df$Date, "%Y")
df$Year <- as.numeric(df$Year)


gas_yearly <- df %>%
    group_by(Year) %>%
    summarise(MedianPrice = median(GasPrice, na.rm = TRUE))

gas_yearly %>%
    ggplot(aes(x = Year, y = MedianPrice)) +
    geom_point(color = "gray50", size = 2) +
    geom_smooth(method = "loess", se = FALSE, color = "black") +
    scale_x_continuous(
        breaks = seq(min(gas_yearly$Year), max(gas_yearly$Year), by = 5)
    ) +
    scale_y_continuous(
        breaks = seq(0, 4, by = 0.5),
        limits = c(0, 4)
    ) +
    theme_minimal() +
    labs(
        title = "Median U.S. Gas Price per Year",
        x = "Year",
        y = "Median Gas Price (USD)",
        caption = "Figure 2"
    ) +     theme(plot.title = element_text(hjust = 0.5))

Code
#write.csv(gas_yearly, "data_processed/gas_yearly.csv", row.names = FALSE)

Figure 2 represents how gas prices are changing over time.

Figure 1 illustrates, EV sales in the US have surged dramatically since 2020, signaling a nationwide shift toward cleaner mobility. This transition, which occurs against a backdrop of fluctuating but relatively stable gasoline prices as shown in Figure 2, delivers clear environmental benefits but simultaneously introduces urgent challenges for highway infrastructure and transportation funding.

The United States finances most highway maintenance through gasoline and diesel taxes. As EV adoption accelerates, fuel consumption—and therefore fuel tax revenue—continues to decline. Meanwhile, EVs typically weigh 20–30% more than conventional vehicles due to their large battery packs [2], increasing stress on pavements, bridges, and other structures. Together, these trends—shrinking revenue and rising infrastructure wear—intensify pressure on existing funding systems. For example, Michigan lost about $50 million in fuel tax revenue between 2019 and 2021 as EV use increased, contributing to an estimated $105 billion in deferred road and bridge maintenance costs across the state [3]. Since roughly 85% of Highway Trust Fund revenues come from fuel taxes, this funding model faces an unsustainable future [4].

This project analyzes the intersection of EV adoption and highway maintenance funding to highlight these unintended consequences and to support data-driven policy decisions. By examining both national and state-level patterns, we aim to provide insights that help shape future transportation funding strategies, ensuring the long-term financial and structural stability of U.S. roadways.

Research question

What relationships exist between electric vehicle adoption and highway maintenance funding/needs in the United States?

Data Sources

EV Adoption Rate (ev_yearly.csv - used in introduction graph 1)

Data from the Argonne National Laboratories (ANL)[11] was used in this study, which is majorly U.S. funded. ANL is sponsored by the U.S. department of energy, and the site is in the government domain. The site contains additional information about how they collected their data and supplemental resources. This data is pre-processed as they do not publish manufacturer information directly but they receive data from automakers. ANL receives data from automakers and manufacturers and energy information from the DOE. ANL may have missed some data as manufacturers may have missed data, but for our purposes we are looking for an overall trend. There could be potential bias in that EV types are mislabeled, but we are only using this data for an overall trend.

Gasoline Prices over time: (pswrgvwall.xls - used in introduction graph 2)

Data from the U.S. Energy Information Administration (EIA), to analyze the trend of gasoline prices over time [12], was used in this study. This source is part of the U.S. government department of energy. The EIA is the official statistical agency for energy in the U.S. is responsible for publishing energy data, so the data is held to a high standard. They include lots of documentation about how the data is collected, including standard error reports. This data is pre-processed, where the EIA collects data from gas stations and distributors and aggregate and clean the data. This may contain missing data as some gas stations may not report, but the data is being utilized to demonstrate a general trend over multiple years. There could also be non response bias and sampling bias as not every gas station may report their data and not every gas station in the U.S. is sampled. 

State Gas Tax Data (Charts 1 & 2)

Data from the U.S. Energy Information Administration (EIA) article “Federal and State Motor Fuel Taxes” [5]. was used in this study. This dataset reports gasoline and diesel excise taxes, Leaking Underground Storage Tank (LUST) fees, and related state or federal fees for every U.S. state and territory from July 2019 through July 2025.

For this analysis, we used a cleaned version of the dataset. We converted data types, renamed variables for clarity, and removed non-numeric annotations. We did not perform any recalculations, imputations, or estimations to maintain the integrity of the original data. The EIA, a division of the U.S. Department of Energy, compiles this information from state departments of transportation and revenue agencies and validates it against Federal Highway Administration (FHWA) data. These agencies provide authoritative, transparent, and complete datasets covering all 50 states, with no bias or missing entries.

This dataset serves as a reliable foundation for analyzing how EV adoption affects gasoline consumption and transportation funding. The cleaned dataset includes two essential variables—state and total_tax—expressed in dollars per gallon.

Vehicle Registrations by State

To identify the number of electric vehicles on the road we used a data set which is published, and produced, by the U.S. Energy and Information Administration which breaks down the number of vehicles registered in every US state, by powertrain type, from 2016 to 2023 [6]. We feel confident that this data is accurate as it is published by a trusted entity within the US federal government and there are no signs that the data has been improperly recorded (such as missing values). To make the data more usable, we put the data into a tidy format and removed variables not needed for our analysis; however, we did not alter the values of the data. A detailed data dictionary of the cleaned data is presented below.

Miles Driven per Year (Chart 2)

We used the paper “Quantifying Electric Vehicle Mileage in the United States” by Lujin Zhao et al., published in Joule [7], which estimates that EVs average 7,165 miles per year. The authors used data from marketcheck.com, a real-time automotive data provider. Because the paper was authored by experts in the field and published in a reputable journal, we consider the figure credible.

Average Miles per Gallon, Gas Vehicles (Chart 2)

We obtained average fuel efficiency data for light-duty gasoline vehicles from the U.S. Department of Transportation [8]. The data originates from the state-reported Highway Performance and Monitoring System (HPMS), a high-risk subject area that directly affects federal funding allocations and therefore undergoes regular verification.

From this dataset, we recorded the 2022 value for light-duty vehicles, short-wheelbase, defined as passenger cars, light trucks, vans, and SUVs with a wheelbase of 121 inches or less.

Population Data by State (Chart 2)

We retrieved state population data for 2016–2023 from the Federal Reserve Bank of St. Louis (FRED) [9]. Although the U.S. Census Bureau produces the underlying data, FRED aggregates it into a more user-friendly format. FRED’s federal credibility and the Census Bureau’s data accuracy ensure reliability.

Consumer Price Index (CPI) Data

We used Consumer Price Index (CPI) data from 2010 to 2024, obtained from FRED [10]. 1982 is the base year. Although FRED aggregates the data, the U.S. Census Bureau remains its source. We consider this data highly reliable.

Results

Code
reg_long <- registration_data %>%
    pivot_longer(names_to = 'type',
                 values_to = 'total',
                 cols = EV:Diesel
    ) %>% 
    mutate(total = (total/1e6))

#write.csv(reg_long, "data_processed/reg_clean.csv", row.names = FALSE)

# Create a tibble containing the states, regions, and divisions. Had to add DC in manually
region_ref <- tibble(
    state = c(state.name, 'District of Columbia'),
    region = c(as.character(state.region), 'South'),
    division = c(as.character(state.division), 'South Atlantic'))

# Create a df which joins my regions tibble to my registration data
reg_regions <- reg_long %>%
    left_join(region_ref, by = 'state') %>%
    mutate(region = if_else(region == 'North Central', 'Midwest', region)) %>% 
    select(year, state, region, division, type, total)

reg_regions1 <- reg_regions %>% 
  group_by(state, region, division, type) %>% 
  summarise(total = sum(total, na.rm = TRUE), .groups = 'drop')



map_data1 <- us_map(exclude = 'PR') %>%
    left_join(reg_regions1, by = c('full' = 'state')) %>% 
    rename(state = full)



map_colors <- c(
  # Update this to change the colors for both plots!!!
    
  "#2F4F2F",  # Dark green
  "#4F7942",  # Medium green
  "#A3B18A",  # Light green
  "#1E3A5F",  # Dark blue
  "#3A5F82",  # Medium blue
  "#8DA7BE",  # Light blue
  "#5A2A27",  # Dark red
  "#8B4A39",  # Medium red
  "#C97E63"   # Light red
)

gw_buff <- "#f8e08e"
gw_blue <- "#033c5a"

ip_bar <- map_data1 %>%
  filter(type == "EV", state != "United States") %>%
  group_by(division) %>%
  summarise(total = sum(total, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(total)) %>%
  ggplot(aes(x = total, y = reorder(division, total))) +
  geom_col_interactive(
    aes(
      data_id = division,
      tooltip = paste0(division, "<br>Total: ", scales::comma(total)),
    fill = division), show.legend = FALSE) +
  scale_fill_manual(values = map_colors) +
  labs(title = "EV Registrations by Division",
       subtitle = 'Total in Millions',
       x = NULL,
       y = NULL) +
  theme_minimal(base_size = 8) +
      theme(
        plot.title =  element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5, face = 'italic', size = 6),
        plot.background = element_blank(),
        panel.background = element_blank(),
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        axis.line = element_blank())

# --- Map plot: hover shows state + total ---
ip_map <- map_data1 %>%
  filter(type == "EV") %>%
  ggplot() +
  geom_sf_interactive(
    aes(data_id = division,
        tooltip = NA), alpha = 0, color = NA, linewidth = 0) +
  geom_sf_interactive(
    aes(
      fill = division,
      data_id = state,
      tooltip = paste0(state, "<br>Total: ", scales::comma(total*1e06))
    ),
    color = "white", linewidth = 0.3, show.legend = FALSE
  ) +
  coord_sf() +
  scale_fill_manual(values = map_colors, na.value = "grey90") +
  labs(title = "EV Registrations by State") +
  theme_void() +
  theme(legend.position = "none",
        plot.title = element_text(size = 10, hjust = 0.5))

# --- Combine and make interactive ---
combined_plot <- (ip_bar) / ip_map + patchwork::plot_layout(heights = c(1, 2))

interactive_plot <- girafe(ggobj = combined_plot)
interactive_plot <- girafe_options(
  interactive_plot,
  opts_hover(css = sprintf("fill:%s;stroke:%s;stroke-width:1px;", gw_buff, gw_blue)),
  opts_hover_inv(css = "opacity:0.2;"),
  opts_selection(type = "none")
)

interactive_plot

We begin our analysis by examining where electric vehicles are most concentrated in the United States. This chart highlights the stark regional differences in EV adoption and shows that growth has been highly uneven across the country. The Pacific division leads by a wide margin, primarily because of California, Washington, and Oregon. These states combine strong environmental policies, higher fuel taxes, and extensive charging infrastructure, which together support faster EV adoption. The South Atlantic and Mountain divisions follow with moderate but steadily increasing adoption rates. In contrast, regions such as the East South Central and West North Central divisions remain at the lowest levels of EV penetration, consistent with larger rural populations, lower fuel taxes, fewer incentives, and longer average driving distances. The accompanying state-level map reinforces these disparities by showing clusters of high EV adoption along the coasts and significantly lower adoption throughout the central United States. This geographic context provides an essential foundation for interpreting the patterns observed in later charts related to fuel taxes and revenue impacts.

Code
 read_one <- function(sh, take = 1L) {
   raw <- read_excel(file_path, sheet = sh, col_names = FALSE)
   hdr_row <- which(apply(raw, 1, function(r)
     any(str_detect(as.character(r), regex("^\\s*Total\\s*State(\\[3\\])?$", ignore_case = TRUE)))))
   if (length(hdr_row) == 0) return(tibble(State = character(), Total_State = numeric(), Year = integer()))
   hdr_row <- hdr_row[1]
   
   hdr_vals <- as.character(unlist(raw[hdr_row, ]))
   total_idx <- which(str_detect(hdr_vals, regex("^\\s*Total\\s*State(\\[3\\])?$", ignore_case = TRUE)))
   if (length(total_idx) == 0) return(tibble(State = character(), Total_State = numeric(), Year = integer()))
   
   total_col <- total_idx[pmin(take, length(total_idx))]
   
   dat <- raw[(hdr_row + 1):nrow(raw), c(1, total_col)]
   names(dat) <- c("State", "Total_State")
   
   dat %>%
     mutate(
       State = as.character(State),
       State = str_trim(str_replace_all(State, "\\[.*?\\]", "")),
       Total_State = as.numeric(str_replace_all(as.character(Total_State), "[$,\\s]", "")),
       Year = as.integer(str_extract(sh, "\\d{4}"))
     ) %>%
     filter(!is.na(State), !is.na(Total_State), !is.na(Year)) %>%
     filter(!str_detect(State, regex("Average|Federal|Notes|Updated", ignore_case = TRUE)))
 }
 
 fuel_tax_data <- map_dfr(sheets, ~ read_one(.x, take = 1L)) %>%
   arrange(Year, State)

#write_csv(fuel_tax_data, "fuel_tax_data.csv")

fuel_tax_data <- (read_csv(here("data_processed","fuel_tax_data.csv")))

fuel_tax_data_yearly_mean <- fuel_tax_data %>%
  group_by(State, Year) %>%
  summarize(mean_fuel_tax = mean(Total_State)) %>%
  filter(!str_starts(State, fixed("*")))

#scraping data 

 years <- 2016:2023
 
 get_the_data <- function(year){
     root <- 'https://afdc.energy.gov/vehicle-registration?year='
     url <- paste0(root, year)
     data <- read_html(url) %>%
         html_table(fill = T)
     data <- as.data.frame(data)
      df <- data %>%
     mutate(across(everything(), as.character)) %>%
     mutate(Year = year)
 return(df)
 }
 
 
 
 scraped_vehicle_registration_data_by_year <- map_df(years,get_the_data)

 #write_csv(scraped_vehicle_registration_data_by_year, "scraped_vehicle_registration_data_by_year.csv")

scraped_vehicle_registration_data_by_year <- read_csv(here("data_processed", "scraped_vehicle_registration_data_by_year.csv"))

fuel_tax_data_yearly_mean <- fuel_tax_data %>%
  group_by(State, Year) %>%
  summarize(mean_fuel_tax = mean(Total_State)) %>%
  filter(!str_starts(State, fixed("*")))

state_vehicle_registrations_and_fuel_tax_by_year <- left_join(scraped_vehicle_registration_data_by_year, fuel_tax_data_yearly_mean, by = c("Year", "State"))
# write.csv(state_vehicle_registrations_and_fuel_tax_by_year,"state_vehicle_registrations_and_fuel_tax_by_year.csv")

state_vehicle_registrations_and_fuel_tax_by_year <- state_vehicle_registrations_and_fuel_tax_by_year %>% 
  mutate(mean_total_state_fuel_tax_gasoline_in_dollars = mean_fuel_tax) %>% 
  select(-mean_fuel_tax)


# From Helveston's paper average BEV drives 7,165 miles 
# Source: https://www.sciencedirect.com/science/article/pii/S254243512300404X



# Going to assume average weighted fuel efficiency of 24.8 mpg. Source
# https://afdc.energy.gov/data/10310


# Creating data set with the average loss in tax revenue by year 2017 to 2023


lost_revenue <- state_vehicle_registrations_and_fuel_tax_by_year %>% 
  mutate(Year = as.numeric(Year)) %>% 
  filter(State != "United States") %>% 
  filter(State != "District of Columbia") %>% 
  select(State, Electric..EV., Year, mean_total_state_fuel_tax_gasoline_in_dollars) %>% 
  arrange(State, Year) %>% 
  group_by(State) %>%
  mutate(loss_in_gas_tax_revenue = ((dplyr::lag(Electric..EV.) * 7165 + Electric..EV. * 0.5 * 7165)/24.8)*mean_total_state_fuel_tax_gasoline_in_dollars) %>% 
  mutate(Year = as.double(Year)) %>% 
  filter(Year != 2016)

# Getting state population by year 
# Source is FRED

annual_pop_by_state <- read_csv(here("data_raw", "annual.csv")) 

annual_pop_by_state_longer <- annual_pop_by_state %>% 
  pivot_longer(cols = -State, names_to = "Year") %>% 
  mutate(State = state.name[match(State, state.abb)]) %>% 
  mutate(Year = as.double(Year))

# combining the pop to the lost revenue data: 



lost_revenue_with_pop <- lost_revenue %>% 
  left_join(annual_pop_by_state_longer, by = c("State", "Year")) %>% 
  mutate(Population_in_1ks = value) %>% 
  select(-value)


# calculating per capita loss in gas tax revenue from EV Adoption

lost_revenue_with_pop_per_cap <- lost_revenue_with_pop %>% 
  arrange(State, Year) %>% 
  group_by(State) %>% 
  mutate(Lost_revenue_per_1k_people = loss_in_gas_tax_revenue/(Population_in_1ks))

# Formatting output into real dollars (2024 dollars) [from FRED]:

cpi_data <- read_csv(here("data_raw", "CPIAUCSL(1).csv")) %>% 
  mutate(observation_date = year(observation_date)) %>% 
  select(Year = observation_date,CPIAUCSL) %>% 
  mutate(convert_2024_dollars = 313.698/CPIAUCSL) %>% 
  select(convert_2024_dollars, Year)

lost_revenue_with_pop_per_cap <- left_join(lost_revenue_with_pop_per_cap, cpi_data,'Year' ) 


lost_revenue_with_pop_per_cap <- lost_revenue_with_pop_per_cap%>% 
  mutate(real_lost_revenue_per_1k = Lost_revenue_per_1k_people * convert_2024_dollars) %>% 
  select(Year, State, real_lost_revenue_per_1k)

# Create ranking per year
ranked_data <- lost_revenue_with_pop_per_cap %>%
  group_by(Year) %>%
  mutate(rank = rank(-real_lost_revenue_per_1k),
         value_lbl = paste0("$", round(real_lost_revenue_per_1k, 1))) %>%
  ungroup()


comparing_2016 <- state_vehicle_registrations_and_fuel_tax_by_year %>% 
  filter(Year == 2016) %>% 
  filter(State != "United States") %>% 
  select(State, mean_total_state_fuel_tax_gasoline_in_dollars_2016 = mean_total_state_fuel_tax_gasoline_in_dollars)

comparing_2023 <- state_vehicle_registrations_and_fuel_tax_by_year %>% 
  filter(Year == 2023) %>% 
  filter(State != "United States") %>% 
  select(State, mean_total_state_fuel_tax_gasoline_in_dollars_2023 = mean_total_state_fuel_tax_gasoline_in_dollars)



combined_2016_2023 <- bind_cols(comparing_2023, comparing_2016) 

combined_2016_2023 <- combined_2016_2023 %>% 
  select(-State...3, State = State...1, mean_total_state_fuel_tax_gasoline_in_dollars_2023, mean_total_state_fuel_tax_gasoline_in_dollars_2016) %>% 
  mutate(difference_2016_2023 = mean_total_state_fuel_tax_gasoline_in_dollars_2023 - mean_total_state_fuel_tax_gasoline_in_dollars_2016)

comparing_2023_2016 <- state_vehicle_registrations_and_fuel_tax_by_year %>% 
  filter(Year %in% c(2016, 2023)) %>% 
  filter(State != "United States")

# View(comparing_2023_2016)
# View(combined_2016_2023)

for_gg <- comparing_2023_2016 %>% 
  left_join(combined_2016_2023, by = "State")

for_gg %>% 
  ggplot(aes(x= mean_total_state_fuel_tax_gasoline_in_dollars, y = reorder(State, mean_total_state_fuel_tax_gasoline_in_dollars_2023))) +
  geom_line(aes(group = State), color = "black")+ 
    
  geom_point(aes(color = as.factor(Year))) +
  scale_color_manual(
    values = c(
      "2016" = "#AC9D6E",
      "2023" = "#002654"
    )
  )+
    labs(
    x = " Total State Gas Tax (USD per Gallon)", 
    y = "State",
    title = "Gas Tax Rates in 2016 vs 2023 in The United States", 
    color =  "Year"
  )+ 
  scale_x_continuous(labels = scales::dollar_format())+
  theme_minimal(base_size = 12) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    text       = element_text(color = "black"),
    axis.text  = element_text(color = "black"),
    plot.title = element_text(face = "bold", hjust = .5),
    plot.subtitle = element_text(hjust = 0.5),
    legend.position = "top",
    legend.title = element_text(color = "black", face = "bold"),
    legend.text = element_text(color = "black")
  )

Our first expectation was that fuel tax rates would vary substantially across states due to differing policy priorities, and the 2023 data strongly supports this. Combined state and federal gasoline tax rates ranged from only $0.09 per gallon in Alaska to over $0.70 in California and Illinois, confirming the wide regulatory gap. This chart also demonstrates how these gaps have widened since 2016. States with already high tax rates, such as California, Pennsylvania, and Washington, implemented some of the largest increases over the past eight years. Meanwhile, states with historically low rates, including Alaska, Mississippi, and Missouri, made minimal or no adjustments during the same period. These patterns illustrate how policy choices diverge between states prioritizing environmental goals and revenue generation and those focused on keeping consumer fuel costs low.

This widening spread also establishes a foundation for understanding the financial impacts of EV adoption. Because fuel taxes directly support transportation infrastructure, states with higher rates face greater revenue exposure as drivers shift away from gasoline.

Code
fuel_2023 <- fuel_2023_jul_raw %>%
  # 1) Remove first 7 rows
  slice(-(1:7)) %>%
  # 2) Remove last 6 rows
  slice(1:(n() - 6)) %>%
  # 3) Keep only column 1 and 4, and rename them
  select(state = 1, total_tax = 4) %>%
  # 4) Clean state names: remove [brackets], numbers, and extra spaces
  mutate(
    state = str_remove_all(state, "\\[[^\\]]*\\]"),  # remove [ ... ]
    state = str_remove_all(state, "[0-9]"),          # remove any digits
    state = str_trim(state)                          # remove leading/trailing spaces
  ) %>%
  # 5) Remove non-state rows
  filter(!state %in% c("District of Columbia",
                       "American Samoa",
                       "Guam",
                       "Northern Mariana Islands",
                       "Puerto Rico",
                       "U.S. Virgin Islands")) %>%
  # 6) Convert total_tax to numeric and round to 4 decimals
  mutate(
    total_tax = as.numeric(total_tax),
    total_tax = round(total_tax, 4)
  )



state_vehicle_data_subset <- state_vehicle_data %>%
  select(State, Electric..EV.) %>%
  slice(366:415)


### 1. Clean EV data (from state_vehicle_data_subset)
ev_2023 <- state_vehicle_data_subset %>%
  rename(
    state = State,
    EV_Count = Electric..EV.
  ) %>%
  mutate(
    state = str_trim(state),
    EV_Count = as.numeric(EV_Count)
  )

#write.csv(ev_2023, "data_processed/ev_2023_clean.csv", row.names = FALSE)


### 2. Clean fuel tax data (from fuel_2023)
fuel_2023_clean <- fuel_2023 %>%
  mutate(
    state = str_trim(state),
    total_tax = as.numeric(total_tax)
  ) %>%
  filter(!is.na(total_tax))

#write.csv(fuel_2023_clean, "data_processed/fuel_2023_clean.csv", row.names = FALSE)

### 2. Clean fuel tax data (from fuel_2023)
fuel_2023_clean <- fuel_2023 %>%
  mutate(
    state = str_trim(state),
    total_tax = as.numeric(total_tax)
  ) %>%
  filter(!is.na(total_tax))

### 3. Find top 5 and bottom 5 states by EV registrations
top5_states <- ev_2023 %>%
  arrange(desc(EV_Count)) %>%
  slice(1:5) %>%
  pull(state)

bottom5_states <- ev_2023 %>%
  arrange(EV_Count) %>%
  slice(1:5) %>%
  pull(state)

plot_data <- fuel_2023_clean %>%
  left_join(ev_2023, by = "state") %>%
  mutate(
    EV_Group = case_when(
      state %in% top5_states    ~ "Top 5 EV States",
      state %in% bottom5_states ~ "Bottom 5 EV States",
      TRUE                      ~ "Other States"
    )
  )


# --- Create lookup table for state abbreviations ---
state_lookup <- data.frame(
  state = state.name,
  abb   = state.abb,
  stringsAsFactors = FALSE
)

# --- rebuild plot_data with adjusted EV_Count and state abbreviations ---
plot_data <- fuel_2023_clean %>%
  left_join(ev_2023, by = "state") %>%
  left_join(state_lookup, by = "state") %>%     # add abbreviations
  filter(!is.na(EV_Count)) %>%                  # <--- drop missing EV counts
  mutate(
    abb = if_else(is.na(abb), state, abb),      # fallback if a name doesn't match
    EV_Group = case_when(
      state %in% top5_states    ~ "Top 5 EV States",
      state %in% bottom5_states ~ "Bottom 5 EV States",
      TRUE                      ~ "Other States"
    ),
    EV_Count_adj = if_else(EV_Count <= 0, 0.1, EV_Count)
  )

highlight_data <- plot_data %>% 
  filter(EV_Group != "Other States")   # Only Top5 & Bottom5 get labels

ggplot() +
  geom_point(
    data = filter(plot_data, EV_Group == "Other States"),
    aes(x = total_tax, y = EV_Count_adj),
    color = "grey75",
    size  = 2.3,
    alpha = 0.7
  ) +
  geom_point(
    data = highlight_data,
    aes(x = total_tax, y = EV_Count_adj, color = EV_Group),
    size = 3
  ) +
  geom_text_repel(
    data = highlight_data,
    aes(x = total_tax, y = EV_Count_adj, label = abb, color = EV_Group),
    size = 3.4,
    fontface = "bold",
    box.padding = 0.3,
    point.padding = 0.2,
    segment.color = NA,
    max.overlaps = Inf
  ) +
  scale_x_continuous(
    name   = "Total Fuel Tax (USD per gallon)",
    breaks = seq(0.1, 0.7, 0.1),
    labels = label_number(accuracy = 0.001)
  ) +
  scale_y_continuous(
    name   = "Number of EV Registrations (log scale)",
    trans  = "log10",
    breaks = c(1e3, 1e4, 1e5, 1e6),
    labels = function(x) format(x, big.mark = ",", scientific = FALSE)
  ) +
  scale_color_manual(
    values = c(
      "Top 5 EV States"    = "#AC9D6E",
      "Bottom 5 EV States" = "#002654"
    ),
    guide = guide_legend(
      title = NULL,
      override.aes = list(size = 4)
    )
  ) +
  labs(
    title = "EV Registrations vs Gasoline Tax by State (2023)"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    legend.position = "top",
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5)
  )

We also hypothesized that states with higher EV adoption would tend to maintain higher fuel tax rates, either as a policy strategy or as a reflection of broader environmental priorities. This chart shows a clear, consistent trend supporting this expectation. States with the highest EV registrations, including California, Washington, Florida, New Jersey, and Texas, cluster on the higher end of total fuel tax rates. California stands out with both the highest tax rate and the highest number of EVs, while states such as Washington and New Jersey follow similar patterns.

Conversely, the states with the fewest EVs, including Alaska, North Dakota, South Dakota, Wyoming, and West Virginia, lie at the lower end of the tax spectrum. These results suggest that EV adoption and fuel taxation policies likely stem from shared underlying political, economic, and environmental factors. Higher-tax states appear more willing to encourage EV adoption through broader climate-oriented policy frameworks, while low-tax states show slower EV growth and fewer complementary incentives.

Or Click here to open the Shiny app in a new window

We examined the financial implications of increasing EV adoption, focusing on lost gasoline tax revenue. As EV use increases, states collect less revenue per mile traveled, and because 85% of Highway Trust Fund revenue depends on fuel taxes, this poses significant challenges for road maintenance funding.

This chart visualizes per-capita and total revenue losses for every state from 2017 to 2023. Again, California stands out: in 2023, the state lost approximately $7,743 per 1,000 residents, equivalent to $303.5 million in total foregone revenue (in 2024 dollars). States with high EV adoption consistently show larger losses, reflecting both reduced gasoline usage and their higher baseline fuel tax rates.

In contrast, states such as Mississippi, West Virginia, and North Dakota show minimal revenue losses. Mississippi’s loss per 1,000 residents—almost 99 times smaller than California’s, aligns with its combination of low EV adoption and one of the lowest fuel tax rates in the country. These findings confirm our expectation that states adopting EVs more rapidly face larger short-term fiscal pressures on their transportation budgets.

Conclusion

Overall, the results indicate a clear and interconnected relationship between state fuel tax policy, EV adoption, and the financial sustainability of highway funding. States with higher fuel tax rates tend to have higher EV adoption, experience greater declines in gasoline consumption, and therefore lose more transportation revenue per resident. These fiscal impacts are not evenly distributed. California, Washington, and other high-adoption states face rapidly growing revenue deficits, while low-adoption states experience only modest declines. The differences in fuel tax policy since 2016 have further amplified these patterns, widening the gap in states’ vulnerability to EV related revenue loss.

These findings highlight the need for updated transportation funding strategies as EV adoption accelerates nationwide. Without adjustments such as mileage based user fees, tiered registration systems, or revised excise tax structures, states with high EV uptake may struggle to maintain the condition of their roads and bridges. At the same time, low-adoption states may postpone confronting these financial challenges, creating uneven infrastructure quality across the country. Future work could incorporate additional within-state factors such as vehicle mix, regional infrastructure needs, and differences in driving patterns to produce more precise revenue estimates and enable more accurate comparisons across states. As the United States continues transitioning toward electric transportation, federal and state policymakers will need cohesive, data driven solutions to ensure that infrastructure funding remains stable, equitable, and aligned with long term environmental goals.

Appendix

EV Adoption Rate

Variable Name Description Type Example
Month Month and Year that the data is from Character Dec-10
BEV Battery Electric Vehicle Sold Character 1,144,840
PHEV Plug in Hybrid Electric Vehicles Sold Character 819,938
HEV Hybrids Electric Vehicles Sold Character 993,535
Total LDV Total Light-Duty Vehicles Sold Character 1,246,668

Gasoline Prices over the Years

Variable Name Description Type Example
Date Date recorded Date 1990-08-20
Weekly U.S. Regular Conventional Retail Gasoline Prices (Dollars Per Gallon) Dollar per gallon Character 1.191

Federal and State Motor Fuel Taxes:

Variable Name Description Type Example
state U.S. state name where the tax applies String “California”
Year Year of observation Numeric (float) 2016
total_tax Combined federal and state gasoline tax rate as of July 1, 2025 Numeric (float) 0.7092

Vehicle Registration By State:

Variable Description Type Example
State US state of the observation String  “Oregon”
Electric..EV. Number of battery electric vehicles registered Double 500
Year Year of observation Double 2016

Population Data by State:

Variable Name Description Type Example
State The state of the ith observation  String “Oregon”
Year The year of the ith observation  Double 2017
value The population (measured in 1000s) of the ith observation  Double 736.624

Consumer Price Index (CPI Data)

Variable Name Description Type Example
observation_date The year of the ith observation Date 1/1/2019
CPIAUCSL The CPI for the ith year (base year 1982) Double 218.076
Code
# Install these packages if you do not already have them 
# install.packages("pdftools")

# Load libraries and settings here
library(tidyverse)
library(here)
library(stringr)
library(dplyr)
library(pdftools)
library(ggplot2)
library(scales)
library(readxl)
library(dplyr)
library(ggplot2)
library(scales)
library(stringr)
library(ggrepel)

library(conflicted)
library(rvest)
library(usethis)
library(here)
library(ggplot2)
library(dplyr)
library(janitor)
library(tidyverse)
library(usmap)
library(ggiraph)
library(tigris)
library(patchwork)
library(scales)
library(RColorBrewer)
conflicts_prefer(dplyr::filter)
options(tigris_use_cache = TRUE)

knitr::opts_chunk$set(
  warning = FALSE,
  message = FALSE,
  comment = "#>",
  fig.path = "figs/", # Folder where rendered plots are saved
  fig.width = 7.252, # Default plot width
  fig.height = 4, # Default plot height
  fig.retina = 3 # For better plot resolution
)

# Put any other "global" settings here, e.g. a ggplot theme:
theme_set(theme_bw(base_size = 20))

# Write code below here to load any data used in project

# for the intro scatter plot about ev adoption 
txt <- pdf_text("data_raw/ev_sales.pdf")

# for the intro scatter plot about gasoline prices 
raw <- read_excel("data_raw/pswrgvwall.xls", sheet = 2, col_names = FALSE)

View(raw)

# for the main scatter plot about taxes and ev registration
xlsx_path <- here("data_raw", "fueltaxes.xlsx")
excel_sheets(xlsx_path)

fuel_2023_jul_raw <- read_excel(xlsx_path, sheet = "July 2023")

csv_path <- here("data_raw", "state_vehicle_registrations_and_fuel_tax_by_year.csv")

state_vehicle_data <- read.csv(csv_path)

registration_data <- read_csv(here('data_raw', 'full_reg_df.csv'))

 file_path <- here("data_raw", "fueltaxes(2).xlsx")

 sheets <- excel_sheets(file_path)
#Cleaning Data 
lines <- unlist(strsplit(txt, "\n"))

data_lines <- lines[str_detect(lines, "^[ ]*[A-Za-z]{3}-[0-9]{2}")]

split_rows <- str_split(str_squish(data_lines), " ")

trimmed_rows <- lapply(split_rows, function(x) x[1:5])

df <- as.data.frame(do.call(rbind, trimmed_rows), stringsAsFactors = FALSE)

colnames(df) <- c("MonthYear", "BEV", "PHEV", "HEV", "TotalLDV")

df$Year <- as.numeric(paste0("20", substr(df$MonthYear, 5, 6)))

df$BEV  <- as.numeric(gsub(",", "", df$BEV))
df$PHEV <- as.numeric(gsub(",", "", df$PHEV))

# Not including the hybrids in this calculation
df$EV <- df$BEV


ev_yearly <- df %>%
    group_by(Year) %>%
    summarise(EV_Sold = sum(EV, na.rm = TRUE))

#write.csv(ev_yearly, "data_processed/ev_yearly.csv", row.names = FALSE)

View(ev_yearly)

ev_yearly %>%  
    ggplot(aes(x = Year, y = EV_Sold)) +
    geom_point(alpha = 0.6, color = "gray50") +
    geom_smooth(se = FALSE, method = "loess", color = "black", linewidth = 1) +
    scale_y_continuous(
        labels = comma,
        breaks = scales::extended_breaks()(ev_yearly$EV_Sold)  # auto extend
    ) +
    theme_minimal() +
    labs(
        x = "Year",
        y = "EV's Sold",
        title = "EV Sales by Year in the US"

    ) +
    theme(plot.title = element_text(hjust = 0.5))

# Cleaning Data 
colnames(raw) <- raw[2, ]

df <- raw[-c(1, 2), ]



names(df)[1] <- "Date"

df$Date <- as.Date(as.numeric(df$Date), origin = "1899-12-30")

df <- df %>% 
    filter(Date >= as.Date("1990-08-20"))


df$GasPrice <- as.numeric(df[[2]])

df$Year <- format(df$Date, "%Y")
df$Year <- as.numeric(df$Year)


gas_yearly <- df %>%
    group_by(Year) %>%
    summarise(MedianPrice = median(GasPrice, na.rm = TRUE))

gas_yearly %>%
    ggplot(aes(x = Year, y = MedianPrice)) +
    geom_point(color = "gray50", size = 2) +
    geom_smooth(method = "loess", se = FALSE, color = "black") +
    scale_x_continuous(
        breaks = seq(min(gas_yearly$Year), max(gas_yearly$Year), by = 5)
    ) +
    scale_y_continuous(
        breaks = seq(0, 4, by = 0.5),
        limits = c(0, 4)
    ) +
    theme_minimal() +
    labs(
        title = "Median U.S. Gas Price per Year",
        x = "Year",
        y = "Median Gas Price (USD)",
        caption = "Figure 2"
    ) +     theme(plot.title = element_text(hjust = 0.5))

#write.csv(gas_yearly, "data_processed/gas_yearly.csv", row.names = FALSE)

reg_long <- registration_data %>%
    pivot_longer(names_to = 'type',
                 values_to = 'total',
                 cols = EV:Diesel
    ) %>% 
    mutate(total = (total/1e6))

#write.csv(reg_long, "data_processed/reg_clean.csv", row.names = FALSE)

# Create a tibble containing the states, regions, and divisions. Had to add DC in manually
region_ref <- tibble(
    state = c(state.name, 'District of Columbia'),
    region = c(as.character(state.region), 'South'),
    division = c(as.character(state.division), 'South Atlantic'))

# Create a df which joins my regions tibble to my registration data
reg_regions <- reg_long %>%
    left_join(region_ref, by = 'state') %>%
    mutate(region = if_else(region == 'North Central', 'Midwest', region)) %>% 
    select(year, state, region, division, type, total)

reg_regions1 <- reg_regions %>% 
  group_by(state, region, division, type) %>% 
  summarise(total = sum(total, na.rm = TRUE), .groups = 'drop')



map_data1 <- us_map(exclude = 'PR') %>%
    left_join(reg_regions1, by = c('full' = 'state')) %>% 
    rename(state = full)



map_colors <- c(
  # Update this to change the colors for both plots!!!
    
  "#2F4F2F",  # Dark green
  "#4F7942",  # Medium green
  "#A3B18A",  # Light green
  "#1E3A5F",  # Dark blue
  "#3A5F82",  # Medium blue
  "#8DA7BE",  # Light blue
  "#5A2A27",  # Dark red
  "#8B4A39",  # Medium red
  "#C97E63"   # Light red
)

gw_buff <- "#f8e08e"
gw_blue <- "#033c5a"

ip_bar <- map_data1 %>%
  filter(type == "EV", state != "United States") %>%
  group_by(division) %>%
  summarise(total = sum(total, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(total)) %>%
  ggplot(aes(x = total, y = reorder(division, total))) +
  geom_col_interactive(
    aes(
      data_id = division,
      tooltip = paste0(division, "<br>Total: ", scales::comma(total)),
    fill = division), show.legend = FALSE) +
  scale_fill_manual(values = map_colors) +
  labs(title = "EV Registrations by Division",
       subtitle = 'Total in Millions',
       x = NULL,
       y = NULL) +
  theme_minimal(base_size = 8) +
      theme(
        plot.title =  element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5, face = 'italic', size = 6),
        plot.background = element_blank(),
        panel.background = element_blank(),
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        axis.line = element_blank())

# --- Map plot: hover shows state + total ---
ip_map <- map_data1 %>%
  filter(type == "EV") %>%
  ggplot() +
  geom_sf_interactive(
    aes(data_id = division,
        tooltip = NA), alpha = 0, color = NA, linewidth = 0) +
  geom_sf_interactive(
    aes(
      fill = division,
      data_id = state,
      tooltip = paste0(state, "<br>Total: ", scales::comma(total*1e06))
    ),
    color = "white", linewidth = 0.3, show.legend = FALSE
  ) +
  coord_sf() +
  scale_fill_manual(values = map_colors, na.value = "grey90") +
  labs(title = "EV Registrations by State") +
  theme_void() +
  theme(legend.position = "none",
        plot.title = element_text(size = 10, hjust = 0.5))

# --- Combine and make interactive ---
combined_plot <- (ip_bar) / ip_map + patchwork::plot_layout(heights = c(1, 2))

interactive_plot <- girafe(ggobj = combined_plot)
interactive_plot <- girafe_options(
  interactive_plot,
  opts_hover(css = sprintf("fill:%s;stroke:%s;stroke-width:1px;", gw_buff, gw_blue)),
  opts_hover_inv(css = "opacity:0.2;"),
  opts_selection(type = "none")
)

interactive_plot
 read_one <- function(sh, take = 1L) {
   raw <- read_excel(file_path, sheet = sh, col_names = FALSE)
   hdr_row <- which(apply(raw, 1, function(r)
     any(str_detect(as.character(r), regex("^\\s*Total\\s*State(\\[3\\])?$", ignore_case = TRUE)))))
   if (length(hdr_row) == 0) return(tibble(State = character(), Total_State = numeric(), Year = integer()))
   hdr_row <- hdr_row[1]
   
   hdr_vals <- as.character(unlist(raw[hdr_row, ]))
   total_idx <- which(str_detect(hdr_vals, regex("^\\s*Total\\s*State(\\[3\\])?$", ignore_case = TRUE)))
   if (length(total_idx) == 0) return(tibble(State = character(), Total_State = numeric(), Year = integer()))
   
   total_col <- total_idx[pmin(take, length(total_idx))]
   
   dat <- raw[(hdr_row + 1):nrow(raw), c(1, total_col)]
   names(dat) <- c("State", "Total_State")
   
   dat %>%
     mutate(
       State = as.character(State),
       State = str_trim(str_replace_all(State, "\\[.*?\\]", "")),
       Total_State = as.numeric(str_replace_all(as.character(Total_State), "[$,\\s]", "")),
       Year = as.integer(str_extract(sh, "\\d{4}"))
     ) %>%
     filter(!is.na(State), !is.na(Total_State), !is.na(Year)) %>%
     filter(!str_detect(State, regex("Average|Federal|Notes|Updated", ignore_case = TRUE)))
 }
 
 fuel_tax_data <- map_dfr(sheets, ~ read_one(.x, take = 1L)) %>%
   arrange(Year, State)

#write_csv(fuel_tax_data, "fuel_tax_data.csv")

fuel_tax_data <- (read_csv(here("data_processed","fuel_tax_data.csv")))

fuel_tax_data_yearly_mean <- fuel_tax_data %>%
  group_by(State, Year) %>%
  summarize(mean_fuel_tax = mean(Total_State)) %>%
  filter(!str_starts(State, fixed("*")))

#scraping data 

 years <- 2016:2023
 
 get_the_data <- function(year){
     root <- 'https://afdc.energy.gov/vehicle-registration?year='
     url <- paste0(root, year)
     data <- read_html(url) %>%
         html_table(fill = T)
     data <- as.data.frame(data)
      df <- data %>%
     mutate(across(everything(), as.character)) %>%
     mutate(Year = year)
 return(df)
 }
 
 
 
 scraped_vehicle_registration_data_by_year <- map_df(years,get_the_data)

 #write_csv(scraped_vehicle_registration_data_by_year, "scraped_vehicle_registration_data_by_year.csv")

scraped_vehicle_registration_data_by_year <- read_csv(here("data_processed", "scraped_vehicle_registration_data_by_year.csv"))

fuel_tax_data_yearly_mean <- fuel_tax_data %>%
  group_by(State, Year) %>%
  summarize(mean_fuel_tax = mean(Total_State)) %>%
  filter(!str_starts(State, fixed("*")))

state_vehicle_registrations_and_fuel_tax_by_year <- left_join(scraped_vehicle_registration_data_by_year, fuel_tax_data_yearly_mean, by = c("Year", "State"))
# write.csv(state_vehicle_registrations_and_fuel_tax_by_year,"state_vehicle_registrations_and_fuel_tax_by_year.csv")

state_vehicle_registrations_and_fuel_tax_by_year <- state_vehicle_registrations_and_fuel_tax_by_year %>% 
  mutate(mean_total_state_fuel_tax_gasoline_in_dollars = mean_fuel_tax) %>% 
  select(-mean_fuel_tax)


# From Helveston's paper average BEV drives 7,165 miles 
# Source: https://www.sciencedirect.com/science/article/pii/S254243512300404X



# Going to assume average weighted fuel efficiency of 24.8 mpg. Source
# https://afdc.energy.gov/data/10310


# Creating data set with the average loss in tax revenue by year 2017 to 2023


lost_revenue <- state_vehicle_registrations_and_fuel_tax_by_year %>% 
  mutate(Year = as.numeric(Year)) %>% 
  filter(State != "United States") %>% 
  filter(State != "District of Columbia") %>% 
  select(State, Electric..EV., Year, mean_total_state_fuel_tax_gasoline_in_dollars) %>% 
  arrange(State, Year) %>% 
  group_by(State) %>%
  mutate(loss_in_gas_tax_revenue = ((dplyr::lag(Electric..EV.) * 7165 + Electric..EV. * 0.5 * 7165)/24.8)*mean_total_state_fuel_tax_gasoline_in_dollars) %>% 
  mutate(Year = as.double(Year)) %>% 
  filter(Year != 2016)

# Getting state population by year 
# Source is FRED

annual_pop_by_state <- read_csv(here("data_raw", "annual.csv")) 

annual_pop_by_state_longer <- annual_pop_by_state %>% 
  pivot_longer(cols = -State, names_to = "Year") %>% 
  mutate(State = state.name[match(State, state.abb)]) %>% 
  mutate(Year = as.double(Year))

# combining the pop to the lost revenue data: 



lost_revenue_with_pop <- lost_revenue %>% 
  left_join(annual_pop_by_state_longer, by = c("State", "Year")) %>% 
  mutate(Population_in_1ks = value) %>% 
  select(-value)


# calculating per capita loss in gas tax revenue from EV Adoption

lost_revenue_with_pop_per_cap <- lost_revenue_with_pop %>% 
  arrange(State, Year) %>% 
  group_by(State) %>% 
  mutate(Lost_revenue_per_1k_people = loss_in_gas_tax_revenue/(Population_in_1ks))

# Formatting output into real dollars (2024 dollars) [from FRED]:

cpi_data <- read_csv(here("data_raw", "CPIAUCSL(1).csv")) %>% 
  mutate(observation_date = year(observation_date)) %>% 
  select(Year = observation_date,CPIAUCSL) %>% 
  mutate(convert_2024_dollars = 313.698/CPIAUCSL) %>% 
  select(convert_2024_dollars, Year)

lost_revenue_with_pop_per_cap <- left_join(lost_revenue_with_pop_per_cap, cpi_data,'Year' ) 


lost_revenue_with_pop_per_cap <- lost_revenue_with_pop_per_cap%>% 
  mutate(real_lost_revenue_per_1k = Lost_revenue_per_1k_people * convert_2024_dollars) %>% 
  select(Year, State, real_lost_revenue_per_1k)

# Create ranking per year
ranked_data <- lost_revenue_with_pop_per_cap %>%
  group_by(Year) %>%
  mutate(rank = rank(-real_lost_revenue_per_1k),
         value_lbl = paste0("$", round(real_lost_revenue_per_1k, 1))) %>%
  ungroup()


comparing_2016 <- state_vehicle_registrations_and_fuel_tax_by_year %>% 
  filter(Year == 2016) %>% 
  filter(State != "United States") %>% 
  select(State, mean_total_state_fuel_tax_gasoline_in_dollars_2016 = mean_total_state_fuel_tax_gasoline_in_dollars)

comparing_2023 <- state_vehicle_registrations_and_fuel_tax_by_year %>% 
  filter(Year == 2023) %>% 
  filter(State != "United States") %>% 
  select(State, mean_total_state_fuel_tax_gasoline_in_dollars_2023 = mean_total_state_fuel_tax_gasoline_in_dollars)



combined_2016_2023 <- bind_cols(comparing_2023, comparing_2016) 

combined_2016_2023 <- combined_2016_2023 %>% 
  select(-State...3, State = State...1, mean_total_state_fuel_tax_gasoline_in_dollars_2023, mean_total_state_fuel_tax_gasoline_in_dollars_2016) %>% 
  mutate(difference_2016_2023 = mean_total_state_fuel_tax_gasoline_in_dollars_2023 - mean_total_state_fuel_tax_gasoline_in_dollars_2016)

comparing_2023_2016 <- state_vehicle_registrations_and_fuel_tax_by_year %>% 
  filter(Year %in% c(2016, 2023)) %>% 
  filter(State != "United States")

# View(comparing_2023_2016)
# View(combined_2016_2023)

for_gg <- comparing_2023_2016 %>% 
  left_join(combined_2016_2023, by = "State")

for_gg %>% 
  ggplot(aes(x= mean_total_state_fuel_tax_gasoline_in_dollars, y = reorder(State, mean_total_state_fuel_tax_gasoline_in_dollars_2023))) +
  geom_line(aes(group = State), color = "black")+ 
    
  geom_point(aes(color = as.factor(Year))) +
  scale_color_manual(
    values = c(
      "2016" = "#AC9D6E",
      "2023" = "#002654"
    )
  )+
    labs(
    x = " Total State Gas Tax (USD per Gallon)", 
    y = "State",
    title = "Gas Tax Rates in 2016 vs 2023 in The United States", 
    color =  "Year"
  )+ 
  scale_x_continuous(labels = scales::dollar_format())+
  theme_minimal(base_size = 12) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    text       = element_text(color = "black"),
    axis.text  = element_text(color = "black"),
    plot.title = element_text(face = "bold", hjust = .5),
    plot.subtitle = element_text(hjust = 0.5),
    legend.position = "top",
    legend.title = element_text(color = "black", face = "bold"),
    legend.text = element_text(color = "black")
  )

fuel_2023 <- fuel_2023_jul_raw %>%
  # 1) Remove first 7 rows
  slice(-(1:7)) %>%
  # 2) Remove last 6 rows
  slice(1:(n() - 6)) %>%
  # 3) Keep only column 1 and 4, and rename them
  select(state = 1, total_tax = 4) %>%
  # 4) Clean state names: remove [brackets], numbers, and extra spaces
  mutate(
    state = str_remove_all(state, "\\[[^\\]]*\\]"),  # remove [ ... ]
    state = str_remove_all(state, "[0-9]"),          # remove any digits
    state = str_trim(state)                          # remove leading/trailing spaces
  ) %>%
  # 5) Remove non-state rows
  filter(!state %in% c("District of Columbia",
                       "American Samoa",
                       "Guam",
                       "Northern Mariana Islands",
                       "Puerto Rico",
                       "U.S. Virgin Islands")) %>%
  # 6) Convert total_tax to numeric and round to 4 decimals
  mutate(
    total_tax = as.numeric(total_tax),
    total_tax = round(total_tax, 4)
  )



state_vehicle_data_subset <- state_vehicle_data %>%
  select(State, Electric..EV.) %>%
  slice(366:415)


### 1. Clean EV data (from state_vehicle_data_subset)
ev_2023 <- state_vehicle_data_subset %>%
  rename(
    state = State,
    EV_Count = Electric..EV.
  ) %>%
  mutate(
    state = str_trim(state),
    EV_Count = as.numeric(EV_Count)
  )

#write.csv(ev_2023, "data_processed/ev_2023_clean.csv", row.names = FALSE)


### 2. Clean fuel tax data (from fuel_2023)
fuel_2023_clean <- fuel_2023 %>%
  mutate(
    state = str_trim(state),
    total_tax = as.numeric(total_tax)
  ) %>%
  filter(!is.na(total_tax))

#write.csv(fuel_2023_clean, "data_processed/fuel_2023_clean.csv", row.names = FALSE)

### 2. Clean fuel tax data (from fuel_2023)
fuel_2023_clean <- fuel_2023 %>%
  mutate(
    state = str_trim(state),
    total_tax = as.numeric(total_tax)
  ) %>%
  filter(!is.na(total_tax))

### 3. Find top 5 and bottom 5 states by EV registrations
top5_states <- ev_2023 %>%
  arrange(desc(EV_Count)) %>%
  slice(1:5) %>%
  pull(state)

bottom5_states <- ev_2023 %>%
  arrange(EV_Count) %>%
  slice(1:5) %>%
  pull(state)

plot_data <- fuel_2023_clean %>%
  left_join(ev_2023, by = "state") %>%
  mutate(
    EV_Group = case_when(
      state %in% top5_states    ~ "Top 5 EV States",
      state %in% bottom5_states ~ "Bottom 5 EV States",
      TRUE                      ~ "Other States"
    )
  )


# --- Create lookup table for state abbreviations ---
state_lookup <- data.frame(
  state = state.name,
  abb   = state.abb,
  stringsAsFactors = FALSE
)

# --- rebuild plot_data with adjusted EV_Count and state abbreviations ---
plot_data <- fuel_2023_clean %>%
  left_join(ev_2023, by = "state") %>%
  left_join(state_lookup, by = "state") %>%     # add abbreviations
  filter(!is.na(EV_Count)) %>%                  # <--- drop missing EV counts
  mutate(
    abb = if_else(is.na(abb), state, abb),      # fallback if a name doesn't match
    EV_Group = case_when(
      state %in% top5_states    ~ "Top 5 EV States",
      state %in% bottom5_states ~ "Bottom 5 EV States",
      TRUE                      ~ "Other States"
    ),
    EV_Count_adj = if_else(EV_Count <= 0, 0.1, EV_Count)
  )

highlight_data <- plot_data %>% 
  filter(EV_Group != "Other States")   # Only Top5 & Bottom5 get labels

ggplot() +
  geom_point(
    data = filter(plot_data, EV_Group == "Other States"),
    aes(x = total_tax, y = EV_Count_adj),
    color = "grey75",
    size  = 2.3,
    alpha = 0.7
  ) +
  geom_point(
    data = highlight_data,
    aes(x = total_tax, y = EV_Count_adj, color = EV_Group),
    size = 3
  ) +
  geom_text_repel(
    data = highlight_data,
    aes(x = total_tax, y = EV_Count_adj, label = abb, color = EV_Group),
    size = 3.4,
    fontface = "bold",
    box.padding = 0.3,
    point.padding = 0.2,
    segment.color = NA,
    max.overlaps = Inf
  ) +
  scale_x_continuous(
    name   = "Total Fuel Tax (USD per gallon)",
    breaks = seq(0.1, 0.7, 0.1),
    labels = label_number(accuracy = 0.001)
  ) +
  scale_y_continuous(
    name   = "Number of EV Registrations (log scale)",
    trans  = "log10",
    breaks = c(1e3, 1e4, 1e5, 1e6),
    labels = function(x) format(x, big.mark = ",", scientific = FALSE)
  ) +
  scale_color_manual(
    values = c(
      "Top 5 EV States"    = "#AC9D6E",
      "Bottom 5 EV States" = "#002654"
    ),
    guide = guide_legend(
      title = NULL,
      override.aes = list(size = 4)
    )
  ) +
  labs(
    title = "EV Registrations vs Gasoline Tax by State (2023)"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    legend.position = "top",
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5)
  )

References

[1] U.S. Department of Energy. (n.d.) Vehicle Registration Counts by State. Alternative Fuels Data Center. https://afdc.energy.gov/vehicle-registration?

[2] Maurer, M. (2024b, March 11). EVs are heavier than gas cars, but are they harder on roads?. Center for Transportation Research. 

https://ctr.utk.edu/evs-heavier-than-cars-are-they-harder-on-roads/ 

[3] ​​Farmer, L., Yousofi, F., & Mills, M. (2025, January 14). More evs, less gas tax revenue create state transportation budget issues. The Pew Charitable Trusts. https://www.pew.org/en/research-and-analysis/articles/2025/01/14/more-evs-less-gas-tax-revenue-create-state-transportation-budget-issues

[4] Congressional Research Service (27 March, 2025). The Highway Trust Fund’s Highway Account. https://www.congress.gov/crs_external_products/R/PDF/R48472/R48472.1.pdf 

[5] U.S. Energy Information Administration. (20 August, 2024). Average state tax rates for retail gasoline and diesel fuel flat since January 2024. https://www.eia.gov/todayinenergy/detail.php?id=62865

[6] U.S. Energy Information Administration.(21 February, 2025). Average state tax rates for retail gasoline and diesel fuel nearly flat since July 2024. Today in Energy. https://www.eia.gov/todayinenergy/detail.php?id=64585

[7] Zhao, L.; Ottinger, E.; Yip, A.; Helveston, J.(15 November, 2023). Quantifying electric vehicle mileage in the United States. Joule. https://www.cell.com/joule/fulltext/S2542-4351(23)00404-X?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS254243512300404X%3Fshowall%3Dtrue

[8] Office of Highway Policy Information. (13 March, 2025). Highway Statistics Series: Highway Statistics 2022. U.S. Department of Transportation. https://www.fhwa.dot.gov/policyinformation/statistics/2022/vm1.cfm

[9] Federal Reserve Bank of St. Louis. (n.d.). https://fred.stlouisfed.org/tags/series?t=population%3Bstate

[10] Federal Reserve Bank of St. Louis. (n.d.). 

https://fred.stlouisfed.org/series/CPIAUCSL

[11] Argonne National Laboratory. (n.d.). EV Model Availability and Sales. Retrieved November 29, 2025, from https://www.anl.gov/ev-facts/model-sales

[12] U.S. Energy Information Administration. (n.d.). Gasoline and Diesel Fuel Update. Retrieved November 29, 2025, from https://www.eia.gov/petroleum/gasdiesel/

Attribution

All members contributed equally