Are tax rates for Diesel cars higher on average than Gasoline across the US

This data set is from https://www.fhwa.dot.gov/policyinformation/motorfuelhwy_trustfund.cfm and is on the the tax rates of Diffrent fuel types (Diesel, Gasoline, Liquefied Petroleum Gas, and Gasohol) there are other variables but this project is anwsering are tax rates for diesel cars higher on average than gasoline across the US? using the variables state, tax rate, fuel type, and MMFR_year.

Data Analysis

First selected state, rate, fuel type, and MMFR year. Next filterd to remove Federal Tax, then filterd to remove Gasohol and Liquefied Petroleum Gas as this project focuses on Diesel and Gasoline. then gruped by fuel type and summarised for mean rate. Then show a bargraph on the tax rate on gasoline vs diesel over time. Finaly testing the hypothosis.

Loading librarys

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(highcharter)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(RColorBrewer)

Loading Data Set

t1 <- read_csv("Tax_Rates_by_Motor_Fuel_and_State_20251113.csv")

## Rows: 2332 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): id, state, fuel_type, fuel_type_code, abbrev, note, effective_date
## dbl (2): rate, annual
## num (2): fips, MMFR_year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Cleaning select

t2 <- t1 |>
  select(state,rate,fuel_type,MMFR_year)

Cleaning Filter

t2 <- t2 |>
  filter(state != "Federal Tax") |>
  filter(fuel_type != "Gasohol" & fuel_type != "Liquefied Petroleum Gas")

Summary

st2 <- t2 |>
  group_by(fuel_type) |>
  summarize( mean_rate = mean(rate),
            sd_rate = sd(rate))

st2

## # A tibble: 2 × 3
##   fuel_type mean_rate sd_rate
##   <chr>         <dbl>   <dbl>
## 1 Diesel         27.2   11.2 
## 2 Gasoline       26.2    9.03

Bar graph

ggplot(t2, aes(x= MMFR_year, y = rate, fill=fuel_type))+
  geom_col(position = position_dodge(width = 1))+
  labs(title = "Tax Rate on Gasoline vs Diesel Over time",
       x = "MMFR_year",
       y = "Tax Rate",
       fill = "fuel type",
       caption = "Source: dot.gov") +
  theme_minimal(base_size = 8) +
  scale_fill_brewer(palette = "Set2")

Statistics

\(H_0\): \(\mu_1\) = \(\mu_2\) \(H_a\): \(\mu_1\) > \(\mu_2\)

\(\mu_1\) = Diesel \(\mu_2\) = Gasoline

t.test(t2$rate[t2$fuel_type == "Diesel"],
       t2$rate[t2$fuel_type == "Gasoline"],conf.level = 0.95, alternative = "greater")

## 
##  Welch Two Sample t-test
## 
## data:  t2$rate[t2$fuel_type == "Diesel"] and t2$rate[t2$fuel_type == "Gasoline"]
## t = 1.7485, df = 1094.2, p-value = 0.04033
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.06139763        Inf
## sample estimates:
## mean of x mean of y 
##  27.20520  26.15499

p-value = 0.04033 Statistically significant at α = 0.05. Strong evidence that the tax rate for diesel cars is higher than gasoline.

95% CI = (0.06139763, ∞). Since 0 is outside the interval, that means the difference in means is statistically significant, showing tax rate for diesel cars is higher than gasoline.

So reject the null.

tax_rates_bmfs

yonas

2025-11-18