This data set is from https://www.fhwa.dot.gov/policyinformation/motorfuelhwy_trustfund.cfm and is on the the tax rates of Diffrent fuel types (Diesel, Gasoline, Liquefied Petroleum Gas, and Gasohol) there are other variables but this project is anwsering are tax rates for diesel cars higher on average than gasoline across the US? using the variables state, tax rate, fuel type, and MMFR_year.
First selected state, rate, fuel type, and MMFR year. Next filterd to remove Federal Tax, then filterd to remove Gasohol and Liquefied Petroleum Gas as this project focuses on Diesel and Gasoline. then gruped by fuel type and summarised for mean rate. Then show a bargraph on the tax rate on gasoline vs diesel over time. Finaly testing the hypothosis.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(RColorBrewer)
t1 <- read_csv("Tax_Rates_by_Motor_Fuel_and_State_20251113.csv")
## Rows: 2332 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): id, state, fuel_type, fuel_type_code, abbrev, note, effective_date
## dbl (2): rate, annual
## num (2): fips, MMFR_year
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
t2 <- t1 |>
select(state,rate,fuel_type,MMFR_year)
t2 <- t2 |>
filter(state != "Federal Tax") |>
filter(fuel_type != "Gasohol" & fuel_type != "Liquefied Petroleum Gas")
st2 <- t2 |>
group_by(fuel_type) |>
summarize( mean_rate = mean(rate),
sd_rate = sd(rate))
st2
## # A tibble: 2 × 3
## fuel_type mean_rate sd_rate
## <chr> <dbl> <dbl>
## 1 Diesel 27.2 11.2
## 2 Gasoline 26.2 9.03
ggplot(t2, aes(x= MMFR_year, y = rate, fill=fuel_type))+
geom_col(position = position_dodge(width = 1))+
labs(title = "Tax Rate on Gasoline vs Diesel Over time",
x = "MMFR_year",
y = "Tax Rate",
fill = "fuel type",
caption = "Source: dot.gov") +
theme_minimal(base_size = 8) +
scale_fill_brewer(palette = "Set2")
\(H_0\): \(\mu_1\) = \(\mu_2\) \(H_a\): \(\mu_1\) > \(\mu_2\)
\(\mu_1\) = Diesel \(\mu_2\) = Gasoline
t.test(t2$rate[t2$fuel_type == "Diesel"],
t2$rate[t2$fuel_type == "Gasoline"],conf.level = 0.95, alternative = "greater")
##
## Welch Two Sample t-test
##
## data: t2$rate[t2$fuel_type == "Diesel"] and t2$rate[t2$fuel_type == "Gasoline"]
## t = 1.7485, df = 1094.2, p-value = 0.04033
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.06139763 Inf
## sample estimates:
## mean of x mean of y
## 27.20520 26.15499
p-value = 0.04033 Statistically significant at α = 0.05. Strong evidence that the tax rate for diesel cars is higher than gasoline.
95% CI = (0.06139763, ∞). Since 0 is outside the interval, that means the difference in means is statistically significant, showing tax rate for diesel cars is higher than gasoline.
So reject the null.
The key finding here is that diesel dose have a higher tax rate than gasoline. With a p-value of 0.04033 it was statistically significant at α = 0.05. which means the tax rate for diesel cars is higher than gasoline. In the future it would be intresting to see how the diffrent states vary on there tax rates on gasoline vs diesel and find out if there is an association between tax rates on flues and state.
Source : https://www.fhwa.dot.gov/policyinformation/motorfuelhwy_trustfund.cfm