The New York State Energy Research and Development Authority (NYSERDA) launched the Drive Clean Rebate program in 2017 to incentive the adoption of electric vehicles (EVs) across New York State. This initiative is part of New York’s broader strategy to reduce greenhouse gas emissions, improve air quality, and promote sustainable transportation solutions. The program offers rebates to consumers who purchase or lease eligible new electric cars, making EVs more affordable and accessible to a broader range of people.
The analysis will focus on the data collected from the inception of the rebate program in 2017. This dataset includes various attributes such as the types of vehicles eligible for the rebate, the number of rebates issued, the geographic distribution of rebates, and trends in EV adoption over time. By examining this data, we aim to uncover insights into the program’s impact, identify patterns in consumer behavior, and evaluate the effectiveness of the rebate incentives in promoting electric vehicle adoption.
The objectives of this analysis will include:
Analyzing the distribution of rebates across different regions and demographic groups to identify trends and disparities.
Assessing the overall impact of the rebate program on EV adoption rates in New York State.
Determining which electric vehicle models are most popular among rebate recipients.
Observing changes and trends in the data over the years since the program’s launch.
Providing insights and recommendations to policymakers based on the findings to enhance the effectiveness.
##Load the necessary libraries:
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” forcats 1.0.0 âś” stringr 1.5.1
## âś” lubridate 1.9.3 âś” tibble 3.2.1
## âś” purrr 1.0.2 âś” tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/cbash/OneDrive/Desktop/DATA 110")
rebate_data <- read_csv("NYSERDA_Electric_Vehicle_Drive_Clean_Rebate_Data.csv")
## Rows: 150328 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): Data through Date, Submitted Date, Make, Model, County, EV Type, Tr...
## dbl (4): ZIP, Annual GHG Emissions Reductions (MT CO2e), Annual Petroleum Re...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(rebate_data)
## # A tibble: 6 Ă— 11
## `Data through Date` `Submitted Date` Make Model County ZIP `EV Type`
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 4/30/2024 5/28/2020 Tesla Model Y <NA> 10509 BEV
## 2 4/30/2024 8/30/2023 Chevrolet Bolt <NA> NA BEV
## 3 4/30/2024 11/8/2023 Jeep Grand C… <NA> 13647 PHEV
## 4 4/30/2024 4/4/2024 Toyota Prius P… <NA> 12922 PHEV
## 5 4/30/2024 3/30/2017 Audi A3 e-tr… Albany 12189 PHEV
## 6 4/30/2024 3/30/2017 Toyota Prius P… Albany 12211 PHEV
## # ℹ 4 more variables: `Transaction Type` <chr>,
## # `Annual GHG Emissions Reductions (MT CO2e)` <dbl>,
## # `Annual Petroleum Reductions (gallons)` <dbl>, `Rebate Amount (USD)` <dbl>
names(rebate_data) <- tolower(names(rebate_data))
names(rebate_data) <- gsub(" ","_",names(rebate_data))
head(rebate_data)
## # A tibble: 6 Ă— 11
## data_through_date submitted_date make model county zip ev_type
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 4/30/2024 5/28/2020 Tesla Model Y <NA> 10509 BEV
## 2 4/30/2024 8/30/2023 Chevrolet Bolt <NA> NA BEV
## 3 4/30/2024 11/8/2023 Jeep Grand Cherokee <NA> 13647 PHEV
## 4 4/30/2024 4/4/2024 Toyota Prius Prime <NA> 12922 PHEV
## 5 4/30/2024 3/30/2017 Audi A3 e-tron Albany 12189 PHEV
## 6 4/30/2024 3/30/2017 Toyota Prius Prime Albany 12211 PHEV
## # ℹ 4 more variables: transaction_type <chr>,
## # `annual_ghg_emissions_reductions_(mt_co2e)` <dbl>,
## # `annual_petroleum_reductions_(gallons)` <dbl>, `rebate_amount_(usd)` <dbl>
library(dplyr)
# Calculate total EV models
models_summary <- rebate_data |>
group_by(model) |>
summarize(total_models = n())
print(models_summary)
## # A tibble: 108 Ă— 2
## model total_models
## <chr> <int>
## 1 330e 396
## 2 530e 693
## 3 740e 34
## 4 745e 9
## 5 750e xDrive 3
## 6 A3 e-tron 2
## 7 A7 9
## 8 A8 3
## 9 ARIYA 367
## 10 Audi Q4 e-tron 163
## # ℹ 98 more rows
library(ggplot2)
library(ggrepel)
library(dplyr)
library(ggalluvial)
# Convert 'model' to a factor
models_summary$model <- as.factor(models_summary$model)
# Step 1: Calculate total EV vehicles sold per county
county_summary <- rebate_data %>%
group_by(county) %>%
summarise(total_vehicles = n()) %>%
arrange(desc(total_vehicles))
# Check the county summary
print(county_summary)
## # A tibble: 63 Ă— 2
## county total_vehicles
## <chr> <int>
## 1 Nassau 23655
## 2 Suffolk 22843
## 3 Westchester 14597
## 4 Queens 13026
## 5 Kings 8666
## 6 Monroe 7855
## 7 Erie 6956
## 8 New York 5601
## 9 Onondaga 3964
## 10 Richmond 3702
## # ℹ 53 more rows
# Step 2: Identify top counties (top 5 counties)
top_counties <- county_summary %>%
top_n(5, wt = total_vehicles) %>%
select(county)
# Check the top counties
print(top_counties)
## # A tibble: 5 Ă— 1
## county
## <chr>
## 1 Nassau
## 2 Suffolk
## 3 Westchester
## 4 Queens
## 5 Kings
# Step 3: Filter data to only include top counties
top_county_data <- rebate_data %>%
filter(county %in% top_counties$county)
# Check the top county data
print(top_county_data)
## # A tibble: 82,787 Ă— 11
## data_through_date submitted_date make model county zip ev_type
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 4/30/2024 4/13/2017 BMW X5 Kings 11249 PHEV
## 2 4/30/2024 4/26/2017 Chevrolet Bolt Kings 11213 BEV
## 3 4/30/2024 5/23/2017 BMW X5 Kings 11215 PHEV
## 4 4/30/2024 5/23/2017 Tesla Model X Kings 11217 BEV
## 5 4/30/2024 6/6/2017 Tesla Model S Kings 11223 BEV
## 6 4/30/2024 6/12/2017 BMW 740e Kings 11234 PHEV
## 7 4/30/2024 6/14/2017 Tesla Model X Kings 11230 BEV
## 8 4/30/2024 6/16/2017 Tesla Model X Kings 11217 BEV
## 9 4/30/2024 7/14/2017 Toyota Prius Prime Kings 11237 PHEV
## 10 4/30/2024 7/17/2017 Tesla Model X Kings 11231 BEV
## # ℹ 82,777 more rows
## # ℹ 4 more variables: transaction_type <chr>,
## # `annual_ghg_emissions_reductions_(mt_co2e)` <dbl>,
## # `annual_petroleum_reductions_(gallons)` <dbl>, `rebate_amount_(usd)` <dbl>
# Step 4: Calculate total EV vehicles sold by model within top counties
model_summary_top_counties <- top_county_data %>%
group_by(county, model) %>%
summarise(total_sales = n()) %>%
arrange(desc(total_sales))
## `summarise()` has grouped output by 'county'. You can override using the
## `.groups` argument.
# Check the model summary for top counties
print(model_summary_top_counties)
## # A tibble: 447 Ă— 3
## # Groups: county [5]
## county model total_sales
## <chr> <chr> <int>
## 1 Nassau Model Y 7515
## 2 Queens Model Y 5984
## 3 Suffolk Model Y 4595
## 4 Nassau Model 3 4354
## 5 Westchester Model Y 4026
## 6 Kings Model Y 3746
## 7 Suffolk Model 3 3633
## 8 Westchester Model 3 3027
## 9 Suffolk Prius Prime 2880
## 10 Queens Model 3 2153
## # ℹ 437 more rows
# Step 5: Identify top 5 models in each top county
top_models_per_county <- model_summary_top_counties %>%
group_by(county) %>%
top_n(5, wt = total_sales)
# Check the top models per county
print(top_models_per_county)
## # A tibble: 25 Ă— 3
## # Groups: county [5]
## county model total_sales
## <chr> <chr> <int>
## 1 Nassau Model Y 7515
## 2 Queens Model Y 5984
## 3 Suffolk Model Y 4595
## 4 Nassau Model 3 4354
## 5 Westchester Model Y 4026
## 6 Kings Model Y 3746
## 7 Suffolk Model 3 3633
## 8 Westchester Model 3 3027
## 9 Suffolk Prius Prime 2880
## 10 Queens Model 3 2153
## # ℹ 15 more rows
# Ensure the 'total_sales' column is present in the top_models_per_county dataframe
if(!"total_sales" %in% colnames(top_models_per_county)) {
stop("The 'total_sales' column is not found in the top_models_per_county dataframe.")
}
# Step 6: Create the alluvial plot
ggplot(top_models_per_county,
aes(axis1 = county, axis2 = model, y = total_sales)) +
geom_alluvium(aes(fill = model), width = 0.1) +
geom_stratum(width = 0.1, fill = "gray", color = "black") +
geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("County", "Model"), expand = c(0.15, 0.05)) +
labs(title = "Top 5 EV Models in Top 5 Counties",
y = "Total Sales") +
theme_minimal()
library(alluvial)
library(ggalluvial)
library(ggplot2)
top_5_models <- rebate_data %>%
group_by(model) %>%
summarize(total_records = n()) %>%
arrange(desc(total_records)) %>%
slice_head(n = 5)
print(top_5_models)
## # A tibble: 5 Ă— 2
## model total_records
## <chr> <int>
## 1 Model Y 37658
## 2 Model 3 22596
## 3 Prius Prime 13227
## 4 RAV4 Prime 10990
## 5 Wrangler 7714
library(dplyr)
# Calculate total EV models sold by county
models_summary <- rebate_data %>%
group_by(county,model) %>%
summarise(total_models = n()) %>%
arrange(desc(total_models))
## `summarise()` has grouped output by 'county'. You can override using the
## `.groups` argument.
# View the summary
print(models_summary)
## # A tibble: 3,146 Ă— 3
## # Groups: county [63]
## county model total_models
## <chr> <chr> <int>
## 1 Nassau Model Y 7515
## 2 Queens Model Y 5984
## 3 Suffolk Model Y 4595
## 4 Nassau Model 3 4354
## 5 Westchester Model Y 4026
## 6 Kings Model Y 3746
## 7 Suffolk Model 3 3633
## 8 Westchester Model 3 3027
## 9 Suffolk Prius Prime 2880
## 10 Queens Model 3 2153
## # ℹ 3,136 more rows
library(dplyr)
# Calculate total distinct EV models sold by county
models_summary <- rebate_data %>%
group_by(county) %>%
summarise(total_models = n_distinct(model)) %>%
arrange(desc(total_models))
# View the summary
print(models_summary)
## # A tibble: 63 Ă— 2
## county total_models
## <chr> <int>
## 1 Suffolk 95
## 2 Nassau 93
## 3 New York 92
## 4 Westchester 89
## 5 Monroe 88
## 6 Queens 87
## 7 Erie 85
## 8 Kings 83
## 9 Albany 80
## 10 Onondaga 73
## # ℹ 53 more rows
library(dplyr)
# Calculate total EV models sold by county and model
models_summary <- rebate_data %>%
group_by(county, model) %>%
summarise(total_sales = n()) %>%
arrange(desc(total_sales))
## `summarise()` has grouped output by 'county'. You can override using the
## `.groups` argument.
# View the summary
print(models_summary)
## # A tibble: 3,146 Ă— 3
## # Groups: county [63]
## county model total_sales
## <chr> <chr> <int>
## 1 Nassau Model Y 7515
## 2 Queens Model Y 5984
## 3 Suffolk Model Y 4595
## 4 Nassau Model 3 4354
## 5 Westchester Model Y 4026
## 6 Kings Model Y 3746
## 7 Suffolk Model 3 3633
## 8 Westchester Model 3 3027
## 9 Suffolk Prius Prime 2880
## 10 Queens Model 3 2153
## # ℹ 3,136 more rows
library(dplyr)
# Step 1: Calculate total EV vehicles sold per county
county_summary <- rebate_data %>%
group_by(county) %>%
summarise(total_vehicles = n()) %>%
arrange(desc(total_vehicles))
# Step 2: Identify top counties (top 5 counties)
top_counties <- county_summary %>%
top_n(5, wt = total_vehicles) %>%
select(county)
# Step 3: Filter data to only include top counties
top_county_data <- rebate_data %>%
filter(county %in% top_counties$county)
# Step 4: Calculate total EV vehicles sold by model within top counties
model_summary_top_counties <- top_county_data %>%
group_by(county, model) %>%
summarise(total_sales = n()) %>%
arrange(desc(total_sales))
## `summarise()` has grouped output by 'county'. You can override using the
## `.groups` argument.
# Step 5: Identify top 5 models in each top county
top_models_per_county <- model_summary_top_counties %>%
group_by(county) %>%
top_n(5, wt = total_sales)
# View the summary of top models in top counties
print(top_models_per_county)
## # A tibble: 25 Ă— 3
## # Groups: county [5]
## county model total_sales
## <chr> <chr> <int>
## 1 Nassau Model Y 7515
## 2 Queens Model Y 5984
## 3 Suffolk Model Y 4595
## 4 Nassau Model 3 4354
## 5 Westchester Model Y 4026
## 6 Kings Model Y 3746
## 7 Suffolk Model 3 3633
## 8 Westchester Model 3 3027
## 9 Suffolk Prius Prime 2880
## 10 Queens Model 3 2153
## # ℹ 15 more rows
# Load necessary libraries
library(ggplot2)
library(ggalluvial)
library(dplyr)
# Creating the alluvial plot
ggplot(top_models_per_county,
aes(axis1 = county, axis2 = model, y = total_sales)) +
geom_alluvium(aes(fill = model), width = 0.1) +
geom_stratum(width = 0.1, fill = "gray", color = "black") +
geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("County", "Model"), expand = c(0.15, 0.05)) +
labs(title = "Top 5 EV Models in Top 5 Counties",
y = "Total Sales") +
theme_minimal()
This alluvial plot shows the top 5 most popular electric vehicle models. The x-axis represents the vehicle models, and the y-axis shows the number of rebates. Different colors are used to distinguish the vehicle models. A meaningful title, axis labels, and a caption for the data source are included. The x-axis text is angled for better readability, and the legend is removed for simplicity. Detailed Essay Data Cleaning Process The dataset was cleaned by first checking for any missing values and handling them appropriately. We used the summary() function to identify missing values and then removed any rows with missing data using drop_na(). To ensure data consistency, we converted the date column to a standard date format using the mutate() function from the dplyr library. No irrelevant data was found, so no additional filtering was necessary.
This visualization shows the distribution of rebates over time, highlighting any peaks or troughs in rebate issuance. Different colors represent different vehicle types, making it easy to see which types are more popular at different times. By analyzing the NYSERDA Electric Vehicle Drive Clean Rebate data, we may aim to draw actionable insights that can inform future policies and encourage the adoption of electric vehicles. This analysis not only evaluates the effectiveness of current rebate programs but also provides a foundation for enhancing sustainable transportation initiatives in New York State.
I had a hard time rendering the document and ran into loads of issues with the chunks. Some chunks didn’t provide error messages, but nothing loaded. Also, I finally was able to get the graph, but when I published the document, the graph was missing.