Project 1 - EV Drive Clean Rebate Data

Author

Allenteena Bernard

Source: https://www.pcmag.com/how-to/ev-tax-credits-how-to-get-the-most-money

NYSERDA Electric Vehicle Drive Clean Rebate Data Analysis

The New York State Energy Research and Development Authority (NYSERDA) launched the Drive Clean Rebate program in 2017 to incentive the adoption of electric vehicles (EVs) across New York State. This initiative is part of New York’s broader strategy to reduce greenhouse gas emissions, improve air quality, and promote sustainable transportation solutions. The program offers rebates to consumers who purchase or lease eligible new electric cars, making EVs more affordable and accessible to a broader range of people.

The analysis will focus on the data collected from the inception of the rebate program in 2017. This dataset includes various attributes such as the types of vehicles eligible for the rebate, the number of rebates issued, the geographic distribution of rebates, and trends in EV adoption over time. By examining this data, we aim to uncover insights into the program’s impact, identify patterns in consumer behavior, and evaluate the effectiveness of the rebate incentives in promoting electric vehicle adoption.

The objectives of this analysis will include:

Analyzing the distribution of rebates across different regions and demographic groups to identify trends and disparities.

Assessing the overall impact of the rebate program on EV adoption rates in New York State.

Determining which electric vehicle models are most popular among rebate recipients.

Observing changes and trends in the data over the years since the program’s launch.

Providing insights and recommendations to policymakers based on the findings to enhance the effectiveness of the Drive Clean Rebate program.

##Load the necessary libraries:

library(readr)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(ggplot2)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

getwd()

[1] "C:/Users/cbash/OneDrive/Desktop/DATA 110"

Load the dataset

setwd("C:/Users/cbash/OneDrive/Desktop/DATA 110")
rebate_data <- read_csv("NYSERDA_Electric_Vehicle_Drive_Clean_Rebate_Data.csv")

Rows: 150328 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Data through Date, Submitted Date, Make, Model, County, EV Type, Tr...
dbl (4): ZIP, Annual GHG Emissions Reductions (MT CO2e), Annual Petroleum Re...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(rebate_data)

# A tibble: 6 × 11
  `Data through Date` `Submitted Date` Make      Model    County   ZIP `EV Type`
  <chr>               <chr>            <chr>     <chr>    <chr>  <dbl> <chr>    
1 4/30/2024           5/28/2020        Tesla     Model Y  <NA>   10509 BEV      
2 4/30/2024           8/30/2023        Chevrolet Bolt     <NA>      NA BEV      
3 4/30/2024           11/8/2023        Jeep      Grand C… <NA>   13647 PHEV     
4 4/30/2024           4/4/2024         Toyota    Prius P… <NA>   12922 PHEV     
5 4/30/2024           3/30/2017        Audi      A3 e-tr… Albany 12189 PHEV     
6 4/30/2024           3/30/2017        Toyota    Prius P… Albany 12211 PHEV     
# ℹ 4 more variables: `Transaction Type` <chr>,
#   `Annual GHG Emissions Reductions (MT CO2e)` <dbl>,
#   `Annual Petroleum Reductions (gallons)` <dbl>, `Rebate Amount (USD)` <dbl>

names(rebate_data) <- tolower(names(rebate_data))
names(rebate_data) <- gsub(" ","_",names(rebate_data))
head(rebate_data)

# A tibble: 6 × 11
  data_through_date submitted_date make      model          county   zip ev_type
  <chr>             <chr>          <chr>     <chr>          <chr>  <dbl> <chr>  
1 4/30/2024         5/28/2020      Tesla     Model Y        <NA>   10509 BEV    
2 4/30/2024         8/30/2023      Chevrolet Bolt           <NA>      NA BEV    
3 4/30/2024         11/8/2023      Jeep      Grand Cherokee <NA>   13647 PHEV   
4 4/30/2024         4/4/2024       Toyota    Prius Prime    <NA>   12922 PHEV   
5 4/30/2024         3/30/2017      Audi      A3 e-tron      Albany 12189 PHEV   
6 4/30/2024         3/30/2017      Toyota    Prius Prime    Albany 12211 PHEV   
# ℹ 4 more variables: transaction_type <chr>,
#   `annual_ghg_emissions_reductions_(mt_co2e)` <dbl>,
#   `annual_petroleum_reductions_(gallons)` <dbl>, `rebate_amount_(usd)` <dbl>

library(dplyr)
# Calculate total EV models
models_summary <- rebate_data |>
  group_by(model) |>
  summarize(total_models = n())

print(models_summary)

# A tibble: 108 × 2
   model          total_models
   <chr>                 <int>
 1 330e                    396
 2 530e                    693
 3 740e                     34
 4 745e                      9
 5 750e xDrive               3
 6 A3 e-tron                 2
 7 A7                        9
 8 A8                        3
 9 ARIYA                   367
10 Audi Q4 e-tron          163
# ℹ 98 more rows

Load the alluvial package

library(alluvial)
library(ggalluvial)
library(ggplot2)

#Group by Model, count records, and get the top 5 models with highest records
top_5_models <- rebate_data %>%
  group_by(model) %>%
  summarize(total_records = n()) %>%
  arrange(desc(total_records)) %>%
  slice_head(n = 5)
print(top_5_models)

# A tibble: 5 × 2
  model       total_records
  <chr>               <int>
1 Model Y             37658
2 Model 3             22596
3 Prius Prime         13227
4 RAV4 Prime          10990
5 Wrangler             7714

Filtering the top 10 models

# Filter the original data to include only the top 5 models
rebate_data_top_10 <- rebate_data %>%
  filter(model %in% top_5_models)

# Print the filtered data for the top 5 models
print(head(rebate_data_top_10))

# A tibble: 0 × 11
# ℹ 11 variables: data_through_date <chr>, submitted_date <chr>, make <chr>,
#   model <chr>, county <chr>, zip <dbl>, ev_type <chr>,
#   transaction_type <chr>, annual_ghg_emissions_reductions_(mt_co2e) <dbl>,
#   annual_petroleum_reductions_(gallons) <dbl>, rebate_amount_(usd) <dbl>

Examine the number of EV Vehicles sold by each make per county

#Calculate total EV model by County  
rebate_data_top_10$county <- as.factor(rebate_data_top_10$county)
rebate_data_top_10$model <- as.factor(rebate_data_top_10$model)
print(levels(rebate_data_top_10$model))

character(0)

Creating the graphic

# Load necessary libraries
library(ggplot2)
library(ggalluvial)
library(dplyr)

# Identify the top 5 models
top_5_models <- rebate_data_top_10 %>%
  group_by(model) %>%
  summarise(total_count = n(), .groups = 'drop') %>%
  arrange(desc(total_count)) %>%
  slice_head(n = 5) %>%
  pull(model)

# Filter the data to include only the top 5 models
filtered_rebate_data <- rebate_data_top_10 %>%
  filter(model %in% top_5_models)

# Summarize the data to get the count of rebates per county and model for the top 5 models
rebate_summary <- filtered_rebate_data %>%
  group_by(county, model) %>%
  summarise(count = n(), .groups = 'drop')

# Aggregate less frequent counties into 'Other'
top_counties <- rebate_summary %>%
  group_by(county) %>%
  summarise(total_count = sum(count), .groups = 'drop') %>%
  arrange(desc(total_count)) %>%
  slice_head(n = 10) %>%
  pull(county)

rebate_summary <- rebate_summary %>%
  mutate(county = ifelse(county %in% top_counties, as.character(county), "Other"))

# Creating the alluvial plot
rebates_by_region <- ggplot(rebate_summary,
                            aes(axis1 = county, axis2 = model, y = count)) +
  geom_alluvium(aes(fill = model), width = 0.25) +
  geom_stratum(width = 0.25) +
  geom_text(stat = "stratum", aes(label = after_stat(stratum)), size = 3) +
  scale_x_discrete(limits = c("County", "Model"), expand = c(0.15, 0.05)) +
  labs(
    title = "Rebate Distribution by Region (Top 5 Models)",
    x = "County",
    y = "Number of Rebates",
    fill = "Model",
    caption = "Data Source: NYSERDA Electric Vehicle Drive Clean Rebate Data"
  ) +
  scale_fill_manual(values = c(
    "Model Y" = "green", "Model 3" = "red", "Prius Prime" = "gold", 
    "RAV4 Prime" = "purple", "Wrangler" = "blue"
  )) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

print(rebates_by_region)

Warning: No shared levels found between `names(values)` of the manual scale and the
data's fill values.

This alluvial plot shows the top 5 most popular electric vehicle models. The x-axis represents the vehicle models, and the y-axis shows the number of rebates. Different colors are used to distinguish the vehicle models. A meaningful title, axis labels, and a caption for the data source are included. The x-axis text is angled for better readability, and the legend is removed for simplicity. Detailed Essay Data Cleaning Process The dataset was cleaned by first checking for any missing values and handling them appropriately. We used the summary() function to identify missing values and then removed any rows with missing data using drop_na(). To ensure data consistency, we converted the date column to a standard date format using the mutate() function from the dplyr library. No irrelevant data was found, so no additional filtering was necessary.

This visualization shows the distribution of rebates over time, highlighting any peaks or troughs in rebate issuance. Different colors represent different vehicle types, making it easy to see which types are more popular at different times. By analyzing the NYSERDA Electric Vehicle Drive Clean Rebate data, we may aim to draw actionable insights that can inform future policies and encourage the adoption of electric vehicles. This analysis not only evaluates the effectiveness of current rebate programs but also provides a foundation for enhancing sustainable transportation initiatives in New York State.

I had a hard time rendering the document and ran into loads of issues with the chunks. Some chunks didn’t provide error messages, but nothing loaded. Also, I finally was able to get the graph, but when I published the document, the graph was missing.