Project#1 - EV Vehicle Rebate Program NY

Source: https://www.pcmag.com/how-to/ev-tax-credits-how-to-get-the-most-money

NYSERDA Electric Vehicle Drive Clean Rebate Data Analysis

The New York State Energy Research and Development Authority (NYSERDA) launched the Drive Clean Rebate program in 2017 to incentive the adoption of electric vehicles (EVs) across New York State. This initiative is part of New York’s broader strategy to reduce greenhouse gas emissions, improve air quality, and promote sustainable transportation solutions. The program offers rebates to consumers who purchase or lease eligible new electric cars, making EVs more affordable and accessible to a broader range of people.

The analysis will focus on the data collected from the inception of the rebate program in 2017. This dataset includes various attributes such as the types of vehicles eligible for the rebate, the number of rebates issued, the geographic distribution of rebates, and trends in EV adoption over time. By examining this data, we aim to uncover insights into the program’s impact, identify patterns in consumer behavior, and evaluate the effectiveness of the rebate incentives in promoting electric vehicle adoption.

The objectives of this analysis will include:

Analyzing the distribution of rebates across different regions and demographic groups to identify trends and disparities.

Assessing the overall impact of the rebate program on EV adoption rates in New York State.

Determining which electric vehicle models are most popular among rebate recipients.

Observing changes and trends in the data over the years since the program’s launch.

Providing insights and recommendations to policymakers based on the findings to enhance the effectiveness.

##Load the necessary libraries:

library(readr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load the dataset

setwd("C:/Users/cbash/OneDrive/Desktop/DATA 110")
rebate_data <- read_csv("NYSERDA_Electric_Vehicle_Drive_Clean_Rebate_Data.csv")

## Rows: 150328 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): Data through Date, Submitted Date, Make, Model, County, EV Type, Tr...
## dbl (4): ZIP, Annual GHG Emissions Reductions (MT CO2e), Annual Petroleum Re...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(rebate_data)

## # A tibble: 6 × 11
##   `Data through Date` `Submitted Date` Make      Model    County   ZIP `EV Type`
##   <chr>               <chr>            <chr>     <chr>    <chr>  <dbl> <chr>    
## 1 4/30/2024           5/28/2020        Tesla     Model Y  <NA>   10509 BEV      
## 2 4/30/2024           8/30/2023        Chevrolet Bolt     <NA>      NA BEV      
## 3 4/30/2024           11/8/2023        Jeep      Grand C… <NA>   13647 PHEV     
## 4 4/30/2024           4/4/2024         Toyota    Prius P… <NA>   12922 PHEV     
## 5 4/30/2024           3/30/2017        Audi      A3 e-tr… Albany 12189 PHEV     
## 6 4/30/2024           3/30/2017        Toyota    Prius P… Albany 12211 PHEV     
## # ℹ 4 more variables: `Transaction Type` <chr>,
## #   `Annual GHG Emissions Reductions (MT CO2e)` <dbl>,
## #   `Annual Petroleum Reductions (gallons)` <dbl>, `Rebate Amount (USD)` <dbl>

Correct the titles

names(rebate_data) <- tolower(names(rebate_data))
names(rebate_data) <- gsub(" ","_",names(rebate_data))
head(rebate_data)

## # A tibble: 6 × 11
##   data_through_date submitted_date make      model          county   zip ev_type
##   <chr>             <chr>          <chr>     <chr>          <chr>  <dbl> <chr>  
## 1 4/30/2024         5/28/2020      Tesla     Model Y        <NA>   10509 BEV    
## 2 4/30/2024         8/30/2023      Chevrolet Bolt           <NA>      NA BEV    
## 3 4/30/2024         11/8/2023      Jeep      Grand Cherokee <NA>   13647 PHEV   
## 4 4/30/2024         4/4/2024       Toyota    Prius Prime    <NA>   12922 PHEV   
## 5 4/30/2024         3/30/2017      Audi      A3 e-tron      Albany 12189 PHEV   
## 6 4/30/2024         3/30/2017      Toyota    Prius Prime    Albany 12211 PHEV   
## # ℹ 4 more variables: transaction_type <chr>,
## #   `annual_ghg_emissions_reductions_(mt_co2e)` <dbl>,
## #   `annual_petroleum_reductions_(gallons)` <dbl>, `rebate_amount_(usd)` <dbl>

Calculate the total electric vehicles sold by models

library(dplyr)
# Calculate total EV models
models_summary <- rebate_data |>
  group_by(model) |>
  summarize(total_models = n())

print(models_summary)

## # A tibble: 108 × 2
##    model          total_models
##    <chr>                 <int>
##  1 330e                    396
##  2 530e                    693
##  3 740e                     34
##  4 745e                      9
##  5 750e xDrive               3
##  6 A3 e-tron                 2
##  7 A7                        9
##  8 A8                        3
##  9 ARIYA                   367
## 10 Audi Q4 e-tron          163
## # ℹ 98 more rows

library(ggplot2)
library(ggrepel)
library(dplyr)
library(ggalluvial)

Creating a graphic for the model sales

# Convert 'model' to a factor
models_summary$model <- as.factor(models_summary$model)

# Step 1: Calculate total EV vehicles sold per county
county_summary <- rebate_data %>%
  group_by(county) %>%
  summarise(total_vehicles = n()) %>%
  arrange(desc(total_vehicles))

# Check the county summary
print(county_summary)

## # A tibble: 63 × 2
##    county      total_vehicles
##    <chr>                <int>
##  1 Nassau               23655
##  2 Suffolk              22843
##  3 Westchester          14597
##  4 Queens               13026
##  5 Kings                 8666
##  6 Monroe                7855
##  7 Erie                  6956
##  8 New York              5601
##  9 Onondaga              3964
## 10 Richmond              3702
## # ℹ 53 more rows

# Step 2: Identify top counties (top 5 counties)
top_counties <- county_summary %>%
  top_n(5, wt = total_vehicles) %>%
  select(county)

# Check the top counties
print(top_counties)

## # A tibble: 5 × 1
##   county     
##   <chr>      
## 1 Nassau     
## 2 Suffolk    
## 3 Westchester
## 4 Queens     
## 5 Kings

# Step 3: Filter data to only include top counties
top_county_data <- rebate_data %>%
  filter(county %in% top_counties$county)

# Check the top county data
print(top_county_data)

## # A tibble: 82,787 × 11
##    data_through_date submitted_date make      model       county   zip ev_type
##    <chr>             <chr>          <chr>     <chr>       <chr>  <dbl> <chr>  
##  1 4/30/2024         4/13/2017      BMW       X5          Kings  11249 PHEV   
##  2 4/30/2024         4/26/2017      Chevrolet Bolt        Kings  11213 BEV    
##  3 4/30/2024         5/23/2017      BMW       X5          Kings  11215 PHEV   
##  4 4/30/2024         5/23/2017      Tesla     Model X     Kings  11217 BEV    
##  5 4/30/2024         6/6/2017       Tesla     Model S     Kings  11223 BEV    
##  6 4/30/2024         6/12/2017      BMW       740e        Kings  11234 PHEV   
##  7 4/30/2024         6/14/2017      Tesla     Model X     Kings  11230 BEV    
##  8 4/30/2024         6/16/2017      Tesla     Model X     Kings  11217 BEV    
##  9 4/30/2024         7/14/2017      Toyota    Prius Prime Kings  11237 PHEV   
## 10 4/30/2024         7/17/2017      Tesla     Model X     Kings  11231 BEV    
## # ℹ 82,777 more rows
## # ℹ 4 more variables: transaction_type <chr>,
## #   `annual_ghg_emissions_reductions_(mt_co2e)` <dbl>,
## #   `annual_petroleum_reductions_(gallons)` <dbl>, `rebate_amount_(usd)` <dbl>

# Step 4: Calculate total EV vehicles sold by model within top counties
model_summary_top_counties <- top_county_data %>%
  group_by(county, model) %>%
  summarise(total_sales = n()) %>%
  arrange(desc(total_sales))

## `summarise()` has grouped output by 'county'. You can override using the
## `.groups` argument.

# Check the model summary for top counties
print(model_summary_top_counties)

## # A tibble: 447 × 3
## # Groups:   county [5]
##    county      model       total_sales
##    <chr>       <chr>             <int>
##  1 Nassau      Model Y            7515
##  2 Queens      Model Y            5984
##  3 Suffolk     Model Y            4595
##  4 Nassau      Model 3            4354
##  5 Westchester Model Y            4026
##  6 Kings       Model Y            3746
##  7 Suffolk     Model 3            3633
##  8 Westchester Model 3            3027
##  9 Suffolk     Prius Prime        2880
## 10 Queens      Model 3            2153
## # ℹ 437 more rows

# Step 5: Identify top 5 models in each top county
top_models_per_county <- model_summary_top_counties %>%
  group_by(county) %>%
  top_n(5, wt = total_sales)

# Check the top models per county
print(top_models_per_county)

## # A tibble: 25 × 3
## # Groups:   county [5]
##    county      model       total_sales
##    <chr>       <chr>             <int>
##  1 Nassau      Model Y            7515
##  2 Queens      Model Y            5984
##  3 Suffolk     Model Y            4595
##  4 Nassau      Model 3            4354
##  5 Westchester Model Y            4026
##  6 Kings       Model Y            3746
##  7 Suffolk     Model 3            3633
##  8 Westchester Model 3            3027
##  9 Suffolk     Prius Prime        2880
## 10 Queens      Model 3            2153
## # ℹ 15 more rows

# Ensure the 'total_sales' column is present in the top_models_per_county dataframe
if(!"total_sales" %in% colnames(top_models_per_county)) {
  stop("The 'total_sales' column is not found in the top_models_per_county dataframe.")
}

# Step 6: Create the alluvial plot
ggplot(top_models_per_county, 
       aes(axis1 = county, axis2 = model, y = total_sales)) +
  geom_alluvium(aes(fill = model), width = 0.1) +
  geom_stratum(width = 0.1, fill = "gray", color = "black") +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("County", "Model"), expand = c(0.15, 0.05)) +
  labs(title = "Top 5 EV Models in Top 5 Counties",
       y = "Total Sales") +
  theme_minimal()

Load the Alluvial package

library(alluvial)
library(ggalluvial)
library(ggplot2)

The top 5 models with highest records

top_5_models <- rebate_data %>%
  group_by(model) %>%
  summarize(total_records = n()) %>%
  arrange(desc(total_records)) %>%
  slice_head(n = 5)
print(top_5_models)

## # A tibble: 5 × 2
##   model       total_records
##   <chr>               <int>
## 1 Model Y             37658
## 2 Model 3             22596
## 3 Prius Prime         13227
## 4 RAV4 Prime          10990
## 5 Wrangler             7714

Examine the number of electric ehicles sold by each make per county

library(dplyr)

# Calculate total EV models sold by county
models_summary <- rebate_data %>%
  group_by(county,model) %>%
  summarise(total_models = n()) %>%
  arrange(desc(total_models))

## `summarise()` has grouped output by 'county'. You can override using the
## `.groups` argument.

# View the summary
print(models_summary)

## # A tibble: 3,146 × 3
## # Groups:   county [63]
##    county      model       total_models
##    <chr>       <chr>              <int>
##  1 Nassau      Model Y             7515
##  2 Queens      Model Y             5984
##  3 Suffolk     Model Y             4595
##  4 Nassau      Model 3             4354
##  5 Westchester Model Y             4026
##  6 Kings       Model Y             3746
##  7 Suffolk     Model 3             3633
##  8 Westchester Model 3             3027
##  9 Suffolk     Prius Prime         2880
## 10 Queens      Model 3             2153
## # ℹ 3,136 more rows

Total distinct EV models sold by county

library(dplyr)

# Calculate total distinct EV models sold by county
models_summary <- rebate_data %>%
  group_by(county) %>%
  summarise(total_models = n_distinct(model)) %>%
  arrange(desc(total_models))

# View the summary
print(models_summary)

## # A tibble: 63 × 2
##    county      total_models
##    <chr>              <int>
##  1 Suffolk               95
##  2 Nassau                93
##  3 New York              92
##  4 Westchester           89
##  5 Monroe                88
##  6 Queens                87
##  7 Erie                  85
##  8 Kings                 83
##  9 Albany                80
## 10 Onondaga              73
## # ℹ 53 more rows

Total EV Models sold by county and model

library(dplyr)

# Calculate total EV models sold by county and model
models_summary <- rebate_data %>%
  group_by(county, model) %>%
  summarise(total_sales = n()) %>%
  arrange(desc(total_sales))

## `summarise()` has grouped output by 'county'. You can override using the
## `.groups` argument.

# View the summary
print(models_summary)

## # A tibble: 3,146 × 3
## # Groups:   county [63]
##    county      model       total_sales
##    <chr>       <chr>             <int>
##  1 Nassau      Model Y            7515
##  2 Queens      Model Y            5984
##  3 Suffolk     Model Y            4595
##  4 Nassau      Model 3            4354
##  5 Westchester Model Y            4026
##  6 Kings       Model Y            3746
##  7 Suffolk     Model 3            3633
##  8 Westchester Model 3            3027
##  9 Suffolk     Prius Prime        2880
## 10 Queens      Model 3            2153
## # ℹ 3,136 more rows

Top 5 EV models with in top 5 counties.

library(dplyr)

# Step 1: Calculate total EV vehicles sold per county
county_summary <- rebate_data %>%
  group_by(county) %>%
  summarise(total_vehicles = n()) %>%
  arrange(desc(total_vehicles))

# Step 2: Identify top counties (top 5 counties)
top_counties <- county_summary %>%
  top_n(5, wt = total_vehicles) %>%
  select(county)

# Step 3: Filter data to only include top counties
top_county_data <- rebate_data %>%
  filter(county %in% top_counties$county)

# Step 4: Calculate total EV vehicles sold by model within top counties
model_summary_top_counties <- top_county_data %>%
  group_by(county, model) %>%
  summarise(total_sales = n()) %>%
  arrange(desc(total_sales))

## `summarise()` has grouped output by 'county'. You can override using the
## `.groups` argument.

# Step 5: Identify top 5 models in each top county
top_models_per_county <- model_summary_top_counties %>%
  group_by(county) %>%
  top_n(5, wt = total_sales)

# View the summary of top models in top counties
print(top_models_per_county)

## # A tibble: 25 × 3
## # Groups:   county [5]
##    county      model       total_sales
##    <chr>       <chr>             <int>
##  1 Nassau      Model Y            7515
##  2 Queens      Model Y            5984
##  3 Suffolk     Model Y            4595
##  4 Nassau      Model 3            4354
##  5 Westchester Model Y            4026
##  6 Kings       Model Y            3746
##  7 Suffolk     Model 3            3633
##  8 Westchester Model 3            3027
##  9 Suffolk     Prius Prime        2880
## 10 Queens      Model 3            2153
## # ℹ 15 more rows

Creating the Graphic

# Load necessary libraries
library(ggplot2)
library(ggalluvial)
library(dplyr)

# Creating the alluvial plot
ggplot(top_models_per_county, 
       aes(axis1 = county, axis2 = model, y = total_sales)) +
  geom_alluvium(aes(fill = model), width = 0.1) +
  geom_stratum(width = 0.1, fill = "gray", color = "black") +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("County", "Model"), expand = c(0.15, 0.05)) +
  labs(title = "Top 5 EV Models in Top 5 Counties",
       y = "Total Sales") +
  theme_minimal()

This alluvial plot shows the top 5 most popular electric vehicle models. The x-axis represents the vehicle models, and the y-axis shows the number of rebates. Different colors are used to distinguish the vehicle models. A meaningful title, axis labels, and a caption for the data source are included. The x-axis text is angled for better readability, and the legend is removed for simplicity. Detailed Essay Data Cleaning Process The dataset was cleaned by first checking for any missing values and handling them appropriately. We used the summary() function to identify missing values and then removed any rows with missing data using drop_na(). To ensure data consistency, we converted the date column to a standard date format using the mutate() function from the dplyr library. No irrelevant data was found, so no additional filtering was necessary.

This visualization shows the distribution of rebates over time, highlighting any peaks or troughs in rebate issuance. Different colors represent different vehicle types, making it easy to see which types are more popular at different times. By analyzing the NYSERDA Electric Vehicle Drive Clean Rebate data, we may aim to draw actionable insights that can inform future policies and encourage the adoption of electric vehicles. This analysis not only evaluates the effectiveness of current rebate programs but also provides a foundation for enhancing sustainable transportation initiatives in New York State.

I had a hard time rendering the document and ran into loads of issues with the chunks. Some chunks didn’t provide error messages, but nothing loaded. Also, I finally was able to get the graph, but when I published the document, the graph was missing.