Electric_Vehicle_Population

Author

Zijin Wang

Introduction

This dataset shows the Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs) that are currently registered through Washington State Department of Licensing(DOL)

The data was sourced from the Washington State DOL and includes various attributes of the vehicles such as make, model, and electric range. I chose this dataset due to a personal and scholarly interest in sustainable transportation, a field where EVs are revolutionizing our approach to mobility. This area is not only technologically progressive but also critical in mitigating environmental impacts such as carbon emissions and fossil fuel dependency.

The momentum behind the electric vehicle (EV) revolution is undeniably significant. According to the International Energy Agency (IEA), global electric car sales have witnessed an extraordinary surge, doubling within a single year to establish a new record in 2021 (IEA, 2021). This surge can be attributed to a confluence of factors driving EV adoption. Governments across the globe are offering incentives to mitigate the higher initial costs of EVs, advancements in battery technology are extending electric range capabilities while simultaneously reducing prices, and consumers are becoming increasingly cognizant of the environmental consequences of their transportation choices.

The visualization presented in the document represents a boxplot that illustrates the distribution of electric range among different vehicle makes. Electric range is a pivotal factor for EV consumers, as it significantly influences the vehicle’s usability and suitability for their needs. Upon analyzing the boxplot, several intriguing patterns emerge. Some vehicle makes consistently offer longer electric ranges compared to others. This pattern can assist potential EV buyers in making informed decisions about the range options available within different makes. Additionally, the boxplot highlights the presence of outliers, signifying vehicles with exceptional electric ranges that may be of interest to consumers seeking extended range capabilities.

While the visualization effectively conveys the distribution of electric range, further exploration could delve into regional variations within the state or correlate electric range with other attributes like vehicle price. Nonetheless, the presented visualization underscores the significance of electric range in the context of EV selection, assisting consumers in making informed choices.

References

International Energy Agency. (2021). Global EV Outlook 2021.
https://www.iea.org/reports/global-ev-outlook-2021

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(readr)
library(dplyr)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library(viridis)
Loading required package: viridisLite

Data Loading and Cleaning

Load the data

electric_vehicle_data <- read_csv("/Users/zwang30/Downloads/Electric_Vehicle_Population_Data.csv")
Rows: 159467 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (11): VIN (1-10), County, City, State, Make, Model, Electric Vehicle Typ...
dbl  (6): Postal Code, Model Year, Electric Range, Base MSRP, Legislative Di...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Filter for the years 2015 to 2023

electric_vehicle_data <- electric_vehicle_data %>%
  filter(`Model Year` >= 2015, `Model Year` <= 2023)

Find the top 10 makes by count within the year range

top_makes <- electric_vehicle_data %>%
  count(Make) %>%
  top_n(10, wt = n) %>%
  pull(Make)

Filter the data to include only the top 10 makes

filtered_data <- electric_vehicle_data %>%
  filter(Make %in% top_makes)

Visualization 1: Distribution of Electric Range in Electric Vehicles

Prepare the data by selecting the ‘Electric Range’ for the histogram

prepared_data <- electric_vehicle_data %>%
  select(`Electric Range`) %>%
  filter(!is.na(`Electric Range`))

Create the histogram

histogram <- ggplot(prepared_data, aes(x = `Electric Range`)) +
  geom_histogram(bins = 30, fill = "skyblue", color = "black") +
  
  scale_x_continuous(breaks = seq(0, max(prepared_data$`Electric Range`), by = 50)) + # X-axis breaks
  labs(
    title = 'Distribution of Electric Range in Electric Vehicles',
    x = 'Electric Range (miles)',
    y = 'Count of Vehicles',
    caption = 'Data Source: Washington State Department of Licensing (DOL)'
  ) +
  theme_minimal() 


interactive_histogram <- ggplotly(histogram)


print(interactive_histogram)

interactive_histogram
Data Source: Washington State Department of Licensing (DOL)

Summary Visualization 1

The histogram divides the electric range values into 30 bins and represents them using a sky-blue fill color and a black border. The x-axis of the histogram displays the electric range in miles, with breaks at regular intervals for clarity. On the y-axis, it shows the count of vehicles falling within each electric range bin. The title of the plot is “Distribution of Electric Range in Electric Vehicles,” providing a clear context for the visualization. Additionally, the plot includes a caption indicating the data source, which is essential for proper attribution. This visualization is effective in illustrating the spread of electric range values among electric vehicles, allowing viewers to understand the distribution and identify patterns or trends. It’s particularly valuable for consumers interested in EVs, as it provides insights into the range options available within this category of vehicles. The use of an interactive plot, generated with ggplotly, enhances user engagement by allowing them to explore the data further, such as hovering over specific bins to see detailed information.

Statistical Analysis

Group the filtered data by ‘Model Year’ and ‘Make’ and calculate the count of electric vehicles for each combination

bar_data <- filtered_data %>%
  group_by(`Model Year`, Make) %>%
  summarise(Count = n(), .groups = 'drop')

Calculate the average count of electric vehicles for each Model Year and Merge the average count data back into filtered_data

average_count_by_year <- bar_data %>%
  group_by(`Model Year`) %>%
  summarise(Average_Count = mean(Count))

merged_data <- filtered_data %>%
  left_join(average_count_by_year, by = c("Model Year" = "Model Year"))

Perform a correlation analysis between ‘Electric Range’ and ‘Average Count of Electric Vehicles’

correlation <- cor(merged_data$`Electric Range`, merged_data$Average_Count, use = "complete.obs")

cat("Correlation between Electric Range and Average Count of Electric Vehicles:", correlation, "\n")
Correlation between Electric Range and Average Count of Electric Vehicles: -0.6137997 

In the analysis, I performed a correlation analysis between ‘Electric Range’ and ‘Average Count of Electric Vehicles.’ The calculated correlation coefficient is approximately -0.6137997.

The correlation coefficient measures the strength and direction of the linear relationship between two variables. In this case, the correlation coefficient of approximately -0.614 suggests a moderately strong negative linear relationship between the electric range of vehicles and the average count of electric vehicles by make and model year.

A negative correlation indicates that as one variable (electric range) increases, the other variable (average count of electric vehicles) tends to decrease. In practical terms, this means that makes and models of electric vehicles with longer electric ranges tend to have lower average counts, while those with shorter electric ranges have higher average counts.

Visualization 2: Average Count of Electric Vehicles by Make and Model Year

Create the bar graph in Plotly

interactive_bar_graph <- plot_ly(bar_data, x = ~`Model Year`, y = ~Count, color = ~Make, colors = viridis_pal(option = "D")(length(unique(bar_data$Make))), type = 'bar') %>%
  layout(barmode = 'group',
         title = 'Average Count of Electric Vehicles by Make and Model Year',
         xaxis = list(title = 'Model Year'),
         yaxis = list(title = 'Average Count'),
         annotations = list(
           list(
             text = 'Data Source: Washington State Department of Licensing (DOL)',
             xref = 'paper',
             yref = 'paper',
             x = 0.5,
             y = -0.15,
             showarrow = FALSE
           )
         )
       )
interactive_bar_graph
Data Source: Washington State Department of Licensing (DOL)

Summary Visualization 2

The visualization at hand is a bar graph entitled “Average Count of Electric Vehicles by Make and Model Year,” crafted using Plotly for an interactive experience. This graph depicts the number of EVs and PHEVs registered each year, disaggregated by the manufacturer. A captivating pattern emerges from the visualization: a significant uptick in the number of EVs from certain manufacturers, signaling a market inclination towards these vehicles. Interestingly, while some brands show a steady rise, others exhibit sporadic growth, indicating varied market strategies or consumer preferences. One intriguing aspect of the data was the observable spike in registrations around specific years, which could correlate with the introduction of new models or increased incentives. A limitation encountered was the inability to drill down into regional variations within the state, which could have provided more granular insights into adoption patterns. In conclusion, the visualization underscores the burgeoning narrative of EVs in modern transport, a testament to technological innovation and environmental stewardship. Despite the inability to dissect the data collection methodology, the dataset remains a valuable asset in understanding the trajectory of electric vehicle adoption.

Final Visualization: Distribution of Electric Range Among Different Vehicle Makes (Boxplot)

Here is another type of visualization, a boxplot, which can show the spread of the electric range for different makes.

interactive_boxplot_graph <- plot_ly(filtered_data, y = ~`Electric Range`, color = ~Make, colors = viridis_pal(option = "D")(length(unique(filtered_data$Make))), type = 'box') %>%
  layout(title = "Distribution of Electric Range Among Different Vehicle Makes",
         xaxis = list(title = 'Make'),
         yaxis = list(title = 'Electric Range (miles)'),
         annotations = list(
           list(
             text = 'Data Source: Washington State Department of Licensing (DOL)',
             xref = 'paper',
             yref = 'paper',
             x = 0.5,
             y = -0.19,
             showarrow = FALSE
           )
         )
       )

interactive_boxplot_graph

Summary Final Visualization

This visualization focuses on the distribution of electric range among different vehicle makes, using a boxplot. Electric range is a critical factor for EV consumers, as it directly impacts the vehicle’s usability. The boxplot visualization provides valuable insights into the distribution of electric range among different vehicle makes. It allows us to compare the electric range spread across various makes. The use of the viridis color palette enhances the plot, distinguishing between different makes. The boxplots reveal interesting patterns, such as certain makes consistently offering longer electric ranges compared to others. These patterns can inform consumers about the range options available within different makes. Overall, this visualization provides valuable insights into the distribution of electric range among different vehicle makes, helping consumers make informed choices when selecting an electric vehicle.

In conclusion, these visualizations shed light on the trends in electric vehicle adoption and the variation in electric range among different vehicle makes. While the data collection methodology remains unspecified, the insights gained from these visualizations contribute to a better understanding of the electric vehicle landscape and its significance in addressing environmental concerns.