Introduction

Electric vehicles have exploded in popularity in recent years, sparking a new arms race among automotive companies. Statistica.com forecasts 1.27 million EV sales in 2024. Age old industry leaders were forced to adapt to changing markets.

Dataset

The Electric Vehicle Population Dataset analyzes the distribution of Battery Electric Vehicles, and Hybrid Electric Vehicles that are registered through Washington State Department of Licensing. The dataset did have a significant amount of null values, such as MSRP of cars, and electric range of vehicles. Check out the the dataset on Kaggle now: Kaggle.com Published on Jan 26, 2024. 166800 rows

Findings

This analysis brings EV growth to life. In just 12 years, EVs in Washington went from 782 in 2011, to over 51,000 in 2023. Starting with 4 companies in 2011, now 15 companies hold marketshare. Tesla quickly shot through the ranks, with 2023 being completely dominated by Tesla sales. In 2023 Tesla sold 48.6% of the market, and the closest competitor was Hyundai at 5.14%.

Tab 1

Basic statistics about the Washington Electric Car dataset.

The amount of NULL values is seen in MSRP by looking at the Mean. The average cost of an EV is not $1153. The Electric Range is skewed due to a large amount of NULL values, and hybrids being included which have a much smaller average range.

Below is the summary statistic of the dataframe

summary(df)
##   VIN (1-10)           County              City              State          
##  Length:166800      Length:166800      Length:166800      Length:166800     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Postal Code      Model Year       Make              Model          
##  Min.   : 1730   Min.   :1997   Length:166800      Length:166800     
##  1st Qu.:98052   1st Qu.:2018   Class :character   Class :character  
##  Median :98122   Median :2021   Mode  :character   Mode  :character  
##  Mean   :98174   Mean   :2020                                        
##  3rd Qu.:98371   3rd Qu.:2023                                        
##  Max.   :99577   Max.   :2024                                        
##  NA's   :5                                                           
##  Electric Vehicle Type Clean Alternative Fuel Vehicle (CAFV) Eligibility
##  Length:166800         Length:166800                                    
##  Class :character      Class :character                                 
##  Mode  :character      Mode  :character                                 
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##  Electric Range     Base MSRP      Legislative District DOL Vehicle ID     
##  Min.   :  0.00   Min.   :     0   Min.   : 1.00        Min.   :     4385  
##  1st Qu.:  0.00   1st Qu.:     0   1st Qu.:18.00        1st Qu.:179074064  
##  Median :  0.00   Median :     0   Median :33.00        Median :224404526  
##  Mean   : 61.51   Mean   :  1153   Mean   :29.18        Mean   :217241994  
##  3rd Qu.: 84.00   3rd Qu.:     0   3rd Qu.:42.00        3rd Qu.:251342132  
##  Max.   :337.00   Max.   :845000   Max.   :49.00        Max.   :479254772  
##                                    NA's   :360                             
##  Vehicle Location   Electric Utility   2020 Census Tract    
##  Length:166800      Length:166800      Min.   : 1001020100  
##  Class :character   Class :character   1st Qu.:53033009701  
##  Mode  :character   Mode  :character   Median :53033029602  
##                                        Mean   :52977091766  
##                                        3rd Qu.:53053073001  
##                                        Max.   :56033000100  
##                                        NA's   :          5


Below is the structure of the datafame

str(df)
## Classes 'data.table' and 'data.frame':   166800 obs. of  17 variables:
##  $ VIN (1-10)                                       : chr  "3C3CFFGE4E" "5YJXCBE40H" "3MW39FS03P" "7PDSGABA8P" ...
##  $ County                                           : chr  "Yakima" "Thurston" "King" "Snohomish" ...
##  $ City                                             : chr  "Yakima" "Olympia" "Renton" "Bothell" ...
##  $ State                                            : chr  "WA" "WA" "WA" "WA" ...
##  $ Postal Code                                      : int  98902 98513 98058 98012 98031 98370 98367 98370 98366 98019 ...
##  $ Model Year                                       : int  2014 2017 2023 2023 2020 2024 2018 2017 2018 2018 ...
##  $ Make                                             : chr  "FIAT" "TESLA" "BMW" "RIVIAN" ...
##  $ Model                                            : chr  "500" "MODEL X" "330E" "R1S" ...
##  $ Electric Vehicle Type                            : chr  "Battery Electric Vehicle (BEV)" "Battery Electric Vehicle (BEV)" "Plug-in Hybrid Electric Vehicle (PHEV)" "Battery Electric Vehicle (BEV)" ...
##  $ Clean Alternative Fuel Vehicle (CAFV) Eligibility: chr  "Clean Alternative Fuel Vehicle Eligible" "Clean Alternative Fuel Vehicle Eligible" "Not eligible due to low battery range" "Eligibility unknown as battery range has not been researched" ...
##  $ Electric Range                                   : int  87 200 20 0 322 39 33 238 215 114 ...
##  $ Base MSRP                                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Legislative District                             : int  14 2 11 21 33 23 26 23 26 45 ...
##  $ DOL Vehicle ID                                   : int  1593721 257167501 224071816 260084653 253771913 259427829 477087012 214494213 280785123 129133343 ...
##  $ Vehicle Location                                 : chr  "POINT (-120.524012 46.5973939)" "POINT (-122.817545 46.98876)" "POINT (-122.1298876 47.4451257)" "POINT (-122.1873 47.820245)" ...
##  $ Electric Utility                                 : chr  "PACIFICORP" "PUGET SOUND ENERGY INC" "PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA)" "PUGET SOUND ENERGY INC" ...
##  $ 2020 Census Tract                                :integer64 53077000700 53067012331 53033025803 53061051927 53033029305 53035940100 53035092902 53035090502 ... 
##  - attr(*, ".internal.selfref")=<externalptr>


Below is the number of columns in the dataframe

nrow(df)
## [1] 166800

Tab 2

Vehicle Sales by Year Pie Chart

This chart shows the rapid rise of EVs sales in Washington. The inner most circle is sales in 2011. Sales in 2011 highlight the initial dominance Nissan had in the EV space with a 88.2% market share (690 cars). Their closest competitor was Chevy at 9.72% market share (76 cars). Tesla was not even a threat to them in 2011 with only 6 cars. This would quickly change in 5 short years to 2016.

In 2016, Nissan had almost doubled their sales in the state to 1,120 but saw their market share dwindle from 88.2% to 20.3%. Competition had already surpassed them. Tesla went from < 1% of market share, to 28.8% market share (1587 cars) in 2016. Tesla was not the only competitor now, 7 other mainstream competitors entered the market. Ford, Kia, BWM, Volkswagen, Audi, Volvo, Hyundai, and many smaller companies were now in the game hovering around 13%-5% sales each. The dynamics would once again change massively from 2016 to 2023.

In 2023, Tesla would fortify their lead by increasing sales from 28.8% market share (1587 cars), to 48.6% market share (24,979 cars). A staggering 15x increase in sales over 7 years. Teslas closest competitor was no longer Nissan, it is now Hyundai. Tesla has a massive lead as Hyundai only has 5.14% share of sales (2639 cars). Nissan, now a thing of the past, holds a 2.5% sales share (1286 cars). In 12 years, Tesla went from last to the leader. In the coming years we will see if Tesla continues to hold its dominance over the electric car scene, as many car companies are desperate to get a cut of the growing market.

# Creating the data frame
make_df <- df %>%
  select(Make, `Model Year`) %>%
  mutate(year = `Model Year`,
         myBrand = ifelse(Make=="TESLA", "Tesla",
                          ifelse(Make=="NISSAN","Nissan",
                                 ifelse(Make=="CHEVROLET", "Chevrolet",
                                        ifelse(Make=="FORD", "Ford",
                                               ifelse(Make=="BMW", "BMW",
                                                      ifelse(Make=="KIA", "Kia",
                                                             ifelse(Make=="TOYOTA", "Toyota",
                                                                    ifelse(Make=="VOLKSWAGEN","Volkswagen",
                                                                           ifelse(Make=="JEEP", 'Jeep',
                                                                                  ifelse(Make=="HYUNDAI","Hyundai",
                                                                                         ifelse(Make=="VOLVO","Volvo",
                                                                                                ifelse(Make=="RIVIAN", "Rivian",
                                                                                                       ifelse(Make=="AUDI","Audi",
                                                                                                              ifelse(Make=="CHRYSLER","Chrysler","Other"))))))))))))))) %>%
  group_by(year, myBrand) %>%
  dplyr::summarise(n=length(myBrand), .groups='keep') %>%
  group_by(year) %>%
  mutate(percent_of_total = round(100*n/sum(n),1)) %>%
  ungroup() %>%
  data.frame()

# Creating the visualization
plot_ly(hole=0.7) %>%
  layout(title = "Electric Cars Sales in (2011, 2016, 2023)") %>%
  add_trace(data = make_df[make_df$year == 2023,],
            labels = ~myBrand,
            values = ~make_df[make_df$year == 2023,"n"],
            type = "pie",
            textposition = "inside",
            hovertemplate = "Year; 2023<br>Brand:%{label}<br>Percent:%{percent}<br>Electric Cars: %{value}<extra></extra>") %>%
  add_trace(data = make_df[make_df$year == 2016,],
            labels = ~myBrand,
            values = ~make_df[make_df$year == 2016,"n"],
            type = "pie",
            textposition = "inside",
            hovertemplate = "Year; 2016<br>Brand:%{label}<br>Percent:%{percent}<br>Electric Cars: %{value}<extra></extra>",
            domain = list(
              x = c(0.16, 0.84),
              y = c(0.16, 0.84))) %>%
  add_trace(data = make_df[make_df$year == 2011,],
            labels = ~myBrand,
            values = ~make_df[make_df$year == 2011,"n"],
            type = "pie",
            textposition = "inside",
            hovertemplate = "Year; 2011<br>Brand:%{label}<br>Percent:%{percent}<br>Electric Cars: %{value}<extra></extra>",
            domain = list(
              x = c(0.27, 0.73),
              y = c(0.27, 0.73)))

Tab 3

Top Electric Cars in Washington

This chart shows the top EV models that were sold in Washington, showing the growth of car models within the State.

In 2020, the Tesla Model Y had sales of 2,335 cars (10th place). This shot up quickly to 6,570 cars in 2021 (4th place). Not stopping its growth there, the 2022 version of the Model Y sold 7,351 models (2nd place). As if the growth over 2 years from ~2,000 to ~7,000 wasn’t impressive enough. The 2023 Model Y over doubled sales with 16,566 models.(1st place)

The closest competitor to Tesla in 2023 was Volkswagen, Hyundai, and Chevrolet. Volkswagen sold 1,956 ID.4 models (11th place). Hyundai sold 1,777 IONIQ 5 (14th place). Chevrolet sold 1,570 Bolt models (15th place).

# Creating a dataframe with the count of car types 
big_tot <- df %>%
  select(Make, Model, `Model Year`) %>%
  group_by(Make, Model, `Model Year`) %>%
  dplyr::summarise(n = n(), .groups = 'keep') %>%
  data.frame()

# Creating a new variable for the full car name to be on the plot
big_tot$MakeModel <- paste(big_tot$Make, big_tot$Model, big_tot$Model.Year)

# Ordering the dataframe
model_df <- big_tot[order(big_tot$n, decreasing = TRUE),]
model_df <- model_df[1:15,]
model_df$Model.Year <- as.factor(model_df$Model.Year)

# Creating a ceiling for the graph. (Prevents parts being cut off)
max_y <- round_any(max(model_df$n), 18000, ceiling)

# Creating the graph
ggplot(model_df, aes(x = reorder(MakeModel, n, sum), y = n, fill = Model.Year)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_fivethirtyeight() +
  labs(title = "Top Electric Cars in Washington", x="", y = "Quantity", fill = "Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Dark2") +
  geom_line(inherit.aes = FALSE, data=model_df, 
            aes(x = MakeModel, y = n, group=1), colour = "#FF0000", linewidth=1) +
  geom_point(inherit.aes = FALSE, data=model_df,
             aes(x = MakeModel,  y = n, group = 1),
             size = 3, shape = 21, fill = "#4D07D0", color = "black") +
  geom_text(data = model_df, aes(x = MakeModel, y = n, label = n, fill = NULL), hjust = -0.15, size=4) +
  theme(legend.background = element_rect(fill = "transparent"),
        legend.box.background = element_rect(fill = "transparent", colour=NA),
        legend.spacing = unit(-1, "lines")) +
  scale_y_continuous(labels = comma, limits=c(0, max_y))


Tab 4

Top Battery Distances(Dataset had many NULL values for 2021-2024)

This chart shows the top 10 driving distances in BEVs. This data was only available for 2020 models and below due to no data being in the dataset for battery ranges 2021 and up.

The top car was the Tesla Model S 2020 with a top distance of 333.5 miles. Standing almost 35 miles above the second place. The Tesla Model 3 took second place at 298.67 miles. Third place went to the Tesla Model X at 291 miles.

Going all the way down to Teslas competition, Hyudai and Chevrolet were the only other companies. Chevrolet held 7th place with 259 miles. Hyundai held 8th & 9th place at 258 miles.

# Data Battery Plot
# Failed dataframe
#bat_df <- df %>%
#  select(Make, Model, `Model Year`, `Electric Range`)%>%
#  distinct(Make, Model, `Model Year`, `Electric Range`) %>%
#  group_by(Make, Model, `Model Year`) %>%
#  data.frame()
#bat_df <- bat_df[order(-bat_df$Electric.Range),]

# The following dataframe was created with the assistance of ChatGPT to assist in the manual labor. 
# The top cars have different battery ranges, but the same make model and year.
# I could not figure out how to combine and average similar records.
# I pasted data from the top 20 records and asked ChatGPT to hard code the dataframe.

# Combine into a new dataframe
bat_df <- data.frame(
  Model = c("TESLA MODEL S 2020", "TESLA MODEL 3 2020", "TESLA MODEL X 2020", "TESLA MODEL X 2019",
            "TESLA MODEL S 2019", "TESLA MODEL S 2012", "CHEVROLET BOLT EV 2020", "HYUNDAI KONA 2019",
            "HYUNDAI KONA 2020", "TESLA MODEL S 2018", "TESLA ROADSTER 2010", "TESLA ROADSTER 2011",
            "KIA NIRO 2019", "KIA NIRO 2020", "CHEVROLET BOLT EV 2017", "CHEVROLET BOLT EV 2019", "TESLA MODEL X 2018"
            , "CHEVROLET BOLT EV 2018", "JAGUAR I-PACE 2019", "JAGUAR I-PACE 2020"),
  Range = c(round(mean(c(337, 330)), 2), round(mean(c(322, 308, 266)), 2), round(mean(c(293, 289)), 2), 289, 270,
            265, 259, 258, 258, 249, 245, 245, 239, 239, 238, 238, 238, 238, 234, 234)
)

bat_df <- head(bat_df, 10)

# This line of code was generated with ChatGPT. I was having errors with my
# color being out of order. ChatGPT suggested to implement this line of code for correction
bat_df$order <- reorder(bat_df$Model, bat_df$Range, sum)

# Creating manual colors from https://r-charts.com/color-palette-generator/
bar_colors <- c("#65ff1d", "#61f236", "#5ee54f", "#5ad968", "#57cc81", "#53bf9b", "#50b2b4", "#4ca6cd", "#4999e6", "#458cff")

ggplot(bat_df, aes(x = order, y = Range, fill = order)) +
  geom_bar(stat = 'identity') +
  scale_fill_manual(values = bar_colors) +
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(title = "Top Driving Distances", x = "Car", y = "Distance") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(size = 8)) +
  geom_text(aes(label = paste(Range, "Miles")), vjust = -1)

Tab 5

Distribution of Electric Vehicle Types Over Years

This heatmap showcases the adoption of Battery Electric Vehicles V.S Plug-in Hybrid Electric Vehicles.

Starting in 2012, there were more PHEVs than BEVs. This trend would reverse in 2015, with a flip of 3,560 BEVs and only 1,273 PHEVs.

Battery electric vehicles would continue to run away with the lead. Throughout 2019-2022 there would be ~4.5 BEVS for every PHEV. This trend would accelerate further with the boom of Tesla sales in 2023. In 2023 there were 6,689 PHEVs which is significant growth from 870 in 2012. However, in 2023 there were 44,462 BEVs, roughly 6.6x the amount of PHEVs. Starting at 760, battery electric vehicles saw staggering growth over just 12 years.

# Creating the data frame to filter for only hybrid vehicles
notElec <- df %>%
  filter(`Model Year`>2011, `Model Year`<2024) %>%
  group_by(`Model Year`, `Electric Vehicle Type`) %>%
  dplyr::summarize(n = n(), .groups = 'keep')

# Creating the heatmap
breaks <- c(seq(0, max(notElec$n), by=5000))


ggplot(notElec, aes(x=`Model Year`, y = `Electric Vehicle Type`, fill = n)) +
  geom_tile(color="black") +
  geom_text(aes(label=comma(n))) +
  labs(title = "Heatmap: Vehicle Type by Year",
       y = "Type of Electric",
       x = "Model Year",
       fill = "Quantity of Cars") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_continuous(low="white", high= "#18FF3B", breaks = breaks) +
  guides(fill = guide_legend(reverse=TRUE, override.aes = list(colour="black")))

Tab 6

Average cost of cars over year (Data is an approximation due to massive amount of NULL MSRP values)

This chart shows the average cost of electric car sales per year. In 2008, there were barely any electric car sales, but had a staggering $94,000 average pricetag. Initial electric car sales were expensive because there was barely a market for them. Electric cars were not mainstream, so mainstream brands ignored them. The highest point in the graph is 2011 at a staggering $109,000.

One year later in 2012 when Nissan took market dominance, they dropped the price to $64,000. This would stay level for 2 years until new people entered the market.

The lowest point in the graph coincides with when 10 companies were all desperate for market dominance. They all began undercutting eachothers prices bringing the lowest point to $32,280. Additionally, in 2016 hybrid electric vehicles still had ~31% of market share.

The graph finishes off in 2020 at ~$81,000. This coincides with when Tesla took market dominance with a 60% market share.

# Creating the cost by year dataframe
cost <- df %>%
  mutate(year = `Model Year`) %>%
  filter(`Base MSRP`>0) %>%
  group_by(year) %>%
  dplyr::summarise(AverageCost = mean(`Base MSRP`, na.rm = TRUE),
            CarCount = n()
  ) %>%
  data.frame()

# Reordering the cost
cost <- cost[order(-cost$year),]


# Getting rid of unique outliers with low sales
aveCost <- cost %>%
  filter(CarCount>5)


# Creating logic for the low and high point in the graph
hi_lo <- aveCost %>%
  filter(AverageCost == min(AverageCost) | AverageCost == max(AverageCost)) %>%
  data.frame()


ggplot(aveCost, aes(x=year, y=AverageCost)) +
  geom_line(color='black', linewidth=1) +
  geom_point(shape=21, size=4, color='red', fill='white') +
  labs(x="Year", y="Average Electric Car Cost", title="Electric Car Cost by Year", caption="Source: Washington Electric Cars (Kaggle)") +
  scale_y_continuous(labels=comma) +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_point(data = hi_lo, aes(x = year, y=AverageCost),shape=21, size=4, fill='red', color='white') +
  geom_label_repel(aes(label= ifelse(AverageCost==max(AverageCost) | AverageCost == min(AverageCost),scales::comma(AverageCost), "")),
                   box.padding = 1,
                   point.padding = 1,
                   size=4,
                   color="Grey50",
                   segment.color = 'darkblue')

Conclusion

Over the past decade, the electric vehicle market in Washington has seen dramatic shifts. Starting initially with just 4 companies and around a thousand sales, it exploded over the decade. Tesla beat out the competition and took market dominance. The Tesla Model Y sold 16,566 models in 2023. The closest competitor was Volkswagen with 1,956 sales in 2023. Battery size/distance has increased through the years, the biggest battery is in the Tesla Model S at 333.5 miles. Hybrids initially outnumbered the Battery Electric Vehicles, but eventually the BEVs won out, vastly outnumbering the hybrids. The average cost of electric cars was initially above 100,000 in 2011, and then dropped down to 32,280 in 2016 as competition intensified. The cost rose back up to ~80,000 in 2020. These graphs highlight the dynamic and rapidly evolving landscape of the EV market.