Tesla has reduced the price of its Electric Vehicle(EV) models 3 times within a year .
The question we are seeking to answer is why is Tesla’s EV sales and price declining though Sales were projected to grow by 30% in 2025.
Possible reasons can be growing competitors, more incentives, more inventory, insufficient charging infrastructure.
Benefit of doing this work is understanding the market conditions to make better decisions from the consumer side as well as from the auto industry side.
library(tidyverse)
library(pdftools)
library(knitr)
library(gifski)
library(png)
library(gganimate)
library(stats)
library(corrplot)
#Load the original .csv file from Global Electric Vehicle Dataset 2023
#file location: https://www.kaggle.com/datasets/padmapiyush/global-electric-vehicle-dataset-2023/data
EV_Raw <- read.csv("https://raw.githubusercontent.com/datanerddhanya/DATA607/main/finalproject/Global%20EV%202023.csv", header = TRUE)
#filter to required parameters and mode as we do not need vans, buses and trucks
EV_Data <- EV_Raw %>%
filter( parameter == "EV sales" |parameter == "EV stock" | parameter == "EV charging points") %>%
filter( mode == "Cars" | mode == "EV")
# for the future years 2025 and 2030, change the category to projection
EV_Data <- EV_Data %>%
mutate( category = ifelse ( year =="2025" | year =="2030","Projection",category ))
# to remove the scientific display format
options(scipen = 999)
# filter the required records as we do not need projected records for the previous years
# filter to years 2021 onwards
# remove groupings within the region variable so that we retain only individual countries.
EV_Data <- EV_Data %>%
filter( category == "Projection" | category == "Historical" ) %>%
filter( year > 2020) %>%
subset(!(region %in% c('EU27', 'Europe', 'Other Europe', 'Rest of the world', 'World'))) %>%
mutate( value = round(as.numeric(value))) %>%
mutate( year = as.factor(as.numeric(year)))
# to find the top countries of interest
# i see that the following are the top countries in EV sales
#"China" , "USA", "India", "Germany", "United Kingdom"
EV_Data |>
filter( parameter == "EV sales" ) |>
filter( mode == "Cars") |>
group_by(region) |>
arrange(desc(value)) |>
ggplot( aes(x= value,y = reorder(region,value))) +
geom_bar(stat = "identity", position = "dodge", fill = "green") +
labs(title = "Top Countries in EV Sales for all years from 2021 till date", x= "EV Sales" , y = "Countries")
#hybrid vs battery vehicles
EV_Data |>
filter( parameter == "EV sales" ) |>
filter( mode == "Cars") |>
ggplot( aes(x= year ,y = value , color = powertrain)) +
geom_point(stat = "identity") +
labs(title = "Battery Electric Vehicle vs Plugin Hybrid Electric Vehicle", x= "Year" , y = "EV Sales")
# Bar plot
EV_Data |>
filter( region %in% c("China" , "USA", "India", "Germany", "United Kingdom")) |>
ggplot( aes(x= year,y = value, fill = parameter)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ region, scales = "free") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_fill_brewer(palette = "Set3") +
scale_x_discrete(breaks = unique(EV_Data$year)) +
labs(title = "EV Production units, Sales Units and Charging points by region by year", x= "Year" , y = "Units")
# Line plot
EV_Data |>
filter( region %in% c("China" , "USA", "India", "Germany", "United Kingdom")) |>
ggplot(aes(x = year, y = value, color = region, group = region)) +
geom_line() +
facet_wrap(~ parameter, scales = "free_y") +
labs(title = "Electric Vehicle Data Trends by Region",
x = "Year",
y = "Value") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_color_brewer(palette = "Set2")
# Faceted plot
EV_Data |>
filter( region %in% c("China" , "USA", "India", "Germany", "United Kingdom")) |>
ggplot(aes(x = year, y = value)) +
geom_line() +
facet_grid(parameter ~ region, scales = "free") +
labs(title = "Electric Vehicle Data Trends by Parameter and Region",
x = "Year",
y = "Value") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_color_brewer(palette = "Set2")
Popular countries to sell EV vehicles are China, India, USA, United Kingdom .
BEV powertrain/ Battery operated electric vehicle is becoming more and more popular every year. It is highly sold than Plug-In Hybrid Electric Vehicle.
USA is projected to have huge stock of EV cars, however the EV public charging points are relatively less.The availability of charging stations is hence a big issue. This may be a concern for consumers who prefer to have easily accessible charging points.Hence lower sales for EV cars.
Based on the bar plot, i see that the EV stock/ production is much more than the EV sales for each country. This may cause the Sale price to decrease due to unsold inventory.
Gather Tesla Production and sales units for 2023 and 2024 File location: https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2024-Update.pdf
#read data from the pdf file
pdf_text <- pdf_text("https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2024-Update.pdf")
# Extract text from the 6th page
page_text <- pdf_text("https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2024-Update.pdf")[6]
# Split the text into lines
lines <- strsplit(page_text, "\n")[[1]]
# Identify and extract the lines corresponding to the table
# Adjust these indices based on where the table is located in the text
table_lines <- lines[3:11]
# Assuming the table is tabular, i can then further process the lines to create a data frame
# Define a regular expression for splitting by one or more spaces
split_regex <- "\\s+"
# Split each line by the regular expression
table_data <- lapply(table_lines, function(line) strsplit(line, split_regex )[[1]])
# Convert the list to a data frame
df <- as.data.frame(do.call(rbind, table_data))
# filter only required observations
# rename columns
# concatenate two variables into one
# remove the unwanted columns
Tesla_data <- df |>
filter(V2 == "Total") |>
rename ( type = "V3", Q1_2023 = "V4" , Q2_2023 = "V5", Q3_2023 = "V6", Q4_2023 = "V7", Q1_2024 ="V8", YoY = "V9") |>
mutate ( type = paste(V2, " ", type))|>
subset(select= -c(V1,V2,V10,YoY) )
#transform to pivot longer
Tesla_stats <- Tesla_data |>
pivot_longer(
cols = starts_with("Q"),
names_to = "Time",
values_to = "Units",
values_drop_na = TRUE
) |>
mutate( time.val = Time )
#remove the thousand separator and convert to numeric
Tesla_stats$Units= as.numeric(gsub("," ,"",Tesla_stats$Units))
#Update Time to a numeric format
Tesla_stats <- Tesla_stats |>
separate_wider_delim(Time, delim = "_", names = c("Quarter", "Year.Quarter"),too_few = "align_start")
Tesla_stats$Quarter= gsub("Q" ,"0",Tesla_stats$Quarter)
Tesla_stats$Year.Quarter= as.numeric(paste0(Tesla_stats$Year.Quarter ,Tesla_stats$Quarter))
#visualize
ggplot(Tesla_stats , aes(x= time.val,y = Units, fill = type)) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_fill_brewer(palette = "Set3") +
labs( title = "Tesla EV models - Production Vs Sales", x= "Year by Quarter")
# Generate animation object and visualize
animate <- ggplot(Tesla_stats, aes(x=factor(type), y=Units)) +
geom_boxplot() +
transition_states(Year.Quarter) +
labs(title = 'Time: {closest_state}', x = 'Productions and Deliveries', y = 'Units')
animate
Competitor Analysis File location: https://www.coxautoinc.com/wp-content/uploads/2024/04/Q1-2024-Kelley-Blue-Book-Electric-Vehicle-Sales-Report.pdf
#read data from the pdf file
comp_pdf_text <- pdf_text("https://www.coxautoinc.com/wp-content/uploads/2024/04/Q1-2024-Kelley-Blue-Book-Electric-Vehicle-Sales-Report.pdf")
# Extract text from the 6th page
comp_page_text <- pdf_text("https://www.coxautoinc.com/wp-content/uploads/2024/04/Q1-2024-Kelley-Blue-Book-Electric-Vehicle-Sales-Report.pdf")[2]
# Split the text into lines
comp_lines <- strsplit(comp_page_text, "\n")[[1]]
print(comp_lines)
## [1] " ELECTRIC VEHICLE (BRAND)"
## [2] " Q1 Sales Segment Share"
## [3] " 2024 2023 YOY Q1 YTD"
## [4] "Audi 5,714 4,438 28.8% 2.1% 2.1%"
## [5] "BMW 10,713 6,588 62.6% 3.9% 3.9%"
## [6] "Brightdrop 256 - - 0.1% 0.1%"
## [7] "Cadillac 5,800 968 499.2% 2.1% 2.1%"
## [8] "Chevrolet 8,701 19,700 -55.8% 3.2% 3.2%"
## [9] "Ford 20,223 10,866 86.1% 7.4% 7.4%"
## [10] "Genesis 992 937 5.9% 0.4% 0.4%"
## [11] "GMC 1,668 2 - 0.6% 0.6%"
## [12] "Fisker 1,660 - - 0.6% 0.6%"
## [13] "Honda 19 - - 0.0% 0.0%"
## [14] "Hyundai 12,290 7,824 57.1% 4.5% 4.5%"
## [15] "Jaguar 256 53 383.0% 0.1% 0.1%"
## [16] "Kia 9,654 5,930 62.8% 3.5% 3.5%"
## [17] "Lexus 1,603 185 766.5% 0.6% 0.6%"
## [18] "Lucid 1,967 1,406 39.9% 0.7% 0.7%"
## [19] "Mazda - 15 -100.0% 0.0% 0.0%"
## [20] "Mercedes 12,250 7,341 66.9% 4.5% 4.5%"
## [21] "Mini 742 672 10.4% 0.3% 0.3%"
## [22] "Nissan 5,284 5,214 1.3% 1.9% 1.9%"
## [23] "Polestar 2,210 2,340 -5.6% 0.8% 0.8%"
## [24] "Porsche 1,925 1,527 26.1% 0.7% 0.7%"
## [25] "Rivian 13,588 8,558 58.8% 5.0% 5.0%"
## [26] "Subaru 1,147 1,359 -15.6% 0.4% 0.4%"
## [27] "Tesla 140,187 161,630 -13.3% 51.3% 51.3%"
## [28] "Toyota 1,897 1,698 11.7% 0.7% 0.7%"
## [29] "Vinfast 927 110 742.7% 0.3% 0.3%"
## [30] "Volvo 1,069 2,924 -63.4% 1.9% 1.9%"
## [31] "VW 6,167 9,758 -36.8% 2.3% 2.3%"
## [32] " Total (Estimates) 268,909 262,043 2.6% 100% 100%"
## [33] ""
## [34] "Toyota 172 668 -74.3%"
## [35] "Hyundai 51 65 -21.5%"
## [36] "Total EV & FCEV (Estimates) 269,132 262,776 2.4%"
# Identify and extract the lines corresponding to the table
# Adjust these indices based on where the table is located in the text
comp_table_lines <- comp_lines[4:31]
# Assuming the table is tabular, we can then further process the lines to create a data frame
# Define a regular expression for splitting by one or more spaces
split_regex <- "\\s+"
# Split each line by the regular expression
comp_table_data <- lapply(comp_table_lines, function(line) strsplit(line, split_regex )[[1]])
# Convert the list to a data frame
comp_ev_sales <- as.data.frame(do.call(rbind, comp_table_data))
# filter only required observations
# rename columns
# concatenate two variables into one
# remove the unwanted columns
ev_brand_sales <- comp_ev_sales|>
rename ( Brand = "V1", Y_2024 = "V2", Y_2023 = "V3", YOY = "V4" , Segment_share = "V5") |>
subset(select= -c(V6) )
#Tidying up data in the variables to remove thousand separator, values with -
ev_brand_sales$Y_2024 <- gsub(",", "", ev_brand_sales$Y_2024)
ev_brand_sales$Y_2024 <- gsub("-", "0", ev_brand_sales$Y_2024)
ev_brand_sales$Y_2023 <- gsub(",", "", ev_brand_sales$Y_2023)
ev_brand_sales$Y_2023 <- gsub("-", "0", ev_brand_sales$Y_2023)
ev_brand_sales$Segment_share <- gsub("%", "", ev_brand_sales$Segment_share)
ev_brand_sales$YOY <- gsub("%", "", ev_brand_sales$YOY)
ev_brand_YOY <- ev_brand_sales |>
mutate( Segment_share= as.numeric(Segment_share),Y_2024 = as.numeric(Y_2024), Y_2023 = as.numeric(Y_2023),YOY = as.numeric(YOY) ) |>
drop_na(YOY)
# Convert Brand to a factor and reorder based on YOY %
# this is needed to sort the yoy% in order
ev_brand_YOY$Brand <- factor(ev_brand_YOY$Brand, levels = ev_brand_YOY$Brand[order(-ev_brand_YOY$YOY)])
#visualize year of year change
ggplot(ev_brand_YOY , aes(x= Brand,y = YOY)) +
geom_bar(stat = "identity", position = "dodge", fill = "blue") +
coord_flip() +
labs ( title = "Year over Year change in Sales of EV cars for 2023 and 2024" ,
y = "YoY %" , x = "Brand")
# Convert Brand to a factor and reorder based on Segment_share %
# this is needed to sort the Segment share % in order
ev_brand_YOY$Brand <- factor(ev_brand_YOY$Brand, levels = ev_brand_YOY$Brand[order(-ev_brand_YOY$Segment_share)])
#visualize segment share YTD
ggplot(ev_brand_YOY , aes(x= reorder(Brand,Segment_share),y = Segment_share)) +
geom_bar(stat = "identity", position = position_stack(reverse = TRUE),color = "black", fill = "green") +
coord_flip() +
labs ( title = "Segment share of brands in Sales of EV cars for 2023 and 2024" ,
y = "Segment Share in %" , x = "Brand")
This Kaggle dataset (“Historic sales of electric vehicles”) is from the Global EV Outlook, an annual publication that identifies and discusses recent developments in electric mobility across the globe.
#Read historic EV sales csv into R
historic <- read.csv('https://raw.githubusercontent.com/evelynbartley/Final-Project-607/main/IEA-EV-dataEV%20salesCarsHistorical.csv', header = TRUE)
tibble(historic)
## # A tibble: 834 × 8
## region category parameter mode powertrain year unit value
## <chr> <chr> <chr> <chr> <chr> <int> <chr> <int>
## 1 Australia Historical EV sales Cars BEV 2011 Vehicles 49
## 2 Australia Historical EV sales Cars BEV 2012 Vehicles 170
## 3 Australia Historical EV sales Cars PHEV 2012 Vehicles 80
## 4 Australia Historical EV sales Cars PHEV 2013 Vehicles 100
## 5 Australia Historical EV sales Cars BEV 2013 Vehicles 190
## 6 Australia Historical EV sales Cars BEV 2014 Vehicles 370
## 7 Australia Historical EV sales Cars PHEV 2014 Vehicles 950
## 8 Australia Historical EV sales Cars PHEV 2015 Vehicles 1000
## 9 Australia Historical EV sales Cars BEV 2015 Vehicles 760
## 10 Australia Historical EV sales Cars BEV 2016 Vehicles 670
## # ℹ 824 more rows
Exploration
I want to look at the number of electric vehicles sold in each country in the dataset.
#sum counts by region
region_counts <- historic %>%
group_by(region) %>%
summarize(count = sum(value))
#remove regions I don't want to look at
region_counts2 <- region_counts[-c(9,36), ]
#plot region counts
region_counts2 %>%
ggplot(aes(x = region, y = count)) +
geom_col(fill = "green") + coord_flip() + scale_y_continuous(labels = scales::comma) +
labs(title = "EV Sales by Region", x = "Region", y = "Count")
China has sold the greatest number of electric vehicles of any region, almost two times more than Europe and nearly four times as many as the USA. This is probably due to the fact that China’s population is much larger compared to any other region. European countries have purchased the second greatest number of electric vehicles. The USA has purchased the third greatest number of electric vehicles.
#get sales counts by year
year_counts <- historic %>%
group_by(year) %>%
summarize(count = sum(value))
tibble(year_counts)
## # A tibble: 13 × 2
## year count
## <int> <int>
## 1 2010 18574
## 2 2011 114463
## 3 2012 283006
## 4 2013 510321
## 5 2014 810547
## 6 2015 1411417
## 7 2016 1835620
## 8 2017 2815219
## 9 2018 4693645
## 10 2019 5148213
## 11 2020 8359524
## 12 2021 17036811
## 13 2022 24862183
year_counts %>%
ggplot(aes(x = year, y = count)) +
geom_col(fill = "green") +
scale_y_continuous(labels = scales::comma) + labs(title = "Global EV Sales 2011-2022", x = "Year", y = "Cars")
Globally, the number of electric vehicles sold has increased every year
since 2011.
#filter historic data for only US data
USdata <- historic %>%
filter(historic$region == "USA")
#plot the trend for US EV sales
USdata %>% ggplot(aes(x = year, y = value)) +
geom_col(fill = "green") + scale_y_continuous(labels = scales::comma) +
labs(title = "US EV Sales 2011-2022", x = "Year", y = "Cars")
The US has closely followed the global trend for EV sales. There was a tiny dip in EV sales in 2015 and a larger dip in 2019 and 2020. EV sales about doubled from 2020 to 2021.
Lets look at BEV vs PHEV sales for the US. BEVs are battery electric vehicles and PHEVs are plug-in hybrid electric vehicles. BEVs run on electricity alone, so they don’t require gasoline like PHEVs do.
historic %>%
ggplot(aes(x = year, y = value, fill = powertrain)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(title = "Global BEV vs PHEV Sales 2011-2022" , x = "Year", y = "Cars")
As EV sales have increased over time, the difference in BEV vs PHEV
sales has also increased.
USdata %>%
ggplot(aes(x = year, y = value, fill = powertrain)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(title = "USA BEV vs PHEV Sales 2011-2022" , x = "Year", y = "Cars")
In the US, BEV and PHEV sales were steadily growing since 2011. PHEV
sales reached a peak in 2018 and then started declining until 2021 where
they doubled in sales. BEV sales reached a plateau from 2018-2020 and
then spiked in 2021 and 2022. The fluctuation of EV sales in the US from
2011-2022 may indicate that it is normal for the EV market to experience
fluctuation and this could be why Tesla has experienced a decline in
sales in the past year.
I want to analyze production and sales for Tesla in 2023 and the first quarter of 2024.
Gather Tesla Production and sales units for 2023 and 2024 File location: https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2024-Update.pdf The nest two code chunks are copied from Dhanya’s portion of the final project so that I am able to work with that dataset she tidied. I am going to use the infer package to perform a correlation test for total production and deliveries of Tesla vehicles for 2023 and 2024. I also want to calculate surplus since it will give us more information on why Tesla has decreased price and may confirm that Tesla sales have been declining.
library(infer)
#I want to isolate the year variable
Tesla_stats$Year.Quarter = substr(Tesla_stats$Year.Quarter, 1, nchar(Tesla_stats$Year.Quarter)-2)
# rename year column
Tesla2 <- rename(Tesla_stats, Year = Year.Quarter)
#make table longer so I can calculate surplus
Tesla3 <- Tesla2 %>%
pivot_wider(names_from = type, values_from = Units)
# I was having a hard time calculating surplus column with mutate so I did it manually
Tsurplus <- c(17933, 13560, -4571, 10482, 46561)
#add Tsurplus to Tesla3 and save a new df
Tesla4 <- Tesla3 %>% mutate(surplus = Tsurplus)
tibble(Tesla4)
## # A tibble: 5 × 6
## Quarter Year time.val `Total production` `Total deliveries` surplus
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 01 2023 Q1_2023 440808 422875 17933
## 2 02 2023 Q2_2023 479700 466140 13560
## 3 03 2023 Q3_2023 430488 435059 -4571
## 4 04 2023 Q4_2023 494989 484507 10482
## 5 01 2024 Q1_2024 433371 386810 46561
It looks like 2024 Q1 has the highest surplus. This indicates that Tesla has had the most difficulty in 2024 when trying the sell the product that they produce. Considering that total production is low in Q1 of 2024 as well, they are also not producing and selling as much as they did in 2023.
I am interested in the relationship between total production and total deliveries. If Tesla is doing well as a business, ideally their total production will have a positive linear relationship with total deliveries. I will conduct a Pearson correlation test for the two variables.
#Pearson correlation test between total production and total deliveries
cor.test(Tesla4$`Total production`,Tesla4$`Total deliveries`, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: Tesla4$`Total production` and Tesla4$`Total deliveries`
## t = 3.1763, df = 3, p-value = 0.05024
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.01915843 0.99190253
## sample estimates:
## cor
## 0.8779478
The correlation coefficient returned is 0.8779478, which is relatively close to 1 and indicates there is a strong positive correlation. The confidence interval contains zero, so there is a chance that the true correlation coefficient is zero. The Pearson correlation test returned a p-value of 0.05024 which would not be statistically significant at a significance level of 0.05. I fail to conclude that production and deliveries are significantly correlated.
library(rvest)
library(xml2)
I am going to web scrape the EV Charging Station Counts by State table from ‘https://www.cnet.com/roadshow/news/how-many-ev-charging-stations-are-there-in-the-us/’ that contains the number of public charging stations per state.
#save link in a variable
link <- "https://www.cnet.com/roadshow/news/how-many-ev-charging-stations-are-there-in-the-us/"
#read html and save it in variable
page <- read_html(link)
#retrieve html elements
# I used Selector Gadget to help me get the css selector!
html_data <- page %>%
html_elements(css = "td , .c-shortcodeChart_chart_hasBg")
head(html_data)
## {xml_nodeset (6)}
## [1] <th class="c-shortcodeChart_chart_hasBg"><strong>State</strong></th>
## [2] <td><strong>Number of public charging stations</strong></td>
## [3] <th class="c-shortcodeChart_chart_hasBg">Alabama</th>
## [4] <td>254</td>
## [5] <th class="c-shortcodeChart_chart_hasBg">Alaska</th>
## [6] <td>59</td>
#retrieve text from html element
stations <- xml_text(html_data)
#remove first two items because they are column names
stations2 <- stations[-c(1:2)]
#turn html text into a dataframe by first converting it into a matrix
stationsdf <- as.data.frame(matrix(stations2, ncol = 2, byrow = TRUE)) %>%
select("State" = V1, "Stations" = V2)
#remove commas from numbers and convert to numeric
stationsdf$Stations <- as.numeric(gsub(",","", stationsdf$Stations))
tibble(stationsdf)
## # A tibble: 51 × 2
## State Stations
## <chr> <dbl>
## 1 Alabama 254
## 2 Alaska 59
## 3 Arizona 951
## 4 Arkansas 220
## 5 California 14040
## 6 Colorado 1689
## 7 Connecticut 533
## 8 Delaware 145
## 9 District of Columbia 270
## 10 Florida 2766
## # ℹ 41 more rows
stationsdf %>% ggplot(aes(x = State, y = Stations)) +
geom_col(fill = "green") + coord_flip() +
labs(title = "Public EV Charging Stations by State")
I want to do statistical analysis on the Number of Public Charging Stations per State in March 2023.
#Find avg number of public charging stations in a state
mean(stationsdf$Stations)
## [1] 1003.804
On average, there are about 1,004 public charging stations in a US State.
#maximum
max(stationsdf$Stations)
## [1] 14040
print(stationsdf$State[which.max(stationsdf$Stations)])
## [1] "California"
#minimum
min(stationsdf$Stations)
## [1] 59
print(stationsdf$State[which.min(stationsdf$Stations)])
## [1] "Alaska"
The state with the greatest number of public charging stations is California. The state with the least number of public charging stations is Alaska.
#boxplot of the distribution of charging stations
stationsdf %>%
ggplot(aes(x = Stations)) + geom_boxplot() +
xlab('Distribution of Public EV Charging Stations in the US')
From 2011-2022, Global EV sales have shown exponential growth. The US follows this same trend. Recently, the EV giant, Tesla, has failed to sell as many electric vehicles as it has been producing. This may be why Tesla has reduced its prices in the recent past. Another factor in the decline of EV sales may be the number of public charging stations. There are large differences in the number of public charging stations from state to state in the US. At the time of the study, California had the most at 14,040 while Alaska only had 59. The unreliability non-uniform spread of public charging stations in the US may also be contributing to the decline in electric vehicle sales.
Performing a T-test to compare EV sales or production between two countries group : UK and Germany
EV_Stats <- EV_Raw %>%
filter( category == "Historical") %>%
filter( powertrain != "PHEV" , powertrain != "Publicly available slow") %>%
filter( mode =="Cars" | mode == "EV") |>
select( region, parameter, year, value) %>%
mutate( value = as.numeric(value)) %>%
mutate( year = as.numeric(year))
#pivot wider to get the numerical values to variables
EV_Stats_summary <- EV_Stats |>
pivot_wider(names_from = parameter, values_from = value) |>
rename( Electricity.demand.in.GWh = "Electricity demand",
Oil.displacement.in.Mlge = "Oil displacement, million lge",
EV.fast.charging.points = "EV charging points",
EV.sales = "EV sales" ,
EV.stock = "EV stock") |>
drop_na(EV.fast.charging.points)
# Subset data for two countries
UK_data <- subset(EV_Stats_summary, region == "United Kingdom")
Germany_data <- subset(EV_Stats_summary, region == "Germany")
# Perform T-test for EV sales between two countries
sales_t_test <- t.test(UK_data$EV.sales, Germany_data$EV.sales)
# Performing T-test for EV production between two countries
production_t_test <- t.test(UK_data$EV.stock, Germany_data$EV.stock)
# Print results
print("T-test results for EV sales in 2 countries(UK and Germany):")
## [1] "T-test results for EV sales in 2 countries(UK and Germany):"
print(sales_t_test)
##
## Welch Two Sample t-test
##
## data: UK_data$EV.sales and Germany_data$EV.sales
## t = -0.8316, df = 15.671, p-value = 0.4181
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -166241.21 72677.57
## sample estimates:
## mean of x mean of y
## 60809.09 107590.91
print("")
## [1] ""
print("T-test results for EV production in 2 countries(UK and Germany):")
## [1] "T-test results for EV production in 2 countries(UK and Germany):"
print(production_t_test)
##
## Welch Two Sample t-test
##
## data: UK_data$EV.stock and Germany_data$EV.stock
## t = -0.82909, df = 15.465, p-value = 0.4197
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -334962.2 146998.6
## sample estimates:
## mean of x mean of y
## 125663.6 219645.5
Since the p-value (0.4197) is greater than the commonly used significance level of 0.05, we fail to reject the null hypothesis. This suggests that there is not enough evidence to conclude that there is a statistically significant difference in mean EV sales between the UK and Germany at the 5% significance level.
I would like to know the correlation between the variables to understand better the relationship between the variables.
#filter to required parameters
EV_summary <- EV_Raw %>%
filter( category == "Historical") %>%
filter( powertrain != "PHEV" , powertrain != "Publicly available slow") %>%
filter( region == "USA") %>%
select( region, parameter, year, value) %>%
mutate( value = as.numeric(value)) %>%
mutate( year = as.numeric(year))
#pivot wider to get the numerical values to variables
EV_summary_pivot <- EV_summary |>
select(parameter, year, value) |>
pivot_wider(names_from = parameter, values_from = value) |>
rename( Electricity.demand.in.GWh = "Electricity demand",
Oil.displacement.in.Mlge = "Oil displacement, million lge",
EV.fast.charging.points = "EV charging points") |>
drop_na(EV.fast.charging.points)
#Correlation Heatmap
corrplot(cor(EV_summary_pivot), method = "color",
type = "upper", # Show upper triangle
addCoef.col = "black", # Color of coefficients
tl.col = "black", # Color of text labels
tl.srt = 45, # Rotate text labels
title = "Correlation Heatmap of various EV parameters for USA ")
As seen above there is a strong correlation among most of the variables.
Least correlation is between Year and Electricity demand. Less correlation is between Electricity demand and EV fast charging points. Most correlation is between Oil displacement and EV Stock, Sales, Stock share and Sales Share. Also there is strong correlation among EV fast charging points and EV Stock and sales.
Popular countries to sell EV vehicles are China, India, USA, United Kingdom .
BEV powertrain/ Battery operated electric vehicle is becoming more and more popular every year. It is highly sold than Plug-In Hybrid Electric Vehicle.
USA is projected to have huge stock of EV cars, however the EV public charging points are relatively less.The availability of charging stations is hence a big issue. This may be a concern for consumers who prefer to have easily accessible charging points.Hence causing lower sales for EV cars.
Based on the bar plot, i see that the EV stock/ production is much more than the EV sales for each country. This may cause the Sale price to decrease due to unsold inventory.
As we can see in the bar chart there is drop in production and sales of Tesla cars in Q1 2024 compared to Q1 2023.
This may have cause the price to reduce from Q3 2023, which caused a bit of sales increase in Q4 2023.
Based on the visualized plots,we can infer from the bar plot that the competition for the EV cars has significantly increased.
Many brands have significantly invested in EV cars. Tesla YoY % is -13.3 % .This may have caused Tesla to reduce the sale price of its EV cars to compete with the growing competition.
However, Tesla is still a leader in the EV segment with respect to overall sales.
Performing a T-test to compare EV sales or production between two countries group : UK and Germany, is not enough evidence to conclude that there is a statistically significant difference in mean EV sales between the UK and Germany at the 5% significance level.
Among the most variables considered in Correlation analysis,we see that the most correlation is between Oil displacement and EV Stock, Sales, Stock share and Sales Share. This can be agreed as EV cars are displacing almost 1.8 million barrels of oil a day in 2022.
Another factor in the decline of EV sales may be the number of public charging stations. There are large differences in the number of public charging stations from state to state in the US. At the time of the study, California had the most at 14,040 while Alaska only had 59. The unreliability non-uniform spread of public charging stations in the US may also be contributing to the decline in electric vehicle sales.