DATA607 Final Project

Topic for analysis

Tesla has reduced the price of its Electric Vehicle(EV) models 3 times within a year .
The question we are seeking to answer is why is Tesla’s EV sales and price declining though Sales were projected to grow by 30% in 2025.
Possible reasons can be growing competitors, more incentives, more inventory, insufficient charging infrastructure.
Benefit of doing this work is understanding the market conditions to make better decisions from the consumer side as well as from the auto industry side.

Load libraries

library(tidyverse)
library(pdftools)
library(knitr)

library(gifski)
library(png)
library(gganimate)

library(stats)

library(corrplot)

Data sources

1. Global Electric Vehicle Dataset (CSV Format)

Data Gathering

Global Electric Vehicle Dataset ( data till 2022, projections for 2025, 2030)
File location: https://www.kaggle.com/datasets/padmapiyush/global-electric-vehicle-dataset-2023/data

#Load the original .csv file from Global Electric Vehicle Dataset 2023
#file location: https://www.kaggle.com/datasets/padmapiyush/global-electric-vehicle-dataset-2023/data

EV_Raw <- read.csv("https://raw.githubusercontent.com/datanerddhanya/DATA607/main/finalproject/Global%20EV%202023.csv", header = TRUE)

Data Transformation

#filter to required parameters and mode as we do not need vans, buses and trucks
EV_Data <- EV_Raw %>%
  filter( parameter == "EV sales" |parameter == "EV stock" | parameter == "EV charging points") %>%
  filter( mode == "Cars" | mode == "EV") 

# for the future years 2025 and 2030, change the category to projection
EV_Data <- EV_Data %>%
  mutate( category = ifelse ( year =="2025" | year =="2030","Projection",category  ))

# to remove the scientific display format
options(scipen = 999)

# filter the required records as we do not need projected records for the previous years
# filter to years 2021 onwards
# remove groupings within the region variable so that we retain only individual countries.
EV_Data <- EV_Data %>%
filter( category == "Projection" | category == "Historical" ) %>%
filter( year > 2020) %>%
subset(!(region %in% c('EU27', 'Europe', 'Other Europe', 'Rest of the world', 'World'))) %>%
  mutate( value = round(as.numeric(value))) %>%
  mutate( year = as.factor(as.numeric(year)))

Data Analysis

# to find the top countries of interest
# i see that the following are the top countries in EV sales 
#"China" , "USA", "India", "Germany", "United Kingdom"
EV_Data |>
  filter( parameter == "EV sales" ) |>
  filter( mode == "Cars") |>
  group_by(region) |>
  arrange(desc(value)) |>
  ggplot( aes(x= value,y = reorder(region,value))) +
  geom_bar(stat = "identity", position = "dodge", fill = "green") +
  labs(title = "Top Countries in EV Sales for all years from 2021 till date", x= "EV Sales" , y = "Countries")

#hybrid vs battery vehicles
EV_Data |>
  filter( parameter == "EV sales" ) |>
  filter( mode == "Cars") |>
  ggplot( aes(x= year ,y = value , color = powertrain)) +
  geom_point(stat = "identity") +
  labs(title = "Battery Electric Vehicle vs Plugin Hybrid  Electric Vehicle", x= "Year" , y = "EV Sales")

# Bar plot
EV_Data |>
 filter( region %in% c("China" , "USA", "India", "Germany", "United Kingdom")) |>
ggplot( aes(x= year,y = value, fill = parameter)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~ region, scales = "free") +
  theme_minimal() +
  theme(legend.position = "bottom") +
  scale_fill_brewer(palette = "Set3") +
  scale_x_discrete(breaks = unique(EV_Data$year)) +
labs(title = "EV Production units, Sales Units and Charging points by region by year", x= "Year" , y = "Units")

# Line plot
EV_Data |>
 filter( region %in% c("China" , "USA", "India", "Germany", "United Kingdom")) |>
ggplot(aes(x = year, y = value, color = region, group = region)) +
  geom_line() +
  facet_wrap(~ parameter, scales = "free_y") +
  labs(title = "Electric Vehicle Data Trends by Region",
       x = "Year",
       y = "Value") +
  theme_minimal() +
  theme(legend.position = "bottom") +
  scale_color_brewer(palette = "Set2")

# Faceted plot
EV_Data |>
 filter( region %in% c("China" , "USA", "India", "Germany", "United Kingdom")) |>
ggplot(aes(x = year, y = value)) +
  geom_line() +
  facet_grid(parameter ~ region, scales = "free") +
  labs(title = "Electric Vehicle Data Trends by Parameter and Region",
       x = "Year",
       y = "Value") +
  theme_minimal() +
  theme(legend.position = "bottom") +
  scale_color_brewer(palette = "Set2")

Conclusion

Popular countries to sell EV vehicles are China, India, USA, United Kingdom .
BEV powertrain/ Battery operated electric vehicle is becoming more and more popular every year. It is highly sold than Plug-In Hybrid Electric Vehicle.
USA is projected to have huge stock of EV cars, however the EV public charging points are relatively less.The availability of charging stations is hence a big issue. This may be a concern for consumers who prefer to have easily accessible charging points.Hence lower sales for EV cars.
Based on the bar plot, i see that the EV stock/ production is much more than the EV sales for each country. This may cause the Sale price to decrease due to unsold inventory.

2. Tesla Production and sales (PDF Format)

Data Gathering

Gather Tesla Production and sales units for 2023 and 2024 File location: https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2024-Update.pdf

#read data from the pdf file
pdf_text <- pdf_text("https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2024-Update.pdf")

# Extract text from the 6th page
page_text <- pdf_text("https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2024-Update.pdf")[6]


# Split the text into lines
lines <- strsplit(page_text, "\n")[[1]]


# Identify and extract the lines corresponding to the table
# Adjust these indices based on where the table is located in the text
table_lines <- lines[3:11]

Data Transformation

# Assuming the table is tabular, i can then further process the lines to create a data frame
# Define a regular expression for splitting by one or more spaces
split_regex <- "\\s+"
# Split each line by the regular expression
table_data <- lapply(table_lines, function(line) strsplit(line, split_regex )[[1]])

# Convert the list to a data frame
df <- as.data.frame(do.call(rbind, table_data))

# filter only required observations
# rename columns
# concatenate two variables into one
# remove the unwanted columns
Tesla_data <- df |>
  filter(V2 == "Total") |>
  rename ( type = "V3", Q1_2023 = "V4" , Q2_2023 = "V5", Q3_2023 = "V6", Q4_2023 = "V7", Q1_2024 ="V8", YoY = "V9")  |>
  mutate ( type = paste(V2, " ", type))|>
  subset(select= -c(V1,V2,V10,YoY) )

#transform to pivot longer
Tesla_stats <- Tesla_data |>
  pivot_longer(
    cols = starts_with("Q"),
    names_to = "Time", 
    values_to = "Units",
   values_drop_na = TRUE  
  )  |>
  mutate( time.val = Time )

#remove the thousand separator and convert to numeric
Tesla_stats$Units= as.numeric(gsub("," ,"",Tesla_stats$Units))

#Update Time to a numeric format
Tesla_stats <- Tesla_stats |>
  separate_wider_delim(Time, delim = "_", names = c("Quarter", "Year.Quarter"),too_few = "align_start")
Tesla_stats$Quarter= gsub("Q" ,"0",Tesla_stats$Quarter)

Tesla_stats$Year.Quarter= as.numeric(paste0(Tesla_stats$Year.Quarter ,Tesla_stats$Quarter))

Data Analysis

#visualize
ggplot(Tesla_stats , aes(x= time.val,y = Units, fill = type)) +
  geom_bar(stat = "identity", position = "dodge") +
    theme_minimal() +
  theme(legend.position = "bottom") +
  scale_fill_brewer(palette = "Set3") +
  labs( title = "Tesla EV models - Production Vs Sales", x= "Year by Quarter")

# Generate animation object and visualize
animate <- ggplot(Tesla_stats, aes(x=factor(type), y=Units)) +
  geom_boxplot() +
  transition_states(Year.Quarter) +
  labs(title = 'Time: {closest_state}', x = 'Productions and Deliveries', y = 'Units')

animate

Conclusion

As we can see in the bar chart there is drop in production and sales of Tesla cars in Q1 2024 compared to Q1 2023.
This may have cause the price to reduce from Q3 2023, which caused a bit of sales increase in Q4 2023.

3. Competitor Analysis (PDF Format)

Data Gathering

Competitor Analysis File location: https://www.coxautoinc.com/wp-content/uploads/2024/04/Q1-2024-Kelley-Blue-Book-Electric-Vehicle-Sales-Report.pdf

#read data from the pdf file
comp_pdf_text <- pdf_text("https://www.coxautoinc.com/wp-content/uploads/2024/04/Q1-2024-Kelley-Blue-Book-Electric-Vehicle-Sales-Report.pdf")

# Extract text from the 6th page
comp_page_text <- pdf_text("https://www.coxautoinc.com/wp-content/uploads/2024/04/Q1-2024-Kelley-Blue-Book-Electric-Vehicle-Sales-Report.pdf")[2]


# Split the text into lines
comp_lines <- strsplit(comp_page_text, "\n")[[1]]

print(comp_lines)

##  [1] "                              ELECTRIC VEHICLE (BRAND)"                              
##  [2] "                                             Q1 Sales               Segment Share"   
##  [3] "                                     2024         2023        YOY       Q1       YTD"
##  [4] "Audi                                5,714       4,438      28.8%      2.1%      2.1%"
##  [5] "BMW                                10,713       6,588      62.6%      3.9%      3.9%"
##  [6] "Brightdrop                            256       -          -          0.1%      0.1%"
##  [7] "Cadillac                            5,800         968     499.2%      2.1%      2.1%"
##  [8] "Chevrolet                           8,701      19,700     -55.8%      3.2%      3.2%"
##  [9] "Ford                               20,223      10,866      86.1%      7.4%      7.4%"
## [10] "Genesis                               992         937        5.9%     0.4%      0.4%"
## [11] "GMC                                 1,668            2     -          0.6%      0.6%"
## [12] "Fisker                              1,660       -          -          0.6%      0.6%"
## [13] "Honda                                   19      -          -          0.0%      0.0%"
## [14] "Hyundai                            12,290       7,824      57.1%      4.5%      4.5%"
## [15] "Jaguar                                256           53    383.0%      0.1%      0.1%"
## [16] "Kia                                 9,654       5,930      62.8%      3.5%      3.5%"
## [17] "Lexus                               1,603         185     766.5%      0.6%      0.6%"
## [18] "Lucid                               1,967       1,406      39.9%      0.7%      0.7%"
## [19] "Mazda                                  -            15   -100.0%      0.0%      0.0%"
## [20] "Mercedes                           12,250       7,341      66.9%      4.5%      4.5%"
## [21] "Mini                                  742         672      10.4%      0.3%      0.3%"
## [22] "Nissan                              5,284       5,214        1.3%     1.9%      1.9%"
## [23] "Polestar                            2,210       2,340       -5.6%     0.8%      0.8%"
## [24] "Porsche                             1,925       1,527      26.1%      0.7%      0.7%"
## [25] "Rivian                             13,588       8,558      58.8%      5.0%      5.0%"
## [26] "Subaru                              1,147       1,359     -15.6%      0.4%      0.4%"
## [27] "Tesla                             140,187     161,630     -13.3%     51.3%    51.3%" 
## [28] "Toyota                              1,897       1,698      11.7%      0.7%      0.7%"
## [29] "Vinfast                               927         110     742.7%      0.3%      0.3%"
## [30] "Volvo                               1,069       2,924     -63.4%      1.9%      1.9%"
## [31] "VW                                  6,167       9,758     -36.8%      2.3%      2.3%"
## [32] " Total (Estimates)                268,909     262,043        2.6%    100%      100%" 
## [33] ""                                                                                    
## [34] "Toyota                                172        668      -74.3%"                    
## [35] "Hyundai                                51         65      -21.5%"                    
## [36] "Total EV & FCEV (Estimates)       269,132    262,776        2.4%"

# Identify and extract the lines corresponding to the table
# Adjust these indices based on where the table is located in the text
comp_table_lines <- comp_lines[4:31]

Data Transformation

# Assuming the table is tabular, we can then further process the lines to create a data frame
# Define a regular expression for splitting by one or more spaces
split_regex <- "\\s+"
# Split each line by the regular expression
comp_table_data <- lapply(comp_table_lines, function(line) strsplit(line, split_regex )[[1]])

# Convert the list to a data frame
comp_ev_sales <- as.data.frame(do.call(rbind, comp_table_data))


# filter only required observations
# rename columns
# concatenate two variables into one
# remove the unwanted columns
ev_brand_sales <- comp_ev_sales|>
  rename ( Brand = "V1", Y_2024 = "V2", Y_2023 = "V3", YOY = "V4" , Segment_share = "V5") |>
  subset(select= -c(V6) ) 

#Tidying up data in the variables to remove thousand separator, values with -
ev_brand_sales$Y_2024 <- gsub(",", "", ev_brand_sales$Y_2024)
ev_brand_sales$Y_2024 <- gsub("-", "0", ev_brand_sales$Y_2024)
ev_brand_sales$Y_2023 <- gsub(",", "", ev_brand_sales$Y_2023)
ev_brand_sales$Y_2023 <- gsub("-", "0", ev_brand_sales$Y_2023)
ev_brand_sales$Segment_share <- gsub("%", "", ev_brand_sales$Segment_share)
ev_brand_sales$YOY <- gsub("%", "", ev_brand_sales$YOY)

ev_brand_YOY <- ev_brand_sales |>
  mutate( Segment_share= as.numeric(Segment_share),Y_2024 = as.numeric(Y_2024), Y_2023 = as.numeric(Y_2023),YOY = as.numeric(YOY) ) |>
  drop_na(YOY)

Data Analysis

# Convert Brand to a factor and reorder based on YOY %
# this is needed to sort the yoy% in order
ev_brand_YOY$Brand <- factor(ev_brand_YOY$Brand, levels = ev_brand_YOY$Brand[order(-ev_brand_YOY$YOY)])


#visualize year of year change
  ggplot(ev_brand_YOY , aes(x= Brand,y = YOY)) +
  geom_bar(stat = "identity", position = "dodge", fill = "blue") +
  coord_flip() +
    labs ( title = "Year over Year change in Sales of EV cars for 2023 and 2024" , 
           y = "YoY %" , x = "Brand")

# Convert Brand to a factor and reorder based on Segment_share %
# this is needed to sort the Segment share % in order
ev_brand_YOY$Brand <- factor(ev_brand_YOY$Brand, levels = ev_brand_YOY$Brand[order(-ev_brand_YOY$Segment_share)])

  
#visualize segment share YTD
  ggplot(ev_brand_YOY , aes(x= reorder(Brand,Segment_share),y = Segment_share)) +
  geom_bar(stat = "identity", position = position_stack(reverse = TRUE),color = "black", fill = "green") +
  coord_flip() +
    labs ( title = "Segment share of brands in Sales of EV cars for 2023 and 2024" , 
           y = "Segment Share in %" , x = "Brand")

Conclusion:

Based on the visualized plots,we can infer from the bar plot that the competition for the EV cars has significantly increased. Many brands have significantly invested in EV cars. Tesla YoY % is -13.3 % .This may have caused Tesla to reduce the sale price of its EV cars to compete with the growing competition.
However, Tesla is still a leader in the EV segment with respect to overall sales.

4. Historic Sales (csv format)

This Kaggle dataset (“Historic sales of electric vehicles”) is from the Global EV Outlook, an annual publication that identifies and discusses recent developments in electric mobility across the globe.

#Read historic EV sales csv into R
historic <- read.csv('https://raw.githubusercontent.com/evelynbartley/Final-Project-607/main/IEA-EV-dataEV%20salesCarsHistorical.csv', header = TRUE)
tibble(historic)

## # A tibble: 834 × 8
##    region    category   parameter mode  powertrain  year unit     value
##    <chr>     <chr>      <chr>     <chr> <chr>      <int> <chr>    <int>
##  1 Australia Historical EV sales  Cars  BEV         2011 Vehicles    49
##  2 Australia Historical EV sales  Cars  BEV         2012 Vehicles   170
##  3 Australia Historical EV sales  Cars  PHEV        2012 Vehicles    80
##  4 Australia Historical EV sales  Cars  PHEV        2013 Vehicles   100
##  5 Australia Historical EV sales  Cars  BEV         2013 Vehicles   190
##  6 Australia Historical EV sales  Cars  BEV         2014 Vehicles   370
##  7 Australia Historical EV sales  Cars  PHEV        2014 Vehicles   950
##  8 Australia Historical EV sales  Cars  PHEV        2015 Vehicles  1000
##  9 Australia Historical EV sales  Cars  BEV         2015 Vehicles   760
## 10 Australia Historical EV sales  Cars  BEV         2016 Vehicles   670
## # ℹ 824 more rows

Exploration

I want to look at the number of electric vehicles sold in each country in the dataset.

#sum counts by region
region_counts <- historic %>% 
  group_by(region) %>%
  summarize(count = sum(value))

#remove regions I don't want to look at
region_counts2 <- region_counts[-c(9,36), ]

#plot region counts 
region_counts2 %>%
  ggplot(aes(x = region, y = count)) + 
  geom_col(fill = "green") + coord_flip() + scale_y_continuous(labels = scales::comma) +
  labs(title = "EV Sales by Region", x = "Region", y = "Count")

China has sold the greatest number of electric vehicles of any region, almost two times more than Europe and nearly four times as many as the USA. This is probably due to the fact that China’s population is much larger compared to any other region. European countries have purchased the second greatest number of electric vehicles. The USA has purchased the third greatest number of electric vehicles.

#get sales counts by year
year_counts <- historic %>% 
  group_by(year) %>%
  summarize(count = sum(value))
tibble(year_counts)

## # A tibble: 13 × 2
##     year    count
##    <int>    <int>
##  1  2010    18574
##  2  2011   114463
##  3  2012   283006
##  4  2013   510321
##  5  2014   810547
##  6  2015  1411417
##  7  2016  1835620
##  8  2017  2815219
##  9  2018  4693645
## 10  2019  5148213
## 11  2020  8359524
## 12  2021 17036811
## 13  2022 24862183

year_counts %>%
  ggplot(aes(x = year, y = count)) + 
  geom_col(fill = "green") + 
  scale_y_continuous(labels = scales::comma) + labs(title = "Global EV Sales 2011-2022", x = "Year", y = "Cars")

Globally, the number of electric vehicles sold has increased every year since 2011.

#filter historic data for only US data
USdata <- historic %>%
  filter(historic$region == "USA")

#plot the trend for US EV sales
USdata %>% ggplot(aes(x = year, y = value)) + 
  geom_col(fill = "green") + scale_y_continuous(labels = scales::comma) + 
  labs(title = "US EV Sales 2011-2022", x = "Year", y = "Cars")

The US has closely followed the global trend for EV sales. There was a tiny dip in EV sales in 2015 and a larger dip in 2019 and 2020. EV sales about doubled from 2020 to 2021.

Lets look at BEV vs PHEV sales for the US. BEVs are battery electric vehicles and PHEVs are plug-in hybrid electric vehicles. BEVs run on electricity alone, so they don’t require gasoline like PHEVs do.

historic %>%
  ggplot(aes(x = year, y = value, fill = powertrain)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Global BEV vs PHEV Sales 2011-2022" , x = "Year", y = "Cars")

As EV sales have increased over time, the difference in BEV vs PHEV sales has also increased.

USdata %>%
  ggplot(aes(x = year, y = value, fill = powertrain)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "USA BEV vs PHEV Sales 2011-2022" , x = "Year", y = "Cars")

In the US, BEV and PHEV sales were steadily growing since 2011. PHEV sales reached a peak in 2018 and then started declining until 2021 where they doubled in sales. BEV sales reached a plateau from 2018-2020 and then spiked in 2021 and 2022. The fluctuation of EV sales in the US from 2011-2022 may indicate that it is normal for the EV market to experience fluctuation and this could be why Tesla has experienced a decline in sales in the past year.

Satistical Inference

I want to analyze production and sales for Tesla in 2023 and the first quarter of 2024.

Gather Tesla Production and sales units for 2023 and 2024 File location: https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2024-Update.pdf The nest two code chunks are copied from Dhanya’s portion of the final project so that I am able to work with that dataset she tidied. I am going to use the infer package to perform a correlation test for total production and deliveries of Tesla vehicles for 2023 and 2024. I also want to calculate surplus since it will give us more information on why Tesla has decreased price and may confirm that Tesla sales have been declining.

library(infer)

#I want to isolate the year variable
Tesla_stats$Year.Quarter = substr(Tesla_stats$Year.Quarter, 1, nchar(Tesla_stats$Year.Quarter)-2)

# rename year column
Tesla2 <- rename(Tesla_stats, Year = Year.Quarter)

#make table longer so I can calculate surplus
Tesla3 <- Tesla2 %>%
  pivot_wider(names_from = type, values_from = Units) 

# I was having a hard time calculating surplus column with mutate so I did it manually
Tsurplus <- c(17933, 13560, -4571, 10482, 46561)

#add Tsurplus to Tesla3 and save a new df
Tesla4 <- Tesla3 %>% mutate(surplus = Tsurplus)

tibble(Tesla4)

## # A tibble: 5 × 6
##   Quarter Year  time.val `Total   production` `Total   deliveries` surplus
##   <chr>   <chr> <chr>                   <dbl>                <dbl>   <dbl>
## 1 01      2023  Q1_2023                440808               422875   17933
## 2 02      2023  Q2_2023                479700               466140   13560
## 3 03      2023  Q3_2023                430488               435059   -4571
## 4 04      2023  Q4_2023                494989               484507   10482
## 5 01      2024  Q1_2024                433371               386810   46561

It looks like 2024 Q1 has the highest surplus. This indicates that Tesla has had the most difficulty in 2024 when trying the sell the product that they produce. Considering that total production is low in Q1 of 2024 as well, they are also not producing and selling as much as they did in 2023.

I am interested in the relationship between total production and total deliveries. If Tesla is doing well as a business, ideally their total production will have a positive linear relationship with total deliveries. I will conduct a Pearson correlation test for the two variables.

#Pearson correlation test between total production and total deliveries
cor.test(Tesla4$`Total   production`,Tesla4$`Total   deliveries`, method = "pearson")

## 
##  Pearson's product-moment correlation
## 
## data:  Tesla4$`Total   production` and Tesla4$`Total   deliveries`
## t = 3.1763, df = 3, p-value = 0.05024
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.01915843  0.99190253
## sample estimates:
##       cor 
## 0.8779478

The correlation coefficient returned is 0.8779478, which is relatively close to 1 and indicates there is a strong positive correlation. The confidence interval contains zero, so there is a chance that the true correlation coefficient is zero. The Pearson correlation test returned a p-value of 0.05024 which would not be statistically significant at a significance level of 0.05. I fail to conclude that production and deliveries are significantly correlated.

5. ev-charging-stations (Web Scraping format)

library(rvest)
library(xml2)

I am going to web scrape the EV Charging Station Counts by State table from ‘https://www.cnet.com/roadshow/news/how-many-ev-charging-stations-are-there-in-the-us/’ that contains the number of public charging stations per state.

#save link in a variable
link <- "https://www.cnet.com/roadshow/news/how-many-ev-charging-stations-are-there-in-the-us/"

#read html and save it in variable
page <- read_html(link)

#retrieve html elements 
# I used Selector Gadget to help me get the css selector!
html_data <- page %>%
  html_elements(css = "td , .c-shortcodeChart_chart_hasBg")

head(html_data)

## {xml_nodeset (6)}
## [1] <th class="c-shortcodeChart_chart_hasBg"><strong>State</strong></th>
## [2] <td><strong>Number of public charging stations</strong></td>
## [3] <th class="c-shortcodeChart_chart_hasBg">Alabama</th>
## [4] <td>254</td>
## [5] <th class="c-shortcodeChart_chart_hasBg">Alaska</th>
## [6] <td>59</td>

#retrieve text from html element
stations <- xml_text(html_data)

#remove first two items because they are column names
stations2 <- stations[-c(1:2)]

#turn html text into a dataframe by first converting it into a matrix
stationsdf <- as.data.frame(matrix(stations2, ncol = 2, byrow = TRUE)) %>%
  select("State" = V1, "Stations" = V2)
#remove commas from numbers and convert to numeric
stationsdf$Stations <- as.numeric(gsub(",","", stationsdf$Stations))
tibble(stationsdf)

## # A tibble: 51 × 2
##    State                Stations
##    <chr>                   <dbl>
##  1 Alabama                   254
##  2 Alaska                     59
##  3 Arizona                   951
##  4 Arkansas                  220
##  5 California              14040
##  6 Colorado                 1689
##  7 Connecticut               533
##  8 Delaware                  145
##  9 District of Columbia      270
## 10 Florida                  2766
## # ℹ 41 more rows

stationsdf %>% ggplot(aes(x = State, y = Stations)) + 
  geom_col(fill = "green") + coord_flip() +
  labs(title = "Public EV Charging Stations by State")

Statistical Analysis on scraped data

I want to do statistical analysis on the Number of Public Charging Stations per State in March 2023.

#Find avg number of public charging stations in a state
mean(stationsdf$Stations)

## [1] 1003.804

On average, there are about 1,004 public charging stations in a US State.

#maximum
max(stationsdf$Stations)

## [1] 14040

print(stationsdf$State[which.max(stationsdf$Stations)])

## [1] "California"

#minimum
min(stationsdf$Stations)

## [1] 59

print(stationsdf$State[which.min(stationsdf$Stations)])

## [1] "Alaska"

The state with the greatest number of public charging stations is California. The state with the least number of public charging stations is Alaska.

#boxplot of the distribution of charging stations
stationsdf %>% 
  ggplot(aes(x = Stations)) + geom_boxplot() + 
  xlab('Distribution of Public EV Charging Stations in the US')

Conclusions

From 2011-2022, Global EV sales have shown exponential growth. The US follows this same trend. Recently, the EV giant, Tesla, has failed to sell as many electric vehicles as it has been producing. This may be why Tesla has reduced its prices in the recent past. Another factor in the decline of EV sales may be the number of public charging stations. There are large differences in the number of public charging stations from state to state in the US. At the time of the study, California had the most at 14,040 while Alaska only had 59. The unreliability non-uniform spread of public charging stations in the US may also be contributing to the decline in electric vehicle sales.

Statistical Analysis

Performing a T-test to compare EV sales or production between two countries group : UK and Germany

EV_Stats <- EV_Raw %>%
  filter( category == "Historical") %>%
  filter( powertrain != "PHEV" , powertrain != "Publicly available slow")  %>%
  filter( mode =="Cars" | mode == "EV") |>
  select( region, parameter, year, value) %>%
  mutate( value = as.numeric(value)) %>%
  mutate( year = as.numeric(year))


#pivot wider to get the numerical values to variables
EV_Stats_summary  <- EV_Stats |>
  pivot_wider(names_from = parameter, values_from = value) |>
  rename( Electricity.demand.in.GWh  = "Electricity demand", 
          Oil.displacement.in.Mlge = "Oil displacement, million lge",
          EV.fast.charging.points = "EV charging points",
          EV.sales = "EV sales" ,
          EV.stock = "EV stock") |>
  drop_na(EV.fast.charging.points)

# Subset data for two countries
UK_data <- subset(EV_Stats_summary, region == "United Kingdom")
Germany_data <- subset(EV_Stats_summary, region == "Germany")

# Perform T-test for EV sales between two countries
sales_t_test <- t.test(UK_data$EV.sales, Germany_data$EV.sales)

# Performing T-test for EV production between two countries
production_t_test <- t.test(UK_data$EV.stock, Germany_data$EV.stock)

# Print results
print("T-test results for EV sales in 2 countries(UK and Germany):")

## [1] "T-test results for EV sales in 2 countries(UK and Germany):"

print(sales_t_test)

## 
##  Welch Two Sample t-test
## 
## data:  UK_data$EV.sales and Germany_data$EV.sales
## t = -0.8316, df = 15.671, p-value = 0.4181
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -166241.21   72677.57
## sample estimates:
## mean of x mean of y 
##  60809.09 107590.91

print("")

## [1] ""

print("T-test results for EV production in 2 countries(UK and Germany):")

## [1] "T-test results for EV production in 2 countries(UK and Germany):"

print(production_t_test)

## 
##  Welch Two Sample t-test
## 
## data:  UK_data$EV.stock and Germany_data$EV.stock
## t = -0.82909, df = 15.465, p-value = 0.4197
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -334962.2  146998.6
## sample estimates:
## mean of x mean of y 
##  125663.6  219645.5

Since the p-value (0.4197) is greater than the commonly used significance level of 0.05, we fail to reject the null hypothesis. This suggests that there is not enough evidence to conclude that there is a statistically significant difference in mean EV sales between the UK and Germany at the 5% significance level.

Feature not covered in class!

I would like to know the correlation between the variables to understand better the relationship between the variables.

#filter to required parameters 
EV_summary <- EV_Raw %>%
  filter( category == "Historical") %>%
  filter( powertrain != "PHEV" , powertrain != "Publicly available slow")  %>%
  filter( region == "USA") %>%
  select( region, parameter, year, value) %>%
  mutate( value = as.numeric(value)) %>%
  mutate( year = as.numeric(year))


#pivot wider to get the numerical values to variables
EV_summary_pivot  <- EV_summary |>
  select(parameter, year, value) |>
  pivot_wider(names_from = parameter, values_from = value) |>
  rename( Electricity.demand.in.GWh  = "Electricity demand", 
          Oil.displacement.in.Mlge = "Oil displacement, million lge",
          EV.fast.charging.points = "EV charging points") |>
  drop_na(EV.fast.charging.points)

#Correlation Heatmap
corrplot(cor(EV_summary_pivot), method = "color", 
         type = "upper", # Show upper triangle
         addCoef.col = "black", # Color of coefficients
         tl.col = "black", # Color of text labels
         tl.srt = 45, # Rotate text labels
         title = "Correlation Heatmap of various EV parameters for USA ")

As seen above there is a strong correlation among most of the variables.

Least correlation is between Year and Electricity demand. Less correlation is between Electricity demand and EV fast charging points. Most correlation is between Oil displacement and EV Stock, Sales, Stock share and Sales Share. Also there is strong correlation among EV fast charging points and EV Stock and sales.

Overall Conclusion

Popular countries to sell EV vehicles are China, India, USA, United Kingdom .
BEV powertrain/ Battery operated electric vehicle is becoming more and more popular every year. It is highly sold than Plug-In Hybrid Electric Vehicle.
USA is projected to have huge stock of EV cars, however the EV public charging points are relatively less.The availability of charging stations is hence a big issue. This may be a concern for consumers who prefer to have easily accessible charging points.Hence causing lower sales for EV cars.
Based on the bar plot, i see that the EV stock/ production is much more than the EV sales for each country. This may cause the Sale price to decrease due to unsold inventory.
As we can see in the bar chart there is drop in production and sales of Tesla cars in Q1 2024 compared to Q1 2023.
This may have cause the price to reduce from Q3 2023, which caused a bit of sales increase in Q4 2023.
Based on the visualized plots,we can infer from the bar plot that the competition for the EV cars has significantly increased.
Many brands have significantly invested in EV cars. Tesla YoY % is -13.3 % .This may have caused Tesla to reduce the sale price of its EV cars to compete with the growing competition.
However, Tesla is still a leader in the EV segment with respect to overall sales.
Performing a T-test to compare EV sales or production between two countries group : UK and Germany, is not enough evidence to conclude that there is a statistically significant difference in mean EV sales between the UK and Germany at the 5% significance level.
Among the most variables considered in Correlation analysis,we see that the most correlation is between Oil displacement and EV Stock, Sales, Stock share and Sales Share. This can be agreed as EV cars are displacing almost 1.8 million barrels of oil a day in 2022.
Another factor in the decline of EV sales may be the number of public charging stations. There are large differences in the number of public charging stations from state to state in the US. At the time of the study, California had the most at 14,040 while Alaska only had 59. The unreliability non-uniform spread of public charging stations in the US may also be contributing to the decline in electric vehicle sales.

DATA607 Final Project

Dhanya Nair and Evelyn Bartley

2024-04-28

Topic for analysis

Load libraries

Data sources

1. Global Electric Vehicle Dataset (CSV Format)

Data Gathering

Data Transformation

Data Analysis

Conclusion

2. Tesla Production and sales (PDF Format)

Data Gathering

Data Transformation

Data Analysis

Conclusion

3. Competitor Analysis (PDF Format)

Data Gathering

Data Transformation

Data Analysis

Conclusion:

4. Historic Sales (csv format)

Satistical Inference

5. ev-charging-stations (Web Scraping format)

Statistical Analysis on scraped data

Conclusions

Statistical Analysis

Feature not covered in class!

Overall Conclusion