Introduction:

The emergence of electric vehicles in Europe has been a major development in the automotive industry over the past decade. With concerns about climate change and air pollution, many consumers and governments have been looking for ways to reduce their fossil fuel consumption. Due to their environmental benefits and low operating costs, their emergence has become a source of interest for me.

Data:

The dataset used in this analysis contains information on different EV’s including their price, and characteristics. The data was obtained from Kaggle and can be downloaded here.

Lets first import our dataset and libraries:

#importing Dataset into markdown and ggplot libraries
electric_data <- read.csv("ElectricCarData_Clean.csv")
library(ggplot2)
library(dplyr)
library(plotly)
library(rpart)
library(rpart.plot)
#Structure of the data
str(electric_data)
## 'data.frame':    103 obs. of  14 variables:
##  $ Brand          : chr  "Tesla " "Volkswagen " "Polestar " "BMW " ...
##  $ Model          : chr  "Model 3 Long Range Dual Motor" "ID.3 Pure" "2" "iX3 " ...
##  $ AccelSec       : num  4.6 10 4.7 6.8 9.5 2.8 9.6 8.1 5.6 6.3 ...
##  $ TopSpeed_KmH   : int  233 160 210 180 145 250 150 150 225 180 ...
##  $ Range_Km       : int  450 270 400 360 170 610 190 275 310 400 ...
##  $ Efficiency_WhKm: int  161 167 181 206 168 180 168 164 153 193 ...
##  $ FastCharge_KmH : chr  "940" "250" "620" "560" ...
##  $ RapidCharge    : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ PowerTrain     : chr  "AWD" "RWD" "AWD" "RWD" ...
##  $ PlugType       : chr  "Type 2 CCS" "Type 2 CCS" "Type 2 CCS" "Type 2 CCS" ...
##  $ BodyStyle      : chr  "Sedan" "Hatchback" "Liftback" "SUV" ...
##  $ Segment        : chr  "D" "C" "D" "D" ...
##  $ Seats          : int  5 5 5 5 4 5 5 5 5 5 ...
##  $ PriceEuro      : int  55480 30000 56440 68040 32997 105000 31900 29682 46380 55000 ...

Our data is composed of 103 rows (observations) and 14 columns (variables) each describing an electric vehicle’s features such as its model, brand, acceleration, and range.

summary(electric_data)
##     Brand              Model              AccelSec       TopSpeed_KmH  
##  Length:103         Length:103         Min.   : 2.100   Min.   :123.0  
##  Class :character   Class :character   1st Qu.: 5.100   1st Qu.:150.0  
##  Mode  :character   Mode  :character   Median : 7.300   Median :160.0  
##                                        Mean   : 7.396   Mean   :179.2  
##                                        3rd Qu.: 9.000   3rd Qu.:200.0  
##                                        Max.   :22.400   Max.   :410.0  
##     Range_Km     Efficiency_WhKm FastCharge_KmH     RapidCharge       
##  Min.   : 95.0   Min.   :104.0   Length:103         Length:103        
##  1st Qu.:250.0   1st Qu.:168.0   Class :character   Class :character  
##  Median :340.0   Median :180.0   Mode  :character   Mode  :character  
##  Mean   :338.8   Mean   :189.2                                        
##  3rd Qu.:400.0   3rd Qu.:203.0                                        
##  Max.   :970.0   Max.   :273.0                                        
##   PowerTrain          PlugType          BodyStyle           Segment         
##  Length:103         Length:103         Length:103         Length:103        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      Seats         PriceEuro     
##  Min.   :2.000   Min.   : 20129  
##  1st Qu.:5.000   1st Qu.: 34430  
##  Median :5.000   Median : 45000  
##  Mean   :4.883   Mean   : 55812  
##  3rd Qu.:5.000   3rd Qu.: 65000  
##  Max.   :7.000   Max.   :215000
knitr::kable(electric_data[1:14,],caption = "EV Dataset")
EV Dataset
Brand Model AccelSec TopSpeed_KmH Range_Km Efficiency_WhKm FastCharge_KmH RapidCharge PowerTrain PlugType BodyStyle Segment Seats PriceEuro
Tesla Model 3 Long Range Dual Motor 4.6 233 450 161 940 Yes AWD Type 2 CCS Sedan D 5 55480
Volkswagen ID.3 Pure 10.0 160 270 167 250 Yes RWD Type 2 CCS Hatchback C 5 30000
Polestar 2 4.7 210 400 181 620 Yes AWD Type 2 CCS Liftback D 5 56440
BMW iX3 6.8 180 360 206 560 Yes RWD Type 2 CCS SUV D 5 68040
Honda e 9.5 145 170 168 190 Yes RWD Type 2 CCS Hatchback B 4 32997
Lucid Air 2.8 250 610 180 620 Yes AWD Type 2 CCS Sedan F 5 105000
Volkswagen e-Golf 9.6 150 190 168 220 Yes FWD Type 2 CCS Hatchback C 5 31900
Peugeot e-208 8.1 150 275 164 420 Yes FWD Type 2 CCS Hatchback B 5 29682
Tesla Model 3 Standard Range Plus 5.6 225 310 153 650 Yes RWD Type 2 CCS Sedan D 5 46380
Audi Q4 e-tron 6.3 180 400 193 540 Yes AWD Type 2 CCS SUV D 5 55000
Mercedes EQC 400 4MATIC 5.1 180 370 216 440 Yes AWD Type 2 CCS SUV D 5 69484
Nissan Leaf 7.9 144 220 164 230 Yes FWD Type 2 CHAdeMO Hatchback C 5 29234
Hyundai Kona Electric 64 kWh 7.9 167 400 160 380 Yes FWD Type 2 CCS SUV B 5 40795
BMW i4 4.0 200 450 178 650 Yes RWD Type 2 CCS Sedan D 5 65000

Data Visualization

Lets first visualize the distribution of EV (Electric Vehicle) prices:

ggplot(electric_data, aes(x=PriceEuro)) +
  geom_histogram(bins = 10, fill = "#006699") +
  labs(title = "Distribution of Vehicle Prices", 
       x = "Price (Euro)", 
       y = "Frequency")

The distribution of prices is somewhat skewed to the left, with a majority of vehicles falling in the lower price ranges from 40,000 to 50,000 Euros.

Now lets look at the relationship between EV Prices & range:

The range of an electric vehicle is an important factor when considering buying an EV because it determines how far the vehicle can go on a single charge. Unlike normal Gasoline powered vehicles, EV’s have limitations to their range due to the battery size (Due to its weight and density) thus many consumers will choose EV’s with the biggest battery and the best efficiency.

ggplot(electric_data, aes(x = PriceEuro, y = Range_Km)) +
  geom_point(aes(colour = Range_Km), size = 2) + 
  xlim(0,90000) + 
  ylim(0,540) + 
  geom_smooth(method=lm, col='red', size=2) + 
  ggtitle("Price Vs Range")

Looking at this Figure, their is a linear relationship between the price of the electric vehicle and the overall range of the vehicle with their respective battery sizes. It also seems like their is diminishing returns past the 50,000 Euro mark with respect to their range.

Lastly let’s take a look at each vehicle’s statistics such as it’s range, rate of fast charge and its efficiency:

fig <- plot_ly(electric_data, x = ~FastCharge_KmH, 
               y = ~Range_Km, 
               z = ~Efficiency_WhKm, 
               color= ~Brand)

fig <- fig %>% add_markers()

fig <- fig %>% layout(title='Relationship Between Fast Charging Vs Range Vs Efficiency',
                      scene = list(xaxis = list(title = 'FastCharge_KmH'),
                                   yaxis = list(title = 'Range_Km'),
                                   zaxis = list(title = 'Efficiency_WhKm')))
fig

EV Price Prediction Model:

First Method: Linear Regression

Let’s first split the original dataset into a training and test datasets:

#Splitting original dataset into training and test datasets

set.seed(120)

indices <- sample(nrow(electric_data), 0.70 * nrow(electric_data))
train <- electric_data[indices, ]
test <- electric_data[-indices, ]
actual <- test$PriceEuro

Simple_Lm <- lm(PriceEuro ~ TopSpeed_KmH + 
                   Range_Km + Efficiency_WhKm, data = train)

P_Simple_Lm <- predict(Simple_Lm, test)

Predicted.Price.Df <- data.frame(P_Simple_Lm)

ggplot(data = Predicted.Price.Df, aes(x = actual, y = P_Simple_Lm)) + 
  geom_point() +
  labs(title = "Multivariable Linear Regression Prediction Vs Actual", ylab = "Predicted", xlab = "Actual") + 
  geom_smooth(method = "lm", colour = "red") + 
  xlim(0,125000) + ylim(0,150000)

The performance independent variables make sense due to its

Second Method: Decision Trees

Decision.Tree.1 <- rpart(PriceEuro ~ TopSpeed_KmH + Range_Km + Efficiency_WhKm, 
                         data = electric_data, method = "class", 
                         minsplit = 30, minbucket = 20,cp = 0.0001)

rpart.plot(Decision.Tree.1)

printcp(Decision.Tree.1)
## 
## Classification tree:
## rpart(formula = PriceEuro ~ TopSpeed_KmH + Range_Km + Efficiency_WhKm, 
##     data = electric_data, method = "class", minsplit = 30, minbucket = 20, 
##     cp = 1e-04)
## 
## Variables actually used in tree construction:
## [1] Efficiency_WhKm Range_Km        TopSpeed_KmH   
## 
## Root node error: 97/103 = 0.94175
## 
## n= 103 
## 
##         CP nsplit rel error xerror     xstd
## 1 0.020619      0   1.00000 1.0000 0.024506
## 2 0.010309      2   0.95876 1.0309 0.017594
## 3 0.000100      3   0.94845 1.0412 0.014437

Conclusion: