The emergence of electric vehicles in Europe has been a major development in the automotive industry over the past decade. With concerns about climate change and air pollution, many consumers and governments have been looking for ways to reduce their fossil fuel consumption. Due to their environmental benefits and low operating costs, their emergence has become a source of interest for me.
The dataset used in this analysis contains information on different EV’s including their price, and characteristics. The data was obtained from Kaggle and can be downloaded here.
Lets first import our dataset and libraries:
#importing Dataset into markdown and ggplot libraries
electric_data <- read.csv("ElectricCarData_Clean.csv")
library(ggplot2)
library(dplyr)
library(plotly)
library(rpart)
library(rpart.plot)
#Structure of the data
str(electric_data)
## 'data.frame': 103 obs. of 14 variables:
## $ Brand : chr "Tesla " "Volkswagen " "Polestar " "BMW " ...
## $ Model : chr "Model 3 Long Range Dual Motor" "ID.3 Pure" "2" "iX3 " ...
## $ AccelSec : num 4.6 10 4.7 6.8 9.5 2.8 9.6 8.1 5.6 6.3 ...
## $ TopSpeed_KmH : int 233 160 210 180 145 250 150 150 225 180 ...
## $ Range_Km : int 450 270 400 360 170 610 190 275 310 400 ...
## $ Efficiency_WhKm: int 161 167 181 206 168 180 168 164 153 193 ...
## $ FastCharge_KmH : chr "940" "250" "620" "560" ...
## $ RapidCharge : chr "Yes" "Yes" "Yes" "Yes" ...
## $ PowerTrain : chr "AWD" "RWD" "AWD" "RWD" ...
## $ PlugType : chr "Type 2 CCS" "Type 2 CCS" "Type 2 CCS" "Type 2 CCS" ...
## $ BodyStyle : chr "Sedan" "Hatchback" "Liftback" "SUV" ...
## $ Segment : chr "D" "C" "D" "D" ...
## $ Seats : int 5 5 5 5 4 5 5 5 5 5 ...
## $ PriceEuro : int 55480 30000 56440 68040 32997 105000 31900 29682 46380 55000 ...
Our data is composed of 103 rows (observations) and 14 columns (variables) each describing an electric vehicle’s features such as its model, brand, acceleration, and range.
summary(electric_data)
## Brand Model AccelSec TopSpeed_KmH
## Length:103 Length:103 Min. : 2.100 Min. :123.0
## Class :character Class :character 1st Qu.: 5.100 1st Qu.:150.0
## Mode :character Mode :character Median : 7.300 Median :160.0
## Mean : 7.396 Mean :179.2
## 3rd Qu.: 9.000 3rd Qu.:200.0
## Max. :22.400 Max. :410.0
## Range_Km Efficiency_WhKm FastCharge_KmH RapidCharge
## Min. : 95.0 Min. :104.0 Length:103 Length:103
## 1st Qu.:250.0 1st Qu.:168.0 Class :character Class :character
## Median :340.0 Median :180.0 Mode :character Mode :character
## Mean :338.8 Mean :189.2
## 3rd Qu.:400.0 3rd Qu.:203.0
## Max. :970.0 Max. :273.0
## PowerTrain PlugType BodyStyle Segment
## Length:103 Length:103 Length:103 Length:103
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Seats PriceEuro
## Min. :2.000 Min. : 20129
## 1st Qu.:5.000 1st Qu.: 34430
## Median :5.000 Median : 45000
## Mean :4.883 Mean : 55812
## 3rd Qu.:5.000 3rd Qu.: 65000
## Max. :7.000 Max. :215000
knitr::kable(electric_data[1:14,],caption = "EV Dataset")
| Brand | Model | AccelSec | TopSpeed_KmH | Range_Km | Efficiency_WhKm | FastCharge_KmH | RapidCharge | PowerTrain | PlugType | BodyStyle | Segment | Seats | PriceEuro |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tesla | Model 3 Long Range Dual Motor | 4.6 | 233 | 450 | 161 | 940 | Yes | AWD | Type 2 CCS | Sedan | D | 5 | 55480 |
| Volkswagen | ID.3 Pure | 10.0 | 160 | 270 | 167 | 250 | Yes | RWD | Type 2 CCS | Hatchback | C | 5 | 30000 |
| Polestar | 2 | 4.7 | 210 | 400 | 181 | 620 | Yes | AWD | Type 2 CCS | Liftback | D | 5 | 56440 |
| BMW | iX3 | 6.8 | 180 | 360 | 206 | 560 | Yes | RWD | Type 2 CCS | SUV | D | 5 | 68040 |
| Honda | e | 9.5 | 145 | 170 | 168 | 190 | Yes | RWD | Type 2 CCS | Hatchback | B | 4 | 32997 |
| Lucid | Air | 2.8 | 250 | 610 | 180 | 620 | Yes | AWD | Type 2 CCS | Sedan | F | 5 | 105000 |
| Volkswagen | e-Golf | 9.6 | 150 | 190 | 168 | 220 | Yes | FWD | Type 2 CCS | Hatchback | C | 5 | 31900 |
| Peugeot | e-208 | 8.1 | 150 | 275 | 164 | 420 | Yes | FWD | Type 2 CCS | Hatchback | B | 5 | 29682 |
| Tesla | Model 3 Standard Range Plus | 5.6 | 225 | 310 | 153 | 650 | Yes | RWD | Type 2 CCS | Sedan | D | 5 | 46380 |
| Audi | Q4 e-tron | 6.3 | 180 | 400 | 193 | 540 | Yes | AWD | Type 2 CCS | SUV | D | 5 | 55000 |
| Mercedes | EQC 400 4MATIC | 5.1 | 180 | 370 | 216 | 440 | Yes | AWD | Type 2 CCS | SUV | D | 5 | 69484 |
| Nissan | Leaf | 7.9 | 144 | 220 | 164 | 230 | Yes | FWD | Type 2 CHAdeMO | Hatchback | C | 5 | 29234 |
| Hyundai | Kona Electric 64 kWh | 7.9 | 167 | 400 | 160 | 380 | Yes | FWD | Type 2 CCS | SUV | B | 5 | 40795 |
| BMW | i4 | 4.0 | 200 | 450 | 178 | 650 | Yes | RWD | Type 2 CCS | Sedan | D | 5 | 65000 |
Lets first visualize the distribution of EV (Electric Vehicle) prices:
ggplot(electric_data, aes(x=PriceEuro)) +
geom_histogram(bins = 10, fill = "#006699") +
labs(title = "Distribution of Vehicle Prices",
x = "Price (Euro)",
y = "Frequency")
The distribution of prices is somewhat skewed to the left, with a majority of vehicles falling in the lower price ranges from 40,000 to 50,000 Euros.
Now lets look at the relationship between EV Prices & range:
The range of an electric vehicle is an important factor when considering buying an EV because it determines how far the vehicle can go on a single charge. Unlike normal Gasoline powered vehicles, EV’s have limitations to their range due to the battery size (Due to its weight and density) thus many consumers will choose EV’s with the biggest battery and the best efficiency.
ggplot(electric_data, aes(x = PriceEuro, y = Range_Km)) +
geom_point(aes(colour = Range_Km), size = 2) +
xlim(0,90000) +
ylim(0,540) +
geom_smooth(method=lm, col='red', size=2) +
ggtitle("Price Vs Range")
Looking at this Figure, their is a linear relationship between the price of the electric vehicle and the overall range of the vehicle with their respective battery sizes. It also seems like their is diminishing returns past the 50,000 Euro mark with respect to their range.
Lastly let’s take a look at each vehicle’s statistics such as it’s range, rate of fast charge and its efficiency:
fig <- plot_ly(electric_data, x = ~FastCharge_KmH,
y = ~Range_Km,
z = ~Efficiency_WhKm,
color= ~Brand)
fig <- fig %>% add_markers()
fig <- fig %>% layout(title='Relationship Between Fast Charging Vs Range Vs Efficiency',
scene = list(xaxis = list(title = 'FastCharge_KmH'),
yaxis = list(title = 'Range_Km'),
zaxis = list(title = 'Efficiency_WhKm')))
fig
Let’s first split the original dataset into a training and test datasets:
#Splitting original dataset into training and test datasets
set.seed(120)
indices <- sample(nrow(electric_data), 0.70 * nrow(electric_data))
train <- electric_data[indices, ]
test <- electric_data[-indices, ]
actual <- test$PriceEuro
Simple_Lm <- lm(PriceEuro ~ TopSpeed_KmH +
Range_Km + Efficiency_WhKm, data = train)
P_Simple_Lm <- predict(Simple_Lm, test)
Predicted.Price.Df <- data.frame(P_Simple_Lm)
ggplot(data = Predicted.Price.Df, aes(x = actual, y = P_Simple_Lm)) +
geom_point() +
labs(title = "Multivariable Linear Regression Prediction Vs Actual", ylab = "Predicted", xlab = "Actual") +
geom_smooth(method = "lm", colour = "red") +
xlim(0,125000) + ylim(0,150000)
The performance independent variables make sense due to its
Decision.Tree.1 <- rpart(PriceEuro ~ TopSpeed_KmH + Range_Km + Efficiency_WhKm,
data = electric_data, method = "class",
minsplit = 30, minbucket = 20,cp = 0.0001)
rpart.plot(Decision.Tree.1)
printcp(Decision.Tree.1)
##
## Classification tree:
## rpart(formula = PriceEuro ~ TopSpeed_KmH + Range_Km + Efficiency_WhKm,
## data = electric_data, method = "class", minsplit = 30, minbucket = 20,
## cp = 1e-04)
##
## Variables actually used in tree construction:
## [1] Efficiency_WhKm Range_Km TopSpeed_KmH
##
## Root node error: 97/103 = 0.94175
##
## n= 103
##
## CP nsplit rel error xerror xstd
## 1 0.020619 0 1.00000 1.0000 0.024506
## 2 0.010309 2 0.95876 1.0309 0.017594
## 3 0.000100 3 0.94845 1.0412 0.014437