EV Pipeline and Analytics

Electric Car Sales Analytics - Rakha Hafish Setiawan

Introduction

In this project, the data set “Electric Car Sales by Model in the US” was used in order to determine the insights and the actionable insights for various users such as investors and engineers to better understand the growth and trend of the Electric Cars from 2012 to 2019. Additionally this project was also initiated in order to practice the author’s skills in pipeline development and analysis with the R Programming Language and the tidyverse library.

Original public data set courtesy of mathurinache on Kaggle.

Library Activation

Here the necessary libraries were activated and the designated data set was read. The raw data set itself contains 99 columns, 3 consists of the make and model data as well as the logo for each of the makes, and the rest contains the actual sales data that needed reformatting from wide to long for the ease of aggregation.

library(tidyverse)
library(sqldf)
library(readxl)
MyCars = read_excel("C:/Users/Rakha Hafish S/Downloads/Electric Car Sales by Model in USA.xlsx")
MyCars %>% head(5)
# A tibble: 5 × 99
  Make      Model     Logo  `janv-12` `Feb 2012` `mars-12` `Apr 2012` `May 2012`
  <chr>     <chr>     <chr>     <dbl>      <dbl>     <dbl>      <dbl>      <dbl>
1 Chevrolet Volt      http…       603       1626      3915       5377       7057
2 Toyota    Prius PHV http…         0         21       912       2566       3652
3 Nissan    Leaf      http…       676       1154      1733       2103       2613
4 Tesla     Model S   http…         0         NA        NA         NA         NA
5 Ford      C-Max En… http…         0         NA        NA         NA         NA
# ℹ 91 more variables: `juin-12` <dbl>, `juil-12` <dbl>, `Aug 2012` <dbl>,
#   `sept-12` <dbl>, `oct-12` <dbl>, `nov-12` <dbl>, `Dec 2012` <dbl>,
#   `janv-13` <dbl>, `Feb 2013` <dbl>, `mars-13` <dbl>, `Apr 2013` <dbl>,
#   `May 2013` <dbl>, `juin-13` <dbl>, `juil-13` <dbl>, `Aug 2013` <dbl>,
#   `sept-13` <dbl>, `oct-13` <dbl>, `nov-13` <dbl>, `Dec 2013` <dbl>,
#   `janv-14` <dbl>, `Feb 2014` <dbl>, `mars-14` <dbl>, `Apr 2014` <dbl>,
#   `May 2014` <dbl>, `juin-14` <dbl>, `juil-14` <dbl>, `Aug 2014` <dbl>, …

Data Transformation

Reformatting and Replacement

In this code chunk, the data set was reformatted using tidyr and dplyr. The logo column will not be necessary so it was deleted out of the set. The wide format was changed to long format, and all the NA values were replaced with 0.

MyCarsTemp = MyCars %>%
  select(everything(), -3) %>%
  gather(Date, Sales, 3:98) %>%
  mutate(Sales = ifelse(is.na(Sales), 0, Sales))
MyCarsTemp %>% tibble() %>% head(5)
# A tibble: 5 × 4
  Make      Model        Date    Sales
  <chr>     <chr>        <chr>   <chr>
1 Chevrolet Volt         janv-12 603  
2 Toyota    Prius PHV    janv-12 0    
3 Nissan    Leaf         janv-12 676  
4 Tesla     Model S      janv-12 0    
5 Ford      C-Max Energi janv-12 0    

Isolation and Cleaning

In this chunk, the Date column was isolated using dplyr and stringr in order to better clean it. This particular column was heavily cleaned in order to enable the yearly aggregate for the Electric Car Sales.

MyDates = MyCars %>%
  select(everything()) %>%
  gather(Dates, Sales, 4:99) %>%
  select(Dates) %>%
  apply(2, FUN = str_remove_all, " ") %>%
  data.frame() %>%
  apply(2, FUN = str_remove_all, "-") %>%
  data.frame() %>%
  apply(2, FUN = str_to_title) %>%
  data.frame()
MyDates %>% head(3) %>% tibble()
# A tibble: 3 × 1
  Dates 
  <chr> 
1 Janv12
2 Janv12
3 Janv12

Recombining and Finalization

In this chunk, the reformatted data frame and isolated date column were recombined to make the final data set for analysis.

CarsFixed = MyCarsTemp %>%
  bind_cols(MyDates) %>%
  select(Make, Model, Dates, Sales) %>%
  mutate(Year = if_else(grepl("12", Dates), 2012,
                if_else(grepl("13", Dates), 2013,
                if_else(grepl("14", Dates), 2014,
                if_else(grepl("15", Dates), 2015,
                if_else(grepl("16", Dates), 2016,
                if_else(grepl("17", Dates), 2017,
                if_else(grepl("18", Dates), 2018, 2019)))))))) %>%
  select(Make, Model, Year, Sales)
CarsFixed %>% tibble() %>% head(5)
# A tibble: 5 × 4
  Make      Model         Year Sales
  <chr>     <chr>        <dbl> <chr>
1 Chevrolet Volt          2012 603  
2 Toyota    Prius PHV     2012 0    
3 Nissan    Leaf          2012 676  
4 Tesla     Model S       2012 0    
5 Ford      C-Max Energi  2012 0    

Exploratory Data Analysis

Exploratory data analysis (EDA) was done in order to obtain a bird’s eyes view of the sales growth and trends among the various makes and models.

Max and Average Sales of each make and model - overall

In this query, the top 3 sales between 2012-2019 were achieved by Chevrolet, Nissan, and Tesla, lead by Volt.

sqldf("SELECT Make,
      Model, 
      MAX(Sales), 
      AVG(Sales) 
      FROM CarsFixed GROUP BY Make, Model
             ORDER BY AVG(Sales) DESC")
         Make              Model MAX(Sales)   AVG(Sales)
1   Chevrolet               Volt      99270 81689.218750
2      Nissan               Leaf      99646 69983.354167
3       Tesla            Model S      99932 68923.708333
4       Tesla            Model 3       9944 33512.614583
5      Toyota          Prius PHV       9623 33408.177083
6        Ford      Fusion Energi       9043 29734.677083
7        Ford       C-Max Energi       9999 24178.354167
8       Tesla            Model X       9737 20397.968750
9         BMW                 i3       9997 16632.937500
10       Fiat               500e       9326 13218.468750
11     Toyota        Prius Prime       8587 11946.229167
12  Chevrolet            Bolt EV       8171 11793.093750
13         VW             e-Golf       9800  5803.104167
14        BMW       X5 xDrive40e       9813  5466.343750
15       Ford     Focus Electric         97  5407.354167
16      Honda       Clarity PHEV       9354  4410.041667
17  Chevrolet           Spark EV        908  4100.302083
18      Smart                 ED        923  3863.312500
19       Audi   A3 Sprtbk e-tron       9978  3404.052083
20        BMW               530e       9328  2674.760417
21        BMW                 i8          9  2600.177083
22   Chrysler    Pacifica Hybrid       9428  2380.406250
23        Kia            Soul EV        981  2324.625000
24    Hyundai        Sonata PHEV        810  2170.322917
25   Mercedes         B-Class ED         92  2145.333333
26        BMW               330e        999  2096.010417
27    Porsche        Cayenne S-E        945  2008.552083
28      Volvo               XC90        926  1963.020833
29     Toyota            Rav4 EV        940  1884.208333
30   Cadillac                ELR        891  1738.177083
31 Mitsubishi             i-MiEV        845  1672.677083
32    Porsche  Panamera E-Hybrid        965  1481.375000
33 Mitsubishi     Outlander PHEV         99   991.875000
34      Honda             Fit EV        980   829.447917
35      Honda         Accord PHV        975   770.343750
36   Mercedes              C350e        988   706.510417
37        Kia          Niro PHEV        966   662.833333
38      Volvo          XC60 PHEV        962   657.687500
39        Kia        Optima PHEV        930   585.177083
40      Honda        Clarity BEV        594   541.552083
41   Mercedes              S550e        998   501.916667
42   Mercedes           GLE 550e         99   471.520833
43       Mini Countryman SE PHEV        882   398.677083
44    Hyundai           IONIQ EV         99   393.468750
45    Hyundai         IONIQ PHEV        958   335.020833
46        BMW               740e        978   276.781250
47   Mercedes           GLC 350e        888   253.479167
48       Audi             e-tron       5369   237.635417
49     Jaguar             I-Pace        789   228.447917
50      Volvo        S90 T8 PHEV        988   148.885417
51    Porsche         918 Spyder         86   118.656250
52   Cadillac           CT6 PHEV         89   108.833333
53    Hyundai      Kona Electric        897    60.708333
54     Subaru   Crosstrek Hybrid     970‬    48.385417
55        Kia            Niro EV     880‬    43.114583
56      Other              Other         35    21.125000
57    Porsche             Taycan        130     1.354167

Average Sales by Year

This query shows the growth of the average sales by year.

sqldf("SELECT Year, AVG(Sales)
      FROM CarsFixed
      GROUP BY Year") 
  Year AVG(Sales)
1 2012   380.0716
2 2013  1757.3041
3 2014  3713.8787
4 2015  5807.9123
5 2016  8125.0307
6 2017 11341.1404
7 2018 15906.2398
8 2019 20955.2851

Top selling models of each year

This query shows the best selling models of each year.

sqldf("SELECT Year, MAX(Sales), Make, Model
      FROM CarsFixed
      GROUP BY Year")
  Year MAX(Sales)      Make            Model
1 2012       9819    Nissan             Leaf
2 2013       9650     Tesla          Model S
3 2014       9999      Ford     C-Max Energi
4 2015       9997       BMW               i3
5 2016      99270 Chevrolet             Volt
6 2017      99932     Tesla          Model S
7 2018       9944     Tesla          Model 3
8 2019       9978      Audi A3 Sprtbk e-tron

Insights

  • According to the analysis, it is recommended to invest in Tesla due it’s stability in population of the best selling model in 2013, 2017 and 2018.

  • Additionally it is imperative if investing in Tesla to evenly distribute the shares over the different models and develop similar machinations, due to their population in the overall yearly analysis.

  • It is recommended to watch over the other best-selling makes and models such as Ford and Chevrolet.