Step 1: Data

Information on the data selected:

Importing the data

ev_df = read.csv("https://data.wa.gov/api/views/f6w7-q2d2/rows.csv?accessType=DOWNLOAD")

Viewing the first few rows

head(ev_df,n=5)
##   VIN..1.10.    County         City State Postal.Code Model.Year   Make
## 1 5YJSA1E28K Snohomish     Mukilteo    WA       98275       2019  TESLA
## 2 1C4JJXP68P    Yakima       Yakima    WA       98901       2023   JEEP
## 3 WBY8P6C05L    Kitsap     Kingston    WA       98346       2020    BMW
## 4 JTDKARFP1J    Kitsap Port Orchard    WA       98367       2018 TOYOTA
## 5 5UXTA6C09N Snohomish      Everett    WA       98208       2022    BMW
##         Model                  Electric.Vehicle.Type
## 1     MODEL S         Battery Electric Vehicle (BEV)
## 2    WRANGLER Plug-in Hybrid Electric Vehicle (PHEV)
## 3          I3         Battery Electric Vehicle (BEV)
## 4 PRIUS PRIME Plug-in Hybrid Electric Vehicle (PHEV)
## 5          X5 Plug-in Hybrid Electric Vehicle (PHEV)
##   Clean.Alternative.Fuel.Vehicle..CAFV..Eligibility Electric.Range Base.MSRP
## 1           Clean Alternative Fuel Vehicle Eligible            270         0
## 2             Not eligible due to low battery range             21         0
## 3           Clean Alternative Fuel Vehicle Eligible            153         0
## 4             Not eligible due to low battery range             25         0
## 5           Clean Alternative Fuel Vehicle Eligible             30         0
##   Legislative.District DOL.Vehicle.ID                Vehicle.Location
## 1                   21      236424583    POINT (-122.29943 47.912654)
## 2                   15      249905295 POINT (-120.4688751 46.6046178)
## 3                   23      260917289 POINT (-122.5178351 47.7981436)
## 4                   26      186410087 POINT (-122.6530052 47.4739066)
## 5                   44      186076915 POINT (-122.2032349 47.8956271)
##         Electric.Utility X2020.Census.Tract
## 1 PUGET SOUND ENERGY INC        53061042001
## 2             PACIFICORP        53077001601
## 3 PUGET SOUND ENERGY INC        53035090102
## 4 PUGET SOUND ENERGY INC        53035092802
## 5 PUGET SOUND ENERGY INC        53061041605

Step 2: Data Quality

Reviewing the data size

dim(ev_df)
## [1] 200048     17

The data has 200,048 rows and 17 columns

Descriptive Statistics

Calculates summary statistics for each field.

summary(ev_df)
##   VIN..1.10.           County              City              State          
##  Length:200048      Length:200048      Length:200048      Length:200048     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Postal.Code      Model.Year       Make              Model          
##  Min.   : 1731   Min.   :1997   Length:200048      Length:200048     
##  1st Qu.:98052   1st Qu.:2019   Class :character   Class :character  
##  Median :98125   Median :2022   Mode  :character   Mode  :character  
##  Mean   :98176   Mean   :2021                                        
##  3rd Qu.:98372   3rd Qu.:2023                                        
##  Max.   :99577   Max.   :2025                                        
##  NA's   :4                                                           
##  Electric.Vehicle.Type Clean.Alternative.Fuel.Vehicle..CAFV..Eligibility
##  Length:200048         Length:200048                                    
##  Class :character      Class :character                                 
##  Mode  :character      Mode  :character                                 
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##  Electric.Range     Base.MSRP        Legislative.District DOL.Vehicle.ID     
##  Min.   :  0.00   Min.   :     0.0   Min.   : 1.00        Min.   :     4385  
##  1st Qu.:  0.00   1st Qu.:     0.0   1st Qu.:17.00        1st Qu.:190457312  
##  Median :  0.00   Median :     0.0   Median :33.00        Median :236339648  
##  Mean   : 53.49   Mean   :   947.6   Mean   :28.99        Mean   :226298775  
##  3rd Qu.: 53.00   3rd Qu.:     0.0   3rd Qu.:42.00        3rd Qu.:260965900  
##  Max.   :337.00   Max.   :845000.0   Max.   :49.00        Max.   :479254772  
##                                      NA's   :442                             
##  Vehicle.Location   Electric.Utility   X2020.Census.Tract 
##  Length:200048      Length:200048      Min.   :1.001e+09  
##  Class :character   Class :character   1st Qu.:5.303e+10  
##  Mode  :character   Mode  :character   Median :5.303e+10  
##                                        Mean   :5.298e+10  
##                                        3rd Qu.:5.305e+10  
##                                        Max.   :5.602e+10  
##                                        NA's   :4

Displays data types of each column.

str(ev_df)
## 'data.frame':    200048 obs. of  17 variables:
##  $ VIN..1.10.                                       : chr  "5YJSA1E28K" "1C4JJXP68P" "WBY8P6C05L" "JTDKARFP1J" ...
##  $ County                                           : chr  "Snohomish" "Yakima" "Kitsap" "Kitsap" ...
##  $ City                                             : chr  "Mukilteo" "Yakima" "Kingston" "Port Orchard" ...
##  $ State                                            : chr  "WA" "WA" "WA" "WA" ...
##  $ Postal.Code                                      : int  98275 98901 98346 98367 98208 98107 98576 98033 98033 98506 ...
##  $ Model.Year                                       : int  2019 2023 2020 2018 2022 2020 2023 2012 2011 2015 ...
##  $ Make                                             : chr  "TESLA" "JEEP" "BMW" "TOYOTA" ...
##  $ Model                                            : chr  "MODEL S" "WRANGLER" "I3" "PRIUS PRIME" ...
##  $ Electric.Vehicle.Type                            : chr  "Battery Electric Vehicle (BEV)" "Plug-in Hybrid Electric Vehicle (PHEV)" "Battery Electric Vehicle (BEV)" "Plug-in Hybrid Electric Vehicle (PHEV)" ...
##  $ Clean.Alternative.Fuel.Vehicle..CAFV..Eligibility: chr  "Clean Alternative Fuel Vehicle Eligible" "Not eligible due to low battery range" "Clean Alternative Fuel Vehicle Eligible" "Not eligible due to low battery range" ...
##  $ Electric.Range                                   : int  270 21 153 25 30 291 42 73 73 84 ...
##  $ Base.MSRP                                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Legislative.District                             : int  21 15 23 26 44 36 2 45 45 22 ...
##  $ DOL.Vehicle.ID                                   : int  236424583 249905295 260917289 186410087 186076915 112984833 236505139 258649240 180120202 187006893 ...
##  $ Vehicle.Location                                 : chr  "POINT (-122.29943 47.912654)" "POINT (-120.4688751 46.6046178)" "POINT (-122.5178351 47.7981436)" "POINT (-122.6530052 47.4739066)" ...
##  $ Electric.Utility                                 : chr  "PUGET SOUND ENERGY INC" "PACIFICORP" "PUGET SOUND ENERGY INC" "PUGET SOUND ENERGY INC" ...
##  $ X2020.Census.Tract                               : num  5.31e+10 5.31e+10 5.30e+10 5.30e+10 5.31e+10 ...

Provides additional stats for numerical fields.

ev_df %>% select(Model.Year,Electric.Range,Base.MSRP) %>%
  describe()
##                vars      n    mean      sd median trimmed  mad  min    max
## Model.Year        1 200048 2020.87    2.99   2022 2021.31 1.48 1997   2025
## Electric.Range    2 200048   53.49   88.79      0   34.47 0.00    0    337
## Base.MSRP         3 200048  947.55 7860.59      0    0.00 0.00    0 845000
##                 range  skew kurtosis    se
## Model.Year         28 -1.19     0.78  0.01
## Electric.Range    337  1.60     1.11  0.20
## Base.MSRP      845000 14.26   741.33 17.57

Missing Data Assessment

Not much missing data overall

sum(is.na(ev_df)==TRUE)
## [1] 450

Only 442 rows aren’t complete

table(complete.cases(ev_df))
## 
##  FALSE   TRUE 
##    442 199606

99.78% of the data is complete

prop.table(table(complete.cases(ev_df))) * 100
## 
##     FALSE      TRUE 
##  0.220947 99.779053

Legaslative.District appears to be the only field with significant missingness

sort(sapply(ev_df, function(x) sum(is.na(x))))
##                                        VIN..1.10. 
##                                                 0 
##                                            County 
##                                                 0 
##                                              City 
##                                                 0 
##                                             State 
##                                                 0 
##                                        Model.Year 
##                                                 0 
##                                              Make 
##                                                 0 
##                                             Model 
##                                                 0 
##                             Electric.Vehicle.Type 
##                                                 0 
## Clean.Alternative.Fuel.Vehicle..CAFV..Eligibility 
##                                                 0 
##                                    Electric.Range 
##                                                 0 
##                                         Base.MSRP 
##                                                 0 
##                                    DOL.Vehicle.ID 
##                                                 0 
##                                  Vehicle.Location 
##                                                 0 
##                                  Electric.Utility 
##                                                 0 
##                                       Postal.Code 
##                                                 4 
##                                X2020.Census.Tract 
##                                                 4 
##                              Legislative.District 
##                                               442

The majority of the fields are character types, with a few integer/numerics. There is only a small fraction of missing values, primarily in the Legaslative.District column.

vis_dat(ev_df,warn_large_data = FALSE)

vis_miss(ev_df,warn_large_data = FALSE)

ev_df %>% select(Legislative.District,Model.Year) %>%
  gg_miss_var(facet = Model.Year)

It appears most of the missing values come from a handful of states like CA, VA, TX, and MD

ev_df %>% select(Legislative.District,State) %>%
  gg_miss_var(facet = State)

Step 3: Data Visualization

One-dimensional visuals

This plot illustrates the frequency of eletrric vehicle range values across the full dataset.

The distribution of electric vehicle ranges is positively skewed. There appear to be two distinct peaks in the distribution with peaks around ~30 and ~220.

ev_df %>% filter(Electric.Range != 0) %>%
ggplot(aes(x=Electric.Range))+
  geom_histogram(bins = 30,color= "black",fill = "light blue")+
  labs(title = "Electric Vehicle Range Frequencies")+
  xlab("Range")+
  ylab("Frequency")+
  theme_bw()

These plots show the number of electric vehicles for each make.

Tesla seems to be the most common make of EVs by far.

ev_df %>%
  ggplot(aes(x=Make))+
  geom_bar(color="black",fill="light blue")+
  labs(title = "Electrc Vehicle Make Barplot")+
  xlab("Make")+
  ylab("Count")+
  theme_bw()+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Removing Tesla from the plot to increase visibility for others manufacturers, Chevrolet, Ford, and Nissan are the next top 3 contenders for total vehicles registered.

ev_df %>% filter(Make != "TESLA") %>%
  ggplot(aes(x=Make))+
  geom_bar(color="black",fill="light blue")+
  labs(title = "Electrc Vehicle Make Barplot")+
  xlab("Make")+
  ylab("Count")+
  theme_bw()+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Comparison visual

This plot shows the distribution of vehicle ranges by car make.

Based on the boxplot below, it appears Tesla vehicles have higher average vehicle ranges than its competitors. Ford has the lowest median range for its EVs, and Chervolet has the largest variation in its vehicle ranges.

ev_df %>% filter(Make %in% c("TESLA","FORD","CHEVROLET","NISSAN"),
                 Electric.Range != 0) %>%
  ggplot(aes(x=Electric.Range,y = Make))+
  geom_boxplot()+
  xlab("Vehicle Range")+
  ylab("")+
  ggtitle("Electric Vehicel Range by Make")+
  theme_bw()

Two-dimensional visual

This plot displays the relationship be vehicle range and base MSRP.

There appears to be a slight positive correlation between range and base MSRP. Unfortunately a large portion of the MSRP data appears to be 0, so this limits the amount of useful data available for this analysis.

ev_df %>% filter(Base.MSRP != 0) %>%
  ggplot(aes(x=Electric.Range,y=Base.MSRP))+
  geom_point()+
  xlab("Vehicle Range")+
  ylab("Base MSRP")+
  scale_y_continuous(labels=scales::dollar_format())+
  ylim(0,200000)+
  geom_smooth(method="lm",formula = y~x)+
  ggtitle("Electric Vehicle Range vs Base MSRP")+
  theme_bw()
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

cor.test(ev_df$Electric.Range,ev_df$Base.MSRP)
## 
##  Pearson's product-moment correlation
## 
## data:  ev_df$Electric.Range and ev_df$Base.MSRP
## t = 50.505, df = 200046, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1078773 0.1165312
## sample estimates:
##       cor 
## 0.1122064

References

State of Washington. (2020, November 10). Electric Vehicle Population Data. Retrieved from Data.gov: https://catalog.data.gov/dataset/electric-vehicle-population-data