Information on the data selected:
Importing the data
ev_df = read.csv("https://data.wa.gov/api/views/f6w7-q2d2/rows.csv?accessType=DOWNLOAD")
Viewing the first few rows
head(ev_df,n=5)
## VIN..1.10. County City State Postal.Code Model.Year Make
## 1 5YJSA1E28K Snohomish Mukilteo WA 98275 2019 TESLA
## 2 1C4JJXP68P Yakima Yakima WA 98901 2023 JEEP
## 3 WBY8P6C05L Kitsap Kingston WA 98346 2020 BMW
## 4 JTDKARFP1J Kitsap Port Orchard WA 98367 2018 TOYOTA
## 5 5UXTA6C09N Snohomish Everett WA 98208 2022 BMW
## Model Electric.Vehicle.Type
## 1 MODEL S Battery Electric Vehicle (BEV)
## 2 WRANGLER Plug-in Hybrid Electric Vehicle (PHEV)
## 3 I3 Battery Electric Vehicle (BEV)
## 4 PRIUS PRIME Plug-in Hybrid Electric Vehicle (PHEV)
## 5 X5 Plug-in Hybrid Electric Vehicle (PHEV)
## Clean.Alternative.Fuel.Vehicle..CAFV..Eligibility Electric.Range Base.MSRP
## 1 Clean Alternative Fuel Vehicle Eligible 270 0
## 2 Not eligible due to low battery range 21 0
## 3 Clean Alternative Fuel Vehicle Eligible 153 0
## 4 Not eligible due to low battery range 25 0
## 5 Clean Alternative Fuel Vehicle Eligible 30 0
## Legislative.District DOL.Vehicle.ID Vehicle.Location
## 1 21 236424583 POINT (-122.29943 47.912654)
## 2 15 249905295 POINT (-120.4688751 46.6046178)
## 3 23 260917289 POINT (-122.5178351 47.7981436)
## 4 26 186410087 POINT (-122.6530052 47.4739066)
## 5 44 186076915 POINT (-122.2032349 47.8956271)
## Electric.Utility X2020.Census.Tract
## 1 PUGET SOUND ENERGY INC 53061042001
## 2 PACIFICORP 53077001601
## 3 PUGET SOUND ENERGY INC 53035090102
## 4 PUGET SOUND ENERGY INC 53035092802
## 5 PUGET SOUND ENERGY INC 53061041605
Reviewing the data size
dim(ev_df)
## [1] 200048 17
The data has 200,048 rows and 17 columns
Calculates summary statistics for each field.
summary(ev_df)
## VIN..1.10. County City State
## Length:200048 Length:200048 Length:200048 Length:200048
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Postal.Code Model.Year Make Model
## Min. : 1731 Min. :1997 Length:200048 Length:200048
## 1st Qu.:98052 1st Qu.:2019 Class :character Class :character
## Median :98125 Median :2022 Mode :character Mode :character
## Mean :98176 Mean :2021
## 3rd Qu.:98372 3rd Qu.:2023
## Max. :99577 Max. :2025
## NA's :4
## Electric.Vehicle.Type Clean.Alternative.Fuel.Vehicle..CAFV..Eligibility
## Length:200048 Length:200048
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## Electric.Range Base.MSRP Legislative.District DOL.Vehicle.ID
## Min. : 0.00 Min. : 0.0 Min. : 1.00 Min. : 4385
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.:17.00 1st Qu.:190457312
## Median : 0.00 Median : 0.0 Median :33.00 Median :236339648
## Mean : 53.49 Mean : 947.6 Mean :28.99 Mean :226298775
## 3rd Qu.: 53.00 3rd Qu.: 0.0 3rd Qu.:42.00 3rd Qu.:260965900
## Max. :337.00 Max. :845000.0 Max. :49.00 Max. :479254772
## NA's :442
## Vehicle.Location Electric.Utility X2020.Census.Tract
## Length:200048 Length:200048 Min. :1.001e+09
## Class :character Class :character 1st Qu.:5.303e+10
## Mode :character Mode :character Median :5.303e+10
## Mean :5.298e+10
## 3rd Qu.:5.305e+10
## Max. :5.602e+10
## NA's :4
Displays data types of each column.
str(ev_df)
## 'data.frame': 200048 obs. of 17 variables:
## $ VIN..1.10. : chr "5YJSA1E28K" "1C4JJXP68P" "WBY8P6C05L" "JTDKARFP1J" ...
## $ County : chr "Snohomish" "Yakima" "Kitsap" "Kitsap" ...
## $ City : chr "Mukilteo" "Yakima" "Kingston" "Port Orchard" ...
## $ State : chr "WA" "WA" "WA" "WA" ...
## $ Postal.Code : int 98275 98901 98346 98367 98208 98107 98576 98033 98033 98506 ...
## $ Model.Year : int 2019 2023 2020 2018 2022 2020 2023 2012 2011 2015 ...
## $ Make : chr "TESLA" "JEEP" "BMW" "TOYOTA" ...
## $ Model : chr "MODEL S" "WRANGLER" "I3" "PRIUS PRIME" ...
## $ Electric.Vehicle.Type : chr "Battery Electric Vehicle (BEV)" "Plug-in Hybrid Electric Vehicle (PHEV)" "Battery Electric Vehicle (BEV)" "Plug-in Hybrid Electric Vehicle (PHEV)" ...
## $ Clean.Alternative.Fuel.Vehicle..CAFV..Eligibility: chr "Clean Alternative Fuel Vehicle Eligible" "Not eligible due to low battery range" "Clean Alternative Fuel Vehicle Eligible" "Not eligible due to low battery range" ...
## $ Electric.Range : int 270 21 153 25 30 291 42 73 73 84 ...
## $ Base.MSRP : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Legislative.District : int 21 15 23 26 44 36 2 45 45 22 ...
## $ DOL.Vehicle.ID : int 236424583 249905295 260917289 186410087 186076915 112984833 236505139 258649240 180120202 187006893 ...
## $ Vehicle.Location : chr "POINT (-122.29943 47.912654)" "POINT (-120.4688751 46.6046178)" "POINT (-122.5178351 47.7981436)" "POINT (-122.6530052 47.4739066)" ...
## $ Electric.Utility : chr "PUGET SOUND ENERGY INC" "PACIFICORP" "PUGET SOUND ENERGY INC" "PUGET SOUND ENERGY INC" ...
## $ X2020.Census.Tract : num 5.31e+10 5.31e+10 5.30e+10 5.30e+10 5.31e+10 ...
Provides additional stats for numerical fields.
ev_df %>% select(Model.Year,Electric.Range,Base.MSRP) %>%
describe()
## vars n mean sd median trimmed mad min max
## Model.Year 1 200048 2020.87 2.99 2022 2021.31 1.48 1997 2025
## Electric.Range 2 200048 53.49 88.79 0 34.47 0.00 0 337
## Base.MSRP 3 200048 947.55 7860.59 0 0.00 0.00 0 845000
## range skew kurtosis se
## Model.Year 28 -1.19 0.78 0.01
## Electric.Range 337 1.60 1.11 0.20
## Base.MSRP 845000 14.26 741.33 17.57
Not much missing data overall
sum(is.na(ev_df)==TRUE)
## [1] 450
Only 442 rows aren’t complete
table(complete.cases(ev_df))
##
## FALSE TRUE
## 442 199606
99.78% of the data is complete
prop.table(table(complete.cases(ev_df))) * 100
##
## FALSE TRUE
## 0.220947 99.779053
Legaslative.District appears to be the only field with significant missingness
sort(sapply(ev_df, function(x) sum(is.na(x))))
## VIN..1.10.
## 0
## County
## 0
## City
## 0
## State
## 0
## Model.Year
## 0
## Make
## 0
## Model
## 0
## Electric.Vehicle.Type
## 0
## Clean.Alternative.Fuel.Vehicle..CAFV..Eligibility
## 0
## Electric.Range
## 0
## Base.MSRP
## 0
## DOL.Vehicle.ID
## 0
## Vehicle.Location
## 0
## Electric.Utility
## 0
## Postal.Code
## 4
## X2020.Census.Tract
## 4
## Legislative.District
## 442
The majority of the fields are character types, with a few integer/numerics. There is only a small fraction of missing values, primarily in the Legaslative.District column.
vis_dat(ev_df,warn_large_data = FALSE)
vis_miss(ev_df,warn_large_data = FALSE)
ev_df %>% select(Legislative.District,Model.Year) %>%
gg_miss_var(facet = Model.Year)
It appears most of the missing values come from a handful of states like CA, VA, TX, and MD
ev_df %>% select(Legislative.District,State) %>%
gg_miss_var(facet = State)
This plot illustrates the frequency of eletrric vehicle range values across the full dataset.
The distribution of electric vehicle ranges is positively skewed. There appear to be two distinct peaks in the distribution with peaks around ~30 and ~220.
ev_df %>% filter(Electric.Range != 0) %>%
ggplot(aes(x=Electric.Range))+
geom_histogram(bins = 30,color= "black",fill = "light blue")+
labs(title = "Electric Vehicle Range Frequencies")+
xlab("Range")+
ylab("Frequency")+
theme_bw()
These plots show the number of electric vehicles for each make.
Tesla seems to be the most common make of EVs by far.
ev_df %>%
ggplot(aes(x=Make))+
geom_bar(color="black",fill="light blue")+
labs(title = "Electrc Vehicle Make Barplot")+
xlab("Make")+
ylab("Count")+
theme_bw()+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Removing Tesla from the plot to increase visibility for others manufacturers, Chevrolet, Ford, and Nissan are the next top 3 contenders for total vehicles registered.
ev_df %>% filter(Make != "TESLA") %>%
ggplot(aes(x=Make))+
geom_bar(color="black",fill="light blue")+
labs(title = "Electrc Vehicle Make Barplot")+
xlab("Make")+
ylab("Count")+
theme_bw()+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
This plot shows the distribution of vehicle ranges by car make.
Based on the boxplot below, it appears Tesla vehicles have higher average vehicle ranges than its competitors. Ford has the lowest median range for its EVs, and Chervolet has the largest variation in its vehicle ranges.
ev_df %>% filter(Make %in% c("TESLA","FORD","CHEVROLET","NISSAN"),
Electric.Range != 0) %>%
ggplot(aes(x=Electric.Range,y = Make))+
geom_boxplot()+
xlab("Vehicle Range")+
ylab("")+
ggtitle("Electric Vehicel Range by Make")+
theme_bw()
This plot displays the relationship be vehicle range and base MSRP.
There appears to be a slight positive correlation between range and base MSRP. Unfortunately a large portion of the MSRP data appears to be 0, so this limits the amount of useful data available for this analysis.
ev_df %>% filter(Base.MSRP != 0) %>%
ggplot(aes(x=Electric.Range,y=Base.MSRP))+
geom_point()+
xlab("Vehicle Range")+
ylab("Base MSRP")+
scale_y_continuous(labels=scales::dollar_format())+
ylim(0,200000)+
geom_smooth(method="lm",formula = y~x)+
ggtitle("Electric Vehicle Range vs Base MSRP")+
theme_bw()
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
cor.test(ev_df$Electric.Range,ev_df$Base.MSRP)
##
## Pearson's product-moment correlation
##
## data: ev_df$Electric.Range and ev_df$Base.MSRP
## t = 50.505, df = 200046, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1078773 0.1165312
## sample estimates:
## cor
## 0.1122064
State of Washington. (2020, November 10). Electric Vehicle Population Data. Retrieved from Data.gov: https://catalog.data.gov/dataset/electric-vehicle-population-data