library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(lubridate)
library(zoo)
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(forcats)
library(ggplot2)

electricVehiclePop <- read.csv("Electric_Vehicle_Population_Data.csv")

Dataset Information

The data set Electric_Vehicle_Population_Data, is a set of information of the amount of electric cars that are driven and owned in the state of Washington, gathered by the government (from data.gov). The data set displays a count of all the kinds of electric cars, their brands, models, etc., that are present within Washington. Specific variables of the data set will be examined to gather proportion of Teslas in the entire population of electric cars in Washington, compared to the proportion of the rest of the electric cars combined in the total population. Thus leading to the question, is the proportion of Tesla cars in an entire population of electric cars in Washington greater than the proportion of the rest of electric cars?

Key variables: - Model.Year - Make - Model

Note: Variables will be modified to replace all uppercase letters and periods in the column names.

Hypothesis \(H_0\): p = 0.5 \(H_a\): p > 0.5

α : 0.05

head(electricVehiclePop)
##   VIN..1.10.    County     City State Postal.Code Model.Year   Make   Model
## 1 WA1E2AFY8R  Thurston  Olympia    WA       98512       2024   AUDI    Q5 E
## 2 WAUUPBFF4J    Yakima   Wapato    WA       98951       2018   AUDI      A3
## 3 1N4AZ0CP0F      King  Seattle    WA       98125       2015 NISSAN    LEAF
## 4 WA1VAAGE5K      King     Kent    WA       98031       2019   AUDI  E-TRON
## 5 7SAXCAE57N Snohomish  Bothell    WA       98021       2022  TESLA MODEL X
## 6 KNDJP3AEXG Snohomish Lynnwood    WA       98037       2016    KIA    SOUL
##                    Electric.Vehicle.Type
## 1 Plug-in Hybrid Electric Vehicle (PHEV)
## 2 Plug-in Hybrid Electric Vehicle (PHEV)
## 3         Battery Electric Vehicle (BEV)
## 4         Battery Electric Vehicle (BEV)
## 5         Battery Electric Vehicle (BEV)
## 6         Battery Electric Vehicle (BEV)
##              Clean.Alternative.Fuel.Vehicle..CAFV..Eligibility Electric.Range
## 1                        Not eligible due to low battery range             23
## 2                        Not eligible due to low battery range             16
## 3                      Clean Alternative Fuel Vehicle Eligible             84
## 4                      Clean Alternative Fuel Vehicle Eligible            204
## 5 Eligibility unknown as battery range has not been researched              0
## 6                      Clean Alternative Fuel Vehicle Eligible             93
##   Base.MSRP Legislative.District DOL.Vehicle.ID            Vehicle.Location
## 1         0                   22      263239938  POINT (-122.90787 46.9461)
## 2         0                   15      318160860 POINT (-120.42083 46.44779)
## 3         0                   46      184963586 POINT (-122.30253 47.72656)
## 4         0                   11      259426821 POINT (-122.17743 47.41185)
## 5         0                    1      208182236  POINT (-122.18384 47.8031)
## 6     31950                   21      209171889 POINT (-122.27734 47.83785)
##                                Electric.Utility X2020.Census.Tract
## 1                        PUGET SOUND ENERGY INC        53067010910
## 2                                    PACIFICORP        53077940008
## 3  CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA)        53033000700
## 4 PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA)        53033029306
## 5                        PUGET SOUND ENERGY INC        53061051922
## 6                        PUGET SOUND ENERGY INC        53061051928

Cleaning the Data Set

Cleaning the data set, as the names of each column contain periods and a capital letter. By using the gsub function, it replaced all column names containing periods with underscores. By using the tolower function, it set all character values in each column name to be lower cased.

names(electricVehiclePop) <- gsub("\\.", "_", names(electricVehiclePop))
names(electricVehiclePop) <- tolower(names(electricVehiclePop))

head(electricVehiclePop)
##   vin__1_10_    county     city state postal_code model_year   make   model
## 1 WA1E2AFY8R  Thurston  Olympia    WA       98512       2024   AUDI    Q5 E
## 2 WAUUPBFF4J    Yakima   Wapato    WA       98951       2018   AUDI      A3
## 3 1N4AZ0CP0F      King  Seattle    WA       98125       2015 NISSAN    LEAF
## 4 WA1VAAGE5K      King     Kent    WA       98031       2019   AUDI  E-TRON
## 5 7SAXCAE57N Snohomish  Bothell    WA       98021       2022  TESLA MODEL X
## 6 KNDJP3AEXG Snohomish Lynnwood    WA       98037       2016    KIA    SOUL
##                    electric_vehicle_type
## 1 Plug-in Hybrid Electric Vehicle (PHEV)
## 2 Plug-in Hybrid Electric Vehicle (PHEV)
## 3         Battery Electric Vehicle (BEV)
## 4         Battery Electric Vehicle (BEV)
## 5         Battery Electric Vehicle (BEV)
## 6         Battery Electric Vehicle (BEV)
##              clean_alternative_fuel_vehicle__cafv__eligibility electric_range
## 1                        Not eligible due to low battery range             23
## 2                        Not eligible due to low battery range             16
## 3                      Clean Alternative Fuel Vehicle Eligible             84
## 4                      Clean Alternative Fuel Vehicle Eligible            204
## 5 Eligibility unknown as battery range has not been researched              0
## 6                      Clean Alternative Fuel Vehicle Eligible             93
##   base_msrp legislative_district dol_vehicle_id            vehicle_location
## 1         0                   22      263239938  POINT (-122.90787 46.9461)
## 2         0                   15      318160860 POINT (-120.42083 46.44779)
## 3         0                   46      184963586 POINT (-122.30253 47.72656)
## 4         0                   11      259426821 POINT (-122.17743 47.41185)
## 5         0                    1      208182236  POINT (-122.18384 47.8031)
## 6     31950                   21      209171889 POINT (-122.27734 47.83785)
##                                electric_utility x2020_census_tract
## 1                        PUGET SOUND ENERGY INC        53067010910
## 2                                    PACIFICORP        53077940008
## 3  CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA)        53033000700
## 4 PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA)        53033029306
## 5                        PUGET SOUND ENERGY INC        53061051922
## 6                        PUGET SOUND ENERGY INC        53061051928

Filtering Data Set Suitable To Key Variables

Creating a new data set imputed_electric_vehicle_pop by utilizing data wrangling within the electricVehiclePop data set. I utilized the select() function to select all rows and columns containing information under the make, model, and model_year cleaned columns names. Due to the order which I have inputed in the select() function, the new data set will organize the columns by make, mode, then model_year. I then used the head() function which gathered the first six entries of the dataset, and used the summary() function to understand the ranging years of electric car models.

imputed_electric_vehicle_pop <- electricVehiclePop |>
  # mutate(model_year = model.year) |>
  select(c(make, model, model_year))
  # group_by(make, model) |>
  # ungroup()
  # arrange(make)

head(imputed_electric_vehicle_pop)
##     make   model model_year
## 1   AUDI    Q5 E       2024
## 2   AUDI      A3       2018
## 3 NISSAN    LEAF       2015
## 4   AUDI  E-TRON       2019
## 5  TESLA MODEL X       2022
## 6    KIA    SOUL       2016
summary(imputed_electric_vehicle_pop)
##      make              model             model_year  
##  Length:264628      Length:264628      Min.   :1999  
##  Class :character   Class :character   1st Qu.:2021  
##  Mode  :character   Mode  :character   Median :2023  
##                                        Mean   :2022  
##                                        3rd Qu.:2024  
##                                        Max.   :2026

Tesla Cars Amount

Creating a new data set teslaPopDataSet by utilizing data wrangling within the imputed_electric_vehicle_pop data set. I utilized the filter() function to create a sub data set containing only electric cars under the Tesla brand from the population of all electric cars by setting make to equal the string value of “TESLA”. I then used the head() function which gathered the first six entries of the data set, and used the summary() function to understand the ranging years of all Tesla cars.

teslaPopDataSet <- imputed_electric_vehicle_pop |>
  filter(make == "TESLA")
  # mutate(model_year = model.year) |>
  # select(c(make, model, model_year))
  
head(teslaPopDataSet)
##    make   model model_year
## 1 TESLA MODEL X       2022
## 2 TESLA MODEL Y       2023
## 3 TESLA MODEL S       2015
## 4 TESLA MODEL 3       2018
## 5 TESLA MODEL S       2016
## 6 TESLA MODEL S       2018
summary(teslaPopDataSet)
##      make              model             model_year  
##  Length:108633      Length:108633      Min.   :2008  
##  Class :character   Class :character   1st Qu.:2021  
##  Mode  :character   Mode  :character   Median :2023  
##                                        Mean   :2022  
##                                        3rd Qu.:2024  
##                                        Max.   :2026

Non-Tesla Cars Amount

Creating a new data set nonTeslaPopDataSet by utilizing data wrangling within the imputed_electric_vehicle_pop data set. I utilized the filter() function to create a sub data set containing non-Tesla electric cars from the population of all electric cars by setting make to not equal the string value of “TESLA”. I then used the head() function which gathered the first six entries of the data set, and used the summary() function to understand the ranging years of all other electric cars not including Teslas.

nonTeslaPopDataSet <- imputed_electric_vehicle_pop |>
  filter(make != "TESLA")

head(nonTeslaPopDataSet)
##     make  model model_year
## 1   AUDI   Q5 E       2024
## 2   AUDI     A3       2018
## 3 NISSAN   LEAF       2015
## 4   AUDI E-TRON       2019
## 5    KIA   SOUL       2016
## 6 NISSAN   LEAF       2019
summary(nonTeslaPopDataSet)
##      make              model             model_year  
##  Length:155995      Length:155995      Min.   :1999  
##  Class :character   Class :character   1st Qu.:2020  
##  Mode  :character   Mode  :character   Median :2023  
##                                        Mean   :2022  
##                                        3rd Qu.:2024  
##                                        Max.   :2026

Single Proportion Test

Due to comparing one group to another group in one entire population, I used the single proportion test to gather the proportion of of one group compared to the rest of the population. I first gathered the portion count of each group by using the sum function to gather a count of all data values possessing “TESLA” and storing it inside of testCount. Then, I used the prop.test() function to conduct a single proportion test to compare the value of teslaCount (placed in the parameter, x) to the rest of the population from imputed_electric_vehicle_pop (placed in the parameter, n). The parameter p contains the null hypothess, while the alternative parameter is set to “greater” which displays the alternative hypothesis (proportion of Teslas is greater than the proportion of all other electric cars). The parameter, conf.level, contains the value 0.95 to indicate the value of α, as 1 - 0.95 = 0.05.

teslaCount <- sum(imputed_electric_vehicle_pop$make == "TESLA")
nonteslaCount <- sum(imputed_electric_vehicle_pop$make != "TESLA")
allCarsPop <- nrow(imputed_electric_vehicle_pop)


# prop.test(x = c(teslaCount, nonteslaCount), n = c(264628, 264628), p = NULL, alternative = "greater", conf.level = 0.95, correct = TRUE)

# prop.test(x = c(teslaCount, nonteslaCount), n = c(teslaCount + nonteslaCount, teslaCount + nonteslaCount), p = NULL, alternative = "greater", conf.level = 0.95, correct = TRUE)

prop.test(x = teslaCount, n = allCarsPop, p = NULL, alternative = "greater", conf.level = 0.95, correct = TRUE)
## 
##  1-sample proportions test with continuity correction
## 
## data:  teslaCount out of allCarsPop, null probability 0.5
## X-squared = 8476.3, df = 1, p-value = 1
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
##  0.4089382 1.0000000
## sample estimates:
##         p 
## 0.4105121

Visual Representation: Bar Plot

Bar plot visualization using the barplot() function to summarize the proportion findings, displaying the total amount of Teslas and non-Tesla electric cars by using the table parameter to create each bar with boolean statements (if imputed_electric_vehicle_pop’s entire make column contains any data equal to “TESLA” and if imputed_electric_vehicle_pop’s entire make column contains any data doesn’t equal to “TESLA” with their respective bars). This would show the total count of each group’s proportions in an entire population (As reinforced from the proportion test in the previous chunk, non-Tesla cars make up a higher proportion than the group of Tesla cars).

barplot(main = "Proportions of Electric Cars", 
        xlab = "Electric Cars", 
        ylab = "Count",  
        table(imputed_electric_vehicle_pop$make == "TESLA", imputed_electric_vehicle_pop$make != "TESLA"), 
        names.arg = c("Tesla", "Non-tesla"))

Conclusion

Due to the p-value being 1.0 (greater than 0.05, α), we do not have sufficient evidence and the results are insignificant, therefore, failing to reject the null hypothesis. We can conclude that the proportion of Teslas in an population of electric cars is not greater than the proportion of all other electric cars, the alternative hypothesis is not accepted. As seen by the value estimate from the single proportion test, Tesla cars made up 41% of Washington’s entire population of all electric cars. With the given results, it could potentially lead to an increased trend of more car companies producing electric cars, providing human populations with many other options of electric cars (potentially more reliable than Teslas) while the trend of Tesla staggers or declines.

Works Cited

DATA.GOV. (2022, February 25). Electric Vehicle Population Data. Data.gov; data.wa.gov. https://catalog.data.gov/dataset/electric-vehicle-population-data