library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(lubridate)
library(zoo)
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(forcats)
library(ggplot2)
electricVehiclePop <- read.csv("Electric_Vehicle_Population_Data.csv")
The data set Electric_Vehicle_Population_Data, is a set of information of the amount of electric cars that are driven and owned in the state of Washington, gathered by the government (from data.gov). The data set displays a count of all the kinds of electric cars, their brands, models, etc., that are present within Washington. Specific variables of the data set will be examined to gather proportion of Teslas in the entire population of electric cars in Washington, compared to the proportion of the rest of the electric cars combined in the total population. Thus leading to the question, is the proportion of Tesla cars in an entire population of electric cars in Washington greater than the proportion of the rest of electric cars?
Key variables: - Model.Year - Make - Model
Note: Variables will be modified to replace all uppercase letters and periods in the column names.
Hypothesis \(H_0\): p = 0.5 \(H_a\): p > 0.5
α : 0.05
head(electricVehiclePop)
## VIN..1.10. County City State Postal.Code Model.Year Make Model
## 1 WA1E2AFY8R Thurston Olympia WA 98512 2024 AUDI Q5 E
## 2 WAUUPBFF4J Yakima Wapato WA 98951 2018 AUDI A3
## 3 1N4AZ0CP0F King Seattle WA 98125 2015 NISSAN LEAF
## 4 WA1VAAGE5K King Kent WA 98031 2019 AUDI E-TRON
## 5 7SAXCAE57N Snohomish Bothell WA 98021 2022 TESLA MODEL X
## 6 KNDJP3AEXG Snohomish Lynnwood WA 98037 2016 KIA SOUL
## Electric.Vehicle.Type
## 1 Plug-in Hybrid Electric Vehicle (PHEV)
## 2 Plug-in Hybrid Electric Vehicle (PHEV)
## 3 Battery Electric Vehicle (BEV)
## 4 Battery Electric Vehicle (BEV)
## 5 Battery Electric Vehicle (BEV)
## 6 Battery Electric Vehicle (BEV)
## Clean.Alternative.Fuel.Vehicle..CAFV..Eligibility Electric.Range
## 1 Not eligible due to low battery range 23
## 2 Not eligible due to low battery range 16
## 3 Clean Alternative Fuel Vehicle Eligible 84
## 4 Clean Alternative Fuel Vehicle Eligible 204
## 5 Eligibility unknown as battery range has not been researched 0
## 6 Clean Alternative Fuel Vehicle Eligible 93
## Base.MSRP Legislative.District DOL.Vehicle.ID Vehicle.Location
## 1 0 22 263239938 POINT (-122.90787 46.9461)
## 2 0 15 318160860 POINT (-120.42083 46.44779)
## 3 0 46 184963586 POINT (-122.30253 47.72656)
## 4 0 11 259426821 POINT (-122.17743 47.41185)
## 5 0 1 208182236 POINT (-122.18384 47.8031)
## 6 31950 21 209171889 POINT (-122.27734 47.83785)
## Electric.Utility X2020.Census.Tract
## 1 PUGET SOUND ENERGY INC 53067010910
## 2 PACIFICORP 53077940008
## 3 CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA) 53033000700
## 4 PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 53033029306
## 5 PUGET SOUND ENERGY INC 53061051922
## 6 PUGET SOUND ENERGY INC 53061051928
Cleaning the data set, as the names of each column contain periods and a capital letter. By using the gsub function, it replaced all column names containing periods with underscores. By using the tolower function, it set all character values in each column name to be lower cased.
names(electricVehiclePop) <- gsub("\\.", "_", names(electricVehiclePop))
names(electricVehiclePop) <- tolower(names(electricVehiclePop))
head(electricVehiclePop)
## vin__1_10_ county city state postal_code model_year make model
## 1 WA1E2AFY8R Thurston Olympia WA 98512 2024 AUDI Q5 E
## 2 WAUUPBFF4J Yakima Wapato WA 98951 2018 AUDI A3
## 3 1N4AZ0CP0F King Seattle WA 98125 2015 NISSAN LEAF
## 4 WA1VAAGE5K King Kent WA 98031 2019 AUDI E-TRON
## 5 7SAXCAE57N Snohomish Bothell WA 98021 2022 TESLA MODEL X
## 6 KNDJP3AEXG Snohomish Lynnwood WA 98037 2016 KIA SOUL
## electric_vehicle_type
## 1 Plug-in Hybrid Electric Vehicle (PHEV)
## 2 Plug-in Hybrid Electric Vehicle (PHEV)
## 3 Battery Electric Vehicle (BEV)
## 4 Battery Electric Vehicle (BEV)
## 5 Battery Electric Vehicle (BEV)
## 6 Battery Electric Vehicle (BEV)
## clean_alternative_fuel_vehicle__cafv__eligibility electric_range
## 1 Not eligible due to low battery range 23
## 2 Not eligible due to low battery range 16
## 3 Clean Alternative Fuel Vehicle Eligible 84
## 4 Clean Alternative Fuel Vehicle Eligible 204
## 5 Eligibility unknown as battery range has not been researched 0
## 6 Clean Alternative Fuel Vehicle Eligible 93
## base_msrp legislative_district dol_vehicle_id vehicle_location
## 1 0 22 263239938 POINT (-122.90787 46.9461)
## 2 0 15 318160860 POINT (-120.42083 46.44779)
## 3 0 46 184963586 POINT (-122.30253 47.72656)
## 4 0 11 259426821 POINT (-122.17743 47.41185)
## 5 0 1 208182236 POINT (-122.18384 47.8031)
## 6 31950 21 209171889 POINT (-122.27734 47.83785)
## electric_utility x2020_census_tract
## 1 PUGET SOUND ENERGY INC 53067010910
## 2 PACIFICORP 53077940008
## 3 CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA) 53033000700
## 4 PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 53033029306
## 5 PUGET SOUND ENERGY INC 53061051922
## 6 PUGET SOUND ENERGY INC 53061051928
Creating a new data set imputed_electric_vehicle_pop by utilizing data wrangling within the electricVehiclePop data set. I utilized the select() function to select all rows and columns containing information under the make, model, and model_year cleaned columns names. Due to the order which I have inputed in the select() function, the new data set will organize the columns by make, mode, then model_year. I then used the head() function which gathered the first six entries of the dataset, and used the summary() function to understand the ranging years of electric car models.
imputed_electric_vehicle_pop <- electricVehiclePop |>
# mutate(model_year = model.year) |>
select(c(make, model, model_year))
# group_by(make, model) |>
# ungroup()
# arrange(make)
head(imputed_electric_vehicle_pop)
## make model model_year
## 1 AUDI Q5 E 2024
## 2 AUDI A3 2018
## 3 NISSAN LEAF 2015
## 4 AUDI E-TRON 2019
## 5 TESLA MODEL X 2022
## 6 KIA SOUL 2016
summary(imputed_electric_vehicle_pop)
## make model model_year
## Length:264628 Length:264628 Min. :1999
## Class :character Class :character 1st Qu.:2021
## Mode :character Mode :character Median :2023
## Mean :2022
## 3rd Qu.:2024
## Max. :2026
Creating a new data set teslaPopDataSet by utilizing data wrangling within the imputed_electric_vehicle_pop data set. I utilized the filter() function to create a sub data set containing only electric cars under the Tesla brand from the population of all electric cars by setting make to equal the string value of “TESLA”. I then used the head() function which gathered the first six entries of the data set, and used the summary() function to understand the ranging years of all Tesla cars.
teslaPopDataSet <- imputed_electric_vehicle_pop |>
filter(make == "TESLA")
# mutate(model_year = model.year) |>
# select(c(make, model, model_year))
head(teslaPopDataSet)
## make model model_year
## 1 TESLA MODEL X 2022
## 2 TESLA MODEL Y 2023
## 3 TESLA MODEL S 2015
## 4 TESLA MODEL 3 2018
## 5 TESLA MODEL S 2016
## 6 TESLA MODEL S 2018
summary(teslaPopDataSet)
## make model model_year
## Length:108633 Length:108633 Min. :2008
## Class :character Class :character 1st Qu.:2021
## Mode :character Mode :character Median :2023
## Mean :2022
## 3rd Qu.:2024
## Max. :2026
Creating a new data set nonTeslaPopDataSet by utilizing data wrangling within the imputed_electric_vehicle_pop data set. I utilized the filter() function to create a sub data set containing non-Tesla electric cars from the population of all electric cars by setting make to not equal the string value of “TESLA”. I then used the head() function which gathered the first six entries of the data set, and used the summary() function to understand the ranging years of all other electric cars not including Teslas.
nonTeslaPopDataSet <- imputed_electric_vehicle_pop |>
filter(make != "TESLA")
head(nonTeslaPopDataSet)
## make model model_year
## 1 AUDI Q5 E 2024
## 2 AUDI A3 2018
## 3 NISSAN LEAF 2015
## 4 AUDI E-TRON 2019
## 5 KIA SOUL 2016
## 6 NISSAN LEAF 2019
summary(nonTeslaPopDataSet)
## make model model_year
## Length:155995 Length:155995 Min. :1999
## Class :character Class :character 1st Qu.:2020
## Mode :character Mode :character Median :2023
## Mean :2022
## 3rd Qu.:2024
## Max. :2026
Due to comparing one group to another group in one entire population, I used the single proportion test to gather the proportion of of one group compared to the rest of the population. I first gathered the portion count of each group by using the sum function to gather a count of all data values possessing “TESLA” and storing it inside of testCount. Then, I used the prop.test() function to conduct a single proportion test to compare the value of teslaCount (placed in the parameter, x) to the rest of the population from imputed_electric_vehicle_pop (placed in the parameter, n). The parameter p contains the null hypothess, while the alternative parameter is set to “greater” which displays the alternative hypothesis (proportion of Teslas is greater than the proportion of all other electric cars). The parameter, conf.level, contains the value 0.95 to indicate the value of α, as 1 - 0.95 = 0.05.
teslaCount <- sum(imputed_electric_vehicle_pop$make == "TESLA")
nonteslaCount <- sum(imputed_electric_vehicle_pop$make != "TESLA")
allCarsPop <- nrow(imputed_electric_vehicle_pop)
# prop.test(x = c(teslaCount, nonteslaCount), n = c(264628, 264628), p = NULL, alternative = "greater", conf.level = 0.95, correct = TRUE)
# prop.test(x = c(teslaCount, nonteslaCount), n = c(teslaCount + nonteslaCount, teslaCount + nonteslaCount), p = NULL, alternative = "greater", conf.level = 0.95, correct = TRUE)
prop.test(x = teslaCount, n = allCarsPop, p = NULL, alternative = "greater", conf.level = 0.95, correct = TRUE)
##
## 1-sample proportions test with continuity correction
##
## data: teslaCount out of allCarsPop, null probability 0.5
## X-squared = 8476.3, df = 1, p-value = 1
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
## 0.4089382 1.0000000
## sample estimates:
## p
## 0.4105121
Bar plot visualization using the barplot() function to summarize the proportion findings, displaying the total amount of Teslas and non-Tesla electric cars by using the table parameter to create each bar with boolean statements (if imputed_electric_vehicle_pop’s entire make column contains any data equal to “TESLA” and if imputed_electric_vehicle_pop’s entire make column contains any data doesn’t equal to “TESLA” with their respective bars). This would show the total count of each group’s proportions in an entire population (As reinforced from the proportion test in the previous chunk, non-Tesla cars make up a higher proportion than the group of Tesla cars).
barplot(main = "Proportions of Electric Cars",
xlab = "Electric Cars",
ylab = "Count",
table(imputed_electric_vehicle_pop$make == "TESLA", imputed_electric_vehicle_pop$make != "TESLA"),
names.arg = c("Tesla", "Non-tesla"))
Due to the p-value being 1.0 (greater than 0.05, α), we do not have sufficient evidence and the results are insignificant, therefore, failing to reject the null hypothesis. We can conclude that the proportion of Teslas in an population of electric cars is not greater than the proportion of all other electric cars, the alternative hypothesis is not accepted. As seen by the value estimate from the single proportion test, Tesla cars made up 41% of Washington’s entire population of all electric cars. With the given results, it could potentially lead to an increased trend of more car companies producing electric cars, providing human populations with many other options of electric cars (potentially more reliable than Teslas) while the trend of Tesla staggers or declines.
DATA.GOV. (2022, February 25). Electric Vehicle Population Data. Data.gov; data.wa.gov. https://catalog.data.gov/dataset/electric-vehicle-population-data