Project Summary

The team will analyze fuel consumption differences between electric (EV), gasoline and plug in hybrid electric (PHEV) vehicles. Specifically, we will address the question - What are the costs and features for Electric (EV) and Plugin Electric Hybrid (PHEV) vehicles versus gas vehicles that contribute to fuel efficiency? We’ll analyze cost differences between these car types and perform a multiple regression to identify car features which contribute the fuel efficient rating. Finally, we implement a shiny app that allows users to select a vehicle in each class to compare cost differences between the selected vehicles.

The primary dataset for vehicle fuel consumption will be sourced from https://www.fueleconomy.gov/feg/download.shtml . We used the the 2022 vehicle fuel data: https://www.fueleconomy.gov/feg/epadata/22data.zip

Project Team Members

  • Sanielle Worrell
  • Vladimir Nimchenko
  • Jose Rodriguez
  • Johnny Rodriguez


Downloading, Reading and Cleaning an Excel file

#Download and read the excel file
url <- "https://github.com/johnnydrodriguez/data607_finalproject/blob/main/2022_FE_Guide.xlsx?raw=true"
destfile <- here("myfile.xlsx")
download.file(url, destfile)


#create data frame for the gas vehicles
Gas_df = read_excel(here("myfile.xlsx"), sheet = "22")[,c('Model Year','Mfr Name','Division', 
        'Carline','Index (Model Type Index)','Eng Displ','# Cyl','Transmission',
        'City FE (Guide) - Conventional Fuel','Hwy FE (Guide) - Conventional Fuel','Comb FE (Guide) - Conventional Fuel',
        'Air Aspiration Method Desc','Trans','Trans Desc','# Gears','Drive Sys','Drive Desc',
        'Fuel Usage Desc - Conventional Fuel','Annual Fuel1 Cost - Conventional Fuel',
        'Fuel2 EPA Calculated Annual Fuel Cost - Alternative Fuel','Carline Class Desc',
        'EPA FE Label Dataset ID','MFR Calculated Gas Guzzler MPG',
        '$ You Save over 5 years (amount saved in fuel costs over 5 years - on label)',
        '$ You Spend over 5 years (increased amount spent in fuel costs over 5 years - on label)', 'FE Rating (1-10 rating on Label)',
        'Comb CO2 Rounded Adjusted (as shown on FE Label)')]

#replacing all column names in the Gas_df data frame which have white space with underscore for filtering purposes
names(Gas_df)<-str_replace_all(names(Gas_df), c(" " = "_" ))


#create data frame for plug-in electric vehicles
Phev_df <-  read_excel(here("myfile.xlsx"), sheet = '22 PHEVs',skip = 4)[,c('Model Yr  (gold fill means release date is after today\'s date)',
            'Mfr Name','Division', 'Carline','Index (Model Type Index)','Eng Displ','# Cyl','Transmission',
            'City FE (Guide) - Conventional Fuel','Hwy FE (Guide) - Conventional Fuel','Comb FE (Guide) - Conventional Fuel',
            'Air Aspiration Method Desc','Trans','Trans Desc','# Gears','Drive Sys','Drive Desc',
            'Fuel Usage Desc - Conventional Fuel','Annual Fuel1 Cost - Conventional Fuel','Fuel2 EPA Calculated Annual Fuel Cost - Alternative Fuel',
            'Carline Class Desc','EPA FE Label Dataset ID','MFR Calculated Gas Guzzler MPG',
            '$ You Save over 5 years (amount saved in fuel costs over 5 years - on label)','FE Rating (1-10 rating on Label)',
            '$ You Spend over 5 years (increased amount spent in fuel costs over 5 years - on label)','Comb CO2 Rounded Adjusted (as shown on FE Label)')]


#create data frame for electric vehicle
Ev_df = read_excel(here("myfile.xlsx") ,sheet = '22 EVs',skip = 8)[,c('Model Yr','Mfr Name','Division', 
        'Carline','Index (Model Type Index)','Eng Displ','# Cyl','Transmission','City FE (Guide) - Conventional Fuel',
        'Hwy FE (Guide) - Conventional Fuel','Comb FE (Guide) - Conventional Fuel','Air Aspiration Method Desc',
        'Trans','Trans Desc','# Gears','Drive Sys','Drive Desc','Fuel Usage Desc - Conventional Fuel',
        'Annual Fuel1 Cost - Conventional Fuel','Fuel2 EPA Calculated Annual Fuel Cost - Alternative Fuel',
        'Carline Class Desc','EPA FE Label Dataset ID','MFR Calculated Gas Guzzler MPG',
        '$ You Save over 5 years (amount saved in fuel costs over 5 years - on label)','FE Rating (1-10 rating on Label)',
        '$ You Spend over 5 years (increased amount spent in fuel costs over 5 years - on label)','Comb CO2 Rounded Adjusted (as shown on FE Label)','Fuel Unit - Conventional Fuel')]



# Adding a "Vehicle_Type" column to all three data frames to use to identify what sheet the info came from
Gas_df <-cbind(Vehicle_Type = 'Gas', Gas_df)
Phev_df <-cbind(Vehicle_Type = 'Phev', Phev_df)
Ev_df <-cbind(Vehicle_Type = 'Ev', Ev_df)

# In Phev, I only retrieve rows where model year is '2022'. Since, my column names have spaces, to filter I   would first  replace white space with underscore

#replacing all column names in the Phev_df data frame which have white space with underscore for filtering   purposes
names(Phev_df)<-str_replace_all(names(Phev_df), c(" " = "_" ))

# filtering the  Phev_df data frame to retrieve only 2022 model year
Phev_df <- Phev_df %>%
  filter(`Model_Yr__(gold_fill_means_release_date_is_after_today's_date)` == "2022")

#replacing all column names in the Phev_df data frame which have white space with underscore for filtering   purposes
names(Ev_df)<-str_replace_all(names(Ev_df), c(" " = "_" ))
# filtering the  EV_df data frame to retrieve only rows  with KW-HR R
Ev_df <- Ev_df %>%
  filter(`Fuel_Unit_-_Conventional_Fuel` == "KW-HR/100Miles")


Combining the data frames for analysis

#replacing underscore with white space to return column names to original state

names(Gas_df)<-str_replace_all(names(Gas_df), c("_" = " " ))

#Extracting the needed columns from the Gas_df data frame for analysis

Gas_df <- select(Gas_df, c('Vehicle Type','Mfr Name','Division', 'Carline','Index (Model Type Index)','Eng Displ','# Cyl','Transmission','City FE (Guide) - Conventional Fuel','Hwy FE (Guide) - Conventional Fuel','Comb FE (Guide) - Conventional Fuel','Air Aspiration Method Desc','Trans','Trans Desc','# Gears','Drive Sys','Drive Desc','Fuel Usage Desc - Conventional Fuel','Annual Fuel1 Cost - Conventional Fuel','Fuel2 EPA Calculated Annual Fuel Cost - Alternative Fuel','Carline Class Desc','EPA FE Label Dataset ID','MFR Calculated Gas Guzzler MPG','$ You Save over 5 years (amount saved in fuel costs over 5 years - on label)','$ You Spend over 5 years (increased amount spent in fuel costs over 5 years - on label)','Comb CO2 Rounded Adjusted (as shown on FE Label)','FE Rating (1-10 rating on Label)'))

#replacing underscore with white space to return column names to original state

names(Phev_df)<-str_replace_all(names(Phev_df), c("_" = " " ))

#Extracting the needed columns from the Phev_df data frame for analysis

Phev_df <- select(Phev_df, c('Vehicle Type','Mfr Name','Division', 'Carline','Index (Model Type Index)','Eng Displ','# Cyl','Transmission','City FE (Guide) - Conventional Fuel','Hwy FE (Guide) - Conventional Fuel','Comb FE (Guide) - Conventional Fuel','Air Aspiration Method Desc','Trans','Trans Desc','# Gears','Drive Sys','Drive Desc','Fuel Usage Desc - Conventional Fuel','Annual Fuel1 Cost - Conventional Fuel','Fuel2 EPA Calculated Annual Fuel Cost - Alternative Fuel','Carline Class Desc','EPA FE Label Dataset ID','MFR Calculated Gas Guzzler MPG','$ You Save over 5 years (amount saved in fuel costs over 5 years - on label)','$ You Spend over 5 years (increased amount spent in fuel costs over 5 years - on label)','Comb CO2 Rounded Adjusted (as shown on FE Label)','FE Rating (1-10 rating on Label)'))


#replacing underscore with white space to return column names to original state

names(Ev_df)<-str_replace_all(names(Ev_df), c("_" = " " ))

#Extracting the needed columns from the Ev_df data frame for analysis

Ev_df <- select(Ev_df, c('Vehicle Type','Mfr Name','Division', 'Carline','Index (Model Type Index)','Eng Displ','# Cyl','Transmission','City FE (Guide) - Conventional Fuel','Hwy FE (Guide) - Conventional Fuel','Comb FE (Guide) - Conventional Fuel','Air Aspiration Method Desc','Trans','Trans Desc','# Gears','Drive Sys','Drive Desc','Fuel Usage Desc - Conventional Fuel', 'Annual Fuel1 Cost - Conventional Fuel','Fuel2 EPA Calculated Annual Fuel Cost - Alternative Fuel','Carline Class Desc','EPA FE Label Dataset ID','MFR Calculated Gas Guzzler MPG','$ You Save over 5 years (amount saved in fuel costs over 5 years - on label)' ,'$ You Spend over 5 years (increased amount spent in fuel costs over 5 years - on label)','Comb CO2 Rounded Adjusted (as shown on FE Label)','FE Rating (1-10 rating on Label)'))

#combine all data frames into one for analysis
df_list <- list(Gas_df,Phev_df,Ev_df)
combined_df <- merge_recurse(df_list)

#Remove duplicates
combined_df<- combined_df %>% distinct()

#Rename columns
names(combined_df) %<>% stringr::str_replace_all("\\s","_") %>% tolower

glimpse(combined_df)
## Rows: 1,313
## Columns: 27
## $ vehicle_type                                                                              <chr> …
## $ mfr_name                                                                                  <chr> …
## $ division                                                                                  <chr> …
## $ carline                                                                                   <chr> …
## $ `index_(model_type_index)`                                                                <dbl> …
## $ eng_displ                                                                                 <chr> …
## $ `#_cyl`                                                                                   <dbl> …
## $ transmission                                                                              <chr> …
## $ `city_fe_(guide)_-_conventional_fuel`                                                     <dbl> …
## $ `hwy_fe_(guide)_-_conventional_fuel`                                                      <dbl> …
## $ `comb_fe_(guide)_-_conventional_fuel`                                                     <dbl> …
## $ air_aspiration_method_desc                                                                <chr> …
## $ trans                                                                                     <chr> …
## $ trans_desc                                                                                <chr> …
## $ `#_gears`                                                                                 <dbl> …
## $ drive_sys                                                                                 <chr> …
## $ drive_desc                                                                                <chr> …
## $ `fuel_usage_desc_-_conventional_fuel`                                                     <chr> …
## $ `annual_fuel1_cost_-_conventional_fuel`                                                   <dbl> …
## $ `fuel2_epa_calculated_annual_fuel_cost_-_alternative_fuel`                                <dbl> …
## $ carline_class_desc                                                                        <chr> …
## $ epa_fe_label_dataset_id                                                                   <dbl> …
## $ mfr_calculated_gas_guzzler_mpg                                                            <dbl> …
## $ `$_you_save_over_5_years_(amount_saved_in_fuel_costs_over_5_years_-_on_label)`            <dbl> …
## $ `$_you_spend_over_5_years_(increased_amount_spent_in_fuel_costs_over_5_years_-_on_label)` <dbl> …
## $ `comb_co2_rounded_adjusted_(as_shown_on_fe_label)`                                        <dbl> …
## $ `fe_rating_(1-10_rating_on_label)`                                                        <dbl> …


Analysis

We wanted to answer the below questions to determine overall are Electric Vehicles (EV) cost effective against combustion engines? More details on the guide that was used to perform the analysis and interpret the data can be found at the below link. https://www.fueleconomy.gov/feg/pdfs/guides/FEG2022.pdf

What is the difference in cost between the vehicle class type? (van, trucks for ev, gas, etc).

Based on the below EVs had an lower cost, based on the annual fuel cost, followed by PHEVs then gas vehicles. In the case of each vehicle type - fuel referred to different sources as noted below:

  • Gas Vehicles - Gasoline (Premium Unleaded Recommended,Premium Unleaded Required, Regular Unleaded Recommended )
  • PHEVs - Gasoline (Premium Unleaded Recommended,Premium Unleaded Required,Regular Unleaded Recommended)
  • EVs - Electricity
# Group by car types related to EVs and PHEVs 
combined_df %>% 
  group_by(vehicle_type, carline_class_desc) %>% 
  filter(carline_class_desc %in% c("Compact Cars", "Midsize Cars","Large Cars", "Subcompact Cars")) %>% 
  summarize(yearly_avg_cost = round(mean(`annual_fuel1_cost_-_conventional_fuel`))) %>% 
  ggplot(aes(x = reorder(vehicle_type,yearly_avg_cost), y = yearly_avg_cost, group = carline_class_desc, fill = carline_class_desc)) + 
  geom_label(aes(label = yearly_avg_cost),position = position_dodge(width = 1),vjust = -.2, size = 2,fill = "white") +
  geom_bar(stat = "identity", color = NA, position = position_dodge(width = 0.9))+
    theme_bw()+scale_fill_brewer(palette = "Dark2")+ 
  labs(title = "Yearly Average Cost by Car Type", x = "Car Type", y = "Yearly Average Cost") 

What is the difference in cost between transmission types ? (automative, cvt) -

Based on the below EV cars with an continuous variable transmissions had the lowest cost followed by PHEVs then Gas vehicles. Based on research, it was noted that “Continuously variable transmissions (CVTs) can change seamlessly through an infinite number of”gears.” Transmissions with more gears allow the engine to run at its most efficient speed more often, improving fuel economy.”

# Group by most common transmission types related to Gas, EVs and PHEVs 
combined_df %>% 
  group_by(vehicle_type, trans_desc) %>% 
  filter(trans_desc %in% c("Automatic", "Continuously Variable","Semi-Automatic", "Automated Manual")) %>% 
  summarize(m = round(mean(`annual_fuel1_cost_-_conventional_fuel`))) %>% 
  
  ggplot(aes(x = reorder(vehicle_type, m), y = m, group = trans_desc, fill = trans_desc, color = trans_desc)) +
  geom_line(stat = "identity", size = 1)+ 
  geom_label_repel(aes(label = m),nudge_y = 1, size = 2,fill = "black", color = "white") +
    theme_bw()+scale_colour_brewer(palette = "BrBG") + 
  labs(title = "Average cost by car transmission type", x = "Car Type", y = "Average Cost") 

Number of vehicles in each class?

Based on the below gas vehicle types significantly outnumbered the amount of Electric and PHEV vehicles which may skewed the analysis.

combined_df %>% 
group_by(vehicle_type, carline_class_desc) %>% 
summarize(count=n())%>% 
top_n(5)%>% 
ggplot(aes(x = vehicle_type, y= count, group = carline_class_desc, fill = carline_class_desc)) +
  geom_bar(stat = "identity", color = "black") +  
     scale_fill_ptol() +
  theme_minimal()+
  labs(title = "Number of cars in top 5 car description types", x = "Car Type", y = "Count") 

Comparison between fuel efficiency and the different vehicle types (highway, city, combined mileage)

Reviewing city mileage fuel efficiency by vehicle types and car manufacturers, EV show a higher fuel mileage even for luxury cars such as BMW compared to Gas Vehicles. For example, Toyota city fuel mileage was even lower than that of a BMW electric car. PHEV city mileage was surprisingly low and almost in line with that of Gas vehicles.

combined_df %>% 
group_by(vehicle_type, mfr_name, carline_class_desc) %>% 
filter(vehicle_type %in% c("Ev")) %>%
summarize(avg_city_ef = round(mean(`city_fe_(guide)_-_conventional_fuel`)))%>%
arrange(desc(mfr_name),.by_group=TRUE) %>%
  top_n(3, avg_city_ef)%>%
ggplot(aes(x= vehicle_type, y = avg_city_ef)) + 
  geom_point(position=position_jitter(h=0.1, w=0.1),
             shape = 21, alpha = 0.5, size = 3) + geom_label_repel(aes(label = avg_city_ef),nudge_y = 1, size = 2,fill = "black", color = "white") +
  facet_wrap(mfr_name~., )+ theme_bw()+
  labs(title = "Electric Vehicles Average City Mileage by Car Type", x = "Car Type", y = "Mileage")

combined_df %>% 
group_by(vehicle_type, mfr_name, carline_class_desc) %>% 
filter(vehicle_type %in% c("Gas")) %>%
summarize(avg_city_ef = round(mean(`city_fe_(guide)_-_conventional_fuel`)))%>%
arrange(desc(mfr_name),.by_group=TRUE) %>%
top_n(3,avg_city_ef)%>%
ggplot(aes(x= vehicle_type, y = avg_city_ef)) + 
geom_point(position=position_jitter(h=0.1, w=0.1),shape = 21, alpha = 0.5, size = 3) + geom_label_repel(aes(label = avg_city_ef),nudge_y = 1, size = 2,fill = "black", color = "white") +
  facet_wrap(mfr_name~., )+ theme_bw()+
  labs(title = "Gas Vehicles Average City Mileage by Car Type", x = "Car Type", y = "Mileage")

combined_df %>% 
group_by(vehicle_type, mfr_name, carline_class_desc) %>% 
filter(vehicle_type %in% c("Phev")) %>%
summarize(avg_city_ef = round(mean(`city_fe_(guide)_-_conventional_fuel`)))%>%
arrange(desc(mfr_name),.by_group=TRUE) %>%
top_n(3,avg_city_ef)%>%
ggplot(aes(x= vehicle_type, y = avg_city_ef)) + 
geom_point(position=position_jitter(h=0.1, w=0.1),shape = 21, alpha = 0.5, size = 3) + geom_label_repel(aes(label = avg_city_ef),nudge_y = 1, size = 2,fill = "black", color = "white") +
  facet_wrap(mfr_name~., )+ theme_bw()+
  labs(title = "PHEV Vehicles Average City Mileage by Car Type", x = "Car Type", y = "Mileage")


Idenitifying the variables that make for an efficient vehicle

All vehicles are given a fuel efficiency (FE) rating by the EPA.

Multiple Regression Approach

Selected 11 variables from the data set

  1. Checked for a linear relationships for each predicator variable against the response variable
  2. Checked for collinearity among the predictor variables
  3. Isolated the predictor variables that contribute to the Fuel Efficiency Rating response variable

Subset the data to create a model data set.

The predictor variables we select are:

  • engine displacement
  • number of cylinders
  • transmission type
  • combined mpg
  • city mpg
  • highway mpg
  • number of gears
  • drive train type
  • car size
  • C02 emissions weight
  • engine air aspiration type
model <- combined_df %>% 
  select(division, carline, eng_displ, '#_cyl', trans, `comb_fe_(guide)_-_conventional_fuel`, `#_gears`, drive_desc, `comb_co2_rounded_adjusted_(as_shown_on_fe_label)`,carline_class_desc,`fe_rating_(1-10_rating_on_label)`, air_aspiration_method_desc, `city_fe_(guide)_-_conventional_fuel` , `hwy_fe_(guide)_-_conventional_fuel`) %>% 
  dplyr::rename(engine_displacement = eng_displ,
                no_cylinders = '#_cyl',
                transmission = trans,
                combined_mpg = 'comb_fe_(guide)_-_conventional_fuel',
                no_gears = `#_gears`,
                drive_train = drive_desc,
                C02_emmission_grams = `comb_co2_rounded_adjusted_(as_shown_on_fe_label)`,
                car_size = carline_class_desc,
                fe_rating = `fe_rating_(1-10_rating_on_label)`,
                air_aspiration = air_aspiration_method_desc,
                city_mpg = `city_fe_(guide)_-_conventional_fuel`, 
                highway_mpg = `hwy_fe_(guide)_-_conventional_fuel` ) %>% 
  unite(vehicle_name, division:carline,  sep = " ", remove = TRUE, na.rm = FALSE)%>% 
  mutate_at('no_cylinders', ~replace_na(.,0)) %>% 
  filter(no_cylinders != 0)

model$engine_displacement <- as.numeric(as.character(model$engine_displacement))

Distribution of the Fuel Efficiency Ratings

  • An EPA FE Rating of 10 is the best efficiency rated granted a vehicle
  • Excludes all EV’s as they are all rated 10, lack similar characteristics and skew the model
  • FE ratings are nearly normally distributed for Gas and PHEV’s and average around 5 (middle rating)
#Plot Fuel Efficiency Rating Distribution
ggplot(model, aes(x=fe_rating)) + 
  geom_histogram(binwidth = 1) +
  ggtitle("Distribution of Fuel Efficiency Rating for Gas & PHEV")+
  theme_minimal()

Check linear relationships between predictor and response variables

  • Mutiple regression requires that there be a linear relationship between the various predictor variables and the response variable (FE Rating).
  • Here I check each of the variables for the linear relationship
  • Ideally we would drop any variables that do not show this linear relationship

In this case, we would consider dropping the transmission type variable.

library(gridExtra)

a <- ggplot(model, aes(x=fe_rating, y=engine_displacement)) + 
  geom_point()+ 
  geom_jitter()+
  geom_smooth(method=lm, se=FALSE)+
  ggtitle ("Engine Displacement")

b <- ggplot(model, aes(x=fe_rating, y=no_cylinders)) + 
  geom_point()+ 
  geom_jitter()+
  geom_smooth(method=lm, se=FALSE)+
  ggtitle ("No. of Cylinders")

c<- ggplot(model, aes(x=fe_rating, y=transmission)) + 
  geom_point()+ 
  geom_jitter()+
  geom_smooth(method=lm, se=FALSE)+
  ggtitle ("Transmission")

d<- ggplot(model, aes(x=fe_rating, y=combined_mpg)) + 
  geom_point()+ 
  geom_jitter()+
  geom_smooth(method=lm, se=FALSE)+
  ggtitle ("Combined MPG")

e<- ggplot(model, aes(x=fe_rating, y=city_mpg)) + 
  geom_point()+ 
  geom_jitter()+
  geom_smooth(method=lm, se=FALSE)+
  ggtitle ("City MPG")

f<- ggplot(model, aes(x=fe_rating, y=highway_mpg)) + 
  geom_point()+ 
  geom_jitter()+
  geom_smooth(method=lm, se=FALSE)+
  ggtitle ("Highway MPG")

g<- ggplot(model, aes(x=fe_rating, y=no_gears)) + 
  geom_point()+ 
  geom_jitter()+
  geom_smooth(method=lm, se=FALSE)+
  ggtitle ("No. of Gears")

h<- ggplot(model, aes(x=fe_rating, y=C02_emmission_grams)) + 
  geom_point()+ 
  geom_jitter()+
  geom_smooth(method=lm, se=FALSE)+
  ggtitle ("C02 Emissions")



grid.arrange(a,b,c,d,e,f,g,h,   ncol = 3)

Check for collinearity using Variable Inflation Factor (VIF)

  • VIF is a calculation that allows us to check for collinearity among the predictor variables.
  • VIF values greater than 5 indicate collinearity among variables
  • Ideally we drop the redundant variables

In this case, combined, city and highway are collinear variables. We drop 2 redundant variables and keep only one.

#install.packages("caTools", 
library(caTools)

#install.packages('car')
library(car)

m_full <- lm(fe_rating ~ engine_displacement + no_cylinders + transmission + combined_mpg+ no_gears + drive_train
             + C02_emmission_grams + car_size + air_aspiration + city_mpg + highway_mpg,  data = model)


#create vector of VIF values
vif_values  <- vif(m_full)[,3]


#create horizontal bar chart to display each VIF value
barplot(vif_values, main = "VIF Values",las=2, horiz = FALSE, col = "steelblue")

#add vertical line at 5
abline(h = 5, lwd = 3, lty = 2)

Check coefficients and P values

  • We look at the p values to determine what variables are statistically significant – p < .05
  • We look at all of the variables at once
  • Ideally, we only keep variables where p < .05
#Calc coeff and p values for all variables without the redundant variables
m_full <- lm(fe_rating ~ engine_displacement + no_cylinders + transmission + combined_mpg+ no_gears + drive_train
             + C02_emmission_grams + car_size + air_aspiration,  data = model)

#summary(m_full)

myco <- summary(m_full)
myco$coefficients[,c(1, 4)]
##                                                   Estimate      Pr(>|t|)
## (Intercept)                                   4.0620502084  7.868593e-86
## engine_displacement                          -0.0180451437  5.553525e-01
## no_cylinders                                 -0.0368558622  4.017385e-02
## transmissionAM                               -0.1498907156  5.691394e-02
## transmissionAMS                              -0.0096919276  8.426122e-01
## transmissionCVT                               0.0580550286  5.607903e-01
## transmissionM                                 0.0189950764  7.475603e-01
## transmissionSA                               -0.0232072919  4.639333e-01
## transmissionSCV                               0.0348591631  5.234184e-01
## combined_mpg                                  0.1308304891 1.139187e-191
## no_gears                                     -0.0001595765  9.879834e-01
## drive_train2-Wheel Drive, Rear               -0.0039616896  9.343136e-01
## drive_train4-Wheel Drive                     -0.0128353422  8.295335e-01
## drive_trainAll Wheel Drive                   -0.0611233462  1.859050e-01
## drive_trainPart-time 4-Wheel Drive           -0.1480398396  4.299070e-02
## C02_emmission_grams                          -0.0052151474 7.783861e-181
## car_sizeLarge Cars                           -0.0462641695  4.282426e-01
## car_sizeMidsize Cars                         -0.0866703547  7.766356e-02
## car_sizeMidsize Station Wagons                0.0531470053  6.842121e-01
## car_sizeMinicompact Cars                     -0.0280731508  6.954433e-01
## car_sizeSmall Pick-up Trucks 2WD              0.0224445000  8.596769e-01
## car_sizeSmall Pick-up Trucks 4WD             -0.0129439292  9.136116e-01
## car_sizeSmall Station Wagons                 -0.0229782405  7.931103e-01
## car_sizeSmall SUV 2WD                        -0.0194471551  7.577500e-01
## car_sizeSmall SUV 4WD                        -0.0062186274  9.054634e-01
## car_sizeSpecial Purpose Vehicle 2WD          -0.0467491420  6.443761e-01
## car_sizeSpecial Purpose Vehicle 4WD          -0.3665428671  1.636027e-01
## car_sizeSpecial Purpose Vehicle cab chassis  -0.3040989507  1.873347e-02
## car_sizeSpecial Purpose Vehicle, minivan 2WD  0.4503981567  4.460379e-03
## car_sizeSpecial Purpose Vehicle, minivan 4WD -0.0425567492  8.718541e-01
## car_sizeStandard Pick-up Trucks 2WD          -0.1127363447  1.491567e-01
## car_sizeStandard Pick-up Trucks 4WD          -0.0945219917  1.918647e-01
## car_sizeStandard SUV 2WD                     -0.0085049900  9.058210e-01
## car_sizeStandard SUV 4WD                     -0.0550889684  3.336882e-01
## car_sizeSubcompact Cars                       0.0518452314  3.420687e-01
## car_sizeTwo Seaters                          -0.1111808690  1.104750e-01
## car_sizeVans, Passenger Type                 -0.0081758668  9.754003e-01
## air_aspirationOther                           0.0606086901  6.641090e-01
## air_aspirationSupercharged                   -0.0538369026  4.902906e-01
## air_aspirationTurbocharged                    0.0427706085  2.221638e-01
## air_aspirationTurbocharged+Supercharged       0.0440866343  6.084340e-01

Profile of a Fuel Efficient Vehicle (Non EV)

To recap:

  1. Eliminated predictor variables that did not indicate a linear relationship with the response variable
  2. Eliminated collinearity issues by dropping redundant variables
  3. Kept variables with statistically significant p values (p < .05)

The profile of a gas or plug-in hybrid vehicle that predicts the efficiency rating:

  • decreasing number of engine cylinders
  • increasing combined mpg
  • does not have a part-time 4 wheel drive train
  • decreasing weight of C02 emissions
  • is a smaller non truck vehicle (as indicated by cab chassis variable)
  • is 2 wheel drive vehicle

Equation of the efficiency rating multiple regression prediction

y = 4.0620502 + -0.0368559(no_cylinders) + 0.1308305(combined_mpg) + -0.1480398(drive_trainPart-time 4-Wheel Drive) + -0.0052151(C02_emmission_grams) + -0.3040990(car_sizeSpecial Purpose Vehicle cab chassis) + 0.4503982(car_sizeSpecial Purpose Vehicle, minivan 2WD)

Shiny App - Compare Fuel Costs Over Time For Each Car Type

We can use the shiny app to continue comparing efficiency and cost between vehicles.

https://sj8vjw-jose-rodriguez.shinyapps.io/2022VehFuelConsumptionCost/

Conclusion