In this assignment, we will be assessing the significance of MPG stats of popular vehicles from multiple brands who had a new year release from 1999-2008. The purpose of this evaluation is to determine which model year provided the best mpg whether its City, Highway and Average. The ultimate question would be what did manufacturers do to iterate on their car models and did this impact the overall mpg performance conducted by the EPA ratings for city and highway driving.
MPG <-read.csv("C:/Users/JG/Desktop/mpg.csv")
MPG
summary(MPG)
> summary(MPG)
X manufacturer model displ year
Min. : 1.00 Length:234 Length:234 Min. :1.600 Min. :1999
1st Qu.: 59.25 Class :character Class :character 1st Qu.:2.400 1st Qu.:1999
Median :117.50 Mode :character Mode :character Median :3.300 Median :2004
Mean :117.50 Mean :3.472 Mean :2004
3rd Qu.:175.75 3rd Qu.:4.600 3rd Qu.:2008
Max. :234.00 Max. :7.000 Max. :2008
cyl trans drv cty hwy
Min. :4.000 Length:234 Length:234 Min. : 9.00 Min. :12.00
1st Qu.:4.000 Class :character Class :character 1st Qu.:14.00 1st Qu.:18.00
Median :6.000 Mode :character Mode :character Median :17.00 Median :24.00
Mean :5.889 Mean :16.86 Mean :23.44
3rd Qu.:8.000 3rd Qu.:19.00 3rd Qu.:27.00
Max. :8.000 Max. :35.00 Max. :44.00
fl class
Length:234 Length:234
Class :character Class :character
Mode :character Mode :character
#Create new data frame called New_Df
New_Df <- subset(MPG, select = c(manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, fl, class))
New_Df
colnames(New_Df) <- c('Make', 'Model', 'Engine_Displacement_(Liters)', 'Year' , 'No_of_Cylinders', 'Transmission_Type', 'Drive_Type', 'City_MPG', 'Highway_MPG', 'Fuel_Type', 'Vehicle_Class')
New_Df
#Rename elements in Drive_Type Column
New_Df$Drive_Type <- with(New_Df, replace(Drive_Type, Drive_Type == 'f', 'FWD'))
New_Df
New_Df$Drive_Type <- with(New_Df, replace(Drive_Type, Drive_Type == 'r', 'RWD'))
New_Df
New_Df$Drive_Type <- with(New_Df, replace(Drive_Type, Drive_Type == '4', '4WD'))
New_Df
#Rename elements in Fuel_Type Column
New_Df$Fuel_Type <- with(New_Df, replace(Fuel_Type, Fuel_Type == 'p', 'Premium'))
New_Df
New_Df$Fuel_Type <- with(New_Df, replace(Fuel_Type, Fuel_Type == 'r', 'Regular'))
New_Df
New_Df$Fuel_Type <- with(New_Df, replace(Fuel_Type, Fuel_Type == 'e', 'Ethanol'))
New_Df
New_Df$Fuel_Type <- with(New_Df, replace(Fuel_Type, Fuel_Type == 'd', 'Diesel'))
New_Df
#Calculate mean of CITY MPG of specific Vehicle make and model
mean_audi <- with(New_Df, mean(City_MPG[Make == 'audi' & Model == 'a4']))
mean_audi
mean_honda <- with(New_Df, mean(City_MPG[Make == 'honda' & Model == 'civic']))
mean_honda
mean_toyota <- with(New_Df, mean(City_MPG[Make == 'toyota' & Model == '4runner 4wd']))
mean_toyota
#Calculate mean of Highway MPG of specific Vehicle make and model
mean_audi <- with(New_Df, mean(Highway_MPG[Make == 'audi' & Model == 'a4']))
mean_audi
mean_honda <- with(New_Df, mean(Highway_MPG[Make == 'honda' & Model == 'civic']))
mean_honda
mean_toyota <- with(New_Df, mean(Highway_MPG[Make == 'toyota' & Model == '4runner 4wd']))
mean_toyota
#create a new column to get mean of CITY and HIGHWAY MPG
MPG_AVG = (New_Df$City_MPG + New_Df$Highway_MPG) / 2
MPG_AVG
MPG_AVG = AVG
cbind(New_Df, MPG_AVG)
#Add Column with $-Operator
New_Df$new_col <- MPG_AVG
New_Df
#Display specified columns including our new column for Average MPG
input <- New_Df[,c('City_MPG','Highway_MPG', 'new_col')]
print(input)
head(input, 20)
tail(input, 20)
Make Model Engine_Displacement_(Liters) Year No_of_Cylinders
1 audi a4 1.8 1999 4
2 audi a4 1.8 1999 4
3 audi a4 2.0 2008 4
4 audi a4 2.0 2008 4
5 audi a4 2.8 1999 6
6 audi a4 2.8 1999 6
7 audi a4 3.1 2008 6
8 audi a4 quattro 1.8 1999 4
9 audi a4 quattro 1.8 1999 4
10 audi a4 quattro 2.0 2008 4
11 audi a4 quattro 2.0 2008 4
12 audi a4 quattro 2.8 1999 6
13 audi a4 quattro 2.8 1999 6
14 audi a4 quattro 3.1 2008 6
15 audi a4 quattro 3.1 2008 6
16 audi a6 quattro 2.8 1999 6
17 audi a6 quattro 3.1 2008 6
18 audi a6 quattro 4.2 2008 8
19 chevrolet c1500 suburban 2wd 5.3 2008 8
20 chevrolet c1500 suburban 2wd 5.3 2008 8
Transmission_Type Drive_Type City_MPG Highway_MPG Fuel_Type Vehicle_Class new_col
1 auto(l5) FWD 18 29 Premium compact 23.5
2 manual(m5) FWD 21 29 Premium compact 25.0
3 manual(m6) FWD 20 31 Premium compact 25.5
4 auto(av) FWD 21 30 Premium compact 25.5
5 auto(l5) FWD 16 26 Premium compact 21.0
6 manual(m5) FWD 18 26 Premium compact 22.0
7 auto(av) FWD 18 27 Premium compact 22.5
8 manual(m5) 4WD 18 26 Premium compact 22.0
9 auto(l5) 4WD 16 25 Premium compact 20.5
10 manual(m6) 4WD 20 28 Premium compact 24.0
11 auto(s6) 4WD 19 27 Premium compact 23.0
12 auto(l5) 4WD 15 25 Premium compact 20.0
13 manual(m5) 4WD 17 25 Premium compact 21.0
14 auto(s6) 4WD 17 25 Premium compact 21.0
15 manual(m6) 4WD 15 25 Premium compact 20.0
16 auto(l5) 4WD 15 24 Premium midsize 19.5
17 auto(s6) 4WD 17 25 Premium midsize 21.0
18 auto(s6) 4WD 16 23 Premium midsize 19.5
19 auto(l4) RWD 14 20 Regular suv 17.0
20 auto(l4) RWD 11 15 Ethanol suv 13.0
#These attempts at scatterplots do show our data however, it is not well developed. Analyzing these scatterplots would lead to it being ambiguous since the data can have multiple interpretations.
plot(City_MPG ~ Highway_MPG, data = New_Df, main = "City versus Highway MPG")
plot(new_col, data = New_Df, main = 'MPG Average')
abline(h=mean(new_col))
City_vs _Highway
MPG_Average
#Here we will evaluate all models from each Make and evaluate the average mpg for each. Scatterplot of all models and their respective average mpg for each model from 1999-2008.
library(ggplot2)
ggplot(data = New_Df, aes(y = Model, x = new_col)) + geom_point()
Vehicle_Model_VS_Average_MPG
#MPG Average VS Drive Type
ggplot(data = New_Df, aes(y = Drive_Type, x = new_col)) + geom_boxplot(aes(color = Make)) + facet_wrap(~Make) + ggtitle("MPG Average vs Drive Type")
Drive_Type_VS_Average_MPG
#Shows data for Average MPG
hist(new_col)
hist(new_col,
main="Histogram for Average MPG",
xlab="Average MPG Range",
border="blue",
col="navy",
xlim=c(10,50),
las=1,
breaks=5)
Average_MPG_Histogram
Histogram_for_Average_MPG
slices <- c(6, 7,7, 5, 2, 7)
lbls <- c("4runner 4wd", "camry", "camry solara", "corolla", "land cruiser wagon 4wd
", "toyota tacoma 4wd")
pie(slices, labels = lbls, main="Pie Chart of Toyota Vehicles")
Pie_Chart_of_all_Toyota_Vehicles_in_Dataset
#The question for analysis has been stated at the top.
From our evaluations of our scatter plot for assessing vehicle models based on average mpg, we see how most manufactures retain their MPG values within a set range. Most vehicles maintained values between 10-20 mpg or 20-30 mpg. Upon further inspection, we see how some manufacturers make a larger leap in mpg within the range of 30-40 mpg. Such vehicle models include the VW New Beetle, Jetta, and the Toyota corolla.
Surprisingly, when evaluating the dataset, the VW New Beetle and Jetta model year that had a average mpg in the 30-40 range would be their 1999 model year. On the other hand, the average mpg for the Toyota Corolla is for the 2008 manuel drive model. What we can conclude from our analysis of the mpg data set is that multiple factors determine the best performance of certain vehicles. For instance, in our box plot graph, we can see that there is a higher popularity of the vehicles in the data set being front wheel drive. Car resource edmunds points out that, front wheel drive cars reduce weight, production costs, and ultimately imrpoves fuel economy compared to rear wheel drive and 4 wheel drive which favor better performance.
On top of that, vehicles are likely to improve performance with larger engines and the engine displacement and such changes will impact MPG numbers.
mpg<- read.csv('https://raw.githubusercontent.com/jakegeo/RBridge-Course-Work/main/mpg.csv')
mpg