Analysis:

In this assignment, we will be assessing the significance of MPG stats of popular vehicles from multiple brands who had a new year release from 1999-2008. The purpose of this evaluation is to determine which model year provided the best mpg whether its City, Highway and Average. The ultimate question would be what did manufacturers do to iterate on their car models and did this impact the overall mpg performance conducted by the EPA ratings for city and highway driving. 
 

Task 1: Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the dataset. Please include some conclusions in the R Markdown text.

MPG <-read.csv("C:/Users/JG/Desktop/mpg.csv")
MPG
summary(MPG)

Solution:

> summary(MPG)
       X          manufacturer          model               displ            year     
 Min.   :  1.00   Length:234         Length:234         Min.   :1.600   Min.   :1999  
 1st Qu.: 59.25   Class :character   Class :character   1st Qu.:2.400   1st Qu.:1999  
 Median :117.50   Mode  :character   Mode  :character   Median :3.300   Median :2004  
 Mean   :117.50                                         Mean   :3.472   Mean   :2004  
 3rd Qu.:175.75                                         3rd Qu.:4.600   3rd Qu.:2008  
 Max.   :234.00                                         Max.   :7.000   Max.   :2008  
      cyl           trans               drv                 cty             hwy       
 Min.   :4.000   Length:234         Length:234         Min.   : 9.00   Min.   :12.00  
 1st Qu.:4.000   Class :character   Class :character   1st Qu.:14.00   1st Qu.:18.00  
 Median :6.000   Mode  :character   Mode  :character   Median :17.00   Median :24.00  
 Mean   :5.889                                         Mean   :16.86   Mean   :23.44  
 3rd Qu.:8.000                                         3rd Qu.:19.00   3rd Qu.:27.00  
 Max.   :8.000                                         Max.   :35.00   Max.   :44.00  
      fl               class          
 Length:234         Length:234        
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      

Task 2: Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example –if it makes sense you could sum two columns together)

#Create new data frame called New_Df
New_Df <- subset(MPG, select = c(manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, fl, class))
New_Df
colnames(New_Df) <- c('Make', 'Model', 'Engine_Displacement_(Liters)', 'Year' , 'No_of_Cylinders', 'Transmission_Type', 'Drive_Type', 'City_MPG', 'Highway_MPG', 'Fuel_Type', 'Vehicle_Class')
New_Df
#Rename elements in Drive_Type Column
New_Df$Drive_Type <- with(New_Df, replace(Drive_Type, Drive_Type == 'f', 'FWD'))
New_Df
New_Df$Drive_Type <- with(New_Df, replace(Drive_Type, Drive_Type == 'r', 'RWD'))
New_Df
New_Df$Drive_Type <- with(New_Df, replace(Drive_Type, Drive_Type == '4', '4WD'))
New_Df

#Rename elements in Fuel_Type Column
New_Df$Fuel_Type <- with(New_Df, replace(Fuel_Type, Fuel_Type == 'p', 'Premium'))
New_Df
New_Df$Fuel_Type <- with(New_Df, replace(Fuel_Type, Fuel_Type == 'r', 'Regular'))
New_Df
New_Df$Fuel_Type <- with(New_Df, replace(Fuel_Type, Fuel_Type == 'e', 'Ethanol'))
New_Df
New_Df$Fuel_Type <- with(New_Df, replace(Fuel_Type, Fuel_Type == 'd', 'Diesel'))
New_Df
#Calculate mean of CITY MPG of specific Vehicle make and model
mean_audi <- with(New_Df, mean(City_MPG[Make == 'audi' & Model == 'a4']))
mean_audi
mean_honda <- with(New_Df, mean(City_MPG[Make == 'honda' & Model == 'civic']))
mean_honda
mean_toyota <- with(New_Df, mean(City_MPG[Make == 'toyota' & Model == '4runner 4wd']))
mean_toyota


#Calculate mean of Highway MPG of specific Vehicle make and model
mean_audi <- with(New_Df, mean(Highway_MPG[Make == 'audi' & Model == 'a4']))
mean_audi
mean_honda <- with(New_Df, mean(Highway_MPG[Make == 'honda' & Model == 'civic']))
mean_honda
mean_toyota <- with(New_Df, mean(Highway_MPG[Make == 'toyota' & Model == '4runner 4wd']))
mean_toyota
#create a new column to get mean of CITY and HIGHWAY MPG
MPG_AVG = (New_Df$City_MPG + New_Df$Highway_MPG) / 2
MPG_AVG

MPG_AVG = AVG
cbind(New_Df, MPG_AVG)

#Add Column with $-Operator
New_Df$new_col <- MPG_AVG
New_Df

#Display specified columns including our new column for Average MPG
input <- New_Df[,c('City_MPG','Highway_MPG', 'new_col')]
print(input)
head(input, 20)
tail(input, 20)
Make               Model Engine_Displacement_(Liters) Year No_of_Cylinders
1       audi                  a4                          1.8 1999               4
2       audi                  a4                          1.8 1999               4
3       audi                  a4                          2.0 2008               4
4       audi                  a4                          2.0 2008               4
5       audi                  a4                          2.8 1999               6
6       audi                  a4                          2.8 1999               6
7       audi                  a4                          3.1 2008               6
8       audi          a4 quattro                          1.8 1999               4
9       audi          a4 quattro                          1.8 1999               4
10      audi          a4 quattro                          2.0 2008               4
11      audi          a4 quattro                          2.0 2008               4
12      audi          a4 quattro                          2.8 1999               6
13      audi          a4 quattro                          2.8 1999               6
14      audi          a4 quattro                          3.1 2008               6
15      audi          a4 quattro                          3.1 2008               6
16      audi          a6 quattro                          2.8 1999               6
17      audi          a6 quattro                          3.1 2008               6
18      audi          a6 quattro                          4.2 2008               8
19 chevrolet  c1500 suburban 2wd                          5.3 2008               8
20 chevrolet  c1500 suburban 2wd                          5.3 2008               8

Transmission_Type Drive_Type City_MPG Highway_MPG Fuel_Type Vehicle_Class new_col
1           auto(l5)        FWD       18          29   Premium       compact    23.5
2         manual(m5)        FWD       21          29   Premium       compact    25.0
3         manual(m6)        FWD       20          31   Premium       compact    25.5
4           auto(av)        FWD       21          30   Premium       compact    25.5
5           auto(l5)        FWD       16          26   Premium       compact    21.0
6         manual(m5)        FWD       18          26   Premium       compact    22.0
7           auto(av)        FWD       18          27   Premium       compact    22.5
8         manual(m5)        4WD       18          26   Premium       compact    22.0
9           auto(l5)        4WD       16          25   Premium       compact    20.5
10        manual(m6)        4WD       20          28   Premium       compact    24.0
11          auto(s6)        4WD       19          27   Premium       compact    23.0
12          auto(l5)        4WD       15          25   Premium       compact    20.0
13        manual(m5)        4WD       17          25   Premium       compact    21.0
14          auto(s6)        4WD       17          25   Premium       compact    21.0
15        manual(m6)        4WD       15          25   Premium       compact    20.0
16          auto(l5)        4WD       15          24   Premium       midsize    19.5
17          auto(s6)        4WD       17          25   Premium       midsize    21.0
18          auto(s6)        4WD       16          23   Premium       midsize    19.5
19          auto(l4)        RWD       14          20   Regular           suv    17.0
20          auto(l4)        RWD       11          15   Ethanol           suv    13.0

Task 3: Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2.

Scatterplot

#These attempts at scatterplots do show our data however, it is not well developed. Analyzing these scatterplots would lead to it being ambiguous since the data can have multiple interpretations.

plot(City_MPG ~ Highway_MPG, data = New_Df, main = "City versus Highway MPG")

plot(new_col, data = New_Df, main = 'MPG Average')
abline(h=mean(new_col))

City_vs _Highway

MPG_Average

#Here we will evaluate all models from each Make and evaluate the average mpg for each. Scatterplot of all models and their respective average mpg for each model from 1999-2008.

library(ggplot2) 
ggplot(data = New_Df, aes(y = Model, x = new_col)) + geom_point()

Vehicle_Model_VS_Average_MPG

Boxplot

#MPG Average VS Drive Type
ggplot(data = New_Df, aes(y = Drive_Type, x = new_col)) + geom_boxplot(aes(color = Make)) + facet_wrap(~Make) + ggtitle("MPG Average vs Drive Type") 

Drive_Type_VS_Average_MPG

Histogram

#Shows data for Average MPG
hist(new_col)

hist(new_col, 
     main="Histogram for Average MPG", 
     xlab="Average MPG Range", 
     border="blue", 
     col="navy",
     xlim=c(10,50),
     las=1, 
     breaks=5)
     

Average_MPG_Histogram

Histogram_for_Average_MPG

Piechart

slices <- c(6, 7,7, 5, 2, 7)
lbls <- c("4runner 4wd", "camry", "camry solara", "corolla", "land cruiser wagon 4wd
", "toyota tacoma 4wd")
pie(slices, labels = lbls, main="Pie Chart of Toyota Vehicles")

Pie_Chart_of_all_Toyota_Vehicles_in_Dataset

Task 4: Meaningful question for analysis: Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraphin R markdown at the end.

#The question for analysis has been stated at the top. 

From our evaluations of our scatter plot for assessing vehicle models based on average mpg, we see how most manufactures retain their MPG values within a set range. Most vehicles maintained values between 10-20 mpg or 20-30 mpg. Upon further inspection, we see how some manufacturers make a larger leap in mpg within the range of 30-40 mpg. Such vehicle models include the VW New Beetle, Jetta, and the Toyota corolla. 

Surprisingly, when evaluating the dataset, the VW New Beetle and Jetta model year that had a average mpg in the 30-40 range would be their 1999 model year. On the other hand, the average mpg for the Toyota Corolla is for the 2008 manuel drive model. What we can conclude from our analysis of the mpg data set is that multiple factors determine the best performance of certain vehicles. For instance, in our box plot graph, we can see that there is a higher popularity of the vehicles in the data set being front wheel drive. Car resource edmunds points out that, front wheel drive cars reduce weight, production costs, and ultimately imrpoves fuel economy compared to rear wheel drive and 4 wheel drive which favor better performance.     

On top of that, vehicles are likely to improve performance with larger engines and the engine displacement and such changes will impact MPG numbers.  

Task 5: BONUS –place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

mpg<- read.csv('https://raw.githubusercontent.com/jakegeo/RBridge-Course-Work/main/mpg.csv')
mpg