Data set mpg - question list:

  1. What is the most popular fuel type in this data set?

  2. Regarding the fuel type variable, the value “d” represents diesel, “p” represents premium (petrol) and “r” represents regular (petrol). Do you think there is an effect of fuel type on fuel economy?

  3. Do you think there is a difference in fuel economy between vehicles made in 1999 and 2008? (When plotting with “year” variable, use “as.factor(year)” to convert it to categorical variables. This will be explained in future classes.)

  4. What happens if you make a scatter plot of “class” vs “drv”? Do you think this plot is useful or not?


Load library

library(tidyverse)

Question 2: Regarding the fuel type variable, the value “d” represents diesel, “p” represents premium (petrol) and “r” represents regular (petrol). Do you think there is an effect of fuel type on fuel economy?

ggplot(data = mpg) + 
  geom_boxplot(mapping = aes(x = fl, y = hwy)) + 
  xlab("Fuel Type") + ylab("Miles Per Gallon in Highway") + 
  ggtitle("Fuel Economy (Highway) vs Fuel Type") +
  theme(plot.title = element_text(hjust = 0.5))

ggplot(data = mpg) + 
  geom_boxplot(mapping = aes(x = fl, y = cty)) +
  xlab("Fuel Type") + ylab("Miles Per Gallon in City") + 
  ggtitle("Fuel Economy (City) vs Fuel Type") +
  theme(plot.title = element_text(hjust = 0.5))

Answer: The figures show that the fuel type does have an effect on fuel economy measured by hwy or cty, and diesel vehicles have the best fuel economy statistically.

Question 3: Do you think there is a difference in fuel economy between vehicles made in 1999 and 2008? (When plotting with “year” variable, use “as.factor(year)” to convert it to categorical variables. This will be explained in future classes.)

ggplot(data = mpg, mapping = aes(x = as.factor(year), y = cty)) + 
  stat_boxplot(geom = "errorbar", width = 0.5) +
  geom_boxplot() +
  labs(x = "Year", y = "Miles Per Gallon in City", title = "Fuel Economy (City) Between 1999 and 2008") +
  theme(plot.title = element_text(hjust = 0.5))

Answer: The figure above show that there is no significant difference in fuel economy between vehicles made in 1999 and 2008 based on our data set.

Question 4: What happens if you make a scatter plot of “class” vs “drv”? Do you think this plot is useful or not?

ggplot(data = mpg) +
  geom_point(mapping = aes(x = class, y = drv)) +
  labs(x = "vehicle class", y = "drive train type", title = "Vehicle Class vs Drive Train Type") + 
  theme(plot.title = element_text(hjust = 0.5))

Answer: This plot is still useful in the sense that it shows which combination of class and drv exists or does not exist. For example, All 2-seater cars are rear-wheel driven.