Question 2: Regarding the fuel type variable, the value “d” represents diesel, “p” represents premium (petrol) and “r” represents regular (petrol). Do you think there is an effect of fuel type on how many miles a vehicle can run on average per gallon of fuel?

ggplot(data = mpg, mapping = aes(x = fl, y = hwy)) + 
  stat_boxplot(geom = "errorbar", width = 0.5) +
  geom_boxplot() 

Yes, fuel type appears to affect fuel efficiency. Vehicles using diesel (d) generally have higher highway miles per gallon than those using regular (r) or premium (p) gasoline.

Question 3: Do you think there is a difference in fuel economy for vehicles made in 1999 and 2008? (When plotting with “year” variable, use as.factor(year) to convert it to categorical variables. This will be explained in future classes.)

ggplot(data = mpg, mapping = aes(x = as.factor(year), y = hwy)) +
  stat_boxplot(geom = "errorbar", width = 0.5) +
  geom_boxplot()

Based on the boxplot, there is no significant difference in fuel economy for vehicles made in 1999 and 2008 because their medians are nearly identical and their data distributions overlap substantially.

Question 4: What happens if you make a scatter plot of class vs drv? Do you think this plot is useful or not?

ggplot(data = mpg, mapping = aes(x = class, y = drv)) +
  geom_point()

The scatter plot of class versus drv is not very useful. Both variables are categorical, so the points overlap heavily and the plot does not show any clear or meaningful relationship.