What is the most popular fuel type in this data set?

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
mpg %>% 
  count(fl) %>% 
  arrange(desc(n))
## # A tibble: 5 × 2
##   fl        n
##   <chr> <int>
## 1 r       168
## 2 p        52
## 3 e         8
## 4 d         5
## 5 c         1


The most popular fuel type is the regular petrol, with 168 observations.


Regarding the fuel type variable, the value “d” represents diesel, “p” represents premium (petrol) and “r” represents regular (petrol). Do you think there is an effect of fuel type on how many miles a vehicle can run on average per gallon of fuel?

ggplot(mpg, aes(x = fl, y = hwy, fill = fl)) +
  geom_boxplot() +
  labs(title = "Fuel Economy by Fuel Type", x = "Fuel Type", y = "Highway MPG") 


Yes, there is a visible effect. Diesel (“d”) vehicles tend to have significantly higher average miles per gallon compared to “p” (premium) or “r” (regular).


Do you think there is a difference in fuel economy for vehicles made in 1999 and 2008? (When plotting with “year” variable, use as.factor(year) to convert it to categorical variables. This will be explained in future classes.)

ggplot(mpg, aes(x = as.factor(year), y = hwy, fill = as.factor(year))) +
  geom_boxplot() +
  labs(title = "Fuel Economy Comparison: 1999 vs 2008", x = "Year", y = "Highway MPG")


Not much, but 1999 tend to have more outliers than 2008 in terms of front wheel drive.


What happens if you make a scatter plot of class vs drv? Do you think this plot is useful or not?

ggplot(mpg, aes(x = class, y = drv)) +
  geom_point()


Not really useful.