Question 1: What is the most popular fuel type in the mpg data
set?
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
table(mpg$fl)
##
## c d e p r
## 1 5 8 52 168
From the table above, regular gasoline (r) appears most frequently in the data set.
Question 2: Regarding the fuel type variable, the value “d”
represents diesel, “p” represents premium (petrol) and “r” represents
regular (petrol). Do you think there is an effect of fuel type on how
many miles a vehicle can run on average per gallon of fuel?
ggplot(data = mpg, mapping = aes(x = fl, y = hwy)) +
stat_boxplot(geom = "errorbar", width = 0.5) +
geom_boxplot()

Yes, fuel type appears to affect fuel efficiency. Vehicles using diesel (d) generally have higher highway miles per gallon than those using regular (r) or premium (p) gasoline.
Question 3: Do you think there is a difference in fuel economy for
vehicles made in 1999 and 2008? (When plotting with “year” variable, use
as.factor(year) to convert it to categorical variables. This will be
explained in future classes.)
ggplot(data = mpg, mapping = aes(x = as.factor(year), y = hwy)) +
stat_boxplot(geom = "errorbar", width = 0.5) +
geom_boxplot()

Based on the boxplot, there is no significant difference in fuel economy for vehicles made in 1999 and 2008 because their medians are nearly identical and their data distributions overlap substantially.
Question 4: What happens if you make a scatter plot of class vs drv?
Do you think this plot is useful or not?
ggplot(data = mpg, mapping = aes(x = class, y = drv)) +
geom_point()

The scatter plot of class versus drv is not very useful. Both variables are categorical, so the points overlap heavily and the plot does not show any clear or meaningful relationship.