What is the most popular fuel type in this data set?
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mpg %>%
count(fl) %>%
arrange(desc(n))
## # A tibble: 5 × 2
## fl n
## <chr> <int>
## 1 r 168
## 2 p 52
## 3 e 8
## 4 d 5
## 5 c 1
The most popular fuel type is the regular petrol, with 168
observations.
Regarding the fuel type variable, the value “d” represents
diesel, “p” represents premium (petrol) and “r” represents regular
(petrol). Do you think there is an effect of fuel type on how many miles
a vehicle can run on average per gallon of fuel?
ggplot(mpg, aes(x = fl, y = hwy, fill = fl)) +
geom_boxplot() +
labs(title = "Fuel Economy by Fuel Type", x = "Fuel Type", y = "Highway MPG")
Yes, there is a visible effect. Diesel (“d”) vehicles tend to have
significantly higher average miles per gallon compared to “p” (premium)
or “r” (regular).
Do you think there is a difference in fuel economy for vehicles
made in 1999 and 2008? (When plotting with “year” variable, use
as.factor(year) to convert it to categorical variables. This will be
explained in future classes.)
ggplot(mpg, aes(x = as.factor(year), y = hwy, fill = as.factor(year))) +
geom_boxplot() +
labs(title = "Fuel Economy Comparison: 1999 vs 2008", x = "Year", y = "Highway MPG")
Not much, but 1999 tend to have more outliers than 2008 in terms of
front wheel drive.
What happens if you make a scatter plot of class vs drv? Do you
think this plot is useful or not?
ggplot(mpg, aes(x = class, y = drv)) +
geom_point()
Not really useful.