Q1: What is the most popular fuel type in ‘mpg’ data
set?
ggplot(mpg) +
geom_bar(aes(x = fl, color = fl, fill = fl))
Answer: According to the plot, the most popular fuel type is regular petrol.
Q2: Regarding the fuel type variable, the value “d”
represents diesel, “p” represents premium (petrol) and “r” represents
regular (petrol). Do you think there is an effect of fuel type on how
many miles a vehicle can run on average per gallon of
fuel?
ggplot(data = mpg) +
geom_boxplot(mapping = aes(x = fl, y = hwy)) +
xlab("Fuel Type") + ylab("Miles per Gallon in Highway") +
ggtitle("Fuel Economy (Highway) vs Fuel Type") +
theme(plot.title = element_text(hjust = 0.5))
ggplot(data = mpg) +
geom_boxplot(mapping = aes(x = fl, y = cty)) +
xlab("Fuel Type") + ylab("Miles per Gallon in City") +
ggtitle("Fuel economy (City) vs Fuel Type") +
theme(plot.title = element_text(hjust = 0.5))
Answer: According to the data, diesel is more fuel efficient than bot premium and regular petrol. Generally, using a higher octane fuel than recommended by your car manufacturer will not significantly improve your gas mileage; the primary factor is using the correct fuel type for your engine, as using a lower octane than needed can lead to decreased efficiency and potential engine damage.
Q3: Do you think there is a difference in fuel economy
for vehicles made in 1999 and 2008? (When plotting with “year” variable,
use as.factor(year) to convert it to categorical variables. This will be
e explained in future classes).
ggplot(data = mpg) +
geom_point(mapping = aes(x = hwy, y = cty, color = as.factor(year)))
ggplot(data = mpg, mapping = aes(x = as.factor(year), y = cty)) +
stat_boxplot(geom = "errorbar", width = 0.5) +
geom_boxplot() +
labs(x = "Year", y = "Miles per Gallon in City",
title = "Fuel Economy (City) Between 1999 and 2008") +
theme(plot.title = element_text(hjust = 0.5))
Answer: The data indicates that a few 1999 samples
(top-right corner) are more fuel-efficient than some 2008 samples
(bottom-left corner). Meanwhile, the majority of the samples (in the
middle section) do not show a clear difference. (I included the scatter
plot to see if I can use scatter plot for this question…)
Q4: What happens if you make a scatter plot of class vs
drv? Do you think this plot is useful or not?
ggplot(data = mpg) +
geom_point(mapping = aes(x = class, y = drv)) +
labs(x = "Vehicle Class", y = "Drive Train Type",
title = "Vehicle Class vs Drive Train Type ") +
theme(plot.title = element_text(hjust = 0.5))
Answer: The plot obscures many details due to
overlapping samples stacking into a single dot, making it less useful.
Nonetheless, this plot remains useful as it highlights which
combinations of class and drive type exist or are absent. For instance,
all pickups are 4-wheel drive.