Lab 2:

Install all the necessary packages:

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Q2: Regarding the fuel type variable, the value “d” represents diesel, “p” represents premium (petrol) and “r” represents regular (petrol). Do you think there is an effect of fuel type on how many miles a vehicle can run on average per gallon of fuel?

In my opinion there is an effect of fuel type on how many miles a vehicle can run because with 1 gallon, different fuel type can have their travel distance differently in the city or on high way:

High way mpg

ggplot(data = mpg, mapping = aes(x = fl, y = hwy)) +
stat_boxplot(geom = "errorbar", width = 0.5) +
geom_boxplot()

City mpg

ggplot(data = mpg, mapping = aes(x = fl, y = cty)) +
stat_boxplot(geom = "errorbar", width = 0.5) +
geom_boxplot()

Q3: Do you think there is a difference in fuel economy for vehicles made in 1999 and 2008? (When plotting with “year” variable, use as.factor(year) to convert it to categorical variables. This will be explained in future classes.)

There is differences in fuel economy for vehicles made in 1999 and 2008. The most obvious thing we can tell from the graph is that vehicle that was manufacted in 1999 doesn’t use c and e fuel type. For 2008 cars, they use p fuel type more than 1999 and they barely use d flue type.

ggplot(data = mpg) +
  geom_point(mapping = aes(x = fl, y = cty, color =factor(year)), position = "jitter")

Q4: What happens if you make a scatter plot of class vs drv? Do you think this plot is useful or not?

This plot is slightly hard to read at first and not that useful but it’s still readable

 ggplot(data = mpg) +
    geom_point(mapping = aes(x = drv, y = class))