Lab Homework

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(nycflights13)

1. In the mpg data set, which manufacturer produced the most fuel economic SUVs?

suv <- filter(mpg, class == "suv")
ggplot(suv, aes(x = manufacturer)) +
  geom_bar(fill = "skyblue") +
  labs(title = "Number of SUVs by Manufacturer",
       x = "Manufacturer",
       y = "Count") +
  theme(plot.title = element_text(hjust = 0.5))

Conclusion: In the mpg dataset, Chevrolet and Ford produced the most fuel economic SUVs.

2. In the mpg data set, which SUV manufacturer improved fuel economy most between 1999 and 2008?

suv <- filter(mpg, class == "suv", year == 1999 | year == 2008)
grouped <- group_by(suv, manufacturer, year)
result <- summarise(grouped, avg_hwy = mean(hwy))
## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by manufacturer and year.
## ℹ Output is grouped by manufacturer.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(manufacturer, year))` for per-operation grouping
##   (`?dplyr::dplyr_by`) instead.
ggplot(result, aes(x = manufacturer, y = avg_hwy, fill = factor(year))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "SUV Fuel Economy (1999 vs 2008)",
       x = "Manufacturer",
       y = "Average Highway MPG",
       fill = "Year") +
  theme(plot.title = element_text(hjust = 0.5))

Conclusion: The SUV manufacturer that improved fuel economy the most between 1999 and 2008 is Land Rover.

3. In the flights data set, pick up another variable other than carrier and analyze whether that variable correlates with long-delay flights or not.

long_delay <- filter(flights, arr_delay > 60)
ggplot(long_delay, aes(x = dep_time)) +
  geom_histogram(binwidth = 100, fill = "steelblue") +
  labs(title = "Distribution of Departure Time for Long Delays",
       x = "Departure Time",
       y = "Count") +
  theme(plot.title = element_text(hjust = 0.5))

Conclusion: The distribution shows that long-delay flights occur more frequently in the afternoon and evening. This suggests that departure time is positively associated with long delays.