── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Dataset
data(flights)
Departure Delay by Specific hours
plot1 <- flights |>filter(dep_delay <50, hour %in%c(6,9,12,15,18)) |>ggplot(aes(x =factor(hour), y = dep_delay, fill =factor(hour))) +geom_boxplot(alpha =0.8, width =0.8) +labs(title ="Side-by-Side Boxplot of Departure Delay by specific hours(6,9,12,15,18)", x ="hours",y ="Departure Delay(minutes)", fill ="hours",caption ="Source: New York City Flights23")plot1
Essay
I created a side-by-side boxplot of departure delay at 6, 9, 12, 15, and 18 hours. Since the dataset contains a lot of data, I only kept flights with a departure delay less than 50 minutes so the graph would be easier to read. I also selected only the hours I wanted to focus on. I used hour %in% c(6,9,12,15,18) to keep only flights that departed at those specific hours. Moreover, each color represents a different hour of departure, and I added a legend so it is easy to see which color corresponds to which hour. The whiskers and points show the spread of the data and possible outliers, which represent flights with unusual delays or early departures. The median (line inside each box) shows the typical delay for that hour. From the plot, I noticed that flights later in the day, especially around 15:00 and 18:00, seem to have more differences in delays compared to earlier flights. This may happen because airports become busier during those hours. I also noticed that some median values are slightly below zero, which means that some flights depart a little earlier than their scheduled departure time.