# Filter the dataset to focus on reasonable delays (between -20 and 200 minutes)flights_clean <- flights %>%filter(dep_delay >-20, dep_delay <200)
ggplot(flights_clean, aes(x = dep_delay, fill = dep_delay <0)) +geom_histogram(binwidth =5, color ="black") +scale_fill_manual(values =c("#AEDFF7", "#90EE90"), name ="Early Departure") +# Light blue and light green colorslabs(title ="Distribution of Departure Delays (Under 200 Minutes, 5-Minute Bins)",x ="Departure Delay (minutes)",y ="Number of Flights",caption ="Source: nycflights23 dataset") +theme_minimal()
Explanation:
The histogram visualizes the distribution of departure delays for flights from NYC airports in 2023, focusing on delays under 200 minutes, with each bar representing a 5-minute range. The x-axis shows the departure delay in minutes, and the y-axis represents the number of flights. Early departures (negative delays) are represented in light green, while flights that are on time or delayed are shown in light blue. The dataset was filtered using a dplyr command to focus on delays between -20 and 200 minutes, excluding extreme outliers. The bar around 0 is half green and half blue because flights that depart exactly on time (0 minutes delay) are grouped into the same bin as flights that depart slightly early or slightly late. The majority of flights cluster around 0 minutes, indicating that most flights departure very close to scheduled time. As the delay increases beyond 50 minutes, the number of flights significantly decreases, and there are very few flights delayed over 100 minutes.