DATA110-W5-HW-NYCflights

Author

ZS

create one plot to visualize one aspect of this dataset. The plot may be any type we have covered so far in this class (bargraphs, scatterplots, boxplots, histograms, treemaps, heatmaps, streamgraphs, or alluvials)

Requirements for the plot:

  1. Include at least one dplyr command (filter, sort, summarize, group_by, select, mutate, ….)

  2. Include labels for the x- and y-axes

  3. Include a title and caption for the data source

  4. Your plot must incorporate at least 2 colors

  5. Include a legend that indicates what the colors represent

  6. Write a brief paragraph that describes the visualization you have created and at least one aspect of the plot that you would like to highlight.

#setup + import data
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data("flights")
options(scipen = 999)
cutflights <- flights |> select(hour, dep_delay) |> group_by(hour)

vdf <- cutflights |> 
  summarise(n=n(),
            mean_dep_delay = mean(dep_delay, na.rm = T)) #|>
vdf |>
ggplot(aes(x=hour, y=n)) +
  geom_bar(aes( fill=mean_dep_delay), na.rm = T, stat = "identity") +
  scale_fill_gradient(low="lightblue", high="darkred") +
  labs(x = "Scheduled Departure Hour", y= "Total Amount of Flights", 
       fill="Mean Delay (Minutes)",
      title = "Total Flights and Mean Delay of Flights, by Hour", 
      caption = "RITA, Bureau of transportation statistics, https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236")

Total Flights and Mean Delay of Flights, by Hour

The visualization Total Flights and Mean Delay of Flights, by Hour is a bar chart showing the amount of flights scheduled at every hour, using one year of data(2023) from New York area airports. Flights seem to only be scheduled between 5AM and midnight, but with fewer flights scheduled at those maximum and minimum hours. In a year, anywhere between 0 and a over 3000 flights are scheduled at each hour. Additionally, the average delay of all flights scheduled at that hour is mapped onto a color scale, from light blue at the minimum of about 0 minutes, to dark red at the maximum of about 30 minutes. There appears to be a pattern where flights later in the day are delayed from more time on average.