NYC Flights HW

Author

Emma Poch

Setting working directory and loading necessary packages

setwd("C:/Users/emmap/Downloads/DATA110")
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
Warning: package 'nycflights23' was built under R version 4.3.3
library(RColorBrewer)
library(gridExtra)
Warning: package 'gridExtra' was built under R version 4.3.3

Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine
flightlog <- nycflights23::flights

departing <- flightlog[, c(6, 10, 13)]
departing2 <- departing |>
  group_by(carrier, origin) |>
  summarise(mean_delay = mean(dep_delay, na.rm = TRUE))
`summarise()` has grouped output by 'carrier'. You can override using the
`.groups` argument.
departplot <- departing2 |>
  ggplot(aes(x = carrier, y = origin, fill = mean_delay))+
  geom_tile()+
  scale_fill_distiller(palette = "BuGn")+
  theme_bw()+
  labs(title = "Departing Delays Based on Carrier and Airport", x = "Airline", y = "Departing Airport", fill = "Mean Delay Time", caption = "Source: Bureau of Transportation Statistics")
departplot

arriving <- flightlog[,c(9, 10, 14)]
arriving2 <- arriving |>
  group_by(carrier, dest) |>
  summarise(mean_delay = mean(arr_delay, na.rm = TRUE))
`summarise()` has grouped output by 'carrier'. You can override using the
`.groups` argument.
arriving3 <- arriving2[sample(nrow(arriving2), 29), ]
arriveplot <- arriving3 |>
  ggplot(aes(x = carrier, y = dest, fill = mean_delay))+
  geom_tile()+
  scale_fill_distiller(palette = "OrRd")+
  theme_bw()+
  labs(title = "Arrival Delays Based on Carrier and Airport", x = "Airline", y = "Destination Airport", fill = "Mean Delay Time", caption = "Source: Bureau of Transportation Statistics")
arriveplot

grid.arrange(departplot, arriveplot)

Reflection

The pair of tile plots that I created was intended to be a comparison of delay times for departure and arrival, respectively, based on the associated airline. I opted to include the airports as well as the airlines, in the hopes that if one factor had a greater influence on arrival time than the other this would be visually apparent. The main issue that I encountered with creating this visualization was the fact that, although the amount of carriers was consistent, the amount of destination airports greatly exceeded the amount of origin airports. I attempted to make the two somewhat more comparable by randomly selecting 29 observations from the data frame of arrival delays (the total amount of observations present), which made the initial image easier to read. It should also be noted that not all airlines were equally represented in either the departing or arriving sets. Generally, YX seemed to have the best record, with consistently low delay times at all airports for both departure and arrival. B6, on the other hand, displayed fairly high delay times on both charts. OO was an interesting case, as it tended toward higher delay times upon departure but had generally low delay times upon arrival. Although the visualization was not as succinct as I’d hoped it would be, I still feel that it’s a useful visual aid for comparing the efficiencies of the various airlines and airports involved.