New York Flights

Author

Marie-Anne Kemajou

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data("flights")
flights$carrier[flights$carrier == "UA"]<-"United Airlines"
flights$carrier[flights$carrier == "DL"]<-"Delta Air Lines"
flights$carrier[flights$carrier == "B6"]<-"JetBlue Airways"
flights$carrier[flights$carrier == "AA"]<-"American Airlines"
flights$carrier[flights$carrier == "NK"]<-"Spirit Airlines"
flights$carrier[flights$carrier == "WN"]<-"Southwest Airlines"
flights$carrier[flights$carrier == "AS"]<-"Alaska Airlines"
flights$carrier[flights$carrier == "YX"]<-"Republic Airways"
flights$carrier[flights$carrier == "9E"]<-"Endeavor Air"
flights$carrier[flights$carrier == "HA"]<-"Hawaiian Airlines"
flights$carrier[flights$carrier == "G4"]<-"Allegiant Air"
flights$carrier[flights$carrier == "MQ"]<-"Envoy Air"
flights$carrier[flights$carrier == "OO"]<-"SkyWest Airlines"
flights$carrier[flights$carrier == "F9"]<-"Frontier Airlines"
chrono_flights <- flights %>%
  arrange(dep_delay)
flights2 <- chrono_flights  %>%
  mutate(delay_category = cut(
    dep_delay, 
    breaks = c(0, 50, 200, Inf),
    labels = c("Low Delay", "Medium Delay", "High Delay")
  ))
ggplot(flights2, aes(x = carrier, y = dep_delay, color = delay_category)) +
  geom_col() +
 scale_fill_manual(values = c("Low Delay" = "pink", 
                                "Medium Delay" = "magenta", 
                                "High Delay" = "red")) +
  labs(
    title = "Departure Delays by Airline",
    x = "Airline",
    y = "Departure Delay in Minutes",
    color = "Delay Category",
    caption = "Source: FAA Aircraft Registry")+
  theme_dark() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9))
Warning: No shared levels found between `names(values)` of the manual scale and the
data's fill values.
Warning: Removed 10738 rows containing missing values or values outside the scale range
(`geom_col()`).

The visualization that I have created compares airlines and their departure delay in minutes. I used a bar plot to display the instances of delayed departure, and I separated the delay categories into 3. Red indicates that there is a “high delay” which is anything above 200 minutes, magenta indicates a “medium delay” which is above 50 to 200, and pink indicates a low delay, which is below 50 minutes. I broke it up into 3 categories in order to avoid having so many colors, which I quickly realized would be a problem the first time I created this graph. The graph does a good job at showing the distinctions between each airline, as you are able to see the types of delays for each airline and how frequent they are. I struggled a lot once I realized that the variables I picked were going to need to be color coded differently and I had to look up how to actually code it in a way that would let me change how it is categorized. I also struggled with disorganization within my coding. I had attempted the same code with tweaks here and there to try to improve how cluttered it looked, and then I removed old code and forgot that some of what I used was still necessary in order for the current plot to actually function. It was also confusing because I initially was starting off with a box plot and I was referencing the airquality homework assignment, but when I could not figure out how to make it look organized, I decided to do a scatter plot instead and I referenced code I used for a previous scatter plot I did last year. I ended up not being able to do a scatterplot either because it was incompatible with one of the variables I chose. I am relatively happy with this bar plot, though I would have liked for the number scale on the y axis to be different. I did my best to use the code I already knew/had saved from Data 101, what I could research and what I could reference from this class.