Airplane_Flights

Use the dataset NYCFlights23 to create a heatmap that explores Late Arrivals

Source: FAA Aircraft registry,
https://www.faa.gov/licenses_certificates/aircraft_certification/ aircraft_registry/releasable_aircraft_download/

library(nycflights23)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

data(flights)

First, remove NA’s

flights_nona <- flights |>
  filter(!is.na(arr_delay))

Simpler (summary) table using group_by and summarise

The question I want to answer with this dataset is which airline carriers have the greatest and lowest average arrival delays. To do this, I have to make a summary table with the number of flights per airline carrier and the average arrival delay for each, and then plot this data.

companydelay <- flights_nona |>
  group_by(carrier) |>
  summarise(count = n(),
            delay = mean(arr_delay)
            )

delay <- filter(companydelay, count > 20)

Rename values in the “carrier” column to make more sense

The values in the column “carrier” are all codes for different airlines, which should be changed to the actual company names to make sense to average viewers. I found the names of the airlines on airlinecodes.info.

companydelay <- companydelay |>
  mutate(
    carrier = case_when(
    carrier == "9E" ~ "Endeavor",
    carrier == "AA" ~ "American",
    carrier == "AS" ~ "Alaska",
    carrier == "B6" ~ "Jetblue",
    carrier == "DL" ~ "Delta",
    carrier == "F9" ~ "Frontier",
    carrier == "G4" ~ "Allegiant",
    carrier == "HA" ~ "Hawaiian",
    carrier == "MQ" ~ "American Eagle",
    carrier == "NK" ~ "Spirit",
    carrier == "OO" ~ "SkyWest",
    carrier == "UA" ~ "United",
    carrier == "WN" ~ "Southwest",
    carrier == "YX" ~ "Midwest"
    ))

# this reordered the airlines, which became an issue when plotting. I performed this command to re-alphabetize the "carrier" column.

companydelay <- companydelay |> arrange(carrier)

Plot graph

acolors <- c("#B429F9", "#A935F9", "#9E41F8", "#934DF8", "#8859F7", "#7D65F7", "#7271F6", "#687DF6", "#5D89F5", "#5295F5", "#47A1F4", "#3CADF4", "#31B9F3", "#26C5F3")

ggplot(data=companydelay, aes(x=delay, y=carrier)) +
  geom_bar(stat="identity", fill=acolors) +
  geom_text(aes(label=round(delay, 2)), hjust=-.5, color="black", size=2.5)+
  theme_minimal() +
  labs(x = "Average Arrival Delay (minutes)",
       y = "Airplane Carrier",
       caption = "Source: FAA Aircraft Registry",
       title = "Average Delay by Aircarrier from Flights from NY")

Essay

This visualization was somewhat difficult but very fun for me to make. I got help from two friends and some websites, such as GeeksforGeeks, Stackexchange, and R-Charts with some steps, such as understanding what Professor Saidi was doing with the summary table, re-alphabetizing the “carrier” column, and adding value labels. My question involves comparing quantitative data across different categorical variables, so the most reasonable graph to use was a bar plot. Because I used a bar plot, color is cosmetic only, and there was no reason to create a key or legend for the graph. Instead, I added value labels for each bar to give a better understanding of what the table was saying, and used the round command that I learned in Data101 to round the numbers to the second decimal place. I initially had the categorical values on the x-axis, but switched them to the y-axis to keep readability after changing the airline code names to the airline company names. This change helped me understand the command geom_text, which I found online, better; I realized that vjust meant vertical adjust, so hjust would adjust the text horizontally and allign better with the new table. Overall, I’m very proud of myself for my work and for figuring out what the code I was writing actually does and how to customize tables to my liking.