NYC Flights Homework

Author

Wilfried Bilong

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)
library(RColorBrewer)
data(flights)
tibble(flights)
# A tibble: 336,776 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2013     1     1      517            515         2      830            819
 2  2013     1     1      533            529         4      850            830
 3  2013     1     1      542            540         2      923            850
 4  2013     1     1      544            545        -1     1004           1022
 5  2013     1     1      554            600        -6      812            837
 6  2013     1     1      554            558        -4      740            728
 7  2013     1     1      555            600        -5      913            854
 8  2013     1     1      557            600        -3      709            723
 9  2013     1     1      557            600        -3      838            846
10  2013     1     1      558            600        -2      753            745
# ℹ 336,766 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>
flights_nona <- flights |>
  filter(!is.na(distance) & !is.na(arr_delay))  
flights_nona |> 
  ggplot() +
  geom_bar(aes(x=month, y=arr_delay, fill=carrier),
         position = "dodge", stat = "identity") +
  labs(fill = "Airline Carrier", 
       y = "Arrival Delay (minutes)", 
       title = "Monthly Arrival Delay based on Carrier", 
       caption = "Source: NYC Flights Data")

This is a bar graph showing us the Monthly arrival delay in minutes per month based on different carriers. Each color represents a different carrier and we can clearly see which airlines are consistently reliable in terms of when they arrive. It should also be noted that the same airlines keep having arrival delays over and over again. This can help people plan ahead in terms of which airlines they fly with, if they have the luxury of that kind of choice anyway. It also poses some obvious questions like what is happening that’s causing certain airline carriers to be delayed more or less than others.