NYC Flights HW

Author

Andrew Hart

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)
library(psych)

Attaching package: 'psych'

The following objects are masked from 'package:ggplot2':

    %+%, alpha
head(flights)
# A tibble: 6 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2013     1     1      517            515         2      830            819
2  2013     1     1      533            529         4      850            830
3  2013     1     1      542            540         2      923            850
4  2013     1     1      544            545        -1     1004           1022
5  2013     1     1      554            600        -6      812            837
6  2013     1     1      554            558        -4      740            728
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>
flight_summary <- flights %>%
  group_by(carrier) %>%
  summarize(avg_delay = mean(dep_delay, na.rm = TRUE),
            total_flights = n())

flight_plot <- ggplot(flight_summary, aes(x = carrier, y = avg_delay, fill = total_flights)) +
  geom_bar(stat = "identity", color = "black") +
  labs(x = "Airlines by Carrier", y = "Average Flight Delay in Minutes", title = "Average Flight Delay by Carrier \nCompared to Total Number of Flights") +
  scale_fill_gradient(low = "yellow", high = "darkred", name = "Total Number of Flights")
  
flight_plot

I chose to make a bar graph showing the average flight delay per airline compared with the total number of flights per airline. I originally wanted to show the average delay by tail number but the vizual was too clustered and not effective. One thing to highlight from the graph is that there is no relation between total flights and average delay. This is shown by F9 having less than 700 total flights but yet having a higher average flight delay than EV, an airline with more than 54,000 flights.