NYC Flight23 HW

Author

Balemlay

This is the package i used.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer)
library(nycflights23)
data(flights)

Here i arrange and clean my data.

departure_delay <- flights|>
  arrange(dep_delay)|>
na.omit(departure_delay$dep_delay)
departure_delay
# A tibble: 422,818 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2023     8    16     1839           1929       -50     2254           2218
 2  2023    12     2     2147           2225       -38     2332             35
 3  2023     9     3     1801           1834       -33     1934           2040
 4  2023     8    13     1804           1834       -30     1947           2040
 5  2023     1    28     1930           1959       -29     2308           2336
 6  2023     1    18     2103           2129       -26     2301           2336
 7  2023     5     5     1129           1155       -26     1337           1420
 8  2023     4    16      925            950       -25     1102           1140
 9  2023    11    14     2055           2120       -25     2215           2239
10  2023    12     6     2234           2259       -25     2350             30
# ℹ 422,808 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

here i rename flight code with their full name.

departure_delay$carrier[departure_delay$carrier == "B6"] <- "JetBlue Airways"
departure_delay$carrier[departure_delay$carrier == "YX"] <- "Republic Airlines"
departure_delay$carrier[departure_delay$carrier == "NK"] <- "Spirit Airlines"
departure_delay$carrier[departure_delay$carrier == "AS"] <- "Alaska Airlines"
departure_delay$carrier[departure_delay$carrier == "G4"] <- "Allegiant Air"
departure_delay$carrier[departure_delay$carrier == "9E"] <- "Endeavor Air"
departure_delay$carrier[departure_delay$carrier == "AA"] <- "American Airlines"
departure_delay$carrier[departure_delay$carrier ==  "DL"] <- "Delta Air Lines"              
departure_delay$carrier[departure_delay$carrier == "F9"] <- "Frontier Airlines"
departure_delay$carrier[departure_delay$carrier == "HA"] <- "Hawaiian Airlines"
departure_delay$carrier[departure_delay$carrier == "UA"] <- "United Airlines"
departure_delay$carrier[departure_delay$carrier == "WN"] <- "Southwest Airlines"
departure_delay$carrier[departure_delay$carrier == "OO"] <- "SkyWest Airline"                         

here i group by carrier and mutate.

dep_mean <- departure_delay|>
  group_by(carrier)|>
  mutate(mean_delay = mean(dep_delay))
head(dep_mean)
# A tibble: 6 × 20
# Groups:   carrier [4]
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2023     8    16     1839           1929       -50     2254           2218
2  2023    12     2     2147           2225       -38     2332             35
3  2023     9     3     1801           1834       -33     1934           2040
4  2023     8    13     1804           1834       -30     1947           2040
5  2023     1    28     1930           1959       -29     2308           2336
6  2023     1    18     2103           2129       -26     2301           2336
# ℹ 12 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>, mean_delay <dbl>

here is the code for the boxplot.

 Plot1 <- dep_mean|>
  filter(dep_delay > 500)|>
  ggplot(aes(x= carrier, y = dep_delay, fill = mean_delay))+
  geom_boxplot() +
  labs(title = "Boxplot Graph of Departure Delays Exceeding 500 Minutes by Carrier",
       caption = "source: FAA Aircraft Registry",
      x = "Carrier", 
      y = "Departure Delay(minutes)") +
   theme(axis.text.x = element_text(angle = 90))

Plot1

this is my essay

I analyze departure delay by carriers using their average delay. First, I filter the departure delay is greater than 500 minutes. And then I grouped by carrier and found mean for each. After that I came up with a boxplot to see the average delay of each air plan. I use carrier in the x -axis and departure delay in the y-axis. When we look the graph, we can clearly see that there is an outlier for American Airline, JetBlue Airways, and SkyWest Airline. For Allegiant Air and Southwest Airline, we can only see one horizontal line, which means the airline departing on time with zero delays. That is why boxplot shows flat lines because there is no variation in the data. Box plot that has lighter blue color represents the highest average delay and the darker the color shows lowest mean delay. We can conclude that Frontier Airlines has the highest average delay and Republic Airlines has the lowest average delay.