NYC Flights Homework

Author

D Shima

Access the library packages - tidyverse and nycflights23

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)

Load the dataset

head(flights)
# A tibble: 6 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2023     1     1        1           2038       203      328              3
2  2023     1     1       18           2300        78      228            135
3  2023     1     1       31           2344        47      500            426
4  2023     1     1       33           2140       173      238           2352
5  2023     1     1       36           2048       228      223           2252
6  2023     1     1      503            500         3      808            815
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>
data(flights)

Select, Filter, Group by, Summarize

Dataone <-flights |> 
select(carrier,origin,year,dest,sched_dep_time,dep_delay,month,minute)|>
 filter(year==2023)|>
  group_by(dest)|>
  summarize(count = n())

Summarize data by carrier

flight_summary <- flights %>%
  filter(!is.na(dep_delay)) %>%               
  group_by(carrier) %>%                       
  summarize(avg_dep_delay = mean(dep_delay))|>
  mutate(
    delay_category=if_else(avg_dep_delay>20,"high delay","low delay")
  ) 
flights <- flights %>%
  mutate(carrier_name = recode(carrier,
    "AA" = "American Airlines",
    "AS" = "Alaska Airlines",
    "B6" = "JetBlue Airways",
    "DL" = "Delta Air Lines",
    "EV" = "ExpressJet Airlines",
    "F9" = "Frontier Airlines",
    "G4" = "Allegiant Air",
    "HA" = "Hawaiian Airlines",
    "MQ" = "Envoy Air",
    "NK" = "Spirit Airlines",
    "OO" = "SkyWest Airlines",
    "UA" = "United Airlines",
    "WN" = "Southwest Airlines",
    "YX" = "Republic Airways",
    "9E" = "Endeavor Air"))

Bar plot

ggplot(flight_summary, aes(x = carrier, y = avg_dep_delay, fill = delay_category)) +
  geom_col() +
  labs(
    title = "Average Departure Delay by Airline",
    caption = "New York City Flights23 Dataset",
    x = "Airline Carrier",
    y = "Average Departure Delay (minutes)",
    fill = "Carrier") +theme_minimal()

Description of the summary

This visualization I created is a bar graph that shows the average departure delay of various airline carriers. The dataset that was used to create the graph is the NYC Flights 2023 dataset. The graph has two categories for delays: the high delay and low delay. According to the graph, F9, HA and B6 are the airlines with the high delays and the rest present lower delays. The x-axis is for the airline carriers and the y-axis is for the average departure delay in minutes.

One key aspect of the visualization is that it shows clearly the differences in delays among multiple carriers where some carriers have significant delays compared to others who have mild delays and others who have very low delays. The graph enables observations of patterns across carriers and could lead to evaluators assessing further the reasons behind the delays that could be based on scheduling practices, operational challenges, weather or airport traffic.