NYC Flights Homework

Author

Gamaliel

NYC Flights Assignments

Loading Libraries and datasets

library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.5.2
Warning: package 'tibble' was built under R version 4.5.2
Warning: package 'tidyr' was built under R version 4.5.2
Warning: package 'readr' was built under R version 4.5.2
Warning: package 'purrr' was built under R version 4.5.2
Warning: package 'dplyr' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
data(flights)
?flights
head(flights)
# A tibble: 6 × 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2023     1     1        1           2038       203      328              3
2  2023     1     1       18           2300        78      228            135
3  2023     1     1       31           2344        47      500            426
4  2023     1     1       33           2140       173      238           2352
5  2023     1     1       36           2048       228      223           2252
6  2023     1     1      503            500         3      808            815
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

Cleaning of datasets

NYC_clean<- flights |>
  filter(!is.na(distance) & !is.na(arr_delay) & !is.na(dep_delay))

Dplyr application to have insights on the origin. to see if departing from one airport usually cause more delay.

airport_summary <- NYC_clean |>
  group_by(origin) |>
  summarise(
    total_flights = n(),
    avg_arr_delay = mean(arr_delay),
    avg_dep_delay = mean(dep_delay),
    .groups = "drop"
  )

airport_summary
# A tibble: 3 × 4
  origin total_flights avg_arr_delay avg_dep_delay
  <chr>          <int>         <dbl>         <dbl>
1 EWR           134398         7.33           15.2
2 JFK           129620         5.86           15.7
3 LGA           158800         0.582          10.7

Changing dataframe into matrix to only use numerics

airport_matrix <- data.matrix(airport_summary[, -1])
row.names(airport_matrix) <- airport_summary$origin
library(viridis)
Loading required package: viridisLite
airport_heatmap <-heatmap( 
  airport_matrix,
  Rowv = NA,
  Colv = NA,
  cexCol = .6,
  color = viridis(3),
  scale = "column",
  xlab = "Flight Delay Metrics",
  ylab = "NYC Airports",
  main =  "Heatmap of Average Flight Delays from NYC Airports"
)
Warning in plot.window(...): "color" is not a graphical parameter
Warning in plot.xy(xy, type, ...): "color" is not a graphical parameter
Warning in title(...): "color" is not a graphical parameter
mtext("Data Source: nycflights23 dataset (NYC flights in 2013)", 
      side = 1, line = 4, cex = 0.8)

VISUAL DESCRIPTION

This visualization is a heatmap that shows patterns in flight delays from the three major New York City airports: JFK, LaGuardia (LGA), and Newark (EWR) using the nycflights23 dataset. The data was first cleaned by filtering out flights that contained missing values for arrival delay or departure delay.Using the dplyr functions group_by() and summarise(), the dataset was aggregated by airport of origin. The resulting summary table includes the total number of flights departing from each airport, as well as the average arrival delay and average departure delay. The heatmap converts these numerical values into colors using color palette, which provides a gradient from lighter to darker colors. In this visualization, the rows represent the NYC airports, while the columns represent different delay metrics such as average arrival delay and average departure delay. The color intensity reflects the relative size of each value after scaling. One aspect highlighted in the heatmap is the difference in delay patterns across airports.Some airports appear darker in the delay columns, indicating higher average delays. This makes it easy to compare airport performance visually and quickly identify which airports experience greater delays.