Flights Assignment

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
library(lubridate)
data(flights)

Removing the N/A from the distance and arrival delay

flights_clean <- flights |>
  filter(!is.na(distance) & !is.na(arr_delay) & !is.na(dep_delay))

Giving the months names not numbers

flights_month <- flights_clean |>
  mutate (date = make_date(year, month, day)) |>
  mutate(month_name = case_when(
    month == 1  ~ "January",
    month == 2  ~ "February",
    month == 3  ~ "March",
    month == 4  ~ "April",
    month == 5  ~ "May",
    month == 6  ~ "June",
    month == 7  ~ "July",
    month == 8  ~ "August",
    month == 9  ~ "September",
    month == 10 ~ "October",
    month == 11 ~ "November",
    month == 12 ~ "December"
  )) |>
  mutate(week_day = wday(date, label = TRUE, abbr = TRUE)) |>
  mutate(month_week = ceiling(day / 7))

Grouping them by destination

by_dest <- flights_month |>
  group_by(dest, month_name, week_day, month_week) |>  
  summarise(count = n(),   
            avg_dist = mean(distance), 
            avg_arr_delay = mean(arr_delay),  
            avg_dep_delay = mean(dep_delay), 
            .groups = "drop") |>
  arrange(avg_arr_delay) |>
  mutate(avg_total_delay = avg_arr_delay + avg_dep_delay)

head(by_dest)
# A tibble: 6 × 9
  dest  month_name week_day month_week count avg_dist avg_arr_delay
  <chr> <chr>      <ord>         <dbl> <int>    <dbl>         <dbl>
1 ABQ   June       Thu               2     1     1826           -76
2 PSP   April      Tue               2     1     2378           -76
3 SJC   June       Thu               2     1     2569           -76
4 BUR   April      Tue               2     1     2465           -75
5 SJC   June       Tue               2     1     2569           -74
6 PSP   December   Tue               4     2     2378           -69
# ℹ 2 more variables: avg_dep_delay <dbl>, avg_total_delay <dbl>

Facted Heatmap

ggplot(by_dest, aes(x = as.character(month_week), y = week_day, fill = avg_dist)) +
  geom_tile(color = "white") +
  facet_wrap(~ month_name, nrow = 3, scales = "free") +
  scale_fill_gradient(low = "#f0cb35", high = "#c02425") +
  xlab("Week of Month") + 
  ylab("Days of Week") +
  labs(fill = "Avg distance", caption = "Source: New York Flights Data", title = "Faceted Heatmap") +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.border     = element_blank(),
    strip.text       = element_text(size = 12),
    plot.caption = element_text(hjust = 0.5, size = 9), 
    plot.title = element_text(hjust= 0.5, size =11)
  )

Reflection

This is a heatmap of every month and day, displaying the average flight distances in New York. One notable feature is that certain days and months appear darker on the map, representing greater distances travelled, while lighter colors represent less distance travelled. Another noticeable detail is that each month has its own set of days beside it; this is due to adding scales = ‘free’ in the facet_wrap function, which makes the map much easier to read and interpret. One major issue I encountered was getting the months to appear in chronological order. I also applied a minimal theme to keep viewers more focused on the map, without being distracted by surrounding letters.The darker spots are consistent throughout the map, but people seem to travel the farthest during the spring break period (March to April). Overall, this map highlights patterns in flight distance across different times of the year.