NYC Flights Homework

Author

N Diker

Loaded in tidyverse

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Installed nycflights23 package

library(nycflights23)

Looking through the data that this specific set has in order to brainstorm graph ideas

data('flights')

Summarized the number of flights by month in order to (eventually) visualize how many there are each month

flights_by_month <- flights |>
  group_by(month) |>
  summarize(flights = n())

print(flights_by_month)
# A tibble: 12 × 2
   month flights
   <int>   <int>
 1     1   36020
 2     2   34761
 3     3   39514
 4     4   37476
 5     5   38710
 6     6   35921
 7     7   36211
 8     8   36765
 9     9   35505
10    10   36586
11    11   34521
12    12   33362

Changed the color of the top three months using the mutate function through the help of ChatGPT

flights_by_month <- flights_by_month %>%
  mutate(color = ifelse(flights %in% sort(flights, decreasing = TRUE)[1:3], 
                         "> 37000", "< 37000"))

Built a scatter plot that shows the relationship between months and flights taken

ggplot(flights_by_month, aes(x = factor(month, levels = 1:12,
                                         labels = c("Jan", "Feb", "Mar", "Apr",
                                                    "May", "Jun", "Jul", "Aug",
                                                    "Sep", "Oct", "Nov", "Dec")), 
                              y = flights)) +
  geom_point(aes(color = color)) +
  labs(x = "Months", y = "Number of Flights",
       title = "Number of Flights by Month (2023)", caption = "RITA, Bureau of transportation statistics, https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236") +
  scale_color_manual(values = c("> 37000" = "red", "< 37000" = "purple"))

The scatter plot that I’ve created shows the relationship between the months of 2023 and the number of flights taken from New York’s three largest airports within each of those months. The points that are colored purple indicate the months that had less than 37000 flights. The points that are colored red indicate the months that had more than 37000 flights. Before making this visualization, I thought June, July, August, and maybe December would’ve had the most flights since people are typically on vacation during these months; however, it turned out to be March, April, and May. I was quite surprised because I wasn’t expecting these three months to have had the most flights, which is why I decided to highlight them in this scatter plot by making them red. I was able to use the mutate function to change the colors of these specific plot points through the help of ChatGPT.