Flights from JFK

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'tidyr' was built under R version 4.3.2
## Warning: package 'readr' was built under R version 4.3.2
## Warning: package 'dplyr' was built under R version 4.3.2
## Warning: package 'forcats' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights13)
## Warning: package 'nycflights13' was built under R version 4.3.2
flights
## # A tibble: 336,776 × 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     1     1      517            515         2      830            819
##  2  2013     1     1      533            529         4      850            830
##  3  2013     1     1      542            540         2      923            850
##  4  2013     1     1      544            545        -1     1004           1022
##  5  2013     1     1      554            600        -6      812            837
##  6  2013     1     1      554            558        -4      740            728
##  7  2013     1     1      555            600        -5      913            854
##  8  2013     1     1      557            600        -3      709            723
##  9  2013     1     1      557            600        -3      838            846
## 10  2013     1     1      558            600        -2      753            745
## # ℹ 336,766 more rows
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## #   hour <dbl>, minute <dbl>, time_hour <dttm>
jfk_flights <- flights |>
  filter(origin == "JFK")

Bar Plot

flights_summary <- jfk_flights |>
  group_by(carrier) |>
  summarise(count = n()) |>
  mutate(top_carrier = ifelse(rank(desc(count)) <= 3, "Top 3", "Other"))


ggplot(flights_summary, aes(x = carrier, y = count, fill = top_carrier)) +
  geom_bar(stat = "identity") +
  labs(
    x = "Airline",
    y = "Number of Flights",
    title = "Number of Flights by Airline from JFK Airport",
    caption = "Data Source: nycflights13 package + ChatGPT"
  ) +
  scale_fill_manual(
    values = c("Top 3" = "pink", "Other" = "gray"),
    labels = c("Other Carriers", "Top 3 Carriers")
  ) +
  theme_minimal() +
  theme(legend.position = "right")

Paragraph Analysis

The bar plot that I made specifically focuses on flights from the JFK airport in New York. I filtered out all other airport locations and then focused on the carriers themselves by grouping that information. I was a little confused on how to go about doing this, so I consulted ChatGPT in order to take the groupings of each airline and summarize the number of flights per carrier so that I could create my y-axis. While making my bar plot, I wanted to focus on what the colors that were being used would convey. I had initially filled the colors of the plot by carrier which meant that each bar had it’s own color so the legend felt redundant and useless. I therefore decided to highlight the top 3 carriers that were making the most amount of flights to differentiate them from the rest.