NYC_FlightsAssignment

Author

Asma Abbas

Assignment: Create one data visualization with this dataset. Open a new quarto document and use the chunk above to load the nycflights23 library package and then load the data into your global environment. Create one plot to visualize some aspect of this dataset. The plot may be any type we have covered so far in this class (bar graphs, scatterplots, boxplots, histograms, treemaps, heatmaps, streamgraphs, or alluvials)

Loading the necessary libraries and data

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)
Warning: package 'nycflights23' was built under R version 4.4.3
data(flights)
data(airlines)

The graph I want to make is one that show each carrier and how late they’ve arrived to DMV airports.

Filtering data for Dulles airport and renaming some variables

  • Include at least one dplyr command (filter, sort, summarize, group_by, select, mutate, ….)
flights_DMVdelayed<- flights |>
  filter(dest %in%c("IAD", "BWI", "DCA"))|>
  left_join(airlines, by = "carrier") |>
  mutate(delay = arr_delay)|>
  select(carrier, delay, name, dest)

To make this is kinda just followed the frame of what was done in the data journalism document. So essentially here I just filtered out the flights going to the dmv airports from the entire dataset. I used left join to keep the data intact and to make sure that the flights and airline data that has the name of the carriers matches up. and then I just used mutate to change the variable name because I thought delay was better. Then for the last line I just selected the data that would make sense for the boxplot.

Making the actual boxplot:

Include labels for the x- and y-axes and a caption for the source for the data Include a title Your plot must incorporate at least 2 different colors Include a legend that indicates what the colors represent

ggplot(flights_DMVdelayed, aes(x = name, y = delay, fill = dest)) +
  geom_boxplot() +
  scale_fill_manual(values = c("lightblue", "forestgreen", "darkmagenta")) + 
  labs(
    title = "Delayed Flights At DMV Airports",
    x = "Airline Carrier",
    y = "Arrival Delay (minutes)",
    caption = "Data Source: nycflights23"
  ) +
  theme_gray() +
  theme(
    axis.text.x = element_text(angle = 30, hjust = 1)  
  )
Warning: Removed 369 rows containing non-finite outside the scale range
(`stat_boxplot()`).

So here I made a boxplot using the data frame I created. I labeled the x and y axis and then selected the fill colors. Then I brought about the title, labels, and the caption and selected a theme. I looked at all the different ones but I thought this one was the most readable. Then I noticed that the names of the airline carriers were meshed into one another, so I searched up how I could fix it and use the axis.text function to work with the angle and alignment. I wanted to make the boxplots bigger and easier to read, but was unsure how.

The website citation:

thda.com. (n.d.). ggplot2 axis ticks: A guide to customize tick marks and labels. THDA. Retrieved June 13, 2025, from https://thda.com/english/wiki/ggplot2-axis-ticks-a-guide-to-customize-tick-marks-and-labels