NYC Flights Analysis

Author

A Warsaw

Introduction

In this assignment, we are expected to create a visualization utilizing the NYC Flights data set. When creating the visualization, we want to include dplyr commands that has been introduced in previous assignments, along with using at least 2 separate colors along with a legend stating what the colors mean. The graph we create must be something previously covered, and must include detailed labels showcasing the skills gained up until this point.

Load the Libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Make sure to also install the package nycflights23 first before attempting the next chunk.

library(nycflights23)

Grab the Data Sets Into the Global Environment

We will need the flights and airlines data sets to start.

data("flights")
data("airlines")

Airlines data set includes the full name of each individual airline flying out from NYC along with the carrier code in reference to the airline’s name. Flights data set offers explicit numerical and categorical information regarding flights departing from all airports in NYC in 2023.

Data Observation & Preparations Before Final Visualization

Although the data set is perfect to work with, I would like to clean up the data some. But before explaining, I would like to show with an initial visual what I would like to adjust first.

flights2023 <- flights |>
  ggplot(aes(x = month, fill = carrier)) +
  geom_bar() +
  labs(x = "Months Flying out of NYC",
       y = "Frequency of Flights",
       title = "Flights Departing NYC from All Carriers in 2023")+
  theme_minimal()
flights2023

Based on this visualization, I cannot quite understand the x-axis, as it is displaying as numerical vs categorical. So instead, I will update that by creating a new tibble and modifying the variables so that instead of it being numerical, it will be categorical.

Final Visualization

To start off, I must create the new tibble flightsnew

flightsnew <- flights |>
  group_by(month, carrier) |>
  select(month, carrier)

Next I need to adjust the variable flightsnew$month from numerical to categorical

flightsnew$month [flightsnew$month == 1] <- "January"
flightsnew$month [flightsnew$month == 2] <- "February"
flightsnew$month [flightsnew$month == 3] <- "March"
flightsnew$month [flightsnew$month == 4] <- "April"
flightsnew$month [flightsnew$month == 5] <- "May"
flightsnew$month [flightsnew$month == 6] <- "June"
flightsnew$month [flightsnew$month == 7] <- "July"
flightsnew$month [flightsnew$month == 8] <- "August"
flightsnew$month [flightsnew$month == 9] <- "September"
flightsnew$month [flightsnew$month == 10] <- "October"
flightsnew$month [flightsnew$month == 11] <- "November"
flightsnew$month [flightsnew$month == 12] <- "December"

And now to double check that the month variable has been updated:

summary(flightsnew$month)
   Length     Class      Mode 
   435352 character character 

one final touch up to ensure that there will be no issues regarding the months:

flightsnew$month <- factor(flightsnew$month,
                           levels = c("December", "November", "October", "September", "August", "July", "June", "May", "April", "March", "February", "January"))

Now for this version of the bar graph, I will rotate the x-axis so that the bars extend horizontally instead of vertically. This is to help with the visual aid as it will be easier to view the differences in quantity of flights departing from each carrier.

I will also like to move the fill legend to the bottom of the graph to make the entire visualization more appealing.

flightsfinal <- flightsnew |>
  ggplot(aes(x = month, fill = carrier)) +
  geom_bar() +
  scale_fill_discrete(name = "Carriers",
                      labels = c("Endeavor Air", "American Airlines", "Alaska Airlines", "JetBlue Airways", "Delta Air Lines", "Frontier Airlines", "Allegiant Air", "Hawaiian Airlines", "Envoy Air", "Spirit Air Lines", "SkyWest Airlines", "United Air Lines", "Southwest Airlines", "Republic Airline")) +
  labs(x = "Months Flying out of NYC",
       y = "Frequency of Flights",
       title = "Flights Departing NYC from All Carriers in 2023")+
  coord_flip() +
  theme_minimal() +  
  theme(legend.position = "bottom",
        legend.direction = "horizontal")
flightsfinal  

Analysis

The final visualization makes it much easier to capture the details. To give a short analysis based on this:

  • In 2023, NYC had the most frequent departing flights in the month of March
  • It appears that United Air Lines has the most flight departures across all months in 2023
  • It also appears that a number of carriers had drastically lower departures in comparison to the relative 6 carriers which dominated the airports in NYC