In this assignment, we are expected to create a visualization utilizing the NYC Flights data set. When creating the visualization, we want to include dplyr commands that has been introduced in previous assignments, along with using at least 2 separate colors along with a legend stating what the colors mean. The graph we create must be something previously covered, and must include detailed labels showcasing the skills gained up until this point.
Load the Libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Make sure to also install the package nycflights23 first before attempting the next chunk.
library(nycflights23)
Grab the Data Sets Into the Global Environment
We will need the flights and airlines data sets to start.
data("flights")data("airlines")
Airlines data set includes the full name of each individual airline flying out from NYC along with the carrier code in reference to the airline’s name. Flights data set offers explicit numerical and categorical information regarding flights departing from all airports in NYC in 2023.
Data Observation & Preparations Before Final Visualization
Although the data set is perfect to work with, I would like to clean up the data some. But before explaining, I would like to show with an initial visual what I would like to adjust first.
flights2023 <- flights |>ggplot(aes(x = month, fill = carrier)) +geom_bar() +labs(x ="Months Flying out of NYC",y ="Frequency of Flights",title ="Flights Departing NYC from All Carriers in 2023")+theme_minimal()flights2023
Based on this visualization, I cannot quite understand the x-axis, as it is displaying as numerical vs categorical. So instead, I will update that by creating a new tibble and modifying the variables so that instead of it being numerical, it will be categorical.
Final Visualization
To start off, I must create the new tibble flightsnew
Now for this version of the bar graph, I will rotate the x-axis so that the bars extend horizontally instead of vertically. This is to help with the visual aid as it will be easier to view the differences in quantity of flights departing from each carrier.
I will also like to move the fill legend to the bottom of the graph to make the entire visualization more appealing.
flightsfinal <- flightsnew |>ggplot(aes(x = month, fill = carrier)) +geom_bar() +scale_fill_discrete(name ="Carriers",labels =c("Endeavor Air", "American Airlines", "Alaska Airlines", "JetBlue Airways", "Delta Air Lines", "Frontier Airlines", "Allegiant Air", "Hawaiian Airlines", "Envoy Air", "Spirit Air Lines", "SkyWest Airlines", "United Air Lines", "Southwest Airlines", "Republic Airline")) +labs(x ="Months Flying out of NYC",y ="Frequency of Flights",title ="Flights Departing NYC from All Carriers in 2023")+coord_flip() +theme_minimal() +theme(legend.position ="bottom",legend.direction ="horizontal")flightsfinal
Analysis
The final visualization makes it much easier to capture the details. To give a short analysis based on this:
In 2023, NYC had the most frequent departing flights in the month of March
It appears that United Air Lines has the most flight departures across all months in 2023
It also appears that a number of carriers had drastically lower departures in comparison to the relative 6 carriers which dominated the airports in NYC