NYC Flights Homework

Load the libraries and view the “flights” dataset

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(nycflights13)
library(psych)
## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
#view(flights)
#describe(flights)
library(dplyr)
library(RColorBrewer)
flights <- flights
#summary(flights)

Now create one data visualization with this dataset

Your assignment is to create one plot to visualize one aspect of this dataset. The plot may be any type we have covered so far in this class (bargraphs, scatterplots, boxplots, histograms, treemaps, heatmaps, streamgraphs, or alluvials)

Requirements for the plot:

  1. Include at least one dplyr command (filter, sort, summarize, group_by, select, mutate, ….)
  2. Include labels for the x- and y-axes
  3. Include a title
  4. Your plot must incorporate at least 2 colors
  5. Include a legend that indicates what the colors represent
  6. Write a brief paragraph that describes the visualization you have created and at least one aspect of the plot that you would like to highlight.

Change the Months from 1 - 12 to Jan through Dec

flights$month[flights$month == 1] <- "Jan"
flights$month[flights$month == 2] <- "Feb"
flights$month[flights$month == 3] <- "Mar"
flights$month[flights$month == 4] <- "Apr"
flights$month[flights$month == 5] <- "May"
flights$month[flights$month == 6] <- "Jun"
flights$month[flights$month == 7] <- "Jul"
flights$month[flights$month == 8] <- "Aug"
flights$month[flights$month == 9] <- "Sep"
flights$month[flights$month == 10] <- "Oct"
flights$month[flights$month == 11] <- "Nov"
flights$month[flights$month == 12] <- "Dec"

Reorder the Months so they do not default to alphabetical

flights$month<-factor(flights$month, levels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun","Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))

Remove NA’s in dep_time and arr_delay from dataset

removena_flights <- flights %>%
  filter(!is.na(arr_delay))

Use group_by and summarise to create a summary table

by_carrier <- removena_flights %>%
  group_by(carrier, month) %>%
  summarize(count = n(),
        delay = mean(arr_delay))
## `summarise()` has grouped output by 'carrier'. You can override using the
## `.groups` argument.
flightDelay <- filter(by_carrier, count > 100, delay > -20)

Alluvial

library(alluvial)
library(ggalluvial)
newAlluv <- ggplot(flightDelay, aes(x=month, y= delay, alluvium = carrier)) +
  theme_bw() +
  geom_alluvium(aes(fill = carrier), 
                color = "white",
                width = .1, 
                alpha = .8,
                decreasing = FALSE) +
  scale_fill_brewer(palette = "Set3") +
  scale_fill_discrete(name = "Airline Carrier") +
  ggtitle("Flight Arrival Delays by Carrier each Month in 2013\n") +
  ylab("Number of Delays") + 
  xlab("Month")
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
newAlluv

This alluvial displays the average arrival delays by airline carrier (with more than 100 flights throughout the year) each month in 2013. Each airline carrier has a designated color (shown in the legend) to distinguish between competitors. It is helpful to compare arrival delays across airline carriers and it helps identify which months resulted in more or fewer arrival delays. Beginning around late April, carrier FL (AirTran Airways) had more delays than their competitors. The months with the highest number of arrival delays was June and July, most likely as a result from higher tourism. On the other hand, the month with overwhelmingly fewer delays was September, most likely as a result from less scheduled flights. One of the most beneficial aspects of this graph is the ability to display the ranges of delays per carrier per month. It makes it easier to distinguish when an airline had particularly good or bad months and how their performance relates to their competitor. Overall, this plot is useful to determine what airline a traveler should chose if they want to avoid arrival delays or how to plan their travels accordingly based on when they plan on traveling and with which airline carrier.