library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(nycflights13)
library(psych)
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
#view(flights)
#describe(flights)
library(dplyr)
library(RColorBrewer)
flights <- flights
#summary(flights)
Your assignment is to create one plot to visualize one aspect of this dataset. The plot may be any type we have covered so far in this class (bargraphs, scatterplots, boxplots, histograms, treemaps, heatmaps, streamgraphs, or alluvials)
flights$month[flights$month == 1] <- "Jan"
flights$month[flights$month == 2] <- "Feb"
flights$month[flights$month == 3] <- "Mar"
flights$month[flights$month == 4] <- "Apr"
flights$month[flights$month == 5] <- "May"
flights$month[flights$month == 6] <- "Jun"
flights$month[flights$month == 7] <- "Jul"
flights$month[flights$month == 8] <- "Aug"
flights$month[flights$month == 9] <- "Sep"
flights$month[flights$month == 10] <- "Oct"
flights$month[flights$month == 11] <- "Nov"
flights$month[flights$month == 12] <- "Dec"
flights$month<-factor(flights$month, levels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun","Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
removena_flights <- flights %>%
filter(!is.na(arr_delay))
by_carrier <- removena_flights %>%
group_by(carrier, month) %>%
summarize(count = n(),
delay = mean(arr_delay))
## `summarise()` has grouped output by 'carrier'. You can override using the
## `.groups` argument.
flightDelay <- filter(by_carrier, count > 100, delay > -20)
library(alluvial)
library(ggalluvial)
newAlluv <- ggplot(flightDelay, aes(x=month, y= delay, alluvium = carrier)) +
theme_bw() +
geom_alluvium(aes(fill = carrier),
color = "white",
width = .1,
alpha = .8,
decreasing = FALSE) +
scale_fill_brewer(palette = "Set3") +
scale_fill_discrete(name = "Airline Carrier") +
ggtitle("Flight Arrival Delays by Carrier each Month in 2013\n") +
ylab("Number of Delays") +
xlab("Month")
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
newAlluv
This alluvial displays the average arrival delays by airline carrier (with more than 100 flights throughout the year) each month in 2013. Each airline carrier has a designated color (shown in the legend) to distinguish between competitors. It is helpful to compare arrival delays across airline carriers and it helps identify which months resulted in more or fewer arrival delays. Beginning around late April, carrier FL (AirTran Airways) had more delays than their competitors. The months with the highest number of arrival delays was June and July, most likely as a result from higher tourism. On the other hand, the month with overwhelmingly fewer delays was September, most likely as a result from less scheduled flights. One of the most beneficial aspects of this graph is the ability to display the ranges of delays per carrier per month. It makes it easier to distinguish when an airline had particularly good or bad months and how their performance relates to their competitor. Overall, this plot is useful to determine what airline a traveler should chose if they want to avoid arrival delays or how to plan their travels accordingly based on when they plan on traveling and with which airline carrier.