# A tibble: 14 × 2
carrier n
<fct> <int>
1 Endeavor Air 54141
2 American Airlines 40525
3 Alaska Airlines 7843
4 JetBlue Airways 66169
5 Delta Airlines 61562
6 Frontier Airlines 1286
7 Allegiant Air 671
8 Hawaiian Airlines 366
9 Envoy Air 357
10 Spirit Airlines 15189
11 Skywest Airlines 6432
12 United Airlines 79641
13 Southwest Airlines 12385
14 Republic Airline 88785
Then change abbreviations for origin too
flights$origin <-factor(flights$origin, levels =c("EWR", "JFK", "LGA"), labels =c("Newark Liberty Airport", "John F. Kennedy Airport", "LaGuardia Airport"))
Visualization
For my visualization, I want to show the average departure delay in minutes for each of the airlines in each of the three origins for the month of august because that is when airports get the busiest. To make this happen, I first need to filter for the month of August.
august_flights <- flights |>filter(month==8)
Then I create a new subset dataset by grouping by origin and carrier. Then I summarize the average for each group.
`summarise()` has grouped output by 'origin'. You can override using the
`.groups` argument.
grouped_august_flights
# A tibble: 28 × 3
# Groups: origin [3]
origin carrier average_delay
<fct> <fct> <dbl>
1 Newark Liberty Airport Endeavor Air 6.01
2 Newark Liberty Airport American Airlines 11.8
3 Newark Liberty Airport Alaska Airlines 18.3
4 Newark Liberty Airport JetBlue Airways 16.7
5 Newark Liberty Airport Delta Airlines 13.3
6 Newark Liberty Airport Allegiant Air 7.91
7 Newark Liberty Airport Spirit Airlines 25.8
8 Newark Liberty Airport Skywest Airlines 7.5
9 Newark Liberty Airport United Airlines 15.5
10 Newark Liberty Airport Republic Airline 1.72
# ℹ 18 more rows
Finally I create the visualization and make it a heat map to account two categorical variables.
ggp <-ggplot(grouped_august_flights, aes(origin, carrier)) +geom_tile(aes(fill=average_delay)) +scale_fill_distiller(palette="RdYlBu") +theme_dark() +labs(x ="NYC Airports",y="Airlines",caption ="Source: FAA Aircraft registry",fill ="Average Departure Delay \n (in minutes)",title ="Average Flight Departure Delays of Airlines from \n Three Different Airports in August") ggp
Summary
For my visualization I chose to show the average flight departure delay of several airlines originating from Newark Liberty Airport, John F. Kennedy Airport, and LaGuardia Airport. I used a heat map to illustrate the two categorical variables- airport and airlines. For the fill values, I used the average departure delay in minutes. I highly recommend taking these fill values with a grain of salt because they do not account for huge outliers. For Delta Airlines flights originating from Newark Liberty Airport, for example, the average departure delay was 13.28 minutes. However, the minimum was -16, meaning the flight was 16 minutes early in its departure, and the maximum was 1047 minutes. These average values may not fully represent the dataset. Something that may stick out at first glance is the empty tiles for some of the airports and airlines. This happened because there wasn’t any data for the variables, at least for the month of august.
ss <- flights |>filter(origin=="Newark Liberty Airport", carrier =="Delta Airlines", month==8)summary(ss$dep_delay)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-16.00 -5.00 -1.50 13.28 6.00 1047.00 10