boxpl <- flights |>ggplot(aes(x=carrier, y = arr_delay, fill = carrier)) +geom_boxplot() +labs(title ="Arrival Delay Distribution by Airline",caption ="Source: FAA Aircraft registry",x ="Airline Carrier",y="average delay in min") +coord_flip()
boxpl
Warning: Removed 12534 rows containing non-finite outside the scale range
(`stat_boxplot()`).
## I copied this code from the heatmaps code final_flights <- flights_nona |>select(carrier, arr_delay) |>left_join(airlines, by ="carrier")final_flights$name <-gsub("Inc\\.|Co\\.", "", final_flights$name)
# My Inclusion/Exculsion criteria final_flights |>count(carrier) #Top 5 B6,DL,9E,AA,NK
# A tibble: 14 × 2
carrier n
<chr> <int>
1 9E 52204
2 AA 39750
3 AS 7734
4 B6 64280
5 DL 60364
6 F9 1218
7 G4 667
8 HA 362
9 MQ 354
10 NK 14769
11 OO 6199
12 UA 77438
13 WN 12048
14 YX 85431
boxpl1 <- top_carrier |>ggplot(aes(x=name, y = arr_delay, fill = name)) +geom_boxplot() +labs(title ="Arrival Delay Distribution by Airline",caption ="Source: FAA Aircraft registry",x ="Airline Carrier",y="average delay in min",fill ="Airline") +scale_fill_manual(values=c("#2ca02c","#d62728","#1f77b4", "#ff7f0e","#9467bd")) +#Top 5 B6,DL,9E,AA,NKcoord_flip()boxpl1
In this visualization, I used a boxplot to show the arrival delay for different airlines. At the beginning, I tried to include all the airlines in the graph, but it made the plot look very crowded and hard to read. Because of that, I decided to only focus on the five airlines with the highest number of flights, which are JetBlue Airways, Delta Air Lines, Endeavor Air, American Airlines, and Spirit Airlines. By reducing the number of airlines, it became much easier to compare the delays between them.
Each airline is represented by a different color in the plot. I chose different colors so that each airline could be clearly distinguished from the others. The colors also match the legend, which helps the reader quickly understand which box belongs to which airline. Using multiple colors makes the visualization easier to read and helps highlight the differences between the airlines.
One thing that stands out in the plot is the large number of outliers for American Airlines. Most flights from all airlines have delays that are close to zero minutes, meaning they arrive close to their scheduled time. However, American Airlines shows several points that are much farther away from the rest of the data. These points represent flights that had very large delays. This suggests that although many American Airlines flights arrive on time, there are some cases where the delays are much higher compared to the other airlines.