# used to compare carriers within categories NOT FINAL VISUALIZATIONflights2 |>ggplot(aes(carrier, ratio)) +geom_boxplot() +facet_grid(~dist_cat) +theme(axis.text.x =element_text(angle =90))
Warning: Removed 12534 rows containing non-finite outside the scale range
(`stat_boxplot()`).
# used to ensure each carrier was actually was the best or worst in their categories # (instead of guessing only based off the box plot, I used this code outline to check close calls in all categories)flights2 |>filter(carrier =="DL"& dist_cat =="short") |>summary()
year month day dep_time sched_dep_time
Min. :2023 Min. : 1.000 Min. : 1.00 Min. : 3 Min. : 600
1st Qu.:2023 1st Qu.: 3.000 1st Qu.: 8.00 1st Qu.:1347 1st Qu.:1512
Median :2023 Median : 5.000 Median :16.00 Median :2024 Median :2000
Mean :2023 Mean : 5.612 Mean :15.77 Mean :1752 Mean :1774
3rd Qu.:2023 3rd Qu.: 8.000 3rd Qu.:23.00 3rd Qu.:2155 3rd Qu.:2159
Max. :2023 Max. :12.000 Max. :31.00 Max. :2358 Max. :2256
NA's :27
dep_delay arr_time sched_arr_time arr_delay
Min. :-17.00 Min. : 1 Min. : 27 Min. :-51.00
1st Qu.: -4.00 1st Qu.:1337 1st Qu.:1616 1st Qu.:-15.00
Median : -1.00 Median :2012 Median :2119 Median : -4.00
Mean : 25.57 Mean :1701 Mean :1914 Mean : 18.42
3rd Qu.: 24.00 3rd Qu.:2254 3rd Qu.:2308 3rd Qu.: 23.00
Max. :975.00 Max. :2400 Max. :2359 Max. :984.00
NA's :27 NA's :29 NA's :31
carrier flight tailnum origin
Length:1693 Min. : 96.0 Length:1693 Length:1693
Class :character 1st Qu.: 409.0 Class :character Class :character
Mode :character Median : 610.0 Mode :character Mode :character
Mean : 610.4
3rd Qu.: 807.0
Max. :1087.0
dest air_time distance hour
Length:1693 Min. : 30.00 Min. :184.0 Min. : 6.00
Class :character 1st Qu.: 37.00 1st Qu.:184.0 1st Qu.:15.00
Mode :character Median : 41.00 Median :187.0 Median :20.00
Mean : 43.06 Mean :212.9 Mean :17.47
3rd Qu.: 44.75 3rd Qu.:187.0 3rd Qu.:21.00
Max. :164.00 Max. :431.0 Max. :22.00
NA's :31
minute time_hour dist_cat
Min. : 0.00 Min. :2023-01-01 14:00:00.00 short :1693
1st Qu.: 0.00 1st Qu.:2023-03-03 13:00:00.00 medium short: 0
Median :29.00 Median :2023-05-19 21:00:00.00 medium long : 0
Mean :27.66 Mean :2023-06-05 00:08:40.97 long : 0
3rd Qu.:55.00 3rd Qu.:2023-08-27 21:00:00.00
Max. :59.00 Max. :2023-12-31 20:00:00.00
ratio
Min. :1.207
1st Qu.:4.452
Median :4.842
Mean :4.941
3rd Qu.:5.412
Max. :7.167
NA's :31
# makes a data set with only the best and worst carries in each categorybw <- flights2 |>filter(((carrier =="HA"| carrier =="YX") & dist_cat =="long") | ((carrier =="OO"| carrier =="G4") & dist_cat =="medium long") | ((carrier =="OO"| carrier =="F9") & dist_cat =="medium short") | ((carrier =="NK"| carrier =="DL") & dist_cat =="short"))
Final Visualization
flights2 |>ggplot(aes(dist_cat, ratio)) +geom_jitter(data = bw, aes(dist_cat, ratio, color = carrier), alpha =0.2) +geom_boxplot(alpha =0.5) +labs(y ="Average Speed (miles/min)",x ="Distance (miles)",title ="NY Flight Speed by Distance 2023",caption ="The fastest and slowest airlines are plotted for each distance category\nSource: FAA Aircraft registry",color ="Airline Carrier") +scale_colour_discrete(labels =c("HA"="Hawaiian Airlines", "YX"="Republic Airlines", "OO"="Skywest Airlines", "G4"="Allegiant Air LLC","F9"="Frontier Airlines", "NK"="Spirit Airlines", "DL"="Delta Air Lines")) +theme_light() +scale_x_discrete(labels =c("short"="80.0 - 479.0", "medium short"="479.1 - 762.0", "medium long"="762.1 - 1182.0", "long"="1182.1 - 4983.0"))
Warning: Removed 12534 rows containing non-finite outside the scale range
(`stat_boxplot()`).
Warning: Removed 330 rows containing missing values or values outside the scale range
(`geom_point()`).
Reflection
I created side by side boxplots that are separated based on the distance in miles of each flight, each box represents 25% of all of the observations. The y-axis shows the plane’s average speed in miles per minute that I calculated based on air time divided by distance. Each box also has the “best” and “worst” airline carrier for that category plotted as points through geom_jitter. The “best” airlines have the fastest calculated speed while the “worst” have the slowest. If I were to improve this plot I would make the key have no opacity so that it is easier for viewers to see the colors. I want to highlight that Skywest Airlines came as slowest for 2 categories and didn’t perform well in the other two even if that is not shown on the graph. I would also like to point out the general upward trend in speed as flights get longer. That is why I split the boxes into categories because I don’t think it would be fair to compare airlines that do majority short flights to those that do majority long ones. For example Hawaiian Airlines only does long flights.