library(tidyverse)
library(nycflights13)
head(flights)
## # A tibble: 6 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
data("flights")
I want to look at a specific airport and determine which of the top 3 airlines is the best. I’ll measure this by examining delay times.
unique(flights$carrier)
## [1] "UA" "AA" "B6" "DL" "EV" "MQ" "US" "WN" "VX" "FL" "AS" "9E" "F9" "HA" "YV"
## [16] "OO"
I’ve been to JFK, so I’ll choose that one. As for how I determined my top 3 airlines Here is my source: Top 10 Largest Airlines in the United States by Capacity
Now that I know that United, American, and Delta airlines are the top 3 in the year 2023. I want to look at their performance in the year 2013 to see if there are any significant time delays.
JFK <- flights %>%
filter(origin == "JFK", carrier %in% c("AA", "DL", "UA"))
JFK
## # A tibble: 39,018 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 542 540 2 923 850
## 2 2013 1 1 558 600 -2 924 917
## 3 2013 1 1 606 610 -4 837 845
## 4 2013 1 1 611 600 11 945 931
## 5 2013 1 1 628 630 -2 1137 1140
## 6 2013 1 1 655 655 0 1021 1030
## 7 2013 1 1 655 700 -5 1037 1045
## 8 2013 1 1 656 659 -3 949 959
## 9 2013 1 1 712 715 -3 1023 1035
## 10 2013 1 1 743 730 13 1107 1100
## # ℹ 39,008 more rows
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
dep_delays <- JFK %>%
arrange(desc(dep_delay))
head(dep_delays)
## # A tibble: 6 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 9 20 1139 1845 1014 1457 2210
## 2 2013 4 10 1100 1900 960 1342 2211
## 3 2013 6 27 959 1900 899 1236 2226
## 4 2013 5 19 713 1700 853 1007 1955
## 5 2013 12 14 830 1845 825 1210 2154
## 6 2013 3 18 1020 2100 800 1336 32
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
names(flights)
## [1] "year" "month" "day" "dep_time"
## [5] "sched_dep_time" "dep_delay" "arr_time" "sched_arr_time"
## [9] "arr_delay" "carrier" "flight" "tailnum"
## [13] "origin" "dest" "air_time" "distance"
## [17] "hour" "minute" "time_hour"
delays <- dep_delays %>%
mutate(month = factor(month, levels = 1:12, labels = month.abb))
flight_plot <- dep_delays %>%
ggplot(aes(x = month, y = dep_delay, color = carrier)) +
geom_point(alpha = 0.5) +
scale_x_discrete(labels = month.abb) +
scale_x_continuous(breaks = 1:12, labels = month.abb) +
scale_color_discrete(name = "Airlines", labels = c('American Airlines', 'Delta Airlines', 'United Airlines')) +
xlab("Months") +
ylab("Delay Times") +
ggtitle("JFK 2013 Airlines Delays")
flight_plot
This visualization describes the delay times for American, Delta, and United Airlines in 2013. It gives us a breakdown per month of how many delays there were for each airline. After analyzing the data, I would like to highlight the fact that Delta Airlines appears to have much more delay times(spread). In fact, it actually has a lot more outliers, meaning that they have the greatest amount of waiting time. It also shows that there are much more green dots bunch up together. Which shows the multiple amount of delays across the year. Particularly during the summer time. A very interesting note is that American Airlines actually had the highest delay time in September that equals almost 17 hours. Another note is that United Airlines did not have any extreme delay times or spread in that year. In fact, during the months of March, April, July, and August, the data shows that United Airlines left earlier than their scheduled times.