library(tidyr)
library(dplyr)
After loading tidyr and dplyr, I read in the CSV file as a data frame called “delays”.
delays <- read.csv('https://raw.githubusercontent.com/chrosemo/data607_fall19_week5/master/assignment5.csv', header=TRUE, sep=',')
delays
## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 delayed 62 12 20 102 305
## 3 NA NA NA NA NA
## 4 AM WEST on time 694 4840 383 320 201
## 5 delayed 117 415 65 129 61
Next, I removed the blank third row and inserted airline names into the empty cells in the first column.
delays <- delays[-3,]
delays[2,1] <- 'ALASKA'
delays[4,1] <- 'AM WEST'
delays
## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 ALASKA delayed 62 12 20 102 305
## 4 AM WEST on time 694 4840 383 320 201
## 5 AM WEST delayed 117 415 65 129 61
I then renamed the first (“airline”) and second columns (“status”) and reshaped the data frame from wide to long format using gather(). The columns containing delay count values were gathered in the “destination” column, and the delay count values themselves were gathered in the “delay_count” column.
delays <- delays %>% rename('Airline' = 'X', 'Status' = 'X.1')
delays <- delays %>% gather(Destination, Flight_Count, -Airline, -Status)
delays
## Airline Status Destination Flight_Count
## 1 ALASKA on time Los.Angeles 497
## 2 ALASKA delayed Los.Angeles 62
## 3 AM WEST on time Los.Angeles 694
## 4 AM WEST delayed Los.Angeles 117
## 5 ALASKA on time Phoenix 221
## 6 ALASKA delayed Phoenix 12
## 7 AM WEST on time Phoenix 4840
## 8 AM WEST delayed Phoenix 415
## 9 ALASKA on time San.Diego 212
## 10 ALASKA delayed San.Diego 20
## 11 AM WEST on time San.Diego 383
## 12 AM WEST delayed San.Diego 65
## 13 ALASKA on time San.Francisco 503
## 14 ALASKA delayed San.Francisco 102
## 15 AM WEST on time San.Francisco 320
## 16 AM WEST delayed San.Francisco 129
## 17 ALASKA on time Seattle 1841
## 18 ALASKA delayed Seattle 305
## 19 AM WEST on time Seattle 201
## 20 AM WEST delayed Seattle 61
I then renamed the values in the “airline” and “destination” columns.
delays <- delays %>% mutate(Airline=recode(Airline,
'ALASKA' = 'Alaska Airlines',
'AM WEST' = 'America West Airlines'),
Destination=recode(Destination,
'Los.Angeles' = 'Los Angeles',
'San.Diego' = 'San Diego',
'San.Francisco' = 'San Francisco')
)
I began my analysis by totaling the number of flights by airline, status, and destination. There were 11,000 flights total, and America West Airlines had nearly twice as many flights (7,225 flights, or approximately 65.7% of the total) as Alaska Airlines had (3,775 flights, or approximately 34.3% of the total).
delays %>% summarize(Total = sum(Flight_Count))
## Total
## 1 11000
delays %>% group_by(Airline) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 2 x 3
## Airline Total `Relative Frequency`
## <fct> <int> <dbl>
## 1 Alaska Airlines 3775 0.343
## 2 America West Airlines 7225 0.657
Regarding destination, Phoenix had 5,488 flights, or approximately 49.9% of all flights; Seattle had 1,054 flights, or approximately 21.9%; Los Angeles had 1,370 flights, or approximately 12.5%; San Francisco had 1,054 flights, or approximately 9.5%; and San Diego had 680 flights, or approximately 6.2%.
delays %>% group_by(Destination) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 5 x 3
## Destination Total `Relative Frequency`
## <chr> <int> <dbl>
## 1 Los Angeles 1370 0.125
## 2 Phoenix 5488 0.499
## 3 San Diego 680 0.0618
## 4 San Francisco 1054 0.0958
## 5 Seattle 2408 0.219
For Alaska Airlines, Seattle (n = 2,146) had approximately 56.8% of all flights, followed by San Francisco (n = 605) with approximately 16%, Los Angeles (n = 559) with approximately 14.8%, and Phoenix (n = 233) and San Diego (n = 232) with approximately 6.2% each.
For America West Airlines, Phoenix (n = 5,255) had approximately 72.7% of all flights. Next were Los Angeles (n = 811) with approximately 11.2%, San Francisco (n = 449) and San Diego (n = 448) with approximately 6.2% each, and Seattle (n = 262) with approximately 3.6%.
delays %>% group_by(Airline, Destination) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 10 x 4
## # Groups: Airline [2]
## Airline Destination Total `Relative Frequency`
## <fct> <chr> <int> <dbl>
## 1 Alaska Airlines Los Angeles 559 0.148
## 2 Alaska Airlines Phoenix 233 0.0617
## 3 Alaska Airlines San Diego 232 0.0615
## 4 Alaska Airlines San Francisco 605 0.160
## 5 Alaska Airlines Seattle 2146 0.568
## 6 America West Airlines Los Angeles 811 0.112
## 7 America West Airlines Phoenix 5255 0.727
## 8 America West Airlines San Diego 448 0.0620
## 9 America West Airlines San Francisco 449 0.0621
## 10 America West Airlines Seattle 262 0.0363
Regarding arrival status, across all destinations, 1,288 flights were delayed, and 9,712 flights arrived on time. Roughly 12 in every 100 flights were delayed.
delays %>% group_by(Status) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 2 x 3
## Status Total `Relative Frequency`
## <fct> <int> <dbl>
## 1 delayed 1288 0.117
## 2 on time 9712 0.883
Across all destinations, Alaska Airlines had 501 flights delayed (approximately 13.3% of its flights) and 3,274 flights arriving on time (approximately 86.7% of its flights); the mean numbers of flights delayed and arriving on time per destination were 100.2 and 654.8 flights, respectively.
For America West Airlines, 787 flights were delayed (approximately 10.9% of its flights), and 6,438 flights arrived on time (approximately 89.1% of its flights); the mean numbers of flights delayed and arriving on time per destination were 157.4 and 1,287.6 flights, respectively.
delays %>% group_by(Airline, Status) %>% summarize(Total = sum(Flight_Count), Mean = mean(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 4 x 5
## # Groups: Airline [2]
## Airline Status Total Mean `Relative Frequency`
## <fct> <fct> <int> <dbl> <dbl>
## 1 Alaska Airlines delayed 501 100. 0.133
## 2 Alaska Airlines on time 3274 655. 0.867
## 3 America West Airlines delayed 787 157. 0.109
## 4 America West Airlines on time 6438 1288. 0.891
Considering delayed flights, flights out of Phoenix (n = 427) represented approximately 33.2% of all flights delayed. Next were Seattle (n = 366) at approximately 28.4%; San Francisco (n = 231) at approximately 17.9%; Los Angeles (n = 179) at approximately 13.9%; and San Diego (n = 85) at approximately 6.6%.
For flights arriving on time, Phoenix had 5,061 flights representing approximately 52.1% of all flights arriving on time. The next highest, in order, were Seattle (n = 2,042) at approximately 21%; Los Angeles (n = 1,191) at approximately 12.3%; San Francisco (n = 823) at approximately 8.5%; and San Diego (n = 595) at approximately 6.1%.
delays %>% group_by(Status, Destination) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 10 x 4
## # Groups: Status [2]
## Status Destination Total `Relative Frequency`
## <fct> <chr> <int> <dbl>
## 1 delayed Los Angeles 179 0.139
## 2 delayed Phoenix 427 0.332
## 3 delayed San Diego 85 0.0660
## 4 delayed San Francisco 231 0.179
## 5 delayed Seattle 366 0.284
## 6 on time Los Angeles 1191 0.123
## 7 on time Phoenix 5061 0.521
## 8 on time San Diego 595 0.0613
## 9 on time San Francisco 823 0.0847
## 10 on time Seattle 2042 0.210
Considering arrival status for each destination, approximately 21.9% (n = 231) of flights out of San Francisco were delayed. In Seattle, approximately 15.2% (n = 366) of flights were delayed. Approximately 13.1% (n = 179) of flights out of Los Angeles were delayed. Approximately 12.5% (n = 85) of flights out of San Diego were delayed. And only approximately 7.8% (n = 427) of flights out of Phoenix were delayed.
delays %>% group_by(Destination, Status) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 10 x 4
## # Groups: Destination [5]
## Destination Status Total `Relative Frequency`
## <chr> <fct> <int> <dbl>
## 1 Los Angeles delayed 179 0.131
## 2 Los Angeles on time 1191 0.869
## 3 Phoenix delayed 427 0.0778
## 4 Phoenix on time 5061 0.922
## 5 San Diego delayed 85 0.125
## 6 San Diego on time 595 0.875
## 7 San Francisco delayed 231 0.219
## 8 San Francisco on time 823 0.781
## 9 Seattle delayed 366 0.152
## 10 Seattle on time 2042 0.848
Numbers and proportions of flights delayed varied by airline and destination. Overall, the airline hubs (Seattle for Alaska Airlines and Phoenix for America West Airlines) had the most flights as well as the most delayed flights. However, San Francisco had the highest proportion of flights delayed, at approximately 21.9%. That proportion compared with 11.7% of flights delayed across all airports. Phoenix, despite having the most total arrivals (n = 5,488), had the lowest proportion of flights delayed, with approximately 7.7%.