The chart above describes arrival delays for two airlines across five destinations. Your task is to:
flights <- read.csv("FlightData.csv", header = T)
dim(flights)
## [1] 5 7
head(flights)
## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 delayed 62 12 20 102 305
## 3 NA NA NA NA NA
## 4 AM WEST on time 694 4840 383 320 210
## 5 delayed 117 415 65 129 61
flights <- flights[c(1,2,4,5), ]
flights
## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 delayed 62 12 20 102 305
## 4 AM WEST on time 694 4840 383 320 210
## 5 delayed 117 415 65 129 61
flights[c(2,4),1] <- flights[c(1,3),1]
flights
## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 ALASKA delayed 62 12 20 102 305
## 4 AM WEST on time 694 4840 383 320 210
## 5 AM WEST delayed 117 415 65 129 61
colnames(flights)[1:2] <- c('airline', 'status')
flights
## airline status Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 ALASKA delayed 62 12 20 102 305
## 4 AM WEST on time 694 4840 383 320 210
## 5 AM WEST delayed 117 415 65 129 61
tidyflights <- gather(flights, key = 'destination', value = 'n', 3:7)
tidyflights
## airline status destination n
## 1 ALASKA on time Los.Angeles 497
## 2 ALASKA delayed Los.Angeles 62
## 3 AM WEST on time Los.Angeles 694
## 4 AM WEST delayed Los.Angeles 117
## 5 ALASKA on time Phoenix 221
## 6 ALASKA delayed Phoenix 12
## 7 AM WEST on time Phoenix 4840
## 8 AM WEST delayed Phoenix 415
## 9 ALASKA on time San.Diego 212
## 10 ALASKA delayed San.Diego 20
## 11 AM WEST on time San.Diego 383
## 12 AM WEST delayed San.Diego 65
## 13 ALASKA on time San.Francisco 503
## 14 ALASKA delayed San.Francisco 102
## 15 AM WEST on time San.Francisco 320
## 16 AM WEST delayed San.Francisco 129
## 17 ALASKA on time Seattle 1841
## 18 ALASKA delayed Seattle 305
## 19 AM WEST on time Seattle 210
## 20 AM WEST delayed Seattle 61
tidyflights <- spread(tidyflights, status, n)
tidyflights
## airline destination delayed on time
## 1 ALASKA Los.Angeles 62 497
## 2 ALASKA Phoenix 12 221
## 3 ALASKA San.Diego 20 212
## 4 ALASKA San.Francisco 102 503
## 5 ALASKA Seattle 305 1841
## 6 AM WEST Los.Angeles 117 694
## 7 AM WEST Phoenix 415 4840
## 8 AM WEST San.Diego 65 383
## 9 AM WEST San.Francisco 129 320
## 10 AM WEST Seattle 61 210
tidyflights <- tidyflights %>%
mutate(total = delayed + `on time`,
percent_on_time = round(`on time`/total*100, 2),
percent_delayed = round(delayed/total*100, 2))
datatable(tidyflights)
compareAll <- tidyflights %>%
group_by(airline) %>%
summarize(OnTime = sum(`on time`), Delayed = sum(delayed),
PercentOnTime = round(OnTime/sum(OnTime,Delayed),2))%>%
select(airline, PercentOnTime)
compareAll %>% arrange(desc(PercentOnTime))
## # A tibble: 2 x 2
## airline PercentOnTime
## <fct> <dbl>
## 1 AM WEST 0.890
## 2 ALASKA 0.870
This is not what I expected because Alaska airlines has 4 out of the top five spots for percentage of on time flights! AM West has a better overall average because they have a very large number of flights to Phoenix with a high on time percentage to that destination.
minusPhoenix <- tidyflights %>%
filter(!destination %in% c('Phoenix')) %>%
group_by(airline) %>%
summarize(OnTime = sum(`on time`), Delayed = sum(delayed),
PercentOnTime = round(OnTime/sum(OnTime,Delayed),2)) %>%
select(airline, PercentOnTime)
minusPhoenix %>% arrange(desc(PercentOnTime))
## # A tibble: 2 x 2
## airline PercentOnTime
## <fct> <dbl>
## 1 ALASKA 0.860
## 2 AM WEST 0.810
When you filter out Phoenix you can see that now Alaska Airlines has a better average on time percentage.
So if you are flying to Phoenix take AM West! But if you are flying to any of the other destinations you have a better chance of being on time with Alaska airlines.
Alaska <- tidyflights %>% filter(airline %in% c('ALASKA')) %>%
select(airline, destination, percent_on_time)
Alaska %>% arrange(desc(percent_on_time))
## airline destination percent_on_time
## 1 ALASKA Phoenix 94.85
## 2 ALASKA San.Diego 91.38
## 3 ALASKA Los.Angeles 88.91
## 4 ALASKA Seattle 85.79
## 5 ALASKA San.Francisco 83.14
AMWest <- tidyflights %>% filter(airline %in% c('AM WEST')) %>%
select(airline, destination, percent_on_time)
AMWest %>% arrange(desc(percent_on_time))
## airline destination percent_on_time
## 1 AM WEST Phoenix 92.10
## 2 AM WEST Los.Angeles 85.57
## 3 AM WEST San.Diego 85.49
## 4 AM WEST Seattle 77.49
## 5 AM WEST San.Francisco 71.27