The table is stored in a csv on my github accounts. Using “read_csv” to read data into a dataframe.
It will be difficult to extract the flight counts when their stored in different columns. Here I use the gather function to narrow the data set and store all of the flight counts in one column.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
flights_narrow <- flights %>%
gather(City, Flights, 3:7)
head(flights_narrow)
## Airline Status City Flights
## 1 ALASKA on time Los.Angeles 497
## 2 ALASKA delayed Los.Angeles 62
## 3 AM WEST on time Los.Angeles 694
## 4 AM WEST delayed Los.Angeles 117
## 5 ALASKA on time Pheonix 221
## 6 ALASKA delayed Pheonix 12
To discover which airline is more punctional we use a couple functions from the dplyr package. Namely mutate, to single out on time flights, and summarize to calculate the on time rate.
flights_narrow %>%
mutate(on_time = ifelse(Status=='on time', Flights, 0)) %>%
group_by(Airline) %>%
summarise(
on_time_rate = sum(on_time) / sum(Flights)
)
## # A tibble: 2 x 2
## Airline on_time_rate
## * <chr> <dbl>
## 1 ALASKA 0.867
## 2 AM WEST 0.891
Here we can see AM WEST is slightly better than Alaska w/ being on time. Let’s investigate further and see if there are any stand out cities.
flights_narrow %>%
mutate(on_time = ifelse(Status=='on time', Flights, 0)) %>%
group_by(City) %>%
summarise(
on_time_rate = sum(on_time) / sum(Flights)
)
## # A tibble: 5 x 2
## City on_time_rate
## * <chr> <dbl>
## 1 Los.Angeles 0.869
## 2 Pheonix 0.922
## 3 San.Deigo 0.875
## 4 San.Francisco 0.781
## 5 Seattle 0.848
Interesting. San Francisco performs the worst in terms of on time flights.
flights_narrow %>%
mutate(sf_flights = ifelse(City=='San.Francisco', Flights, 0)) %>%
group_by(Airline) %>%
summarise(
sf_flights = sum(sf_flights) / sum(Flights)
)
## # A tibble: 2 x 2
## Airline sf_flights
## * <chr> <dbl>
## 1 ALASKA 0.160
## 2 AM WEST 0.0621
A larger share of Alaska’s flights are flying into SF. This could be the reason it has a lower proportion of flights being delayed. Not because it’s an inferior Airline.