First, to make our lives easy, we will use three packages from the tidyverse: readr, tidyr, and dplyr. Then, we will read the data in with read_csv and fill in the null spots with the correct data using fill().

require(readr)
## Loading required package: readr
require(tidyr)
## Loading required package: tidyr
require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data <- read_csv('flights.csv')
## Warning: Missing column names filled in: 'X1' [1], 'X2' [2]
## Parsed with column specification:
## cols(
##   X1 = col_character(),
##   X2 = col_character(),
##   `Los Angeles` = col_integer(),
##   Phoenix = col_integer(),
##   `San Diego` = col_integer(),
##   `San Francisco` = col_integer(),
##   Seattle = col_integer()
## )
data <- fill(data, X1)
data
## # A tibble: 4 x 7
##   X1     X2      `Los Angeles` Phoenix `San Diego` `San Francisco` Seattle
##   <chr>  <chr>           <int>   <int>       <int>           <int>   <int>
## 1 ALASKA ontime            497     221         212             503    1841
## 2 ALASKA delayed            62      12          20             201     305
## 3 AMWEST ontime            694    4840         383             320     201
## 4 AMWEST delayed           117     415          65             129      61

Next we crea data frames for each airline and give the columns names.

alaska <- data.frame(t(filter(data, X1 == "ALASKA")[3:7]))
amwest <- data.frame(t(filter(data, X1 == "AMWEST")[3:7]))
colnames(alaska) <- c( 'Ontime', 'Delayed')
colnames(amwest) <- c( 'Ontime', 'Delayed')

Then, we add a column that shows the delay rate for each city and airline.

alaska <- mutate(alaska, Rates = Delayed/(Ontime+Delayed), Totals = Ontime+Delayed)
amwest <- mutate(amwest, Rates = Delayed/(Ontime+Delayed), Totals = Ontime+Delayed)
alaska
##   Ontime Delayed      Rates Totals
## 1    497      62 0.11091234    559
## 2    221      12 0.05150215    233
## 3    212      20 0.08620690    232
## 4    503     201 0.28551136    704
## 5   1841     305 0.14212488   2146

Next, we bind the rates columns from each airline and give the new data frame column names.

rates <- data.frame(cbind(alaska$Rates, amwest$Rates))
colnames(rates) <- c("Alaska", "Amwest")
mean(rates$Alaska)
## [1] 0.1352515
mean(rates$Amwest)
## [1] 0.1776915
sd(rates$Alaska)
## [1] 0.09032856
sd(rates$Amwest)
## [1] 0.08212854
median(rates$Alaska)
## [1] 0.1109123
median(rates$Amwest)
## [1] 0.1450893

This shows us that at the average city, Amwest is more likely to be delayed than Alaska but Alaska has more variance between cities. However, we can see that Alaska had an average of .15 delays and Amwest had .10 when we look at the aggregate data.

sum(alaska$Delayed)/sum(alaska$Totals)
## [1] 0.1548787
sum(amwest$Delayed)/sum(amwest$Totals)
## [1] 0.1089273

This makes sense when we see that both sets of data are skewed.

summary(rates)
##      Alaska            Amwest       
##  Min.   :0.05150   Min.   :0.07897  
##  1st Qu.:0.08621   1st Qu.:0.14427  
##  Median :0.11091   Median :0.14509  
##  Mean   :0.13525   Mean   :0.17769  
##  3rd Qu.:0.14212   3rd Qu.:0.23282  
##  Max.   :0.28551   Max.   :0.28731