The data contained in the table above are replicated in a .csv file.

Reading the Data

The data is read into a data frame, using the check.names = FALSE option to avoid inserting unnecessary periods in cities with spaces:

airlines <- read.table('./airlines.csv', sep = ",", check.names = FALSE, header = TRUE, stringsAsFactors = FALSE)

Cleaning the Data

The first two column names are filled in, and the blank entries in the airline column are filled in:

names(airlines) <- c('Airline', 'Status', names(airlines)[3:7])
for (i in 1:nrow(airlines)) {
  if (airlines[i, 1] == "") {
    airlines[i, 1] = airlines[i - 1, 1]
  }
}

Tidying the Data

To tidy the data, the frequencies are gathered by destination using tidyr. The blank row is filtered out using dplyr:

airlines <- airlines %>%
  gather(Destination, Frequency, -Airline, -Status) %>%
  filter(Frequency != "")

Flight statuses are renamed for consistency, and then spread out in order to allow for analysis:

airlines$Status[airlines$Status == "on time"] <- "OnTime"
airlines$Status[airlines$Status == "delayed"] <- "Delayed"
airlines <- airlines %>% 
  spread(Status, Frequency)
kable(airlines)
Airline Destination Delayed OnTime
ALASKA Los Angeles 62 497
ALASKA Phoenix 12 221
ALASKA San Diego 20 212
ALASKA San Francisco 102 503
ALASKA Seattle 305 1841
AM WEST Los Angeles 117 694
AM WEST Phoenix 415 4840
AM WEST San Diego 65 383
AM WEST San Francisco 129 320
AM WEST Seattle 61 201

Analyzing the Data

In order to analyze the two airlines’ performance, the mutate function is used to calculate the total number of flights and use that to calculate the percentage of flights that arrive on time:

airlines <- airlines %>% 
  mutate(
    Total = Delayed + OnTime,
    OnTimeRate = OnTime / Total) %>% 
  select(-Delayed, -OnTime)
kable(airlines)
Airline Destination Total OnTimeRate
ALASKA Los Angeles 559 0.8890877
ALASKA Phoenix 233 0.9484979
ALASKA San Diego 232 0.9137931
ALASKA San Francisco 605 0.8314050
ALASKA Seattle 2146 0.8578751
AM WEST Los Angeles 811 0.8557337
AM WEST Phoenix 5255 0.9210276
AM WEST San Diego 448 0.8549107
AM WEST San Francisco 449 0.7126949
AM WEST Seattle 262 0.7671756

The airlines’ on-time performance at each of the five airports is visualized using ggplot2:

From this chart, it is clear that Alaska has better on-time performance at each of the five destinations. To investigate if there is a relationship between on-time performance and volume of flights to a destination:

From this plot, there does not seem to be much relationship between flight volume and on-time performance — the overall correlation between the two variables is only 0.302.