First we load the csv data into R from Github. We also load the required packages.
library(stringr)
library(RCurl)
## Loading required package: bitops
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.2
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.2.3
##
## Attaching package: 'tidyr'
##
## The following object is masked from 'package:RCurl':
##
## complete
url = "https://raw.githubusercontent.com/cyadusha/arrival-delays/master/arrival%20delays.csv"
x = getURL(url)
arrivaldelays = read.csv(file = textConnection(x), header = TRUE)
Then, we remove the row that has null values using the command below.
arrivaldelays = arrivaldelays[complete.cases(arrivaldelays), ]
We also replace the blank below Alaska with Alaska and the blank below AM West with AM West.
arrivaldelays[2,1] = "Alaska"
arrivaldelays[4,1] = "AM West"
We can also rename two of the columns in the original dataset as follows.
colnames(arrivaldelays)[1] = "Airline"
colnames(arrivaldelays)[2] = "Status"
We now use the gather command from the tidyr package to collect the names of the last 5 columns into one column.
arrivaldelays = arrivaldelays %>%
gather(Destination, Number_of_Arrivals, Los.Angeles:Seattle, na.rm = T)
Because a punctuation mark exists in some of the destinations, we replace each punctuation mark with a space.
arrivaldelays$Destination = str_replace_all(arrivaldelays$Destination, "[[:punct:]]", " ")
Because each of the arrival statuses will soon be the names of the two columns in each of our tables, to avoid spaces, we replace “on time” with one word that would suffice. That word is “punctual”.
arrivaldelays$Status = str_replace_all(arrivaldelays$Status, "on time", "punctual")
Now we use the spread command in the tidyr package to organize the number of arrivals into the two new columns created which are “delayed” and “punctual”.
arrivaldelays = arrivaldelays %>%
spread(Status, Number_of_Arrivals)
arrivaldelays
## Airline Destination delayed punctual
## 1 Alaska Los Angeles 62 497
## 2 Alaska Phoenix 12 221
## 3 Alaska San Diego 20 212
## 4 Alaska San Francisco 102 503
## 5 Alaska Seattle 305 1841
## 6 AM West Los Angeles 117 694
## 7 AM West Phoenix 415 4840
## 8 AM West San Diego 65 383
## 9 AM West San Francisco 129 320
## 10 AM West Seattle 61 201
To facilitate the comparison of the arrival delays for each airline, we add a new column which computes the percentage of punctual flights in each city for each airline.
arrivaldelays$percentage = (arrivaldelays$punctual/(arrivaldelays$delayed + arrivaldelays$punctual))*100
arrivaldelays
## Airline Destination delayed punctual percentage
## 1 Alaska Los Angeles 62 497 88.90877
## 2 Alaska Phoenix 12 221 94.84979
## 3 Alaska San Diego 20 212 91.37931
## 4 Alaska San Francisco 102 503 83.14050
## 5 Alaska Seattle 305 1841 85.78751
## 6 AM West Los Angeles 117 694 85.57337
## 7 AM West Phoenix 415 4840 92.10276
## 8 AM West San Diego 65 383 85.49107
## 9 AM West San Francisco 129 320 71.26949
## 10 AM West Seattle 61 201 76.71756
Now we use the group_by command to compute the mean percentage of punctual arrivals for each flight.
arrivalcomparison = arrivaldelays %>%
group_by(Airline) %>% summarize(percentage = round(mean(percentage), 0))
arrivalcomparison
## Source: local data frame [2 x 2]
##
## Airline percentage
## (fctr) (dbl)
## 1 Alaska 89
## 2 AM West 82
We notice that for each destination, Alaska Airlines has a higher percentage of punctual arrivals as compared to AM West Airlines. But yet, the percentage of punctual arrivals for each city is much greater than 50.
It turns out that each airline has an average of at least 80% of punctual arrivals because the average percentage of punctual arrivals for each airline falls between 80 and 90. However, Alaska Airlines has a better performance because 89% of its flights arrive punctually. On the other hand, for AM West Airlines only 82% of its flights arrive punctually.