Introduction

This assignment begins with a chart that describes arrival delays for two airlines across five destinations. The objective is to: 1. Create a .CSV file that includes all of the information above. You’re encouraged to use a “wide” structure similar to how the information appears above, so that you can practice tidying and transformations as described below. 2. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. 3. Perform analysis to compare the arrival delays for the two airlines.

PREPARE

Load required packages

library(RCurl)
## Loading required package: bitops
library(knitr)
library(stringr)
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
## 
##     complete
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(DT)

Read in CSV file from GitHub

GitHubFile <- getURL("https://raw.githubusercontent.com/jillenergy/Week6/master/ArrivalTimes.csv")
flights_raw <- data.frame(read.csv(text = GitHubFile, header = T, check.names=FALSE))

Check the raw table imported

flights_raw
##   Airline  Status Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  ALASKA on time         497     221       212           503    1841
## 2  ALASKA delayed          62      12        20           102     305
## 3 AM WEST on time         694    4840       383           320     201
## 4 AM WEST delayed         117     415        65           129      61

Replace period with space in the column headers

names(flights_raw) <- gsub("\\.", " ", names(flights_raw)) 
flights_raw
##   Airline  Status Los Angeles Phoenix San Diego San Francisco Seattle
## 1  ALASKA on time         497     221       212           503    1841
## 2  ALASKA delayed          62      12        20           102     305
## 3 AM WEST on time         694    4840       383           320     201
## 4 AM WEST delayed         117     415        65           129      61

TIDY

Use the Gather function to transform the table into a more usable format

flights_tidy <- gather(flights_raw, "City", "TotalFlights", "Los Angeles":"Seattle") %>% 
  arrange(Airline)
flights_tidy
##    Airline  Status          City TotalFlights
## 1   ALASKA on time   Los Angeles          497
## 2   ALASKA delayed   Los Angeles           62
## 3   ALASKA on time       Phoenix          221
## 4   ALASKA delayed       Phoenix           12
## 5   ALASKA on time     San Diego          212
## 6   ALASKA delayed     San Diego           20
## 7   ALASKA on time San Francisco          503
## 8   ALASKA delayed San Francisco          102
## 9   ALASKA on time       Seattle         1841
## 10  ALASKA delayed       Seattle          305
## 11 AM WEST on time   Los Angeles          694
## 12 AM WEST delayed   Los Angeles          117
## 13 AM WEST on time       Phoenix         4840
## 14 AM WEST delayed       Phoenix          415
## 15 AM WEST on time     San Diego          383
## 16 AM WEST delayed     San Diego           65
## 17 AM WEST on time San Francisco          320
## 18 AM WEST delayed San Francisco          129
## 19 AM WEST on time       Seattle          201
## 20 AM WEST delayed       Seattle           61

RESHAPE

Reshape the data looking at the number of flights that were on time or delayed to set-up for analysis

flights_shaped <- flights_tidy %>% 
  spread("Status", "TotalFlights",fill=NA)
colnames(flights_shaped) <- c("Airline", "City", "Delayed", "OnTime")
flights_shaped
##    Airline          City Delayed OnTime
## 1   ALASKA   Los Angeles      62    497
## 2   ALASKA       Phoenix      12    221
## 3   ALASKA     San Diego      20    212
## 4   ALASKA San Francisco     102    503
## 5   ALASKA       Seattle     305   1841
## 6  AM WEST   Los Angeles     117    694
## 7  AM WEST       Phoenix     415   4840
## 8  AM WEST     San Diego      65    383
## 9  AM WEST San Francisco     129    320
## 10 AM WEST       Seattle      61    201

CALCULATE

Add a column for Total Flights by Airline and City

flights_shaped_tot <- flights_shaped %>% 
  mutate(TotalFlights = Delayed + OnTime)
flights_shaped_tot
##    Airline          City Delayed OnTime TotalFlights
## 1   ALASKA   Los Angeles      62    497          559
## 2   ALASKA       Phoenix      12    221          233
## 3   ALASKA     San Diego      20    212          232
## 4   ALASKA San Francisco     102    503          605
## 5   ALASKA       Seattle     305   1841         2146
## 6  AM WEST   Los Angeles     117    694          811
## 7  AM WEST       Phoenix     415   4840         5255
## 8  AM WEST     San Diego      65    383          448
## 9  AM WEST San Francisco     129    320          449
## 10 AM WEST       Seattle      61    201          262

Add a column for Percentage of On Time

flights_shaped_perc <- flights_shaped_tot %>% 
  mutate(PercentOnTime = OnTime / TotalFlights)
flights_shaped_perc
##    Airline          City Delayed OnTime TotalFlights PercentOnTime
## 1   ALASKA   Los Angeles      62    497          559     0.8890877
## 2   ALASKA       Phoenix      12    221          233     0.9484979
## 3   ALASKA     San Diego      20    212          232     0.9137931
## 4   ALASKA San Francisco     102    503          605     0.8314050
## 5   ALASKA       Seattle     305   1841         2146     0.8578751
## 6  AM WEST   Los Angeles     117    694          811     0.8557337
## 7  AM WEST       Phoenix     415   4840         5255     0.9210276
## 8  AM WEST     San Diego      65    383          448     0.8549107
## 9  AM WEST San Francisco     129    320          449     0.7126949
## 10 AM WEST       Seattle      61    201          262     0.7671756

DISPLAY

Round to shorten the Percentage on Time calculation two two digits and format the final data table

flights_shaped_perc[,'PercentOnTime']=format(round(flights_shaped_perc[,'PercentOnTime'],2),nsmall=2)
datatable(flights_shaped_perc)

PLOT

Plot the percentage of flights on time for each city comparing airlines

flights_shaped_perc %>%  
  ggplot( aes(x=City, y=PercentOnTime, fill=Airline)) +
  geom_bar(stat="identity", position=position_dodge()) +
  geom_text(aes(label=PercentOnTime), vjust=.5, hjust=1,position= position_dodge(width=0.9)) +
  ggtitle("Percentage of Flights On Time by Airline by City") +
  xlab("City") + ylab("Percentage of Flights On Time") +
  coord_flip() 

DISCUSS

The graph above visually illustrates the performance difference between airlines and cities, making it easier to draw conclusions about current state and areas for improvement than reading the data in a table. Conclusions include: 1. Phoenix has the best on time performance of all of the cities regardless of airlines. 2. San Francisco has the worst on time performance of all of the cities regardless of airlines. 3. Alaska Airlines performs better in every city compared to America West.