This assignment begins with a chart that describes arrival delays for two airlines across five destinations. The objective is to: 1. Create a .CSV file that includes all of the information above. You’re encouraged to use a “wide” structure similar to how the information appears above, so that you can practice tidying and transformations as described below. 2. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. 3. Perform analysis to compare the arrival delays for the two airlines.
library(RCurl)
## Loading required package: bitops
library(knitr)
library(stringr)
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
##
## complete
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(DT)
GitHubFile <- getURL("https://raw.githubusercontent.com/jillenergy/Week6/master/ArrivalTimes.csv")
flights_raw <- data.frame(read.csv(text = GitHubFile, header = T, check.names=FALSE))
flights_raw
## Airline Status Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 ALASKA delayed 62 12 20 102 305
## 3 AM WEST on time 694 4840 383 320 201
## 4 AM WEST delayed 117 415 65 129 61
names(flights_raw) <- gsub("\\.", " ", names(flights_raw))
flights_raw
## Airline Status Los Angeles Phoenix San Diego San Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 ALASKA delayed 62 12 20 102 305
## 3 AM WEST on time 694 4840 383 320 201
## 4 AM WEST delayed 117 415 65 129 61
flights_tidy <- gather(flights_raw, "City", "TotalFlights", "Los Angeles":"Seattle") %>%
arrange(Airline)
flights_tidy
## Airline Status City TotalFlights
## 1 ALASKA on time Los Angeles 497
## 2 ALASKA delayed Los Angeles 62
## 3 ALASKA on time Phoenix 221
## 4 ALASKA delayed Phoenix 12
## 5 ALASKA on time San Diego 212
## 6 ALASKA delayed San Diego 20
## 7 ALASKA on time San Francisco 503
## 8 ALASKA delayed San Francisco 102
## 9 ALASKA on time Seattle 1841
## 10 ALASKA delayed Seattle 305
## 11 AM WEST on time Los Angeles 694
## 12 AM WEST delayed Los Angeles 117
## 13 AM WEST on time Phoenix 4840
## 14 AM WEST delayed Phoenix 415
## 15 AM WEST on time San Diego 383
## 16 AM WEST delayed San Diego 65
## 17 AM WEST on time San Francisco 320
## 18 AM WEST delayed San Francisco 129
## 19 AM WEST on time Seattle 201
## 20 AM WEST delayed Seattle 61
flights_shaped <- flights_tidy %>%
spread("Status", "TotalFlights",fill=NA)
colnames(flights_shaped) <- c("Airline", "City", "Delayed", "OnTime")
flights_shaped
## Airline City Delayed OnTime
## 1 ALASKA Los Angeles 62 497
## 2 ALASKA Phoenix 12 221
## 3 ALASKA San Diego 20 212
## 4 ALASKA San Francisco 102 503
## 5 ALASKA Seattle 305 1841
## 6 AM WEST Los Angeles 117 694
## 7 AM WEST Phoenix 415 4840
## 8 AM WEST San Diego 65 383
## 9 AM WEST San Francisco 129 320
## 10 AM WEST Seattle 61 201
flights_shaped_tot <- flights_shaped %>%
mutate(TotalFlights = Delayed + OnTime)
flights_shaped_tot
## Airline City Delayed OnTime TotalFlights
## 1 ALASKA Los Angeles 62 497 559
## 2 ALASKA Phoenix 12 221 233
## 3 ALASKA San Diego 20 212 232
## 4 ALASKA San Francisco 102 503 605
## 5 ALASKA Seattle 305 1841 2146
## 6 AM WEST Los Angeles 117 694 811
## 7 AM WEST Phoenix 415 4840 5255
## 8 AM WEST San Diego 65 383 448
## 9 AM WEST San Francisco 129 320 449
## 10 AM WEST Seattle 61 201 262
flights_shaped_perc <- flights_shaped_tot %>%
mutate(PercentOnTime = OnTime / TotalFlights)
flights_shaped_perc
## Airline City Delayed OnTime TotalFlights PercentOnTime
## 1 ALASKA Los Angeles 62 497 559 0.8890877
## 2 ALASKA Phoenix 12 221 233 0.9484979
## 3 ALASKA San Diego 20 212 232 0.9137931
## 4 ALASKA San Francisco 102 503 605 0.8314050
## 5 ALASKA Seattle 305 1841 2146 0.8578751
## 6 AM WEST Los Angeles 117 694 811 0.8557337
## 7 AM WEST Phoenix 415 4840 5255 0.9210276
## 8 AM WEST San Diego 65 383 448 0.8549107
## 9 AM WEST San Francisco 129 320 449 0.7126949
## 10 AM WEST Seattle 61 201 262 0.7671756
flights_shaped_perc[,'PercentOnTime']=format(round(flights_shaped_perc[,'PercentOnTime'],2),nsmall=2)
datatable(flights_shaped_perc)
flights_shaped_perc %>%
ggplot( aes(x=City, y=PercentOnTime, fill=Airline)) +
geom_bar(stat="identity", position=position_dodge()) +
geom_text(aes(label=PercentOnTime), vjust=.5, hjust=1,position= position_dodge(width=0.9)) +
ggtitle("Percentage of Flights On Time by Airline by City") +
xlab("City") + ylab("Percentage of Flights On Time") +
coord_flip()
The graph above visually illustrates the performance difference between airlines and cities, making it easier to draw conclusions about current state and areas for improvement than reading the data in a table. Conclusions include: 1. Phoenix has the best on time performance of all of the cities regardless of airlines. 2. San Francisco has the worst on time performance of all of the cities regardless of airlines. 3. Alaska Airlines performs better in every city compared to America West.