To create the CSV I manually entered the information contained in the image into a google sheets file and downloaded it as a .csv. I start by reading in the .csv file
airline2<-read.csv("CSV for HW 5 - Sheet1.csv", stringsAsFactors = FALSE)
airline2[2,1]<-airline2[1,1]#inserting duplicate values so that I can use gather without a problem
airline2[5,1]<-airline2[4,1]
airline2
## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 Alaska on time 497 221 212 503 1841
## 2 Alaska delayed 62 12 20 102 305
## 3 NA NA NA NA NA
## 4 AM West on time 694 4840 383 383 201
## 5 AM West delayed 117 415 65 129 61
In order to get this into a tidy format, I need to first melt (or gather) the data.frame and then spread one of the colums. Gather nicely takes care of row 3 with the blanks and NAs if I use the na.rm option. I also took the time to fix the name of two columns, one that had X as the tile and the other that had a space in it, making it harder to work with.
AirPartTidy<-gather(airline2,city,number_of_flights,c(3:7), na.rm = TRUE) #melting it all together
AirTidy<-spread(AirPartTidy,X.1,number_of_flights) #spreading out the on time/delayed part into two columns for easy use
names(AirTidy)[1]<-"Airline" #fixing column names
names(AirTidy)[4]<-"On_Time"
For the analysis, I first created two vectors, one tracking the percentage of flights that were on time, the other tracking delays, both by location and airline. I then attached them to the AirTidy data.frame.
On_Time_Percentage<-(AirTidy$On_Time/(AirTidy$delayed+AirTidy$On_Time))
Delay_Percentage<-1-On_Time_Percentage
AirTidy<-cbind(AirTidy,On_Time_Percentage)
AirTidy<-cbind(AirTidy,Delay_Percentage)
AirTidy
## Airline city delayed On_Time On_Time_Percentage Delay_Percentage
## 1 Alaska Los.Angeles 62 497 0.8890877 0.11091234
## 2 Alaska Phoenix 12 221 0.9484979 0.05150215
## 3 Alaska San.Diego 20 212 0.9137931 0.08620690
## 4 Alaska San.Francisco 102 503 0.8314050 0.16859504
## 5 Alaska Seattle 305 1841 0.8578751 0.14212488
## 6 AM West Los.Angeles 117 694 0.8557337 0.14426634
## 7 AM West Phoenix 415 4840 0.9210276 0.07897241
## 8 AM West San.Diego 65 383 0.8549107 0.14508929
## 9 AM West San.Francisco 129 383 0.7480469 0.25195312
## 10 AM West Seattle 61 201 0.7671756 0.23282443
The assignment asked us to compare the delay and on time rates of both airlines. I created two arrays that give that information. The percentage of delayed flights plus the percentage of on time flights will equal 100%, so I used that to calculate delay percentages.
TotalOnTime<-with(AirTidy, tapply(On_Time,Airline,sum)) #creating totals of on time and delayed for each airline to compare them
TotalDelayed<-with(AirTidy, tapply(delayed,Airline,sum))
OverallOTPercent<-TotalOnTime/(TotalDelayed+TotalOnTime)
OverallDPercent<-1-OverallOTPercent
OverallOTPercent
## Alaska AM West
## 0.8672848 0.8920143
OverallDPercent
## Alaska AM West
## 0.1327152 0.1079857
The data here is quite interesting, as we have a case of Simpson’s Paradox. For each location, Alaska has a lower delay percentage. However, overall, AM West has a lower delay percentage. The reasoning for this can be seen when we look at the city based delay percentages.
CitiesOnTime<-with(AirTidy, tapply(On_Time,city,sum)) #creating totals for cities instead of airlines
CitiesDelayed<-with(AirTidy, tapply(delayed,city,sum))
CitiesOTPercent<-CitiesOnTime/(CitiesDelayed+CitiesOnTime)
CitiesOTPercent
## Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 0.8693431 0.9221939 0.8750000 0.7931961 0.8480066
CitiesDPercent<-1-CitiesOTPercent
CitiesDPercent
## Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 0.13065693 0.07780612 0.12500000 0.20680394 0.15199336
Phoenix has a fantastically low delay percentage, while San Francisco and Seattle have much higher delay percentages. AM West flies a more planes out of Phoenix than all of Alaska’s locations combined. Alaska’s main hub is Seattle, which has only a marginally better on time percentage than San Francisco. Despite AM West having a lower overall delay percentage, the data indicates that a flier is better off taking Alaska over AM West.