To create the CSV I manually entered the information contained in the image into a google sheets file and downloaded it as a .csv. I start by reading in the .csv file

airline2<-read.csv("CSV for HW 5 - Sheet1.csv", stringsAsFactors = FALSE)
airline2[2,1]<-airline2[1,1]#inserting duplicate values so that I can use gather without a problem
airline2[5,1]<-airline2[4,1]
airline2
##         X     X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  Alaska on time         497     221       212           503    1841
## 2  Alaska delayed          62      12        20           102     305
## 3                          NA      NA        NA            NA      NA
## 4 AM West on time         694    4840       383           383     201
## 5 AM West delayed         117     415        65           129      61

In order to get this into a tidy format, I need to first melt (or gather) the data.frame and then spread one of the colums. Gather nicely takes care of row 3 with the blanks and NAs if I use the na.rm option. I also took the time to fix the name of two columns, one that had X as the tile and the other that had a space in it, making it harder to work with.

AirPartTidy<-gather(airline2,city,number_of_flights,c(3:7), na.rm = TRUE) #melting it all together
AirTidy<-spread(AirPartTidy,X.1,number_of_flights) #spreading out the on time/delayed part into two columns for easy use
names(AirTidy)[1]<-"Airline" #fixing column names
names(AirTidy)[4]<-"On_Time"

For the analysis, I first created two vectors, one tracking the percentage of flights that were on time, the other tracking delays, both by location and airline. I then attached them to the AirTidy data.frame.

On_Time_Percentage<-(AirTidy$On_Time/(AirTidy$delayed+AirTidy$On_Time))  
Delay_Percentage<-1-On_Time_Percentage
AirTidy<-cbind(AirTidy,On_Time_Percentage)
AirTidy<-cbind(AirTidy,Delay_Percentage)
AirTidy
##    Airline          city delayed On_Time On_Time_Percentage Delay_Percentage
## 1   Alaska   Los.Angeles      62     497          0.8890877       0.11091234
## 2   Alaska       Phoenix      12     221          0.9484979       0.05150215
## 3   Alaska     San.Diego      20     212          0.9137931       0.08620690
## 4   Alaska San.Francisco     102     503          0.8314050       0.16859504
## 5   Alaska       Seattle     305    1841          0.8578751       0.14212488
## 6  AM West   Los.Angeles     117     694          0.8557337       0.14426634
## 7  AM West       Phoenix     415    4840          0.9210276       0.07897241
## 8  AM West     San.Diego      65     383          0.8549107       0.14508929
## 9  AM West San.Francisco     129     383          0.7480469       0.25195312
## 10 AM West       Seattle      61     201          0.7671756       0.23282443

The assignment asked us to compare the delay and on time rates of both airlines. I created two arrays that give that information. The percentage of delayed flights plus the percentage of on time flights will equal 100%, so I used that to calculate delay percentages.

TotalOnTime<-with(AirTidy, tapply(On_Time,Airline,sum)) #creating totals of on time and delayed for each airline to compare them
TotalDelayed<-with(AirTidy, tapply(delayed,Airline,sum))
OverallOTPercent<-TotalOnTime/(TotalDelayed+TotalOnTime)
OverallDPercent<-1-OverallOTPercent
OverallOTPercent
##    Alaska   AM West 
## 0.8672848 0.8920143
OverallDPercent
##    Alaska   AM West 
## 0.1327152 0.1079857

The data here is quite interesting, as we have a case of Simpson’s Paradox. For each location, Alaska has a lower delay percentage. However, overall, AM West has a lower delay percentage. The reasoning for this can be seen when we look at the city based delay percentages.

CitiesOnTime<-with(AirTidy, tapply(On_Time,city,sum)) #creating totals for cities instead of airlines
CitiesDelayed<-with(AirTidy, tapply(delayed,city,sum))
CitiesOTPercent<-CitiesOnTime/(CitiesDelayed+CitiesOnTime)
CitiesOTPercent
##   Los.Angeles       Phoenix     San.Diego San.Francisco       Seattle 
##     0.8693431     0.9221939     0.8750000     0.7931961     0.8480066
CitiesDPercent<-1-CitiesOTPercent
CitiesDPercent
##   Los.Angeles       Phoenix     San.Diego San.Francisco       Seattle 
##    0.13065693    0.07780612    0.12500000    0.20680394    0.15199336

Phoenix has a fantastically low delay percentage, while San Francisco and Seattle have much higher delay percentages. AM West flies a more planes out of Phoenix than all of Alaska’s locations combined. Alaska’s main hub is Seattle, which has only a marginally better on time percentage than San Francisco. Despite AM West having a lower overall delay percentage, the data indicates that a flier is better off taking Alaska over AM West.