Airline CSV

What writes great csv files R can read without a problem? R does. So let us create a matrix (transposed because I misread things), and write that out. We then read it back in.

airlineData<-t(matrix(c("","","Los Angeles", "Phoenix", "San Diego", "San Francisco", "Seattle","Alaska", "on time", 497,221,212,503,1841,"","delayed",62,12,20,102,305,"","","","","","","","AM WEST","on time",694,4840,383,320,201,"","delayed",117,415,65,129,61),nrow=7))
write.csv(airlineData,"airlineData.csv")
airlineData <- read.csv("airlineData.csv", as.is = TRUE)
kable(airlineData)
X V1 V2 V3 V4 V5 V6 V7
1 Los Angeles Phoenix San Diego San Francisco Seattle
2 Alaska on time 497 221 212 503 1841
3 delayed 62 12 20 102 305
4
5 AM WEST on time 694 4840 383 320 201
6 delayed 117 415 65 129 61

Smashed Data.

First we take off the left index count and get a data frame. Then we take the top row and make the the names of the data. Next we cleanup the blank row. Then we dupe the category level labels.

airlineData<-as.data.frame(airlineData[-1],stringsAsFactors = FALSE)
names(airlineData)<-airlineData[1,]
names(airlineData)[1:2]<-c("Airline","Flight Status")
airlineData<-airlineData[c(-1,-4),]
airlineData[2,1]<-airlineData[1,1]
airlineData[4,1]<-airlineData[3,1]
kable(airlineData)
Airline Flight Status Los Angeles Phoenix San Diego San Francisco Seattle
2 Alaska on time 497 221 212 503 1841
3 Alaska delayed 62 12 20 102 305
5 AM WEST on time 694 4840 383 320 201
6 AM WEST delayed 117 415 65 129 61

Because it says to use tidyr we canโ€™t just use the base functions quixotically. So we gather into City with Flights excluding airlinge and flight status and sorting on airline.

airlineDataLong <- gather(airlineData ,key=City, value="Flights", -"Flight Status",-Airline) %>% arrange(Airline)
airlineDataLong$Flights<-as.numeric(airlineDataLong$Flights)
kable(airlineDataLong)
Airline Flight Status City Flights
Alaska on time Los Angeles 497
Alaska delayed Los Angeles 62
Alaska on time Phoenix 221
Alaska delayed Phoenix 12
Alaska on time San Diego 212
Alaska delayed San Diego 20
Alaska on time San Francisco 503
Alaska delayed San Francisco 102
Alaska on time Seattle 1841
Alaska delayed Seattle 305
AM WEST on time Los Angeles 694
AM WEST delayed Los Angeles 117
AM WEST on time Phoenix 4840
AM WEST delayed Phoenix 415
AM WEST on time San Diego 383
AM WEST delayed San Diego 65
AM WEST on time San Francisco 320
AM WEST delayed San Francisco 129
AM WEST on time Seattle 201
AM WEST delayed Seattle 61

Analysis

onTime <- airlineDataLong %>% spread(`Flight Status`,"Flights")
onTime$onTimePercent <- onTime$`on time` / (onTime$delayed+ onTime$`on time`) *100
onTime <-onTime %>% arrange(onTimePercent)
ggplot(onTime, aes(City, onTimePercent, color=Airline, size=40))+geom_point()

We can clearly see that for ever city Am West has a lower on-time percentage.