What writes great csv files R can read without a problem? R does. So let us create a matrix (transposed because I misread things), and write that out. We then read it back in.
airlineData<-t(matrix(c("","","Los Angeles", "Phoenix", "San Diego", "San Francisco", "Seattle","Alaska", "on time", 497,221,212,503,1841,"","delayed",62,12,20,102,305,"","","","","","","","AM WEST","on time",694,4840,383,320,201,"","delayed",117,415,65,129,61),nrow=7))
write.csv(airlineData,"airlineData.csv")
airlineData <- read.csv("airlineData.csv", as.is = TRUE)
kable(airlineData)
| X | V1 | V2 | V3 | V4 | V5 | V6 | V7 |
|---|---|---|---|---|---|---|---|
| 1 | Los Angeles | Phoenix | San Diego | San Francisco | Seattle | ||
| 2 | Alaska | on time | 497 | 221 | 212 | 503 | 1841 |
| 3 | delayed | 62 | 12 | 20 | 102 | 305 | |
| 4 | |||||||
| 5 | AM WEST | on time | 694 | 4840 | 383 | 320 | 201 |
| 6 | delayed | 117 | 415 | 65 | 129 | 61 |
First we take off the left index count and get a data frame. Then we take the top row and make the the names of the data. Next we cleanup the blank row. Then we dupe the category level labels.
airlineData<-as.data.frame(airlineData[-1],stringsAsFactors = FALSE)
names(airlineData)<-airlineData[1,]
names(airlineData)[1:2]<-c("Airline","Flight Status")
airlineData<-airlineData[c(-1,-4),]
airlineData[2,1]<-airlineData[1,1]
airlineData[4,1]<-airlineData[3,1]
kable(airlineData)
| Airline | Flight Status | Los Angeles | Phoenix | San Diego | San Francisco | Seattle | |
|---|---|---|---|---|---|---|---|
| 2 | Alaska | on time | 497 | 221 | 212 | 503 | 1841 |
| 3 | Alaska | delayed | 62 | 12 | 20 | 102 | 305 |
| 5 | AM WEST | on time | 694 | 4840 | 383 | 320 | 201 |
| 6 | AM WEST | delayed | 117 | 415 | 65 | 129 | 61 |
Because it says to use tidyr we canโt just use the base functions quixotically. So we gather into City with Flights excluding airlinge and flight status and sorting on airline.
airlineDataLong <- gather(airlineData ,key=City, value="Flights", -"Flight Status",-Airline) %>% arrange(Airline)
airlineDataLong$Flights<-as.numeric(airlineDataLong$Flights)
kable(airlineDataLong)
| Airline | Flight Status | City | Flights |
|---|---|---|---|
| Alaska | on time | Los Angeles | 497 |
| Alaska | delayed | Los Angeles | 62 |
| Alaska | on time | Phoenix | 221 |
| Alaska | delayed | Phoenix | 12 |
| Alaska | on time | San Diego | 212 |
| Alaska | delayed | San Diego | 20 |
| Alaska | on time | San Francisco | 503 |
| Alaska | delayed | San Francisco | 102 |
| Alaska | on time | Seattle | 1841 |
| Alaska | delayed | Seattle | 305 |
| AM WEST | on time | Los Angeles | 694 |
| AM WEST | delayed | Los Angeles | 117 |
| AM WEST | on time | Phoenix | 4840 |
| AM WEST | delayed | Phoenix | 415 |
| AM WEST | on time | San Diego | 383 |
| AM WEST | delayed | San Diego | 65 |
| AM WEST | on time | San Francisco | 320 |
| AM WEST | delayed | San Francisco | 129 |
| AM WEST | on time | Seattle | 201 |
| AM WEST | delayed | Seattle | 61 |
onTime <- airlineDataLong %>% spread(`Flight Status`,"Flights")
onTime$onTimePercent <- onTime$`on time` / (onTime$delayed+ onTime$`on time`) *100
onTime <-onTime %>% arrange(onTimePercent)
ggplot(onTime, aes(City, onTimePercent, color=Airline, size=40))+geom_point()
We can clearly see that for ever city Am West has a lower on-time percentage.