Task: Perform tasks with tidyr and dplyr to clean and tidy a data frame. Next, analyze arrival delays for the two airlines.
Our data frame, in its original form looks like:
## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 delayed 62 12 20 102 305
## 3 NA NA NA NA NA
## 4 AM WEST on time 694 4840 383 320 201
## 5 delayed 117 415 65 129 61
Fill in the Airline name in the two missing places. Then eliminate the row full of NAs. Then, gather it with X and X.1 as keys. (They will be renamed airline and category.)
airline.chart[2,1]<-airline.chart[1,1]
airline.chart[5,1]<-airline.chart[4,1]
airline.chart<-airline.chart[-3,]
airline.chart<-gather(airline.chart,key='X','X.1')
airline.chart
## X X.1 X X.1
## 1 ALASKA on time Los.Angeles 497
## 2 ALASKA delayed Los.Angeles 62
## 3 AM WEST on time Los.Angeles 694
## 4 AM WEST delayed Los.Angeles 117
## 5 ALASKA on time Phoenix 221
## 6 ALASKA delayed Phoenix 12
## 7 AM WEST on time Phoenix 4840
## 8 AM WEST delayed Phoenix 415
## 9 ALASKA on time San.Diego 212
## 10 ALASKA delayed San.Diego 20
## 11 AM WEST on time San.Diego 383
## 12 AM WEST delayed San.Diego 65
## 13 ALASKA on time San.Francisco 503
## 14 ALASKA delayed San.Francisco 102
## 15 AM WEST on time San.Francisco 320
## 16 AM WEST delayed San.Francisco 129
## 17 ALASKA on time Seattle 1841
## 18 ALASKA delayed Seattle 305
## 19 AM WEST on time Seattle 201
## 20 AM WEST delayed Seattle 61
Attach column names. Arrange by airline. Use spread to move on time status (category) to the columns.
colnames(airline.chart)<-c('airline','category','city','number')
airline.chart<-arrange(airline.chart,airline.chart[,1])
airline.chart<-spread(airline.chart,category,number)
airline.chart
## airline city delayed on time
## 1 ALASKA Los.Angeles 62 497
## 2 ALASKA Phoenix 12 221
## 3 ALASKA San.Diego 20 212
## 4 ALASKA San.Francisco 102 503
## 5 ALASKA Seattle 305 1841
## 6 AM WEST Los.Angeles 117 694
## 7 AM WEST Phoenix 415 4840
## 8 AM WEST San.Diego 65 383
## 9 AM WEST San.Francisco 129 320
## 10 AM WEST Seattle 61 201
Create a new column for on-time percentage.
## airline city delayed on time on_time_percentage
## 1 ALASKA Los.Angeles 62 497 0.8890877
## 2 ALASKA Phoenix 12 221 0.9484979
## 3 ALASKA San.Diego 20 212 0.9137931
## 4 ALASKA San.Francisco 102 503 0.8314050
## 5 ALASKA Seattle 305 1841 0.8578751
## 6 AM WEST Los.Angeles 117 694 0.8557337
## 7 AM WEST Phoenix 415 4840 0.9210276
## 8 AM WEST San.Diego 65 383 0.8549107
## 9 AM WEST San.Francisco 129 320 0.7126949
## 10 AM WEST Seattle 61 201 0.7671756
AM West has an online percentage that is lower in each city, which is clear from the above graph.
In San Francisco, Alaska Airlines has the greatest advantage, whereas Pheonix has the smallest difference
For average overall performance, the result is the opposite. This means that, in the city where on-time percentage is the lowest, it matters most which airline you choose.
———————————————————————————————————————–
The last 2 charts were aided by the following tables:
## city ALASKA AM WEST Alaska_advantage
## 1 Los.Angeles 0.8890877 0.8557337 0.03335399
## 2 Phoenix 0.9484979 0.9210276 0.02747026
## 3 San.Diego 0.9137931 0.8549107 0.05888239
## 4 San.Francisco 0.8314050 0.7126949 0.11871008
## 5 Seattle 0.8578751 0.7671756 0.09069954
## city ALASKA AM WEST On_time_average
## 1 Los.Angeles 0.8890877 0.8557337 0.8724107
## 2 Phoenix 0.9484979 0.9210276 0.9347627
## 3 San.Diego 0.9137931 0.8549107 0.8843519
## 4 San.Francisco 0.8314050 0.7126949 0.7720499
## 5 Seattle 0.8578751 0.7671756 0.8125253