Data 607 - Assignment 5

John Kellogg

2019-09-29


Data Generation

The data is saved on github. In this chunk I pull the data locally then run a few simple cleanup operations to get it ready for ‘tidyr’ and ‘dplyr’

##    Flight  Status Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  ALASKA on.time         497     221       212           503    1841
## 2         delayed          62      12        20           102     305
## 3                          NA      NA        NA            NA      NA
## 4 AM WEST on.time         694    4840       383           320     201
## 5         delayed         117     415        65           129      61
##    Flight  Status Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  ALASKA on.time         497     221       212           503    1841
## 2  ALASKA delayed          62      12        20           102     305
## 4 AM WEST on.time         694    4840       383           320     201
## 5 AM WEST delayed         117     415        65           129      61

Using dplyr methods and functions

Using the Gather method

##     Flight  Status          City flight_length
## 1   ALASKA on.time   Los.Angeles           497
## 2   ALASKA delayed   Los.Angeles            62
## 3  AM WEST on.time   Los.Angeles           694
## 4  AM WEST delayed   Los.Angeles           117
## 5   ALASKA on.time       Phoenix           221
## 6   ALASKA delayed       Phoenix            12
## 7  AM WEST on.time       Phoenix          4840
## 8  AM WEST delayed       Phoenix           415
## 9   ALASKA on.time     San.Diego           212
## 10  ALASKA delayed     San.Diego            20
## 11 AM WEST on.time     San.Diego           383
## 12 AM WEST delayed     San.Diego            65
## 13  ALASKA on.time San.Francisco           503
## 14  ALASKA delayed San.Francisco           102
## 15 AM WEST on.time San.Francisco           320
## 16 AM WEST delayed San.Francisco           129
## 17  ALASKA on.time       Seattle          1841
## 18  ALASKA delayed       Seattle           305
## 19 AM WEST on.time       Seattle           201
## 20 AM WEST delayed       Seattle            61

Using the spread function to create second Data frame on delayed and on time for use later.

##     Flight          City delayed on.time
## 1   ALASKA   Los.Angeles      62     497
## 2   ALASKA       Phoenix      12     221
## 3   ALASKA     San.Diego      20     212
## 4   ALASKA San.Francisco     102     503
## 5   ALASKA       Seattle     305    1841
## 6  AM WEST   Los.Angeles     117     694
## 7  AM WEST       Phoenix     415    4840
## 8  AM WEST     San.Diego      65     383
## 9  AM WEST San.Francisco     129     320
## 10 AM WEST       Seattle      61     201

Using tidyr methods and functions

Using Select Statements

##     Status flight_length
## 1  on.time           497
## 2  delayed            62
## 3  on.time           694
## 4  delayed           117
## 5  on.time           221
## 6  delayed            12
## 7  on.time          4840
## 8  delayed           415
## 9  on.time           212
## 10 delayed            20
## 11 on.time           383
## 12 delayed            65
## 13 on.time           503
## 14 delayed           102
## 15 on.time           320
## 16 delayed           129
## 17 on.time          1841
## 18 delayed           305
## 19 on.time           201
## 20 delayed            61
##     Flight  Status flight_length
## 1   ALASKA on.time           497
## 2   ALASKA delayed            62
## 3  AM WEST on.time           694
## 4  AM WEST delayed           117
## 5   ALASKA on.time           221
## 6   ALASKA delayed            12
## 7  AM WEST on.time          4840
## 8  AM WEST delayed           415
## 9   ALASKA on.time           212
## 10  ALASKA delayed            20
## 11 AM WEST on.time           383
## 12 AM WEST delayed            65
## 13  ALASKA on.time           503
## 14  ALASKA delayed           102
## 15 AM WEST on.time           320
## 16 AM WEST delayed           129
## 17  ALASKA on.time          1841
## 18  ALASKA delayed           305
## 19 AM WEST on.time           201
## 20 AM WEST delayed            61

Using filter to separate out each status

##   Flight  Status          City flight_length
## 1 ALASKA on.time   Los.Angeles           497
## 2 ALASKA on.time       Phoenix           221
## 3 ALASKA on.time     San.Diego           212
## 4 ALASKA on.time San.Francisco           503
## 5 ALASKA on.time       Seattle          1841
##   Flight  Status          City flight_length
## 1 ALASKA delayed   Los.Angeles            62
## 2 ALASKA delayed       Phoenix            12
## 3 ALASKA delayed     San.Diego            20
## 4 ALASKA delayed San.Francisco           102
## 5 ALASKA delayed       Seattle           305
##    Flight  Status          City flight_length
## 1 AM WEST on.time   Los.Angeles           694
## 2 AM WEST on.time       Phoenix          4840
## 3 AM WEST on.time     San.Diego           383
## 4 AM WEST on.time San.Francisco           320
## 5 AM WEST on.time       Seattle           201
##    Flight  Status          City flight_length
## 1 AM WEST delayed   Los.Angeles           117
## 2 AM WEST delayed       Phoenix           415
## 3 AM WEST delayed     San.Diego            65
## 4 AM WEST delayed San.Francisco           129
## 5 AM WEST delayed       Seattle            61

Using mutate to define total time of each flight location

##     Flight          City delayed on.time total
## 1   ALASKA   Los.Angeles      62     497   559
## 2   ALASKA       Phoenix      12     221   233
## 3   ALASKA     San.Diego      20     212   232
## 4   ALASKA San.Francisco     102     503   605
## 5   ALASKA       Seattle     305    1841  2146
## 6  AM WEST   Los.Angeles     117     694   811
## 7  AM WEST       Phoenix     415    4840  5255
## 8  AM WEST     San.Diego      65     383   448
## 9  AM WEST San.Francisco     129     320   449
## 10 AM WEST       Seattle      61     201   262

Using arrange on the total column

##     Flight          City delayed on.time total
## 1  AM WEST       Phoenix     415    4840  5255
## 2   ALASKA       Seattle     305    1841  2146
## 3  AM WEST   Los.Angeles     117     694   811
## 4   ALASKA San.Francisco     102     503   605
## 5   ALASKA   Los.Angeles      62     497   559
## 6  AM WEST San.Francisco     129     320   449
## 7  AM WEST     San.Diego      65     383   448
## 8  AM WEST       Seattle      61     201   262
## 9   ALASKA       Phoenix      12     221   233
## 10  ALASKA     San.Diego      20     212   232

Using summarise with group by

## # A tibble: 2 x 4
##   Flight  median   max `min(total)`
##   <chr>    <int> <int>        <int>
## 1 ALASKA     559  2146          232
## 2 AM WEST    449  5255          262

ANALYSIS

Assumption: The amount of flights and delays are normally distributed

Best Flight to take

Which Flight To Los.Angeles had a better arrival time?

Logic: Probability your flight will be delayed to each city

##     Flight          City delayed on.time total delay_ratio
## 1  AM WEST       Phoenix     415    4840  5255  0.07897241
## 2   ALASKA       Seattle     305    1841  2146  0.14212488
## 3  AM WEST   Los.Angeles     117     694   811  0.14426634
## 4   ALASKA San.Francisco     102     503   605  0.16859504
## 5   ALASKA   Los.Angeles      62     497   559  0.11091234
## 6  AM WEST San.Francisco     129     320   449  0.28730512
## 7  AM WEST     San.Diego      65     383   448  0.14508929
## 8  AM WEST       Seattle      61     201   262  0.23282443
## 9   ALASKA       Phoenix      12     221   233  0.05150215
## 10  ALASKA     San.Diego      20     212   232  0.08620690
## # A tibble: 2 x 8
##   Flight    Ttl ONTIME.mean DELAY.mean M.Ratio ONTIME.SD DELAY.SD SD.Ratio
##   <chr>   <int>       <dbl>      <dbl>   <dbl>     <dbl>    <dbl>    <dbl>
## 1 ALASKA   3775        655.       100.   0.153      678.     120.   0.177 
## 2 AM WEST  7225       1288.       157.   0.122     1994.     147.   0.0738

Which Flight To Phoenix has the probability of delaying less?

## [1] "ALASKA"
## [1] "Alaska has a delay ratio of 0.0515 and AM WEST a delay ratio of 0.0790"

Which Flight To San.Diego has the probability of delaying less?

## [1] "ALASKA"
## [1] "Alaska has a delay ratio of 0.0862 and AM WEST a delay ratio of 0.1451"

Which Flight To San.Francisco has the probability of delaying less?

## [1] "ALASKA"
## [1] "Alaska has a delay ratio of 0.1686 and AM WEST a delay ratio of 0.2873"

Which Flight To Seattle has the probability of delaying less?

## [1] "ALASKA"
## [1] "Alaska has a delay ratio of 0.1421 and AM WEST a delay ratio of 0.2328"

Conclusion:

If you look at the Analysis table only for your choice of the better airline, AM WEST seems to be the better choice. Their ratio of mean On Time vs. delay is less than Alaska. AM West does have 26% more flights in total than Alaska allowing for a larger chance of delays. Using just this table is inconclusive.

Analyzing each individual flight destination for probability of less delays, Alaska has the advantage taking each of the individual flights by having a smaller delay ratio.

If I was going to chose a single carrier to use and didn’t mind a smaller pool of available flights I would chose:

Alaska