By William Outcault

Problem

The Data

## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: ggplot2
## Loading required package: tidyr
##         X     X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  Alaska on time         497     221       212           503    1840
## 2         delayed          62      12        20           102     305
## 3                          NA      NA        NA            NA      NA
## 4 AM West on time         694    4840       383           320     201
## 5         delayed         117     415        65           129      61

Janitorial Work

##   Airline  Status  LA Phoenix San Diego San Fran Seattle
## 1  Alaska on time 497     221       212      503    1840
## 2  Alaska delayed  62      12        20      102     305
## 4 AM West on time 694    4840       383      320     201
## 5 AM West delayed 117     415        65      129      61

Tidy Data

##    Airline      city delayed on time ratio
## 1   Alaska        LA      62     497 0.111
## 2   Alaska   Phoenix      12     221 0.052
## 3   Alaska San Diego      20     212 0.086
## 4   Alaska  San Fran     102     503 0.169
## 5   Alaska   Seattle     305    1840 0.142
## 6  AM West        LA     117     694 0.144
## 7  AM West   Phoenix     415    4840 0.079
## 8  AM West San Diego      65     383 0.145
## 9  AM West  San Fran     129     320 0.287
## 10 AM West   Seattle      61     201 0.233

Box Plot

Smaller box plots such as Phoenix and LA mean similar ratios of being late between the two airlines. Because both airlines are late approximately the same amount, can we assume it is the airport’s fault rather than the airline’s fault. Perhaps weather is usually bad in Phoenix and LA so you can expect a consistent amount of delayed flights in these cities?

Conclusion

Notice in the box plot above the cities Phoenix, LA and San Diego all have a smaller distribution between ‘late’ ratios. These ‘late’ ratios represent the number of times a flight is delayed over the total number of flights. If we notice a smaller distribution between these ratios that means both airlines are delayed around the same percentages of time. From this small distribution we conclude that it may not be the airlines fault if in fact both airlines are late the same amount of times. This may mean that an airport such as Phoenix or LA expereince worse weather. Cities such as San Francisco have larger distributions so one airline has a higher liklihood of being late. So when flying to San Francisco or Seattle it is smart to do your research on the airline.