Loading libraries

library(tidyr)
library(dplyr)


Reading in the data

After loading tidyr and dplyr, I read in the CSV file as a data frame called “delays”.

delays <- read.csv('https://raw.githubusercontent.com/chrosemo/data607_fall19_week5/master/assignment5.csv', header=TRUE, sep=',')
delays
##         X     X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  ALASKA on time         497     221       212           503    1841
## 2         delayed          62      12        20           102     305
## 3                          NA      NA        NA            NA      NA
## 4 AM WEST on time         694    4840       383           320     201
## 5         delayed         117     415        65           129      61


Manipulating the data

Next, I removed the blank third row and inserted airline names into the empty cells in the first column.

delays <- delays[-3,]
delays[2,1] <- 'ALASKA'
delays[4,1] <- 'AM WEST'
delays
##         X     X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  ALASKA on time         497     221       212           503    1841
## 2  ALASKA delayed          62      12        20           102     305
## 4 AM WEST on time         694    4840       383           320     201
## 5 AM WEST delayed         117     415        65           129      61


I then renamed the first (“airline”) and second columns (“status”) and reshaped the data frame from wide to long format using gather(). The columns containing delay count values were gathered in the “destination” column, and the delay count values themselves were gathered in the “delay_count” column.

delays <- delays %>% rename('Airline' = 'X', 'Status' = 'X.1')
delays <- delays %>% gather(Destination, Flight_Count, -Airline, -Status)
delays
##    Airline  Status   Destination Flight_Count
## 1   ALASKA on time   Los.Angeles          497
## 2   ALASKA delayed   Los.Angeles           62
## 3  AM WEST on time   Los.Angeles          694
## 4  AM WEST delayed   Los.Angeles          117
## 5   ALASKA on time       Phoenix          221
## 6   ALASKA delayed       Phoenix           12
## 7  AM WEST on time       Phoenix         4840
## 8  AM WEST delayed       Phoenix          415
## 9   ALASKA on time     San.Diego          212
## 10  ALASKA delayed     San.Diego           20
## 11 AM WEST on time     San.Diego          383
## 12 AM WEST delayed     San.Diego           65
## 13  ALASKA on time San.Francisco          503
## 14  ALASKA delayed San.Francisco          102
## 15 AM WEST on time San.Francisco          320
## 16 AM WEST delayed San.Francisco          129
## 17  ALASKA on time       Seattle         1841
## 18  ALASKA delayed       Seattle          305
## 19 AM WEST on time       Seattle          201
## 20 AM WEST delayed       Seattle           61


I then renamed the values in the “airline” and “destination” columns.

delays <- delays %>% mutate(Airline=recode(Airline,
                                 'ALASKA' = 'Alaska Airlines',
                                 'AM WEST' = 'America West Airlines'),
                  Destination=recode(Destination,
                                     'Los.Angeles' = 'Los Angeles',
                                     'San.Diego' = 'San Diego',
                                     'San.Francisco' = 'San Francisco')
                  )


Analyzing the data

I began my analysis by totaling the number of flights by airline, status, and destination. There were 11,000 flights total, and America West Airlines had nearly twice as many flights (7,225 flights, or approximately 65.7% of the total) as Alaska Airlines had (3,775 flights, or approximately 34.3% of the total).

delays %>% summarize(Total = sum(Flight_Count))
##   Total
## 1 11000
delays %>% group_by(Airline) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 2 x 3
##   Airline               Total `Relative Frequency`
##   <fct>                 <int>                <dbl>
## 1 Alaska Airlines        3775                0.343
## 2 America West Airlines  7225                0.657


Regarding destination, Phoenix had 5,488 flights, or approximately 49.9% of all flights; Seattle had 1,054 flights, or approximately 21.9%; Los Angeles had 1,370 flights, or approximately 12.5%; San Francisco had 1,054 flights, or approximately 9.5%; and San Diego had 680 flights, or approximately 6.2%.

delays %>% group_by(Destination) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 5 x 3
##   Destination   Total `Relative Frequency`
##   <chr>         <int>                <dbl>
## 1 Los Angeles    1370               0.125 
## 2 Phoenix        5488               0.499 
## 3 San Diego       680               0.0618
## 4 San Francisco  1054               0.0958
## 5 Seattle        2408               0.219


For Alaska Airlines, Seattle (n = 2,146) had approximately 56.8% of all flights, followed by San Francisco (n = 605) with approximately 16%, Los Angeles (n = 559) with approximately 14.8%, and Phoenix (n = 233) and San Diego (n = 232) with approximately 6.2% each.

For America West Airlines, Phoenix (n = 5,255) had approximately 72.7% of all flights. Next were Los Angeles (n = 811) with approximately 11.2%, San Francisco (n = 449) and San Diego (n = 448) with approximately 6.2% each, and Seattle (n = 262) with approximately 3.6%.

delays %>% group_by(Airline, Destination) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 10 x 4
## # Groups:   Airline [2]
##    Airline               Destination   Total `Relative Frequency`
##    <fct>                 <chr>         <int>                <dbl>
##  1 Alaska Airlines       Los Angeles     559               0.148 
##  2 Alaska Airlines       Phoenix         233               0.0617
##  3 Alaska Airlines       San Diego       232               0.0615
##  4 Alaska Airlines       San Francisco   605               0.160 
##  5 Alaska Airlines       Seattle        2146               0.568 
##  6 America West Airlines Los Angeles     811               0.112 
##  7 America West Airlines Phoenix        5255               0.727 
##  8 America West Airlines San Diego       448               0.0620
##  9 America West Airlines San Francisco   449               0.0621
## 10 America West Airlines Seattle         262               0.0363


Regarding arrival status, across all destinations, 1,288 flights were delayed, and 9,712 flights arrived on time. Roughly 12 in every 100 flights were delayed.

delays %>% group_by(Status) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 2 x 3
##   Status  Total `Relative Frequency`
##   <fct>   <int>                <dbl>
## 1 delayed  1288                0.117
## 2 on time  9712                0.883

Across all destinations, Alaska Airlines had 501 flights delayed (approximately 13.3% of its flights) and 3,274 flights arriving on time (approximately 86.7% of its flights); the mean numbers of flights delayed and arriving on time per destination were 100.2 and 654.8 flights, respectively.

For America West Airlines, 787 flights were delayed (approximately 10.9% of its flights), and 6,438 flights arrived on time (approximately 89.1% of its flights); the mean numbers of flights delayed and arriving on time per destination were 157.4 and 1,287.6 flights, respectively.

delays %>% group_by(Airline, Status) %>% summarize(Total = sum(Flight_Count), Mean = mean(Flight_Count)) %>%  mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 4 x 5
## # Groups:   Airline [2]
##   Airline               Status  Total  Mean `Relative Frequency`
##   <fct>                 <fct>   <int> <dbl>                <dbl>
## 1 Alaska Airlines       delayed   501  100.                0.133
## 2 Alaska Airlines       on time  3274  655.                0.867
## 3 America West Airlines delayed   787  157.                0.109
## 4 America West Airlines on time  6438 1288.                0.891


Considering delayed flights, flights out of Phoenix (n = 427) represented approximately 33.2% of all flights delayed. Next were Seattle (n = 366) at approximately 28.4%; San Francisco (n = 231) at approximately 17.9%; Los Angeles (n = 179) at approximately 13.9%; and San Diego (n = 85) at approximately 6.6%.

For flights arriving on time, Phoenix had 5,061 flights representing approximately 52.1% of all flights arriving on time. The next highest, in order, were Seattle (n = 2,042) at approximately 21%; Los Angeles (n = 1,191) at approximately 12.3%; San Francisco (n = 823) at approximately 8.5%; and San Diego (n = 595) at approximately 6.1%.

delays %>% group_by(Status, Destination) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 10 x 4
## # Groups:   Status [2]
##    Status  Destination   Total `Relative Frequency`
##    <fct>   <chr>         <int>                <dbl>
##  1 delayed Los Angeles     179               0.139 
##  2 delayed Phoenix         427               0.332 
##  3 delayed San Diego        85               0.0660
##  4 delayed San Francisco   231               0.179 
##  5 delayed Seattle         366               0.284 
##  6 on time Los Angeles    1191               0.123 
##  7 on time Phoenix        5061               0.521 
##  8 on time San Diego       595               0.0613
##  9 on time San Francisco   823               0.0847
## 10 on time Seattle        2042               0.210


Considering arrival status for each destination, approximately 21.9% (n = 231) of flights out of San Francisco were delayed. In Seattle, approximately 15.2% (n = 366) of flights were delayed. Approximately 13.1% (n = 179) of flights out of Los Angeles were delayed. Approximately 12.5% (n = 85) of flights out of San Diego were delayed. And only approximately 7.8% (n = 427) of flights out of Phoenix were delayed.

delays %>% group_by(Destination, Status) %>% summarize(Total = sum(Flight_Count)) %>% mutate('Relative Frequency' = Total / sum(Total))
## # A tibble: 10 x 4
## # Groups:   Destination [5]
##    Destination   Status  Total `Relative Frequency`
##    <chr>         <fct>   <int>                <dbl>
##  1 Los Angeles   delayed   179               0.131 
##  2 Los Angeles   on time  1191               0.869 
##  3 Phoenix       delayed   427               0.0778
##  4 Phoenix       on time  5061               0.922 
##  5 San Diego     delayed    85               0.125 
##  6 San Diego     on time   595               0.875 
##  7 San Francisco delayed   231               0.219 
##  8 San Francisco on time   823               0.781 
##  9 Seattle       delayed   366               0.152 
## 10 Seattle       on time  2042               0.848


Making brief conclusions

Numbers and proportions of flights delayed varied by airline and destination. Overall, the airline hubs (Seattle for Alaska Airlines and Phoenix for America West Airlines) had the most flights as well as the most delayed flights. However, San Francisco had the highest proportion of flights delayed, at approximately 21.9%. That proportion compared with 11.7% of flights delayed across all airports. Phoenix, despite having the most total arrivals (n = 5,488), had the lowest proportion of flights delayed, with approximately 7.7%.