This is R Bridage Final project. Data are a subset of hflights. Visualizing the airline arrival delay and depature delay by carriers, and exploring the relationships delays with other factors.

library(hflights)
library(ggplot2)

Subsetting data and displaying descriptive summary statistics

sub.delay <- hflights[hflights$Month == 12, 
                c("ArrDelay", "DepDelay", "UniqueCarrier", "DayOfWeek",
                  "AirTime", "Year", "Month", "DepTime", "Distance")]

str(sub.delay)
## 'data.frame':    19117 obs. of  9 variables:
##  $ ArrDelay     : int  47 -2 -17 53 -23 -6 -12 -12 -12 -7 ...
##  $ DepDelay     : int  63 -6 -3 58 -2 15 11 5 3 -3 ...
##  $ UniqueCarrier: chr  "AA" "AA" "AA" "AA" ...
##  $ DayOfWeek    : int  4 5 7 1 2 3 4 5 1 2 ...
##  $ AirTime      : int  44 39 46 54 41 43 39 36 43 53 ...
##  $ Year         : int  2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
##  $ Month        : int  12 12 12 12 12 12 12 12 12 12 ...
##  $ DepTime      : int  2113 2004 2007 2108 2008 2025 2021 2015 2013 2007 ...
##  $ Distance     : int  224 224 224 224 224 224 224 224 224 224 ...
head(sub.delay)
##         ArrDelay DepDelay UniqueCarrier DayOfWeek AirTime Year Month
## 5596743       47       63            AA         4      44 2011    12
## 5596744       -2       -6            AA         5      39 2011    12
## 5596745      -17       -3            AA         7      46 2011    12
## 5596746       53       58            AA         1      54 2011    12
## 5596747      -23       -2            AA         2      41 2011    12
## 5596748       -6       15            AA         3      43 2011    12
##         DepTime Distance
## 5596743    2113      224
## 5596744    2004      224
## 5596745    2007      224
## 5596746    2108      224
## 5596747    2008      224
## 5596748    2025      224
summary(sub.delay)
##     ArrDelay          DepDelay      UniqueCarrier        DayOfWeek    
##  Min.   :-57.000   Min.   :-33.00   Length:19117       Min.   :1.000  
##  1st Qu.:-10.000   1st Qu.: -2.00   Class :character   1st Qu.:2.000  
##  Median : -2.000   Median :  1.00   Mode  :character   Median :4.000  
##  Mean   :  5.013   Mean   : 10.99                      Mean   :4.018  
##  3rd Qu.:  9.000   3rd Qu.: 11.00                      3rd Qu.:6.000  
##  Max.   :978.000   Max.   :970.00                      Max.   :7.000  
##  NA's   :240       NA's   :164                                        
##     AirTime           Year          Month       DepTime        Distance   
##  Min.   : 24.0   Min.   :2011   Min.   :12   Min.   :   1   Min.   : 127  
##  1st Qu.: 61.0   1st Qu.:2011   1st Qu.:12   1st Qu.:1018   1st Qu.: 427  
##  Median :107.0   Median :2011   Median :12   Median :1408   Median : 844  
##  Mean   :108.9   Mean   :2011   Mean   :12   Mean   :1390   Mean   : 809  
##  3rd Qu.:141.0   3rd Qu.:2011   3rd Qu.:12   3rd Qu.:1755   3rd Qu.:1075  
##  Max.   :541.0   Max.   :2011   Max.   :12   Max.   :2359   Max.   :3904  
##  NA's   :240                                 NA's   :164

Facet the delays by Day of Week

ggplot(sub.delay, aes(x = as.factor(ArrDelay))) + geom_bar() + 
  coord_cartesian(xlim = c(0, 200)) +
  facet_wrap(~DayOfWeek, nrow = 1) +
  xlab("Arrival Delay") + 
  ylab("frequency")+
  labs(title = "Arrival Delay by Day of the Week in Dec. of 2011") 

ggplot(sub.delay, aes(x = as.factor(DepDelay))) + geom_bar() + 
  coord_cartesian(xlim = c(0, 200))  + 
  facet_wrap(~DayOfWeek, nrow = 1) +
  xlab("Departure Delay") + 
  ylab("frequency")+
  labs(title = "Depature Delay by Day of the Week in Dec. of 2011") 

Rank carriers by delay frequence and explore relationships between delays and distance.

ggplot(sub.delay[, c("ArrDelay", "UniqueCarrier")], 
       aes(x=reorder(UniqueCarrier, UniqueCarrier,
                         function(x)-length(x)))) + geom_bar() +
  xlab("Arrival Delay by Carrier") + 
  ylab("Arrival Delay")+
  labs(title = "Arrival Delay by Carriers in Dec. of 2011") 

ggplot(sub.delay[, c("DepDelay", "UniqueCarrier")], 
       aes(x=reorder(UniqueCarrier, UniqueCarrier,
                     function(x)-length(x)))) + geom_bar() +
  xlab("Depature Delay") + 
  ylab("Departure Delay") +
  labs(title = "Depature Delay by Carriers in Dec. of 2011") 

ggplot(sub.delay[sub.delay$ArrDelay > 0, c("ArrDelay", "DepDelay")],
       aes(x = DepDelay, y = ArrDelay)) + geom_point() +
       xlab("Depature Delay") + 
       ylab("Arrival Delay") +
       labs(title = "Depature Delay vs. Arrival Delay in Dec. of 2011") 
## Warning: Removed 240 rows containing missing values (geom_point).

ggplot(sub.delay[sub.delay$ArrDelay > 0, c("ArrDelay", "Distance")],
       aes(x = Distance, y = ArrDelay)) + geom_point()+
  xlab("Distance") + 
  ylab("Arrival Delay") +
  labs(title = "Distance vs. Arrival Delay in Dec. of 2011") 
## Warning: Removed 240 rows containing missing values (geom_point).

The observations show: - Delay data have outliers. - Departure delay leads to arrival delay. - Delays are the highest on Friday - The carriers on the top frequency of delay list: XE, CO, WN, and OO. - Distance has no direct impact on delays.