I want to choose the travel time and airline smartly, so I have the best chance to avoid delays and arrive at my destination in time. So I analyzed the hflight data with the KPI Arrival Delay:

#load data
library("hflights")
## Warning: package 'hflights' was built under R version 3.2.1
#change column data type
hflights$UniqueCarrier<-as.factor(hflights$UniqueCarrier)
hflights$DayOfWeek<-as.factor(hflights$DayOfWeek)

#create and append departure time field
FlightDate<-paste(hflights$Year, hflights$Month, hflights$DayofMonth,sep="-")
data<-cbind(hflights,FlightDate)

#select rows where flights are delayed
del<-data[which(data$ArrDelay >0),-(1:3)]

#subset columns of interest
sub<-subset(del, select=c(DayOfWeek,UniqueCarrier,ArrDelay,DepDelay,AirTime,FlightDate))

#summary statistics for data analysis
summary(sub)
##  DayOfWeek UniqueCarrier      ArrDelay         DepDelay     
##  1:16771   XE     :35429   Min.   :  1.00   Min.   :-15.00  
##  2:14243   CO     :34044   1st Qu.:  5.00   1st Qu.:  0.00  
##  3:13933   WN     :20685   Median : 12.00   Median :  9.00  
##  4:17256   OO     : 8443   Mean   : 24.28   Mean   : 21.04  
##  5:16483   MQ     : 1665   3rd Qu.: 27.00   3rd Qu.: 26.00  
##  6:13261   US     : 1317   Max.   :978.00   Max.   :981.00  
##  7:14973   (Other): 5337                                    
##     AirTime          FlightDate    
##  Min.   : 22.0   2011-5-20:   565  
##  1st Qu.: 60.0   2011-6-22:   564  
##  Median :109.0   2011-3-14:   558  
##  Mean   :111.5   2011-4-4 :   543  
##  3rd Qu.:145.0   2011-6-21:   527  
##  Max.   :549.0   2011-4-25:   520  
##                  (Other)  :103643

1. Histogram of Arrival Delays by day of week:

#qplot(UniqueCarrier, data=sub, geom="bar", fill=DayOfWeek)
#ggplot(sub, aes(UniqueCarrier)) + geom_freqpoly(aes(group = DayOfWeek, colour = DayOfWeek))
#Stacked bars are easy, but might be overloaded with information. Faceting might be a better solution.
## Warning: package 'ggthemes' was built under R version 3.2.1



Congestion tends to happen during weekend and mid week, although there are viarations for airlines with smaller volume.


2. Boxplot of Delay Time by airline:


Most of the delays are short (mean = 24 min), although there are significant outliers with certain airlines.


3. Evaluate airline performance:
Simply counting the frequency of delayed flights per airline is misleading, as bigger outlines represent majority of flights and likely most of the delays. Therefore, I created calculated fields for analysis:

#calculate number of delayed flights by carrier
library("sqldf")
ucd<-sqldf("select UniqueCarrier, count(ArrDelay) as delays from hflights where ArrDelay >0 group by UniqueCarrier")

#calculate number of total flights by carrier, if arrival information exists
uc<-sqldf("select UniqueCarrier, count(ArrDelay) as flights from hflights where ArrDelay is not null group by UniqueCarrier")

#merge the results and create calculated field: %of delayed flights
library(dplyr)
stats<-merge(ucd,uc,all=TRUE)
ratio<-with(stats, 100*delays/flights)
stats<-cbind(stats,ratio)
stats
##    UniqueCarrier delays flights    ratio
## 1             AA    963    3178 30.30208
## 2             AS    159     364 43.68132
## 3             B6    266     673 39.52452
## 4             CO  34044   69373 49.07385
## 5             DL   1003    2591 38.71092
## 6             EV    780    2121 36.77511
## 7             F9    463     832 55.64904
## 8             FL    657    2111 31.12269
## 9             MQ   1665    4504 36.96714
## 10            OO   8443   15781 53.50105
## 11            UA   1009    2033 49.63109
## 12            US   1317    4030 32.67990
## 13            WN  20685   44536 46.44557
## 14            XE  35429   71669 49.43420
## 15            YV     37      78 47.43590


Bar chart of percentage of delayed flights, organized by airline size; number of flights labeled:


AA has the lowest arrival delay rate while F9 the highest. Notice that from a certain point on, flight delay rate tend to increase as the volume of flights grows. This can also be seen in the following scatter plot:



4. (Is it obvious?) Scatter plot of arrival delay and departure delay colored by airline:

There is strong positive correlation between departure delays and arrival delays, suggesting airport congestion rather than flight time likely caused the delay.

My conclusion is that I want to avoid weekends; Tuesday/Thursday may be best. I would also choose to fly US, FL, or AA - they have decent number of flights but low delay rate.