Introduction

The hflights dataset contains all flights departing from Houston airports IAH (George Bush Intercontinental) and HOU (Houston Hobby) in 2011. The data comes from the Research and Innovation Technology Administration at the Bureau of Transportation statistics. These are the variables or columns in the dataset.

Variables

First Look at Dataset

We can take our first look at the hflights dataset with R’s Head and Tail functions. We can follow up by looking at the structure and summary of the data. Notice the mostly meaningless summary information on numeric fields like year, month, flightnum, cancelled, diverted and so on.

library(hflights)
head(hflights)
##      Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier
## 5424 2011     1          1         6    1400    1500            AA
## 5425 2011     1          2         7    1401    1501            AA
## 5426 2011     1          3         1    1352    1502            AA
## 5427 2011     1          4         2    1403    1513            AA
## 5428 2011     1          5         3    1405    1507            AA
## 5429 2011     1          6         4    1359    1503            AA
##      FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin
## 5424       428  N576AA                60      40      -10        0    IAH
## 5425       428  N557AA                60      45       -9        1    IAH
## 5426       428  N541AA                70      48       -8       -8    IAH
## 5427       428  N403AA                70      39        3        3    IAH
## 5428       428  N492AA                62      44       -3        5    IAH
## 5429       428  N262AA                64      45       -7       -1    IAH
##      Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted
## 5424  DFW      224      7      13         0                         0
## 5425  DFW      224      6       9         0                         0
## 5426  DFW      224      5      17         0                         0
## 5427  DFW      224      9      22         0                         0
## 5428  DFW      224      9       9         0                         0
## 5429  DFW      224      6      13         0                         0
tail(hflights)
##         Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier
## 6083254 2011    12          6         2    1307    1600            WN
## 6083255 2011    12          6         2    1818    2111            WN
## 6083256 2011    12          6         2    2047    2334            WN
## 6083257 2011    12          6         2     912    1031            WN
## 6083258 2011    12          6         2     656     812            WN
## 6083259 2011    12          6         2    1600    1713            WN
##         FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay
## 6083254       471  N632SW               113      98        0        7
## 6083255      1191  N284WN               113      97       -9        8
## 6083256      1674  N366SW               107      94        4        7
## 6083257       127  N777QC                79      61       -4       -3
## 6083258       621  N727SW                76      64      -13       -4
## 6083259      1597  N745SW                73      59      -12        0
##         Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode
## 6083254    HOU  TPA      781      5      10         0                 
## 6083255    HOU  TPA      781      5      11         0                 
## 6083256    HOU  TPA      781      4       9         0                 
## 6083257    HOU  TUL      453      4      14         0                 
## 6083258    HOU  TUL      453      3       9         0                 
## 6083259    HOU  TUL      453      3      11         0                 
##         Diverted
## 6083254        0
## 6083255        0
## 6083256        0
## 6083257        0
## 6083258        0
## 6083259        0
str(hflights)
## 'data.frame':    227496 obs. of  21 variables:
##  $ Year             : int  2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
##  $ Month            : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DayofMonth       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ DayOfWeek        : int  6 7 1 2 3 4 5 6 7 1 ...
##  $ DepTime          : int  1400 1401 1352 1403 1405 1359 1359 1355 1443 1443 ...
##  $ ArrTime          : int  1500 1501 1502 1513 1507 1503 1509 1454 1554 1553 ...
##  $ UniqueCarrier    : chr  "AA" "AA" "AA" "AA" ...
##  $ FlightNum        : int  428 428 428 428 428 428 428 428 428 428 ...
##  $ TailNum          : chr  "N576AA" "N557AA" "N541AA" "N403AA" ...
##  $ ActualElapsedTime: int  60 60 70 70 62 64 70 59 71 70 ...
##  $ AirTime          : int  40 45 48 39 44 45 43 40 41 45 ...
##  $ ArrDelay         : int  -10 -9 -8 3 -3 -7 -1 -16 44 43 ...
##  $ DepDelay         : int  0 1 -8 3 5 -1 -1 -5 43 43 ...
##  $ Origin           : chr  "IAH" "IAH" "IAH" "IAH" ...
##  $ Dest             : chr  "DFW" "DFW" "DFW" "DFW" ...
##  $ Distance         : int  224 224 224 224 224 224 224 224 224 224 ...
##  $ TaxiIn           : int  7 6 5 9 9 6 12 7 8 6 ...
##  $ TaxiOut          : int  13 9 17 22 9 13 15 12 22 19 ...
##  $ Cancelled        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ CancellationCode : chr  "" "" "" "" ...
##  $ Diverted         : int  0 0 0 0 0 0 0 0 0 0 ...
summary(hflights)
##       Year          Month          DayofMonth      DayOfWeek    
##  Min.   :2011   Min.   : 1.000   Min.   : 1.00   Min.   :1.000  
##  1st Qu.:2011   1st Qu.: 4.000   1st Qu.: 8.00   1st Qu.:2.000  
##  Median :2011   Median : 7.000   Median :16.00   Median :4.000  
##  Mean   :2011   Mean   : 6.514   Mean   :15.74   Mean   :3.948  
##  3rd Qu.:2011   3rd Qu.: 9.000   3rd Qu.:23.00   3rd Qu.:6.000  
##  Max.   :2011   Max.   :12.000   Max.   :31.00   Max.   :7.000  
##                                                                 
##     DepTime        ArrTime     UniqueCarrier        FlightNum   
##  Min.   :   1   Min.   :   1   Length:227496      Min.   :   1  
##  1st Qu.:1021   1st Qu.:1215   Class :character   1st Qu.: 855  
##  Median :1416   Median :1617   Mode  :character   Median :1696  
##  Mean   :1396   Mean   :1578                      Mean   :1962  
##  3rd Qu.:1801   3rd Qu.:1953                      3rd Qu.:2755  
##  Max.   :2400   Max.   :2400                      Max.   :7290  
##  NA's   :2905   NA's   :3066                                    
##    TailNum          ActualElapsedTime    AirTime         ArrDelay      
##  Length:227496      Min.   : 34.0     Min.   : 11.0   Min.   :-70.000  
##  Class :character   1st Qu.: 77.0     1st Qu.: 58.0   1st Qu.: -8.000  
##  Mode  :character   Median :128.0     Median :107.0   Median :  0.000  
##                     Mean   :129.3     Mean   :108.1   Mean   :  7.094  
##                     3rd Qu.:165.0     3rd Qu.:141.0   3rd Qu.: 11.000  
##                     Max.   :575.0     Max.   :549.0   Max.   :978.000  
##                     NA's   :3622      NA's   :3622    NA's   :3622     
##     DepDelay          Origin              Dest              Distance     
##  Min.   :-33.000   Length:227496      Length:227496      Min.   :  79.0  
##  1st Qu.: -3.000   Class :character   Class :character   1st Qu.: 376.0  
##  Median :  0.000   Mode  :character   Mode  :character   Median : 809.0  
##  Mean   :  9.445                                         Mean   : 787.8  
##  3rd Qu.:  9.000                                         3rd Qu.:1042.0  
##  Max.   :981.000                                         Max.   :3904.0  
##  NA's   :2905                                                            
##      TaxiIn           TaxiOut         Cancelled       CancellationCode  
##  Min.   :  1.000   Min.   :  1.00   Min.   :0.00000   Length:227496     
##  1st Qu.:  4.000   1st Qu.: 10.00   1st Qu.:0.00000   Class :character  
##  Median :  5.000   Median : 14.00   Median :0.00000   Mode  :character  
##  Mean   :  6.099   Mean   : 15.09   Mean   :0.01307                     
##  3rd Qu.:  7.000   3rd Qu.: 18.00   3rd Qu.:0.00000                     
##  Max.   :165.000   Max.   :163.00   Max.   :1.00000                     
##  NA's   :3066      NA's   :2947                                         
##     Diverted       
##  Min.   :0.000000  
##  1st Qu.:0.000000  
##  Median :0.000000  
##  Mean   :0.002853  
##  3rd Qu.:0.000000  
##  Max.   :1.000000  
## 

We see a dataframe of 227,496 observations of the 21 variables described above. The variables can be grouped as follows.

We can see that although we have 21 variables, we really cover only five concepts or ideas. We might also infer that “How Long” the flight took is an important concept, since eight (8) variables have something to do with total time involved in a flight. By the way, although not defined in the dataset write up, I observed that Monday is day 1 of the week through Sunday being day 7.

Based on this initial examination of the dataset we should be able to gain some insight into where people fly to from Houston, how long it takes, and the likelihood of getting there.

Formatting the Dataframe

To make the dataframe easier to use I want to combine the date-related fields into a date and make the variables in the dataframe factors. I find the summary information on factor data very useful. I also plan to make the binary fields True-False. We will also start the data formatting by making a copy of the hflights dataframe.

When Vaiables

I find it more useful most times to have real date fields instead of integers for months, days and years. Here we will combine the Year, Month, DayofMonth variables into Date. We will also make the DayofWeek variable into a legible factor variable. After the conversions we will drop the separate year, month, and day fields. Notice the almost even distribution of number of flights by day of the week from the summary.

hf.new <- hflights
hf.new$Date <- as.Date(paste(hf.new$Year, hf.new$Month, hf.new$DayofMonth, sep="-"))
hf.new$DayOfWeek[hf.new$DayOfWeek == 1] <- "Mon"
hf.new$DayOfWeek[hf.new$DayOfWeek == 2] <- "Tue"
hf.new$DayOfWeek[hf.new$DayOfWeek == 3] <- "Wed"
hf.new$DayOfWeek[hf.new$DayOfWeek == 4] <- "Thu"
hf.new$DayOfWeek[hf.new$DayOfWeek == 5] <- "Fri"
hf.new$DayOfWeek[hf.new$DayOfWeek == 6] <- "Sat"
hf.new$DayOfWeek[hf.new$DayOfWeek == 7] <- "Sun"
hf.new$DayOfWeek <- as.factor(hf.new$DayOfWeek)
hf.new <- subset(hf.new, select = c(-Year, -Month, -DayofMonth))
summary(hf.new)
##  DayOfWeek      DepTime        ArrTime     UniqueCarrier     
##  Fri:34972   Min.   :   1   Min.   :   1   Length:227496     
##  Mon:34360   1st Qu.:1021   1st Qu.:1215   Class :character  
##  Sat:27629   Median :1416   Median :1617   Mode  :character  
##  Sun:32058   Mean   :1396   Mean   :1578                     
##  Thu:34902   3rd Qu.:1801   3rd Qu.:1953                     
##  Tue:31649   Max.   :2400   Max.   :2400                     
##  Wed:31926   NA's   :2905   NA's   :3066                     
##    FlightNum      TailNum          ActualElapsedTime    AirTime     
##  Min.   :   1   Length:227496      Min.   : 34.0     Min.   : 11.0  
##  1st Qu.: 855   Class :character   1st Qu.: 77.0     1st Qu.: 58.0  
##  Median :1696   Mode  :character   Median :128.0     Median :107.0  
##  Mean   :1962                      Mean   :129.3     Mean   :108.1  
##  3rd Qu.:2755                      3rd Qu.:165.0     3rd Qu.:141.0  
##  Max.   :7290                      Max.   :575.0     Max.   :549.0  
##                                    NA's   :3622      NA's   :3622   
##     ArrDelay          DepDelay          Origin              Dest          
##  Min.   :-70.000   Min.   :-33.000   Length:227496      Length:227496     
##  1st Qu.: -8.000   1st Qu.: -3.000   Class :character   Class :character  
##  Median :  0.000   Median :  0.000   Mode  :character   Mode  :character  
##  Mean   :  7.094   Mean   :  9.445                                        
##  3rd Qu.: 11.000   3rd Qu.:  9.000                                        
##  Max.   :978.000   Max.   :981.000                                        
##  NA's   :3622      NA's   :2905                                           
##     Distance          TaxiIn           TaxiOut         Cancelled      
##  Min.   :  79.0   Min.   :  1.000   Min.   :  1.00   Min.   :0.00000  
##  1st Qu.: 376.0   1st Qu.:  4.000   1st Qu.: 10.00   1st Qu.:0.00000  
##  Median : 809.0   Median :  5.000   Median : 14.00   Median :0.00000  
##  Mean   : 787.8   Mean   :  6.099   Mean   : 15.09   Mean   :0.01307  
##  3rd Qu.:1042.0   3rd Qu.:  7.000   3rd Qu.: 18.00   3rd Qu.:0.00000  
##  Max.   :3904.0   Max.   :165.000   Max.   :163.00   Max.   :1.00000  
##                   NA's   :3066      NA's   :2947                      
##  CancellationCode      Diverted             Date           
##  Length:227496      Min.   :0.000000   Min.   :2011-01-01  
##  Class :character   1st Qu.:0.000000   1st Qu.:2011-04-03  
##  Mode  :character   Median :0.000000   Median :2011-07-02  
##                     Mean   :0.002853   Mean   :2011-07-01  
##                     3rd Qu.:0.000000   3rd Qu.:2011-09-29  
##                     Max.   :1.000000   Max.   :2011-12-31  
## 

How Long Variables

These variables were all numeric and it makes sense to leave them in that form. These are the only variables where calculations may be needed.

Flight ID & Location Variables

It makes sense to turn these variables into factors in the dataframe. This allows for simple counts just running the summary function. The Distance variable grouped with location is a numeric value and will stay that way. Notice the number of carriers, planes (by tail num), the only two origins, and number of destinations.

hf.new$UniqueCarrier <- as.factor(hf.new$UniqueCarrier)
hf.new$FlightNum <- as.factor(hf.new$FlightNum)
hf.new$TailNum <- as.factor(hf.new$TailNum)
hf.new$Origin <- as.factor(hf.new$Origin)
hf.new$Dest <- as.factor(hf.new$Dest)
summary(hf.new)
##  DayOfWeek      DepTime        ArrTime     UniqueCarrier  
##  Fri:34972   Min.   :   1   Min.   :   1   XE     :73053  
##  Mon:34360   1st Qu.:1021   1st Qu.:1215   CO     :70032  
##  Sat:27629   Median :1416   Median :1617   WN     :45343  
##  Sun:32058   Mean   :1396   Mean   :1578   OO     :16061  
##  Thu:34902   3rd Qu.:1801   3rd Qu.:1953   MQ     : 4648  
##  Tue:31649   Max.   :2400   Max.   :2400   US     : 4082  
##  Wed:31926   NA's   :2905   NA's   :3066   (Other):14277  
##    FlightNum         TailNum       ActualElapsedTime    AirTime     
##  52     :   667   N14945 :   971   Min.   : 34.0     Min.   : 11.0  
##  8      :   636   N15926 :   960   1st Qu.: 77.0     1st Qu.: 58.0  
##  1590   :   634   N16927 :   951   Median :128.0     Median :107.0  
##  35     :   618   N12946 :   948   Mean   :129.3     Mean   :108.1  
##  1      :   606   N14937 :   946   3rd Qu.:165.0     3rd Qu.:141.0  
##  60     :   600   N14942 :   946   Max.   :575.0     Max.   :549.0  
##  (Other):223735   (Other):221774   NA's   :3622      NA's   :3622   
##     ArrDelay          DepDelay       Origin            Dest       
##  Min.   :-70.000   Min.   :-33.000   HOU: 52299   DAL    :  9820  
##  1st Qu.: -8.000   1st Qu.: -3.000   IAH:175197   ATL    :  7886  
##  Median :  0.000   Median :  0.000                MSY    :  6823  
##  Mean   :  7.094   Mean   :  9.445                DFW    :  6653  
##  3rd Qu.: 11.000   3rd Qu.:  9.000                LAX    :  6064  
##  Max.   :978.000   Max.   :981.000                DEN    :  5920  
##  NA's   :3622      NA's   :2905                   (Other):184330  
##     Distance          TaxiIn           TaxiOut         Cancelled      
##  Min.   :  79.0   Min.   :  1.000   Min.   :  1.00   Min.   :0.00000  
##  1st Qu.: 376.0   1st Qu.:  4.000   1st Qu.: 10.00   1st Qu.:0.00000  
##  Median : 809.0   Median :  5.000   Median : 14.00   Median :0.00000  
##  Mean   : 787.8   Mean   :  6.099   Mean   : 15.09   Mean   :0.01307  
##  3rd Qu.:1042.0   3rd Qu.:  7.000   3rd Qu.: 18.00   3rd Qu.:0.00000  
##  Max.   :3904.0   Max.   :165.000   Max.   :163.00   Max.   :1.00000  
##                   NA's   :3066      NA's   :2947                      
##  CancellationCode      Diverted             Date           
##  Length:227496      Min.   :0.000000   Min.   :2011-01-01  
##  Class :character   1st Qu.:0.000000   1st Qu.:2011-04-03  
##  Mode  :character   Median :0.000000   Median :2011-07-02  
##                     Mean   :0.002853   Mean   :2011-07-01  
##                     3rd Qu.:0.000000   3rd Qu.:2011-09-29  
##                     Max.   :1.000000   Max.   :2011-12-31  
## 

Cancelled Variables

We have two binary variables for yes-no or true-false for “Flight Cancelled” and “Flight Diverted”. I changed them to True-False fields for simplicity. The cancellation code field was turned into a factor for ease of counting the types. The many blank values were made into NA’s. Notice that we now know there were 2,973 canceled flights, 649 diverted flights, and that the cancellation codes counts add up to match the number of canceled flights.

hf.new[["Cancelled"]] <- hf.new$Cancelled == 1
hf.new[["Diverted"]] <- hf.new$Diverted == 1
hf.new$CancellationCode[hf.new$CancellationCode == ""] <- NA
hf.new$CancellationCode <- as.factor(hf.new$CancellationCode)
summary(hf.new)
##  DayOfWeek      DepTime        ArrTime     UniqueCarrier  
##  Fri:34972   Min.   :   1   Min.   :   1   XE     :73053  
##  Mon:34360   1st Qu.:1021   1st Qu.:1215   CO     :70032  
##  Sat:27629   Median :1416   Median :1617   WN     :45343  
##  Sun:32058   Mean   :1396   Mean   :1578   OO     :16061  
##  Thu:34902   3rd Qu.:1801   3rd Qu.:1953   MQ     : 4648  
##  Tue:31649   Max.   :2400   Max.   :2400   US     : 4082  
##  Wed:31926   NA's   :2905   NA's   :3066   (Other):14277  
##    FlightNum         TailNum       ActualElapsedTime    AirTime     
##  52     :   667   N14945 :   971   Min.   : 34.0     Min.   : 11.0  
##  8      :   636   N15926 :   960   1st Qu.: 77.0     1st Qu.: 58.0  
##  1590   :   634   N16927 :   951   Median :128.0     Median :107.0  
##  35     :   618   N12946 :   948   Mean   :129.3     Mean   :108.1  
##  1      :   606   N14937 :   946   3rd Qu.:165.0     3rd Qu.:141.0  
##  60     :   600   N14942 :   946   Max.   :575.0     Max.   :549.0  
##  (Other):223735   (Other):221774   NA's   :3622      NA's   :3622   
##     ArrDelay          DepDelay       Origin            Dest       
##  Min.   :-70.000   Min.   :-33.000   HOU: 52299   DAL    :  9820  
##  1st Qu.: -8.000   1st Qu.: -3.000   IAH:175197   ATL    :  7886  
##  Median :  0.000   Median :  0.000                MSY    :  6823  
##  Mean   :  7.094   Mean   :  9.445                DFW    :  6653  
##  3rd Qu.: 11.000   3rd Qu.:  9.000                LAX    :  6064  
##  Max.   :978.000   Max.   :981.000                DEN    :  5920  
##  NA's   :3622      NA's   :2905                   (Other):184330  
##     Distance          TaxiIn           TaxiOut       Cancelled      
##  Min.   :  79.0   Min.   :  1.000   Min.   :  1.00   Mode :logical  
##  1st Qu.: 376.0   1st Qu.:  4.000   1st Qu.: 10.00   FALSE:224523   
##  Median : 809.0   Median :  5.000   Median : 14.00   TRUE :2973     
##  Mean   : 787.8   Mean   :  6.099   Mean   : 15.09   NA's :0        
##  3rd Qu.:1042.0   3rd Qu.:  7.000   3rd Qu.: 18.00                  
##  Max.   :3904.0   Max.   :165.000   Max.   :163.00                  
##                   NA's   :3066      NA's   :2947                    
##  CancellationCode  Diverted            Date           
##  A   :  1202      Mode :logical   Min.   :2011-01-01  
##  B   :  1652      FALSE:226847    1st Qu.:2011-04-03  
##  C   :   118      TRUE :649       Median :2011-07-02  
##  D   :     1      NA's :0         Mean   :2011-07-01  
##  NA's:224523                      3rd Qu.:2011-09-29  
##                                   Max.   :2011-12-31  
## 

Individual Variable Examination

Let’s take a closer look at the data and do some data visulization in the order we have following.

When Vaiables

We can use histograms to look at the number of flights by different time periods, such as by month throughout the year and by day of the week. I found that by coded the days of week with thw first three letters makes it easier to read, I will need to figure out how to set the order. Alphabetical seems to be the default and makes Friday comes first, followd by Monday.

Notice the summer months are peak flight times for the Houston airports with August hitting 20,000 flights, July slightly under that followed by a little lower June. Fall looks flat and steady as close to 18,000 flights a month. January is the lowest with just over 15,000 flights.

Days of the week are interesting too, with Saturdays being the least likely day to fly and Friday being the most likely. The really is not much variation by day of the week. With all flights for the year divided ou by day, Saturday is just under 30,000 and the other day are all between 30,000 and 35,000.

library(ggplot2)
ggplot(hf.new, aes(x = hf.new$Date)) + geom_histogram(binwidth = 30, fill="red", color="black")

ggplot(hf.new, aes(x = hf.new$DayOfWeek)) + geom_histogram(fill="orange", color="black")

How Long Variables

We can quickly look at when you are most likely to have a flight delayed by looking at the departure delay over time. You can see that we may need to spread this out, maybe look at individual months to get real detail. However, this overview does tell us some important information. That thick band of black data points at the bottom at the first box line means 125 minutes of delay. It looks like a 2 hour delay is very likely all through the year. The very long delays (over 500 minutes) are infrequent, but spread pretty evenly across the year.

ggplot(hf.new, aes(x= Date, y = DepDelay)) + geom_line(color = "Green") + geom_point()
## Warning: Removed 2905 rows containing missing values (geom_point).

Fini

I am out of time now, so I will end here. My last look at this dataset will be to use my favorite function again a new parameter; summary(hf.new, 20). This will take the element counts of the factor variables to 20 elements. From this we can see that there were 15 different airlines flying out of Houston in 2011; that the top 20 planes flew around 900 flights in 2011; that IAH had over 175,000 flights originate from there and HOU had just over 52,000; and the top 20 destination airports with Dallas leading the list at 9,820 flights in 2011.

summary(hf.new, 20)
##  DayOfWeek      DepTime        ArrTime     UniqueCarrier   FlightNum     
##  Fri:34972   Min.   :   1   Min.   :   1   AA: 3244      52     :   667  
##  Mon:34360   1st Qu.:1021   1st Qu.:1215   AS:  365      8      :   636  
##  Sat:27629   Median :1416   Median :1617   B6:  695      1590   :   634  
##  Sun:32058   Mean   :1396   Mean   :1578   CO:70032      35     :   618  
##  Thu:34902   3rd Qu.:1801   3rd Qu.:1953   DL: 2641      1      :   606  
##  Tue:31649   Max.   :2400   Max.   :2400   EV: 2204      60     :   600  
##  Wed:31926   NA's   :2905   NA's   :3066   F9:  838      47     :   574  
##                                            FL: 2139      6      :   567  
##                                            MQ: 4648      5      :   537  
##                                            OO:16061      33     :   523  
##                                            UA: 2072      1294   :   411  
##                                            US: 4082      731    :   405  
##                                            WN:45343      286    :   387  
##                                            XE:73053      270    :   382  
##                                            YV:   79      106    :   371  
##                                                          3216   :   370  
##                                                          62     :   366  
##                                                          89     :   365  
##                                                          1586   :   365  
##                                                          (Other):218112  
##     TailNum       ActualElapsedTime    AirTime         ArrDelay      
##  N14945 :   971   Min.   : 34.0     Min.   : 11.0   Min.   :-70.000  
##  N15926 :   960   1st Qu.: 77.0     1st Qu.: 58.0   1st Qu.: -8.000  
##  N16927 :   951   Median :128.0     Median :107.0   Median :  0.000  
##  N12946 :   948   Mean   :129.3     Mean   :108.1   Mean   :  7.094  
##  N14937 :   946   3rd Qu.:165.0     3rd Qu.:141.0   3rd Qu.: 11.000  
##  N14942 :   946   Max.   :575.0     Max.   :549.0   Max.   :978.000  
##  N15948 :   942   NA's   :3622      NA's   :3622    NA's   :3622     
##  N14938 :   935                                                      
##  N13935 :   934                                                      
##  N14943 :   934                                                      
##  N14947 :   921                                                      
##  N15932 :   920                                                      
##  N13936 :   915                                                      
##  N14930 :   913                                                      
##  N15941 :   911                                                      
##  N16944 :   909                                                      
##  N14939 :   902                                                      
##  N14933 :   897                                                      
##  N12934 :   889                                                      
##  (Other):209852                                                      
##     DepDelay       Origin            Dest           Distance     
##  Min.   :-33.000   HOU: 52299   DAL    :  9820   Min.   :  79.0  
##  1st Qu.: -3.000   IAH:175197   ATL    :  7886   1st Qu.: 376.0  
##  Median :  0.000                MSY    :  6823   Median : 809.0  
##  Mean   :  9.445                DFW    :  6653   Mean   : 787.8  
##  3rd Qu.:  9.000                LAX    :  6064   3rd Qu.:1042.0  
##  Max.   :981.000                DEN    :  5920   Max.   :3904.0  
##  NA's   :2905                   ORD    :  5748                   
##                                 PHX    :  5096                   
##                                 AUS    :  5022                   
##                                 SAT    :  4893                   
##                                 CRP    :  4813                   
##                                 CLT    :  4735                   
##                                 EWR    :  4314                   
##                                 LAS    :  4082                   
##                                 HRL    :  3983                   
##                                 MCO    :  3687                   
##                                 BNA    :  3481                   
##                                 MCI    :  3174                   
##                                 OKC    :  3170                   
##                                 (Other):128132                   
##      TaxiIn           TaxiOut       Cancelled       CancellationCode
##  Min.   :  1.000   Min.   :  1.00   Mode :logical   A   :  1202     
##  1st Qu.:  4.000   1st Qu.: 10.00   FALSE:224523    B   :  1652     
##  Median :  5.000   Median : 14.00   TRUE :2973      C   :   118     
##  Mean   :  6.099   Mean   : 15.09   NA's :0         D   :     1     
##  3rd Qu.:  7.000   3rd Qu.: 18.00                   NA's:224523     
##  Max.   :165.000   Max.   :163.00                                   
##  NA's   :3066      NA's   :2947                                     
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##                                                                     
##   Diverted            Date           
##  Mode :logical   Min.   :2011-01-01  
##  FALSE:226847    1st Qu.:2011-04-03  
##  TRUE :649       Median :2011-07-02  
##  NA's :0         Mean   :2011-07-01  
##                  3rd Qu.:2011-09-29  
##                  Max.   :2011-12-31  
##                                      
##                                      
##                                      
##                                      
##                                      
##                                      
##                                      
##                                      
##                                      
##                                      
##                                      
##                                      
##                                      
##