Overview

Data Collection

This nycflights13 r-package contains information about all flights that departed from NYC (e.g. EWR, JFK and LGA) to destinations in the United States, Puerto Rico, and the American Virgin Islands) in 2013: 336,776 flights in total.

To help understand what causes delays, it also includes a number of other useful datasets:

  • flights: 2013 flights departed from NYC
  • weather: hourly meterological data for each airport
  • planes: construction information about each plane
  • airports: airport names and locations
  • airlines: airline carrier codes and names.

This is an observational study.

Dependent Variable

The response variable is quantitative variable, “departure delays”, in minutes. Negative times represent early departures.

This variable will be transformed to log (departure delays) to obtain a more symmetrical distribution.

Independent variables to be considered:

  • origin.x - NYC airport
  • date
  • month.x
  • dest - destination airport
  • distance - how long is the trip
  • name - airline carrier
  • year.y - airplane age
  • type - airplane type
  • engine - airplane engine
  • engines - number of engines
  • manufacturer - airplane maker
  • seats - number of seats (proxy for size of plane)
  • temp - air temp
  • wind_dir - wind direction
  • wind_speed - wind_speed
  • wind_gust - wind gust speed
  • humid - relative humidity
  • dewp - dew point
  • precip - Precipitation, in inches
  • visibility - visibility in miles
  • pressure - sea level pressure in millibars

Research question

What are the most significant variables driving JFK airline departure delays?

Part 1 - Introduction

Data Preparation

There was a substantial amount of data cleaning and wrangling that preceeded the analysis.

Merging the 4 datasets to include fields from the weather, planes and airlines tables to supplement the flights dataset was performed.

Airlines and Planes were merged to flights then a variable origin.time was created by concatenate in the weather and previously merged dataset to the final merge.

The observations with missing values in departure delay variable were dropped.

The research question: What is driving delays at JFK in 2013?

There are over 330,000 observations with variables relating to flights, weather, airplanes and carriers.

A multiple linear regression model will aid in identifying significant drivers of delays.

Part 2 - Data

Let’s take a look at the dependent variable departure delays:

hist(flights6$dep_delay)

> DISCUSSION:

This variable has negative values and is right skewed. Let’s transform this variable to: log(dep_delay+1).

Also, create a variable depdelay_YN, if delay is less than 15 minutes = “N”, otherwise if delay is more than 15 minutes = “Y”.

In addition, change month from 1, 2, 3…. to Jan, Feb, Mar.

Create a precip_YN variable.

Create a visibility10_YN variable.

Modify windgust variable, if NA use windspeed.

Part 3 - Exploratory data analysis

Let’s look at departure delays by airport.

* The airport with the most flights is Newark, followed by JFK and Laguardia.

However, is there an association between airports and delays?

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  328521 
## 
##  
##                 | flights6$depdelay_YN 
## flights6$origin |         N |         Y | Row Total | 
## ----------------|-----------|-----------|-----------|
##             EWR |     87821 |     29775 |    117596 | 
##                 |   147.610 |   517.460 |           | 
##                 |     0.747 |     0.253 |     0.358 | 
##                 |     0.344 |     0.408 |           | 
##                 |     0.267 |     0.091 |           | 
## ----------------|-----------|-----------|-----------|
##             JFK |     86069 |     23347 |    109416 | 
##                 |    10.323 |    36.190 |           | 
##                 |     0.787 |     0.213 |     0.333 | 
##                 |     0.337 |     0.320 |           | 
##                 |     0.262 |     0.071 |           | 
## ----------------|-----------|-----------|-----------|
##             LGA |     81717 |     19792 |    101509 | 
##                 |    94.887 |   332.636 |           | 
##                 |     0.805 |     0.195 |     0.309 | 
##                 |     0.320 |     0.271 |           | 
##                 |     0.249 |     0.060 |           | 
## ----------------|-----------|-----------|-----------|
##    Column Total |    255607 |     72914 |    328521 | 
##                 |     0.778 |     0.222 |           | 
## ----------------|-----------|-----------|-----------|
## 
## 
## 
##  Pearson's Chi-squared test
## 
## data:  flights6$origin and flights6$depdelay_YN
## X-squared = 1139.1, df = 2, p-value < 2.2e-16

The chisq test is significant for an association between delays and airports.

The three NYC airports do have differing attributes.They are separated geographically and may have slightly different weather patterns. Different airline carriers use different airports and offer service different destinations. It is best to look at one airport

At this point, let’s drill down on one airport…………………..JFK

  • What is driving delays at JFK in 2013? > DISCUSSION:

JetBlue dominates the flights at JFK, with over 40,000 flights with Delta following at 20,000 flights. It appears the MISSING NA are few, however, Endeavor seems to have an inordinate amount of missing.

We are interested in flights with delays, so let’s subset the dataset. Let’s consider significant delays of 15 minutes or more.

JFK_d<- 
  filter (JFK,dep_delay>14)

So, now the dataset is JFK significant delayed flights.

dim(JFK_d)
## [1] 23347    30

DISCUSSION:

The number of 2013 JFK departures that are delayed 15 minutes or more 23,347.

Here is a summary of the new data set of delays.

##    origin.x              date               month.x         dep_delay      
##  Length:23347       Min.   :2013-01-01   Min.   : 1.000   Min.   :  15.00  
##  Class :character   1st Qu.:2013-04-11   1st Qu.: 4.000   1st Qu.:  25.00  
##  Mode  :character   Median :2013-06-29   Median : 6.000   Median :  43.00  
##                     Mean   :2013-06-28   Mean   : 6.419   Mean   :  63.69  
##                     3rd Qu.:2013-09-02   3rd Qu.: 9.000   3rd Qu.:  81.00  
##                     Max.   :2013-12-31   Max.   :12.000   Max.   :1301.00  
##                                                                            
##      dest              distance        speed           name          
##  Length:23347       Min.   :  94   Min.   :105.0   Length:23347      
##  Class :character   1st Qu.: 425   1st Qu.:108.0   Class :character  
##  Mode  :character   Median :1005   Median :117.5   Mode  :character  
##                     Mean   :1182   Mean   :143.5                     
##                     3rd Qu.:1990   3rd Qu.:167.0                     
##                     Max.   :4983   Max.   :232.0                     
##                                    NA's   :23317                     
##  manufacturer           seats           year.y        engine         
##  Length:23347       Min.   :  2.0   Min.   :1956   Length:23347      
##  Class :character   1st Qu.: 55.0   1st Qu.:2001   Class :character  
##  Mode  :character   Median :178.0   Median :2005   Mode  :character  
##                     Mean   :138.1   Mean   :2003                     
##                     3rd Qu.:200.0   3rd Qu.:2008                     
##                     Max.   :450.0   Max.   :2013                     
##                     NA's   :3211    NA's   :3422                     
##     engines          type                temp          wind_dir    
##  Min.   :1.000   Length:23347       Min.   :12.02   Min.   :  0.0  
##  1st Qu.:2.000   Class :character   1st Qu.:42.98   1st Qu.:150.0  
##  Median :2.000   Mode  :character   Median :60.08   Median :190.0  
##  Mean   :1.993                      Mean   :58.21   Mean   :200.8  
##  3rd Qu.:2.000                      3rd Qu.:73.94   3rd Qu.:280.0  
##  Max.   :4.000                      Max.   :98.06   Max.   :360.0  
##  NA's   :3211                       NA's   :132     NA's   :260    
##    wind_speed       wind_gust         visib           pressure     
##  Min.   : 0.000   Min.   :16.11   Min.   : 0.000   Min.   : 985.7  
##  1st Qu.: 9.206   1st Qu.:24.17   1st Qu.:10.000   1st Qu.:1011.5  
##  Median :12.659   Median :27.62   Median :10.000   Median :1016.5  
##  Mean   :12.897   Mean   :28.39   Mean   : 8.898   Mean   :1016.6  
##  3rd Qu.:16.111   3rd Qu.:32.22   3rd Qu.:10.000   3rd Qu.:1021.5  
##  Max.   :37.976   Max.   :66.75   Max.   :10.000   Max.   :1041.6  
##  NA's   :139      NA's   :18739   NA's   :132      NA's   :3709    
##      precip           humid             dewp        logdepdelay   
##  Min.   :0.0000   Min.   : 15.21   Min.   :-9.04   Min.   :2.773  
##  1st Qu.:0.0000   1st Qu.: 49.92   1st Qu.:28.94   1st Qu.:3.258  
##  Median :0.0000   Median : 68.86   Median :48.92   Median :3.784  
##  Mean   :0.0072   Mean   : 66.48   Mean   :45.78   Mean   :3.871  
##  3rd Qu.:0.0000   3rd Qu.: 84.46   3rd Qu.:64.04   3rd Qu.:4.407  
##  Max.   :0.6500   Max.   :100.00   Max.   :78.08   Max.   :7.172  
##  NA's   :132      NA's   :132      NA's   :132                    
##   logwindgust       month           depdelay_YN         depdelay_1_0
##  Min.   :0.000   Length:23347       Length:23347       Min.   :1    
##  1st Qu.:2.323   Class :character   Class :character   1st Qu.:1    
##  Median :2.614   Mode  :character   Mode  :character   Median :1    
##  Mean   :2.586                                         Mean   :1    
##  3rd Qu.:2.966                                         3rd Qu.:1    
##  Max.   :4.216                                         Max.   :1    
##  NA's   :139                                                        
##   precip_YN          visib10_YN       
##  Length:23347       Length:23347      
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 

DISCUSSION:

Let’s look at the possible independent variables…………….

Boxplots

Data visualizations for categorical variables:

## 
##   JFK 
## 23347

## 
## 01-JAN 02-FEB 03-MAR 04-APR 05-MAY 06-JUN 07-JUL 08-AUG 09-SEP 10-OCT 11-NOV 
##   1539   1738   1947   1913   2054   2676   3194   2344   1288   1194   1129 
## 12-DEC 
##   2331

## 
##     N     Y 
##  5498 17717

## 
##     N     Y 
## 20635  2580

Histograms

Let’s look at quantitative variables………… > DISCUSSION:

With respect to missing values, and categorical variable creation, the following continuous variables are plotted. >

Let’s look at completed cases.

## [1] 19699    15

Part 4 - Inference

Linear Regression

What is driving the delays at JFK in 2013?

Model using the training set.

## 
## Call:
## lm(formula = logdepdelay ~ month + distance + name + type + engine + 
##     manufacturer + year.y + temp + logwindgust + wind_dir + visib10_YN + 
##     humid + dewp + precip_YN, data = JFK_d_train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.50297 -0.58401 -0.07475  0.50939  3.04919 
## 
## Coefficients: (5 not defined because of singularities)
##                                             Estimate Std. Error t value
## (Intercept)                                5.590e+00  4.748e+00   1.177
## month02-FEB                                4.987e-02  3.307e-02   1.508
## month03-MAR                                3.354e-02  3.258e-02   1.029
## month04-APR                                8.293e-02  3.609e-02   2.298
## month05-MAY                               -2.788e-02  3.859e-02  -0.722
## month06-JUN                                1.161e-01  4.436e-02   2.618
## month07-JUL                                1.492e-01  4.877e-02   3.060
## month08-AUG                               -6.741e-02  4.684e-02  -1.439
## month09-SEP                               -3.435e-03  4.680e-02  -0.073
## month10-OCT                               -1.226e-01  4.354e-02  -2.815
## month11-NOV                               -7.700e-02  3.800e-02  -2.026
## month12-DEC                               -3.608e-02  3.140e-02  -1.149
## distance                                   2.542e-06  1.199e-05   0.212
## nameDelta Air Lines Inc.                  -3.050e-01  4.628e-02  -6.590
## nameEndeavor Air Inc.                      6.643e-02  1.756e-01   0.378
## nameEnvoy Air                              2.214e-01  4.352e-01   0.509
## nameExpressJet Airlines Inc.               1.261e-01  1.541e-01   0.818
## nameHawaiian Airlines Inc.                -1.830e-02  2.098e-01  -0.087
## nameJetBlue Airways                       -4.009e-01  6.393e-02  -6.271
## nameUnited Air Lines Inc.                 -1.623e-01  5.639e-02  -2.878
## nameUS Airways Inc.                       -3.459e-01  7.808e-02  -4.430
## nameVirgin America                        -3.198e-01  7.631e-02  -4.191
## typeFixed wing single engine               5.288e-01  6.089e-01   0.868
## typeRotorcraft                             1.701e+00  1.101e+00   1.545
## engineReciprocating                       -1.415e-01  4.315e-01  -0.328
## engineTurbo-fan                            2.639e-02  1.029e+00   0.026
## engineTurbo-jet                            5.743e-02  1.029e+00   0.056
## engineTurbo-prop                           3.640e-01  1.063e+00   0.342
## engineTurbo-shaft                                 NA         NA      NA
## manufacturerAIRBUS                         3.343e-01  1.406e-01   2.377
## manufacturerAIRBUS INDUSTRIE               2.847e-01  1.370e-01   2.077
## manufacturerAVIAT AIRCRAFT INC             3.853e-01  6.592e-01   0.585
## manufacturerBEECH                                 NA         NA      NA
## manufacturerBELL                          -1.261e+00  6.574e-01  -1.919
## manufacturerBOEING                         1.182e-01  1.327e-01   0.890
## manufacturerBOMBARDIER INC                 1.138e-01  9.550e-02   1.191
## manufacturerCANADAIR                              NA         NA      NA
## manufacturerCANADAIR LTD                  -5.320e-01  4.950e-01  -1.075
## manufacturerCESSNA                        -2.345e-01  6.102e-01  -0.384
## manufacturerCIRRUS DESIGN CORP            -1.693e-01  5.281e-01  -0.321
## manufacturerDOUGLAS                       -5.130e-01  8.992e-01  -0.571
## manufacturerEMBRAER                        4.668e-01  1.432e-01   3.260
## manufacturerGULFSTREAM AEROSPACE           7.491e-01  3.993e-01   1.876
## manufacturerMCDONNELL DOUGLAS              8.636e-01  7.312e-01   1.181
## manufacturerMCDONNELL DOUGLAS AIRCRAFT CO  3.229e-01  1.412e-01   2.287
## manufacturerMCDONNELL DOUGLAS CORPORATION         NA         NA      NA
## manufacturerPIPER                          4.105e-01  7.198e-01   0.570
## manufacturerROBINSON HELICOPTER CO        -1.382e+00  4.348e-01  -3.179
## manufacturerSIKORSKY                      -2.246e+00  6.583e-01  -3.412
## manufacturerSTEWART MACO                          NA         NA      NA
## year.y                                    -1.294e-03  2.322e-03  -0.557
## temp                                       7.598e-03  3.928e-03   1.934
## logwindgust                                3.975e-02  1.101e-02   3.612
## wind_dir                                   1.099e-04  7.863e-05   1.398
## visib10_YNY                               -1.208e-02  2.033e-02  -0.594
## humid                                      8.064e-03  1.995e-03   4.042
## dewp                                      -6.681e-03  4.231e-03  -1.579
## precip_YNY                                 7.137e-02  2.367e-02   3.015
##                                           Pr(>|t|)    
## (Intercept)                               0.239127    
## month02-FEB                               0.131618    
## month03-MAR                               0.303309    
## month04-APR                               0.021579 *  
## month05-MAY                               0.470000    
## month06-JUN                               0.008856 ** 
## month07-JUL                               0.002219 ** 
## month08-AUG                               0.150130    
## month09-SEP                               0.941496    
## month10-OCT                               0.004891 ** 
## month11-NOV                               0.042764 *  
## month12-DEC                               0.250528    
## distance                                  0.832141    
## nameDelta Air Lines Inc.                  4.56e-11 ***
## nameEndeavor Air Inc.                     0.705262    
## nameEnvoy Air                             0.610885    
## nameExpressJet Airlines Inc.              0.413106    
## nameHawaiian Airlines Inc.                0.930494    
## nameJetBlue Airways                       3.69e-10 ***
## nameUnited Air Lines Inc.                 0.004007 ** 
## nameUS Airways Inc.                       9.50e-06 ***
## nameVirgin America                        2.79e-05 ***
## typeFixed wing single engine              0.385186    
## typeRotorcraft                            0.122421    
## engineReciprocating                       0.742975    
## engineTurbo-fan                           0.979539    
## engineTurbo-jet                           0.955503    
## engineTurbo-prop                          0.732035    
## engineTurbo-shaft                               NA    
## manufacturerAIRBUS                        0.017475 *  
## manufacturerAIRBUS INDUSTRIE              0.037778 *  
## manufacturerAVIAT AIRCRAFT INC            0.558875    
## manufacturerBEECH                               NA    
## manufacturerBELL                          0.055031 .  
## manufacturerBOEING                        0.373260    
## manufacturerBOMBARDIER INC                0.233649    
## manufacturerCANADAIR                            NA    
## manufacturerCANADAIR LTD                  0.282503    
## manufacturerCESSNA                        0.700764    
## manufacturerCIRRUS DESIGN CORP            0.748584    
## manufacturerDOUGLAS                       0.568340    
## manufacturerEMBRAER                       0.001117 ** 
## manufacturerGULFSTREAM AEROSPACE          0.060682 .  
## manufacturerMCDONNELL DOUGLAS             0.237581    
## manufacturerMCDONNELL DOUGLAS AIRCRAFT CO 0.022189 *  
## manufacturerMCDONNELL DOUGLAS CORPORATION       NA    
## manufacturerPIPER                         0.568467    
## manufacturerROBINSON HELICOPTER CO        0.001483 ** 
## manufacturerSIKORSKY                      0.000648 ***
## manufacturerSTEWART MACO                        NA    
## year.y                                    0.577492    
## temp                                      0.053096 .  
## logwindgust                               0.000305 ***
## wind_dir                                  0.162236    
## visib10_YNY                               0.552274    
## humid                                     5.32e-05 ***
## dewp                                      0.114306    
## precip_YNY                                0.002577 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7192 on 13736 degrees of freedom
## Multiple R-squared:  0.0666, Adjusted R-squared:  0.06307 
## F-statistic: 18.85 on 52 and 13736 DF,  p-value: < 2.2e-16

Use test data.

lm.out2 <- lm(logdepdelay ~ month + distance + name +type + engine + manufacturer + year.y + temp + logwindgust + wind_dir+  visib10_YN + humid + dewp + precip_YN, data=JFK_d_test)

plot(fitted(lm.out2), resid(lm.out2))

qqnorm(resid(lm.out2))

summary(lm.out2)
## 
## Call:
## lm(formula = logdepdelay ~ month + distance + name + type + engine + 
##     manufacturer + year.y + temp + logwindgust + wind_dir + visib10_YN + 
##     humid + dewp + precip_YN, data = JFK_d_test)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.49248 -0.57190 -0.06083  0.51087  2.97823 
## 
## Coefficients: (5 not defined because of singularities)
##                                             Estimate Std. Error t value
## (Intercept)                                1.862e+01  7.157e+00   2.601
## month02-FEB                               -1.802e-02  5.129e-02  -0.351
## month03-MAR                                1.067e-03  4.989e-02   0.021
## month04-APR                                6.583e-02  5.504e-02   1.196
## month05-MAY                               -1.154e-01  5.873e-02  -1.965
## month06-JUN                                5.243e-02  6.846e-02   0.766
## month07-JUL                                6.127e-02  7.397e-02   0.828
## month08-AUG                               -5.829e-02  7.078e-02  -0.824
## month09-SEP                               -7.575e-02  7.164e-02  -1.057
## month10-OCT                               -1.666e-01  6.725e-02  -2.477
## month11-NOV                               -6.059e-02  5.871e-02  -1.032
## month12-DEC                               -7.629e-02  4.879e-02  -1.564
## distance                                  -4.534e-06  1.809e-05  -0.251
## nameDelta Air Lines Inc.                  -1.700e-01  6.717e-02  -2.532
## nameEndeavor Air Inc.                      5.453e-01  2.719e-01   2.006
## nameEnvoy Air                              1.426e+00  7.187e-01   1.985
## nameExpressJet Airlines Inc.               4.564e-01  2.438e-01   1.872
## nameHawaiian Airlines Inc.                 2.809e-01  2.830e-01   0.993
## nameJetBlue Airways                       -1.502e-01  9.670e-02  -1.554
## nameUnited Air Lines Inc.                  4.664e-02  8.621e-02   0.541
## nameUS Airways Inc.                       -1.273e-01  1.182e-01  -1.076
## nameVirgin America                         4.699e-02  1.147e-01   0.410
## typeFixed wing single engine              -2.672e-01  9.488e-01  -0.282
## typeRotorcraft                             1.448e+00  1.733e+00   0.836
## engineReciprocating                       -2.527e-01  7.476e-01  -0.338
## engineTurbo-fan                            1.061e-01  1.671e+00   0.063
## engineTurbo-jet                            5.951e-02  1.672e+00   0.036
## engineTurbo-prop                           6.658e-01  1.734e+00   0.384
## engineTurbo-shaft                                 NA         NA      NA
## manufacturerAIRBUS                         2.834e-01  2.249e-01   1.260
## manufacturerAIRBUS INDUSTRIE               2.178e-01  2.194e-01   0.993
## manufacturerAVIAT AIRCRAFT INC             1.835e+00  1.018e+00   1.802
## manufacturerBEECH                                 NA         NA      NA
## manufacturerBELL                          -8.363e-01  6.221e-01  -1.344
## manufacturerBOEING                         1.828e-01  2.123e-01   0.861
## manufacturerBOMBARDIER INC                -1.733e-01  1.393e-01  -1.244
## manufacturerCANADAIR                              NA         NA      NA
## manufacturerCANADAIR LTD                  -1.163e+00  8.565e-01  -1.358
## manufacturerCESSNA                         2.621e-01  9.500e-01   0.276
## manufacturerCIRRUS DESIGN CORP             1.102e+00  7.578e-01   1.454
## manufacturerDOUGLAS                       -7.192e-01  1.297e+00  -0.554
## manufacturerEMBRAER                        4.565e-01  2.286e-01   1.998
## manufacturerGULFSTREAM AEROSPACE          -2.839e-01  6.571e-01  -0.432
## manufacturerMCDONNELL DOUGLAS              7.166e-02  7.468e-01   0.096
## manufacturerMCDONNELL DOUGLAS AIRCRAFT CO  3.351e-01  2.245e-01   1.493
## manufacturerMCDONNELL DOUGLAS CORPORATION         NA         NA      NA
## manufacturerPIPER                          4.276e-01  8.292e-01   0.516
## manufacturerROBINSON HELICOPTER CO        -1.018e+00  5.443e-01  -1.870
## manufacturerSIKORSKY                      -1.448e+00  6.573e-01  -2.203
## manufacturerSTEWART MACO                          NA         NA      NA
## year.y                                    -8.078e-03  3.502e-03  -2.306
## temp                                       1.295e-02  6.073e-03   2.132
## logwindgust                                5.463e-02  1.708e-02   3.198
## wind_dir                                   1.650e-04  1.206e-04   1.368
## visib10_YNY                               -1.182e-02  3.095e-02  -0.382
## humid                                      1.093e-02  3.060e-03   3.572
## dewp                                      -1.100e-02  6.515e-03  -1.688
## precip_YNY                                 1.010e-01  3.602e-02   2.804
##                                           Pr(>|t|)    
## (Intercept)                               0.009316 ** 
## month02-FEB                               0.725285    
## month03-MAR                               0.982936    
## month04-APR                               0.231723    
## month05-MAY                               0.049437 *  
## month06-JUN                               0.443808    
## month07-JUL                               0.407522    
## month08-AUG                               0.410171    
## month09-SEP                               0.290353    
## month10-OCT                               0.013270 *  
## month11-NOV                               0.302084    
## month12-DEC                               0.117948    
## distance                                  0.802058    
## nameDelta Air Lines Inc.                  0.011383 *  
## nameEndeavor Air Inc.                     0.044919 *  
## nameEnvoy Air                             0.047237 *  
## nameExpressJet Airlines Inc.              0.061226 .  
## nameHawaiian Airlines Inc.                0.320955    
## nameJetBlue Airways                       0.120312    
## nameUnited Air Lines Inc.                 0.588479    
## nameUS Airways Inc.                       0.281867    
## nameVirgin America                        0.682145    
## typeFixed wing single engine              0.778265    
## typeRotorcraft                            0.403355    
## engineReciprocating                       0.735383    
## engineTurbo-fan                           0.949386    
## engineTurbo-jet                           0.971602    
## engineTurbo-prop                          0.700976    
## engineTurbo-shaft                               NA    
## manufacturerAIRBUS                        0.207553    
## manufacturerAIRBUS INDUSTRIE              0.320902    
## manufacturerAVIAT AIRCRAFT INC            0.071519 .  
## manufacturerBEECH                               NA    
## manufacturerBELL                          0.178853    
## manufacturerBOEING                        0.389275    
## manufacturerBOMBARDIER INC                0.213661    
## manufacturerCANADAIR                            NA    
## manufacturerCANADAIR LTD                  0.174612    
## manufacturerCESSNA                        0.782654    
## manufacturerCIRRUS DESIGN CORP            0.146034    
## manufacturerDOUGLAS                       0.579360    
## manufacturerEMBRAER                       0.045812 *  
## manufacturerGULFSTREAM AEROSPACE          0.665676    
## manufacturerMCDONNELL DOUGLAS             0.923563    
## manufacturerMCDONNELL DOUGLAS AIRCRAFT CO 0.135565    
## manufacturerMCDONNELL DOUGLAS CORPORATION       NA    
## manufacturerPIPER                         0.606082    
## manufacturerROBINSON HELICOPTER CO        0.061508 .  
## manufacturerSIKORSKY                      0.027617 *  
## manufacturerSTEWART MACO                        NA    
## year.y                                    0.021123 *  
## temp                                      0.033046 *  
## logwindgust                               0.001391 ** 
## wind_dir                                  0.171322    
## visib10_YNY                               0.702449    
## humid                                     0.000356 ***
## dewp                                      0.091414 .  
## precip_YNY                                0.005070 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7163 on 5857 degrees of freedom
## Multiple R-squared:  0.07988,    Adjusted R-squared:  0.07171 
## F-statistic: 9.779 on 52 and 5857 DF,  p-value: < 2.2e-16

Part 5 - Conclusion

The model produced a low r-squared of 0.06 with a test adjusted r-squared of 0.07.

Low R squared values indicate a weak linear fit for the model. We need to consider changing the independent variables. Low R-squared value could be several things for example, linearity assumption may not correct, underlying normality assumption of regression might appropriate, missing important predicted variable.

DISCUSSION:

There are some serious concerns regarding the model fit with respect to the diagnostic plots. The logdepdelay versus the fitted values does not indicate the straight line fit is reasonable. There are some outliers which should be investigated and possibly modeled separately. The normal QQ has concerning points in the tails.

Furthermore, are we missing important predictors? A literature review did shed some light on the source of airline delays.

The University of Maryland, Institute for Systems Research published a manuscript “Total Delay Impact Study: A Comprehensive Assessment of the Costs and Impacts of Flight Delay in the United States.”

According to this study, one third of late arrivals are due to the inability of the aviation system to handle the air traffic. There were no variables with respect to air traffic congestion. Another third resulted from airline internal problems. This could mean personnel issues, passenger boarding difficulties, mechanical problems. While the variables of aircraft type, manufacturer, age, carrier were thought to serve as a proxy to these problems, the airline problems were not adequately captured by these variables. The remainder of the delays were caused by late arriving flights. There were no variables in the dataset to indicate that the previous flight arrived late, thus causing the delay. The manuscript mentions weather can cause a delay if safety issues arise due to severe weather with the notion that certain delays are unavoidable.

In fact, the United States Department of Transportation, Bureau of Transportation Statistics, states that airlines report the causes of delay in broad categories that were created by the Air Carrier On-Time Reporting Advisory Committee. The categories are Air Carrier, National Aviation System, Weather, Late-Arriving Aircraft and Security.

Furthermore, the Bureau of Transportation Statistics website included delays at JFK for 2013:

From Bureau of Transportation Statistics website

In conclusion, the dataset did not contain certain predictors that adequately explain flight delays. Particularly delays caused by air traffic congestion, late arrivals, boarding/airline problems. This is a limitation to the study.

The dataset could be utilized to produce descriptive statistics highlighting attributes and proportions of flights delayed. Some of these plots could be seen in the initial data exploration.

References

Simon J. Sheather, 2009,“A Modern Approach to Regression with R”, NY, NY, Springer.

Diez, DM, Barr, CD & Cetinkay-Rundel, M (2019), "OpenIntro Statistics (4th ed)

Michael Ball, Cynthia Barnhart, Martin Dresner, Mark Hansen, Kevin Neels, Amedeo Odoni, Everett Peterson, Lance Sherry, Antonio Trani, Bo Zou, “Total Delay Impact Study: A Comprehensive Assessment of the Costs and Impacts of Flight Delay in the United States”, Final Report — October, 2010, Nextor, University Of Maryland, Institute for Systems Research.

United States Department of Transportion, Bureau of Transportation Statistics website: https://www.bts.gov/topics/airlines-and-airports/understanding-reporting-causes-flight-delays-and-cancellations#q6