This nycflights13 r-package contains information about all flights that departed from NYC (e.g. EWR, JFK and LGA) to destinations in the United States, Puerto Rico, and the American Virgin Islands) in 2013: 336,776 flights in total.
To help understand what causes delays, it also includes a number of other useful datasets:
This is an observational study.
Dependent Variable
The response variable is quantitative variable, “departure delays”, in minutes. Negative times represent early departures.
This variable will be transformed to log (departure delays) to obtain a more symmetrical distribution.
Independent variables to be considered:
Research question
What are the most significant variables driving JFK airline departure delays?
Data Preparation
There was a substantial amount of data cleaning and wrangling that preceeded the analysis.
Merging the 4 datasets to include fields from the weather, planes and airlines tables to supplement the flights dataset was performed.
Airlines and Planes were merged to flights then a variable origin.time was created by concatenate in the weather and previously merged dataset to the final merge.
The observations with missing values in departure delay variable were dropped.
The research question: What is driving delays at JFK in 2013?
There are over 330,000 observations with variables relating to flights, weather, airplanes and carriers.
A multiple linear regression model will aid in identifying significant drivers of delays.
Let’s take a look at the dependent variable departure delays:
hist(flights6$dep_delay)
> DISCUSSION:
This variable has negative values and is right skewed. Let’s transform this variable to: log(dep_delay+1).
Also, create a variable depdelay_YN, if delay is less than 15 minutes = “N”, otherwise if delay is more than 15 minutes = “Y”.
In addition, change month from 1, 2, 3…. to Jan, Feb, Mar.
Create a precip_YN variable.
Create a visibility10_YN variable.
Modify windgust variable, if NA use windspeed.
Let’s look at departure delays by airport.
* The airport with the most flights is Newark, followed by JFK and Laguardia.
However, is there an association between airports and delays?
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 328521
##
##
## | flights6$depdelay_YN
## flights6$origin | N | Y | Row Total |
## ----------------|-----------|-----------|-----------|
## EWR | 87821 | 29775 | 117596 |
## | 147.610 | 517.460 | |
## | 0.747 | 0.253 | 0.358 |
## | 0.344 | 0.408 | |
## | 0.267 | 0.091 | |
## ----------------|-----------|-----------|-----------|
## JFK | 86069 | 23347 | 109416 |
## | 10.323 | 36.190 | |
## | 0.787 | 0.213 | 0.333 |
## | 0.337 | 0.320 | |
## | 0.262 | 0.071 | |
## ----------------|-----------|-----------|-----------|
## LGA | 81717 | 19792 | 101509 |
## | 94.887 | 332.636 | |
## | 0.805 | 0.195 | 0.309 |
## | 0.320 | 0.271 | |
## | 0.249 | 0.060 | |
## ----------------|-----------|-----------|-----------|
## Column Total | 255607 | 72914 | 328521 |
## | 0.778 | 0.222 | |
## ----------------|-----------|-----------|-----------|
##
##
##
## Pearson's Chi-squared test
##
## data: flights6$origin and flights6$depdelay_YN
## X-squared = 1139.1, df = 2, p-value < 2.2e-16
The chisq test is significant for an association between delays and airports.
The three NYC airports do have differing attributes.They are separated geographically and may have slightly different weather patterns. Different airline carriers use different airports and offer service different destinations. It is best to look at one airport
At this point, let’s drill down on one airport…………………..JFK
JetBlue dominates the flights at JFK, with over 40,000 flights with Delta following at 20,000 flights. It appears the MISSING NA are few, however, Endeavor seems to have an inordinate amount of missing.
We are interested in flights with delays, so let’s subset the dataset. Let’s consider significant delays of 15 minutes or more.
JFK_d<-
filter (JFK,dep_delay>14)
So, now the dataset is JFK significant delayed flights.
dim(JFK_d)
## [1] 23347 30
DISCUSSION:
The number of 2013 JFK departures that are delayed 15 minutes or more 23,347.
Here is a summary of the new data set of delays.
## origin.x date month.x dep_delay
## Length:23347 Min. :2013-01-01 Min. : 1.000 Min. : 15.00
## Class :character 1st Qu.:2013-04-11 1st Qu.: 4.000 1st Qu.: 25.00
## Mode :character Median :2013-06-29 Median : 6.000 Median : 43.00
## Mean :2013-06-28 Mean : 6.419 Mean : 63.69
## 3rd Qu.:2013-09-02 3rd Qu.: 9.000 3rd Qu.: 81.00
## Max. :2013-12-31 Max. :12.000 Max. :1301.00
##
## dest distance speed name
## Length:23347 Min. : 94 Min. :105.0 Length:23347
## Class :character 1st Qu.: 425 1st Qu.:108.0 Class :character
## Mode :character Median :1005 Median :117.5 Mode :character
## Mean :1182 Mean :143.5
## 3rd Qu.:1990 3rd Qu.:167.0
## Max. :4983 Max. :232.0
## NA's :23317
## manufacturer seats year.y engine
## Length:23347 Min. : 2.0 Min. :1956 Length:23347
## Class :character 1st Qu.: 55.0 1st Qu.:2001 Class :character
## Mode :character Median :178.0 Median :2005 Mode :character
## Mean :138.1 Mean :2003
## 3rd Qu.:200.0 3rd Qu.:2008
## Max. :450.0 Max. :2013
## NA's :3211 NA's :3422
## engines type temp wind_dir
## Min. :1.000 Length:23347 Min. :12.02 Min. : 0.0
## 1st Qu.:2.000 Class :character 1st Qu.:42.98 1st Qu.:150.0
## Median :2.000 Mode :character Median :60.08 Median :190.0
## Mean :1.993 Mean :58.21 Mean :200.8
## 3rd Qu.:2.000 3rd Qu.:73.94 3rd Qu.:280.0
## Max. :4.000 Max. :98.06 Max. :360.0
## NA's :3211 NA's :132 NA's :260
## wind_speed wind_gust visib pressure
## Min. : 0.000 Min. :16.11 Min. : 0.000 Min. : 985.7
## 1st Qu.: 9.206 1st Qu.:24.17 1st Qu.:10.000 1st Qu.:1011.5
## Median :12.659 Median :27.62 Median :10.000 Median :1016.5
## Mean :12.897 Mean :28.39 Mean : 8.898 Mean :1016.6
## 3rd Qu.:16.111 3rd Qu.:32.22 3rd Qu.:10.000 3rd Qu.:1021.5
## Max. :37.976 Max. :66.75 Max. :10.000 Max. :1041.6
## NA's :139 NA's :18739 NA's :132 NA's :3709
## precip humid dewp logdepdelay
## Min. :0.0000 Min. : 15.21 Min. :-9.04 Min. :2.773
## 1st Qu.:0.0000 1st Qu.: 49.92 1st Qu.:28.94 1st Qu.:3.258
## Median :0.0000 Median : 68.86 Median :48.92 Median :3.784
## Mean :0.0072 Mean : 66.48 Mean :45.78 Mean :3.871
## 3rd Qu.:0.0000 3rd Qu.: 84.46 3rd Qu.:64.04 3rd Qu.:4.407
## Max. :0.6500 Max. :100.00 Max. :78.08 Max. :7.172
## NA's :132 NA's :132 NA's :132
## logwindgust month depdelay_YN depdelay_1_0
## Min. :0.000 Length:23347 Length:23347 Min. :1
## 1st Qu.:2.323 Class :character Class :character 1st Qu.:1
## Median :2.614 Mode :character Mode :character Median :1
## Mean :2.586 Mean :1
## 3rd Qu.:2.966 3rd Qu.:1
## Max. :4.216 Max. :1
## NA's :139
## precip_YN visib10_YN
## Length:23347 Length:23347
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
DISCUSSION:
Let’s look at the possible independent variables…………….
Boxplots
Data visualizations for categorical variables:
##
## JFK
## 23347
##
## 01-JAN 02-FEB 03-MAR 04-APR 05-MAY 06-JUN 07-JUL 08-AUG 09-SEP 10-OCT 11-NOV
## 1539 1738 1947 1913 2054 2676 3194 2344 1288 1194 1129
## 12-DEC
## 2331
##
## N Y
## 5498 17717
##
## N Y
## 20635 2580
Histograms
Let’s look at quantitative variables………… > DISCUSSION:
With respect to missing values, and categorical variable creation, the following continuous variables are plotted. >
Let’s look at completed cases.
## [1] 19699 15
Linear Regression
What is driving the delays at JFK in 2013?
Model using the training set.
##
## Call:
## lm(formula = logdepdelay ~ month + distance + name + type + engine +
## manufacturer + year.y + temp + logwindgust + wind_dir + visib10_YN +
## humid + dewp + precip_YN, data = JFK_d_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.50297 -0.58401 -0.07475 0.50939 3.04919
##
## Coefficients: (5 not defined because of singularities)
## Estimate Std. Error t value
## (Intercept) 5.590e+00 4.748e+00 1.177
## month02-FEB 4.987e-02 3.307e-02 1.508
## month03-MAR 3.354e-02 3.258e-02 1.029
## month04-APR 8.293e-02 3.609e-02 2.298
## month05-MAY -2.788e-02 3.859e-02 -0.722
## month06-JUN 1.161e-01 4.436e-02 2.618
## month07-JUL 1.492e-01 4.877e-02 3.060
## month08-AUG -6.741e-02 4.684e-02 -1.439
## month09-SEP -3.435e-03 4.680e-02 -0.073
## month10-OCT -1.226e-01 4.354e-02 -2.815
## month11-NOV -7.700e-02 3.800e-02 -2.026
## month12-DEC -3.608e-02 3.140e-02 -1.149
## distance 2.542e-06 1.199e-05 0.212
## nameDelta Air Lines Inc. -3.050e-01 4.628e-02 -6.590
## nameEndeavor Air Inc. 6.643e-02 1.756e-01 0.378
## nameEnvoy Air 2.214e-01 4.352e-01 0.509
## nameExpressJet Airlines Inc. 1.261e-01 1.541e-01 0.818
## nameHawaiian Airlines Inc. -1.830e-02 2.098e-01 -0.087
## nameJetBlue Airways -4.009e-01 6.393e-02 -6.271
## nameUnited Air Lines Inc. -1.623e-01 5.639e-02 -2.878
## nameUS Airways Inc. -3.459e-01 7.808e-02 -4.430
## nameVirgin America -3.198e-01 7.631e-02 -4.191
## typeFixed wing single engine 5.288e-01 6.089e-01 0.868
## typeRotorcraft 1.701e+00 1.101e+00 1.545
## engineReciprocating -1.415e-01 4.315e-01 -0.328
## engineTurbo-fan 2.639e-02 1.029e+00 0.026
## engineTurbo-jet 5.743e-02 1.029e+00 0.056
## engineTurbo-prop 3.640e-01 1.063e+00 0.342
## engineTurbo-shaft NA NA NA
## manufacturerAIRBUS 3.343e-01 1.406e-01 2.377
## manufacturerAIRBUS INDUSTRIE 2.847e-01 1.370e-01 2.077
## manufacturerAVIAT AIRCRAFT INC 3.853e-01 6.592e-01 0.585
## manufacturerBEECH NA NA NA
## manufacturerBELL -1.261e+00 6.574e-01 -1.919
## manufacturerBOEING 1.182e-01 1.327e-01 0.890
## manufacturerBOMBARDIER INC 1.138e-01 9.550e-02 1.191
## manufacturerCANADAIR NA NA NA
## manufacturerCANADAIR LTD -5.320e-01 4.950e-01 -1.075
## manufacturerCESSNA -2.345e-01 6.102e-01 -0.384
## manufacturerCIRRUS DESIGN CORP -1.693e-01 5.281e-01 -0.321
## manufacturerDOUGLAS -5.130e-01 8.992e-01 -0.571
## manufacturerEMBRAER 4.668e-01 1.432e-01 3.260
## manufacturerGULFSTREAM AEROSPACE 7.491e-01 3.993e-01 1.876
## manufacturerMCDONNELL DOUGLAS 8.636e-01 7.312e-01 1.181
## manufacturerMCDONNELL DOUGLAS AIRCRAFT CO 3.229e-01 1.412e-01 2.287
## manufacturerMCDONNELL DOUGLAS CORPORATION NA NA NA
## manufacturerPIPER 4.105e-01 7.198e-01 0.570
## manufacturerROBINSON HELICOPTER CO -1.382e+00 4.348e-01 -3.179
## manufacturerSIKORSKY -2.246e+00 6.583e-01 -3.412
## manufacturerSTEWART MACO NA NA NA
## year.y -1.294e-03 2.322e-03 -0.557
## temp 7.598e-03 3.928e-03 1.934
## logwindgust 3.975e-02 1.101e-02 3.612
## wind_dir 1.099e-04 7.863e-05 1.398
## visib10_YNY -1.208e-02 2.033e-02 -0.594
## humid 8.064e-03 1.995e-03 4.042
## dewp -6.681e-03 4.231e-03 -1.579
## precip_YNY 7.137e-02 2.367e-02 3.015
## Pr(>|t|)
## (Intercept) 0.239127
## month02-FEB 0.131618
## month03-MAR 0.303309
## month04-APR 0.021579 *
## month05-MAY 0.470000
## month06-JUN 0.008856 **
## month07-JUL 0.002219 **
## month08-AUG 0.150130
## month09-SEP 0.941496
## month10-OCT 0.004891 **
## month11-NOV 0.042764 *
## month12-DEC 0.250528
## distance 0.832141
## nameDelta Air Lines Inc. 4.56e-11 ***
## nameEndeavor Air Inc. 0.705262
## nameEnvoy Air 0.610885
## nameExpressJet Airlines Inc. 0.413106
## nameHawaiian Airlines Inc. 0.930494
## nameJetBlue Airways 3.69e-10 ***
## nameUnited Air Lines Inc. 0.004007 **
## nameUS Airways Inc. 9.50e-06 ***
## nameVirgin America 2.79e-05 ***
## typeFixed wing single engine 0.385186
## typeRotorcraft 0.122421
## engineReciprocating 0.742975
## engineTurbo-fan 0.979539
## engineTurbo-jet 0.955503
## engineTurbo-prop 0.732035
## engineTurbo-shaft NA
## manufacturerAIRBUS 0.017475 *
## manufacturerAIRBUS INDUSTRIE 0.037778 *
## manufacturerAVIAT AIRCRAFT INC 0.558875
## manufacturerBEECH NA
## manufacturerBELL 0.055031 .
## manufacturerBOEING 0.373260
## manufacturerBOMBARDIER INC 0.233649
## manufacturerCANADAIR NA
## manufacturerCANADAIR LTD 0.282503
## manufacturerCESSNA 0.700764
## manufacturerCIRRUS DESIGN CORP 0.748584
## manufacturerDOUGLAS 0.568340
## manufacturerEMBRAER 0.001117 **
## manufacturerGULFSTREAM AEROSPACE 0.060682 .
## manufacturerMCDONNELL DOUGLAS 0.237581
## manufacturerMCDONNELL DOUGLAS AIRCRAFT CO 0.022189 *
## manufacturerMCDONNELL DOUGLAS CORPORATION NA
## manufacturerPIPER 0.568467
## manufacturerROBINSON HELICOPTER CO 0.001483 **
## manufacturerSIKORSKY 0.000648 ***
## manufacturerSTEWART MACO NA
## year.y 0.577492
## temp 0.053096 .
## logwindgust 0.000305 ***
## wind_dir 0.162236
## visib10_YNY 0.552274
## humid 5.32e-05 ***
## dewp 0.114306
## precip_YNY 0.002577 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7192 on 13736 degrees of freedom
## Multiple R-squared: 0.0666, Adjusted R-squared: 0.06307
## F-statistic: 18.85 on 52 and 13736 DF, p-value: < 2.2e-16
Use test data.
lm.out2 <- lm(logdepdelay ~ month + distance + name +type + engine + manufacturer + year.y + temp + logwindgust + wind_dir+ visib10_YN + humid + dewp + precip_YN, data=JFK_d_test)
plot(fitted(lm.out2), resid(lm.out2))
qqnorm(resid(lm.out2))
summary(lm.out2)
##
## Call:
## lm(formula = logdepdelay ~ month + distance + name + type + engine +
## manufacturer + year.y + temp + logwindgust + wind_dir + visib10_YN +
## humid + dewp + precip_YN, data = JFK_d_test)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.49248 -0.57190 -0.06083 0.51087 2.97823
##
## Coefficients: (5 not defined because of singularities)
## Estimate Std. Error t value
## (Intercept) 1.862e+01 7.157e+00 2.601
## month02-FEB -1.802e-02 5.129e-02 -0.351
## month03-MAR 1.067e-03 4.989e-02 0.021
## month04-APR 6.583e-02 5.504e-02 1.196
## month05-MAY -1.154e-01 5.873e-02 -1.965
## month06-JUN 5.243e-02 6.846e-02 0.766
## month07-JUL 6.127e-02 7.397e-02 0.828
## month08-AUG -5.829e-02 7.078e-02 -0.824
## month09-SEP -7.575e-02 7.164e-02 -1.057
## month10-OCT -1.666e-01 6.725e-02 -2.477
## month11-NOV -6.059e-02 5.871e-02 -1.032
## month12-DEC -7.629e-02 4.879e-02 -1.564
## distance -4.534e-06 1.809e-05 -0.251
## nameDelta Air Lines Inc. -1.700e-01 6.717e-02 -2.532
## nameEndeavor Air Inc. 5.453e-01 2.719e-01 2.006
## nameEnvoy Air 1.426e+00 7.187e-01 1.985
## nameExpressJet Airlines Inc. 4.564e-01 2.438e-01 1.872
## nameHawaiian Airlines Inc. 2.809e-01 2.830e-01 0.993
## nameJetBlue Airways -1.502e-01 9.670e-02 -1.554
## nameUnited Air Lines Inc. 4.664e-02 8.621e-02 0.541
## nameUS Airways Inc. -1.273e-01 1.182e-01 -1.076
## nameVirgin America 4.699e-02 1.147e-01 0.410
## typeFixed wing single engine -2.672e-01 9.488e-01 -0.282
## typeRotorcraft 1.448e+00 1.733e+00 0.836
## engineReciprocating -2.527e-01 7.476e-01 -0.338
## engineTurbo-fan 1.061e-01 1.671e+00 0.063
## engineTurbo-jet 5.951e-02 1.672e+00 0.036
## engineTurbo-prop 6.658e-01 1.734e+00 0.384
## engineTurbo-shaft NA NA NA
## manufacturerAIRBUS 2.834e-01 2.249e-01 1.260
## manufacturerAIRBUS INDUSTRIE 2.178e-01 2.194e-01 0.993
## manufacturerAVIAT AIRCRAFT INC 1.835e+00 1.018e+00 1.802
## manufacturerBEECH NA NA NA
## manufacturerBELL -8.363e-01 6.221e-01 -1.344
## manufacturerBOEING 1.828e-01 2.123e-01 0.861
## manufacturerBOMBARDIER INC -1.733e-01 1.393e-01 -1.244
## manufacturerCANADAIR NA NA NA
## manufacturerCANADAIR LTD -1.163e+00 8.565e-01 -1.358
## manufacturerCESSNA 2.621e-01 9.500e-01 0.276
## manufacturerCIRRUS DESIGN CORP 1.102e+00 7.578e-01 1.454
## manufacturerDOUGLAS -7.192e-01 1.297e+00 -0.554
## manufacturerEMBRAER 4.565e-01 2.286e-01 1.998
## manufacturerGULFSTREAM AEROSPACE -2.839e-01 6.571e-01 -0.432
## manufacturerMCDONNELL DOUGLAS 7.166e-02 7.468e-01 0.096
## manufacturerMCDONNELL DOUGLAS AIRCRAFT CO 3.351e-01 2.245e-01 1.493
## manufacturerMCDONNELL DOUGLAS CORPORATION NA NA NA
## manufacturerPIPER 4.276e-01 8.292e-01 0.516
## manufacturerROBINSON HELICOPTER CO -1.018e+00 5.443e-01 -1.870
## manufacturerSIKORSKY -1.448e+00 6.573e-01 -2.203
## manufacturerSTEWART MACO NA NA NA
## year.y -8.078e-03 3.502e-03 -2.306
## temp 1.295e-02 6.073e-03 2.132
## logwindgust 5.463e-02 1.708e-02 3.198
## wind_dir 1.650e-04 1.206e-04 1.368
## visib10_YNY -1.182e-02 3.095e-02 -0.382
## humid 1.093e-02 3.060e-03 3.572
## dewp -1.100e-02 6.515e-03 -1.688
## precip_YNY 1.010e-01 3.602e-02 2.804
## Pr(>|t|)
## (Intercept) 0.009316 **
## month02-FEB 0.725285
## month03-MAR 0.982936
## month04-APR 0.231723
## month05-MAY 0.049437 *
## month06-JUN 0.443808
## month07-JUL 0.407522
## month08-AUG 0.410171
## month09-SEP 0.290353
## month10-OCT 0.013270 *
## month11-NOV 0.302084
## month12-DEC 0.117948
## distance 0.802058
## nameDelta Air Lines Inc. 0.011383 *
## nameEndeavor Air Inc. 0.044919 *
## nameEnvoy Air 0.047237 *
## nameExpressJet Airlines Inc. 0.061226 .
## nameHawaiian Airlines Inc. 0.320955
## nameJetBlue Airways 0.120312
## nameUnited Air Lines Inc. 0.588479
## nameUS Airways Inc. 0.281867
## nameVirgin America 0.682145
## typeFixed wing single engine 0.778265
## typeRotorcraft 0.403355
## engineReciprocating 0.735383
## engineTurbo-fan 0.949386
## engineTurbo-jet 0.971602
## engineTurbo-prop 0.700976
## engineTurbo-shaft NA
## manufacturerAIRBUS 0.207553
## manufacturerAIRBUS INDUSTRIE 0.320902
## manufacturerAVIAT AIRCRAFT INC 0.071519 .
## manufacturerBEECH NA
## manufacturerBELL 0.178853
## manufacturerBOEING 0.389275
## manufacturerBOMBARDIER INC 0.213661
## manufacturerCANADAIR NA
## manufacturerCANADAIR LTD 0.174612
## manufacturerCESSNA 0.782654
## manufacturerCIRRUS DESIGN CORP 0.146034
## manufacturerDOUGLAS 0.579360
## manufacturerEMBRAER 0.045812 *
## manufacturerGULFSTREAM AEROSPACE 0.665676
## manufacturerMCDONNELL DOUGLAS 0.923563
## manufacturerMCDONNELL DOUGLAS AIRCRAFT CO 0.135565
## manufacturerMCDONNELL DOUGLAS CORPORATION NA
## manufacturerPIPER 0.606082
## manufacturerROBINSON HELICOPTER CO 0.061508 .
## manufacturerSIKORSKY 0.027617 *
## manufacturerSTEWART MACO NA
## year.y 0.021123 *
## temp 0.033046 *
## logwindgust 0.001391 **
## wind_dir 0.171322
## visib10_YNY 0.702449
## humid 0.000356 ***
## dewp 0.091414 .
## precip_YNY 0.005070 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7163 on 5857 degrees of freedom
## Multiple R-squared: 0.07988, Adjusted R-squared: 0.07171
## F-statistic: 9.779 on 52 and 5857 DF, p-value: < 2.2e-16
The model produced a low r-squared of 0.06 with a test adjusted r-squared of 0.07.
Low R squared values indicate a weak linear fit for the model. We need to consider changing the independent variables. Low R-squared value could be several things for example, linearity assumption may not correct, underlying normality assumption of regression might appropriate, missing important predicted variable.
DISCUSSION:
There are some serious concerns regarding the model fit with respect to the diagnostic plots. The logdepdelay versus the fitted values does not indicate the straight line fit is reasonable. There are some outliers which should be investigated and possibly modeled separately. The normal QQ has concerning points in the tails.
Furthermore, are we missing important predictors? A literature review did shed some light on the source of airline delays.
The University of Maryland, Institute for Systems Research published a manuscript “Total Delay Impact Study: A Comprehensive Assessment of the Costs and Impacts of Flight Delay in the United States.”
According to this study, one third of late arrivals are due to the inability of the aviation system to handle the air traffic. There were no variables with respect to air traffic congestion. Another third resulted from airline internal problems. This could mean personnel issues, passenger boarding difficulties, mechanical problems. While the variables of aircraft type, manufacturer, age, carrier were thought to serve as a proxy to these problems, the airline problems were not adequately captured by these variables. The remainder of the delays were caused by late arriving flights. There were no variables in the dataset to indicate that the previous flight arrived late, thus causing the delay. The manuscript mentions weather can cause a delay if safety issues arise due to severe weather with the notion that certain delays are unavoidable.
In fact, the United States Department of Transportation, Bureau of Transportation Statistics, states that airlines report the causes of delay in broad categories that were created by the Air Carrier On-Time Reporting Advisory Committee. The categories are Air Carrier, National Aviation System, Weather, Late-Arriving Aircraft and Security.
Furthermore, the Bureau of Transportation Statistics website included delays at JFK for 2013:
From Bureau of Transportation Statistics website
In conclusion, the dataset did not contain certain predictors that adequately explain flight delays. Particularly delays caused by air traffic congestion, late arrivals, boarding/airline problems. This is a limitation to the study.
The dataset could be utilized to produce descriptive statistics highlighting attributes and proportions of flights delayed. Some of these plots could be seen in the initial data exploration.
Simon J. Sheather, 2009,“A Modern Approach to Regression with R”, NY, NY, Springer.
Diez, DM, Barr, CD & Cetinkay-Rundel, M (2019), "OpenIntro Statistics (4th ed)
Michael Ball, Cynthia Barnhart, Martin Dresner, Mark Hansen, Kevin Neels, Amedeo Odoni, Everett Peterson, Lance Sherry, Antonio Trani, Bo Zou, “Total Delay Impact Study: A Comprehensive Assessment of the Costs and Impacts of Flight Delay in the United States”, Final Report — October, 2010, Nextor, University Of Maryland, Institute for Systems Research.
United States Department of Transportion, Bureau of Transportation Statistics website: https://www.bts.gov/topics/airlines-and-airports/understanding-reporting-causes-flight-delays-and-cancellations#q6