arrivalTimeMod <- lm(arr_delay~dep_delay+air_time, flights)

What I’d like to test is: After accounting for the number of minutes a departure is delayed, is air time still linearly related to arrival delay?

Ho: air_time = 0 Ha: air_time =/= 0

summary(arrivalTimeMod)
## 
## Call:
## lm(formula = arr_delay ~ dep_delay + air_time, data = flights)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -107.679  -10.979   -1.759    8.810  203.240 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.8318053  0.0606415  -79.68   <2e-16 ***
## dep_delay    1.0187233  0.0007861 1295.92   <2e-16 ***
## air_time    -0.0070547  0.0003362  -20.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.02 on 327343 degrees of freedom
##   (9430 observations deleted due to missingness)
## Multiple R-squared:  0.8371, Adjusted R-squared:  0.8371 
## F-statistic: 8.41e+05 on 2 and 327343 DF,  p-value: < 2.2e-16

The test statistic for air_time in this model is -20.98, and, when compared to a t-distribution with 336773 degrees of freedom, the p-value is, essentially, 0. This means that there is still a linear relationship between arrival delay and air time when departure delay is accounted for.

Now let’s check the confidence interval for the coefficient of air_time

confint(arrivalTimeMod)
##                    2.5 %       97.5 %
## (Intercept) -4.950660982 -4.712949620
## dep_delay    1.017182581  1.020264040
## air_time    -0.007713591 -0.006395809

We’re 95% confident that for flights which were delayed the same number of minutes, for every additional minute of air time, the arrival time delay will decrease by between 0.0077 and 0.0064 minutes.

### CI and PI for mean of value of y 
## predict(modelName, nameOfDataframeWithValuesYouWantToTest, interval = "prediction" or "confidence")
## ex:
checkTheseValues <- data.frame(dep_delay = 4, air_time = 120)
predict(arrivalTimeMod, checkTheseValues, interval = "confidence")
##         fit       lwr       upr
## 1 -1.603476 -1.669833 -1.537119
predict(arrivalTimeMod, checkTheseValues, interval = "prediction")
##         fit       lwr      upr
## 1 -1.603476 -36.91314 33.70619

95% confident that, on average, flights which are delayed 4 minutes and which are in the air for 120 minutes arrive between 1.67 and 1.54 minutes early.

95% confident that a particular flight which was delayed 4 minutes and which was in the air for 120 minutes will arrive between 36.91 minutes late and 33.71 minutes early.