Background

In Assessment Task 2, our linear model was not able to yield much insight to our questions as the assumptions of linear regression were heavily violated. Furthermore, our dataset was large and we did not aggregate the data, making analysis very difficult and harder to identify patterns. Even though we only focused on a particular state, California, for our analysis, our data was in terms of minutes, making it tough for us to look at the big picture. Therefore, for this assessment, I will aggregate data into monthly data to gain relevant insights and extend our model from linear regression to multilevel models.

The dataset we obtained is a nested dataset, multilevel models are an extension of regression models which are designed to model nested data. In linear regression, it is assumed that each observation is independent. The independence assumption was violated in assessment as there was correlation between flights of the same day, by aggregating the data by month, this would not be a problem. Whereas, observations are still not independent because the flight delays are repeatedly recorded in the same airport over a 2 years period. Each airport has different infrastructures, structure, and this is going to be an idiosyncratic factor that affects all observations from the same airport. By utilizing a multi-level model (adding a random effect) as an extension of linear regression, the analysis would be more accurate and provide a better model for future use.

Original research questions: What are the main causes for Airline delays? (Variables that influence departure delays) How is American Airlines on-time performance compared to our two main competitors?

New research question: Which airline should I pick given year, month, and origin airport to minimize expected departure delays?

For the following analysis, the first original question will be removed due to the structure of the data. Since the data has many levels, each level has different reasons to account for airline delays, for example, employees in that certain airport. I decided to shift the focus of research questions to comparing airlines as it is a choice that consumers can make while buying their airplane tickets. From my personal experience, the flexibility in choosing the departure airport, year and month I want to fly out is low, but there are usually flights operated by different airlines for the same route which I can choose from. Therefore, the main research question and analysis would focus on the differences of airlines in on time performance.

Justification

In our last assessment, we found out that there are correlations among different states.

I will first create a multilevel model for California, then generalize it to all the airports across different states.

knitr::include_graphics("Multilevel model diagram.jpg")

Figure 1: Multilevel model diagram

Airport is the random effect in the model which represents the inter-dependence of observations of airports . Within the same state, California (CA), there are different departure delay means for different airports (Refer to figure 2). Some airports, such as KSBA and KOAK have the lowest delays, in fact, the mean indicates that the flights depart early. We will model these individual differences by assuming different random intercepts for each airport.

knitr::include_graphics("Figure 2.jpg")

Figure 2: Departure delays based on airport

Fixed effects variables include:
- Airline
- Year
- Month
- Age of the aircraft

To examine these variables and assumption of linear regression, I first fitted a linear regression with all these fixed effects for benchmarking. The multiple R-squared is 0.4118, adjusted R-squared is 0.3983, R-squared are significant (p-value <2.2e-16).

I will not compare multiple R-squared against models with less variables as multiple R-squared will only drop when there are less variables.

Linear regression model formula:

\(delay = month + year + age + airline + \varepsilon\)

To test whether the age of aircraft is a useful variable in explaining delays, I compared the new linear model without the age of aircraft variable against the benchmark model. The model without age of the aircraft has a lower adjusted R-squared 0.3845. After using ANOVA to compare these 2 models, they are significantly different from each other.

However, if we include origin_airport_code in the linear regression, Linear regression model formula:

\(delay = month + year + age + airline + origin airport + \varepsilon\)

The adjusted R-squared is higher, 0.4956. Meaning that origin airport is a significant factor in explaining the variation of departure delays. Afterwards, I did the test on the age of aircraft again and the adjusted R-squared in the model without age of aircraft is 0.4951, showing that the contribution of aircraft age is minimal. I suspect that there is a high association between airport and aircraft age. To verify this, I conducted a chi-square independence test to test the variable association. P-value is 0.3973, > 0.05, meaning that with origin_airport_code, there is no need to include aircraft age.

knitr::include_graphics("old diagnostic plot.jpg")

Figure 3: Old diagnostic plot (From AT2B)

knitr::include_graphics("new diagnostic plot .jpg")

Figure 4: New diagnostic plot

For the new aggregated data, it satisfies most assumptions for linear model except for normal distribution assumption. Heavy tails are present from the normal qq plot in figure 4, but there are no influential observations in the data according to the residual vs leverage plot. Compared to the old diagnostic plot (figure 3), the new diagnostic plots (figure 4) shows a huge improvement.

Results

With the existent knowledge from linear regression from above, the random intercept multilevel model:

\(delay = month + year + airline + (1|origin airport) + \varepsilon\)

knitr::include_graphics("lmer1.jpg")

The variability of airports, i.e. idiosyncratic differences between airports from observations are accounted for around 3.169. The error term, i.e. variation that cannot be explained, is 9.583.
From the above result, there are several variables being statistically significant, especially year, United airlines. The departure delay magnitude is the largest in 2019, while flights from 2020 and 2021 departure delays are much less. According to fig5, the mean delay of 2021 is higher than 2020, however, the slope for year 2021 is smaller than year 2020. There might be interaction variables among the fixed effects.

knitr::include_graphics("Figure 5.jpg")

Figure 5. Box plot of delay vs year

Year and month should be a random effect as well as each data point from airports comes from a certain year and month, i.e. 2020 March. To ensure complete independence of subject, year and month would also be considered as random effects.
Updated random intercept multilevel model:

\(delay = airline + (1|month) + (1|year) + (1|origin airport ) + \varepsilon\)

knitr::include_graphics("lmer3.jpg")

After adding month and year as random effects, origin_airport_code variation increased slightly. Furthermore, year accounts for the most variation.

knitr::include_graphics("coef lmer3.jpg")

The intercept varies among the random effects but slopes are fixed. It would be interesting to investigate whether there are interactions between airlines and the random variables.
Utilizing linear regression interaction variables,

To test airline & origin_airport_code:

Formula 1:

\(delay = airline + origin airport + \varepsilon\)

Formula 2:

\(delay = airline * origin airport + \varepsilon\)

First equation adjusted R-squared is 0.08489, second equation containing interaction effect adjusted R-squared is 0.1342. ANOVA test also shows that there are significant differences between these 2 equations.
Hence, the interaction effect would be considered in the random slope multilevel model. Furthermore, it is expected that there is no correlation among fixed effects, however there is some correlation (0.426) among airlines and it violates the assumption that observations should be independent.

For airline and other random variables, there are no interaction effects.

Final random slope multilevel model:

\(delay = airline + (1|month) + (1|year) + (1+ airline|origin airport) + \varepsilon\)

knitr::include_graphics("lmer4.jpg")

Among all the multilevel models fitted, the final model has the best AIC, meaning it has the best fit and the lowest error term. The interaction between airports and airlines have accounted for more variations compared to airports alone from the previous model. After allowing the slope to vary, correlation of fixed effects are close to 0, meaning that observations are independent.

Research question - How is American Airlines on-time performance compared to our two main competitors?
The intercept and delta airlines variable is not statistically significant as it has a low t value, implying that there is no difference on the on time performance between American Airline and Delta Airline. In the meanwhile, United Airline has a better on time performance as its slope is -2.21. This result is only applicable to California.

The coefficient of the random effects are as follows,

knitr::include_graphics("coef lmer4.jpg")

Research question - which airline should I choose for my flight?

Assume that if I want to take a flight from KBUR on 2021 Nov, I should take United airlines as the slope of airline United airlines Inc. for KBUR is -5.98 and Delta airlines is 2.1637.
Decision would be: United airlines > American airlines > Delta airlines, based on mean.
Regardless of the year and month, I should always take United airlines if I’m departing from KBUR.

Utilizing our model for prediction, it is anticipated that departing from KBUR on 2021 Nov, taking any of the airlines, it is anticipated that it would depart early.
American Airlines: Early by 2.73 minutes +/- 2* 1.8203
CI: [-6.27, 0.9106]
Delta Airlines: Early by 0.57 minutes +/- 2* 0.4382
CI: [-1.4464, 0.3064]
United Airlines: Early by 8.71 minutes +/- 2* 0.7757
CI: [-10.26, 7.1586]
Confidence Interval (CI) is based on 5% and 95%. Negatives means the flight departs early.

Even though American Airlines has a lower mean than Delta Airlines in departure delays, after taking into consideration the confidence interval, customers can choose between american airlines and delta airlines based on their preference.
For example, if I don’t care about the flight departing early or not, I only want to minimize my delay time if the flight does delay, then I would pick Delta Airlines.

For dianostic plots,

knitr::include_graphics("lmer4 diagnostic.jpg")

The diagnostic results are similar to the linear model. All the assumptions for multilevel is satisfied except for normality as there are heavy tails due to outliers. The outliers are not influential points, meaning it will not have a big impact on the analysis. Thus, the model is still reliable for prediction and for analysis.

To extend the model further to other states as well, the multilevel model:

\(delay = airline + (1|month) + (1|year) + (1|origin state) + (1+ airline|origin airport) + \varepsilon\)

There are interaction effects between airline & origin_state_abr and airline & origin_airport_code, however the model cannot include both of these interaction effects as it will overfit the model. To determine which combination should be used, I generated both models and found the one that captures the interaction effect for airline & origin_state_abr is a better fit and accounts for more variation in the random variables.

Research question - How is American Airlines on-time performance compared to our two main competitors?

Considering top 50 airports in America across all states instead of airports only in California, Delta airlines performs the best, followed by United AIrlines, American Airlines being the worst.

Even though this multilevel model is more generalizable to the California specific one, it adds complexity to computation and an extra level in the model.

Conclusion

The multilevel model created is a useful tool to aid customers on their decision for deciding the airline they should take based on year, month, and departure airport. It also gives an overview of how the big 3 airlines are performing in each airport, making this a stepping stone for them to further investigate in airports that they are performing badly against competitors. Then the first original research question can be answered, ‘what are the main causes for Airline delays?’ for a specific airport.

Reflection

In assessment task 2, not much insights and results were yielded from the statistical model (linear & logistic regression) as we were unaware that the complexity of the data structure renders our analysis useless. Data engineering, i.e. aggregating the data is needed to simplify the data structure and models beyond linear and logistic regression has to be considered in order to bring in better results. After aggregating the data, the new analysis obtained through multilevel models are generalizable to all airports, it also provides insights to customers for making decisions and gives insights to the big 3 airline companies on their competitiveness in terms of on time performance.

Reference

Winter, B. (2013, August 26). Linear models and linear mixed effects models in R with linguistic. . . ArXiv.Org. https://arxiv.org/abs/1308.5499

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Usinglme4. Journal of Statistical Software, 67(1). https://doi.org/10.18637/jss.v067.i01

Appendix

Read file

raw_data <- read_feather(here('mlm_dataset_5.feather'))

Aggregate data based on day - data1

data1 <- raw_data %>% filter(origin_state_abr == "CA") %>% group_by(origin_airport_code, airline, year, month) %>% summarise(avg_delay = mean(dep_delay), ave_age = mean(age)) %>% droplevels()
## `summarise()` has grouped output by 'origin_airport_code', 'airline', 'year'. You can override using the `.groups` argument.
#%>% dplyr::summarise(avg_delay = mean(dep_delay), avg_air = mean(air_time))

Aggregate data for logistics regression - data2

data2 <- data1 %>% mutate(delay = sapply(avg_delay, function(x) ifelse(x >0 , 1, 0))) %>% select(-avg_delay)
chisq.test(data1$origin_airport_code, data1$ave_age)
## Warning in chisq.test(data1$origin_airport_code, data1$ave_age): Chi-squared
## approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  data1$origin_airport_code and data1$ave_age
## X-squared = 8409.1, df = 8376, p-value = 0.3973

Plot box

data1 %>% ggplot(aes(x = avg_delay, y = year)) + geom_boxplot(alpha = 0.2) + geom_point(alpha = 0.3)

airport boxplot

data1 %>% ggplot(aes(x = avg_delay, y = origin_airport_code)) + geom_boxplot(alpha = 0.2) + geom_point(alpha = 0.3)

Linear regression

lm1 <- lm(avg_delay ~ month + ave_age + year + airline + origin_airport_code ,data = data1)
plot(lm1)

summary(lm1)
## 
## Call:
## lm(formula = avg_delay ~ month + ave_age + year + airline + origin_airport_code, 
##     data = data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.3436  -1.8405  -0.0812   1.6016  16.0585 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   5.82581    0.81182   7.176 1.87e-12 ***
## month2                        0.98353    0.55037   1.787 0.074376 .  
## month3                       -0.02812    0.54708  -0.051 0.959022    
## month4                       -2.77752    0.58016  -4.787 2.07e-06 ***
## month5                       -0.42761    0.60432  -0.708 0.479443    
## month6                        0.39870    0.60804   0.656 0.512227    
## month7                       -1.01172    0.59423  -1.703 0.089103 .  
## month8                       -1.04270    0.58860  -1.771 0.076922 .  
## month9                       -2.36898    0.58598  -4.043 5.88e-05 ***
## month10                      -1.23109    0.57786  -2.130 0.033492 *  
## month11                      -1.95104    0.58293  -3.347 0.000862 ***
## month12                      -0.09814    0.58085  -0.169 0.865882    
## ave_age                      -0.05172    0.03984  -1.298 0.194575    
## year2020                     -5.39344    0.24783 -21.763  < 2e-16 ***
## year2021                     -6.88777    0.56117 -12.274  < 2e-16 ***
## airlineDelta Air Lines Inc.   0.37603    0.36172   1.040 0.298905    
## airlineUnited Air Lines Inc. -1.87326    0.38359  -4.883 1.30e-06 ***
## origin_airport_codeKFAT       0.87288    0.68353   1.277 0.202027    
## origin_airport_codeKLAX       0.45120    0.57932   0.779 0.436340    
## origin_airport_codeKLGB       0.86158    1.11408   0.773 0.439579    
## origin_airport_codeKOAK      -4.97019    0.72527  -6.853 1.61e-11 ***
## origin_airport_codeKONT      -2.31271    0.60158  -3.844 0.000132 ***
## origin_airport_codeKPSP       0.34626    0.61845   0.560 0.575744    
## origin_airport_codeKSAN      -1.74473    0.57809  -3.018 0.002638 ** 
## origin_airport_codeKSBA      -4.54528    0.90677  -5.013 6.84e-07 ***
## origin_airport_codeKSFO      -0.21008    0.58317  -0.360 0.718785    
## origin_airport_codeKSJC      -2.48899    0.60051  -4.145 3.83e-05 ***
## origin_airport_codeKSMF      -0.95413    0.57977  -1.646 0.100283    
## origin_airport_codeKSNA      -0.98079    0.57833  -1.696 0.090359 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.127 on 686 degrees of freedom
## Multiple R-squared:  0.5154, Adjusted R-squared:  0.4956 
## F-statistic: 26.05 on 28 and 686 DF,  p-value: < 2.2e-16

lm2- only airline

lm2 <- lm(avg_delay ~ month + year + airline + origin_airport_code, data = data1)
summary(lm2)
## 
## Call:
## lm(formula = avg_delay ~ month + year + airline + origin_airport_code, 
##     data = data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.4243  -1.7643  -0.0633   1.5610  16.3044 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   5.15565    0.62694   8.223 9.88e-16 ***
## month2                        0.98180    0.55065   1.783 0.075029 .  
## month3                       -0.03073    0.54735  -0.056 0.955242    
## month4                       -2.73509    0.57953  -4.720 2.87e-06 ***
## month5                       -0.40118    0.60428  -0.664 0.506975    
## month6                        0.39821    0.60834   0.655 0.512961    
## month7                       -1.00754    0.59452  -1.695 0.090585 .  
## month8                       -1.02579    0.58875  -1.742 0.081899 .  
## month9                       -2.33225    0.58559  -3.983 7.54e-05 ***
## month10                      -1.19130    0.57734  -2.063 0.039446 *  
## month11                      -1.92893    0.58297  -3.309 0.000986 ***
## month12                      -0.04016    0.57942  -0.069 0.944763    
## year2020                     -5.36862    0.24721 -21.717  < 2e-16 ***
## year2021                     -6.81362    0.55853 -12.199  < 2e-16 ***
## airlineDelta Air Lines Inc.   0.09867    0.29205   0.338 0.735568    
## airlineUnited Air Lines Inc. -2.19533    0.29275  -7.499 1.99e-13 ***
## origin_airport_codeKFAT       0.82246    0.68276   1.205 0.228774    
## origin_airport_codeKLAX       0.53348    0.57614   0.926 0.354793    
## origin_airport_codeKLGB       0.72644    1.10976   0.655 0.512949    
## origin_airport_codeKOAK      -5.15615    0.71134  -7.249 1.14e-12 ***
## origin_airport_codeKONT      -2.31752    0.60187  -3.851 0.000129 ***
## origin_airport_codeKPSP       0.29624    0.61756   0.480 0.631591    
## origin_airport_codeKSAN      -1.67866    0.57614  -2.914 0.003688 ** 
## origin_airport_codeKSBA      -4.78873    0.88762  -5.395 9.44e-08 ***
## origin_airport_codeKSFO      -0.09048    0.57614  -0.157 0.875250    
## origin_airport_codeKSJC      -2.51392    0.60051  -4.186 3.20e-05 ***
## origin_airport_codeKSMF      -0.90217    0.57867  -1.559 0.119450    
## origin_airport_codeKSNA      -0.99267    0.57855  -1.716 0.086649 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.128 on 687 degrees of freedom
## Multiple R-squared:  0.5142, Adjusted R-squared:  0.4951 
## F-statistic: 26.93 on 27 and 687 DF,  p-value: < 2.2e-16
anova(lm1, lm2)
lm3 <- lm(avg_delay ~ airline , data = data1)
summary(lm3)
## 
## Call:
## lm(formula = avg_delay ~ airline, data = data1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.128  -3.084  -0.329   2.722  21.043 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   0.466927   0.259104   1.802 0.071956 .  
## airlineDelta Air Lines Inc.   0.008991   0.389229   0.023 0.981577    
## airlineUnited Air Lines Inc. -1.338767   0.400177  -3.345 0.000865 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.367 on 712 degrees of freedom
## Multiple R-squared:  0.01905,    Adjusted R-squared:  0.0163 
## F-statistic: 6.914 on 2 and 712 DF,  p-value: 0.001062

lm4

lm4 <- lm(avg_delay ~ origin_airport_code * year, data = data1)
summary(lm4)
## 
## Call:
## lm(formula = avg_delay ~ origin_airport_code * year, data = data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.3320  -1.6051  -0.2122   1.6129  17.6947 
## 
## Coefficients: (2 not defined because of singularities)
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        3.8238     0.6623   5.773 1.18e-08 ***
## origin_airport_codeKFAT            1.6528     0.9803   1.686 0.092251 .  
## origin_airport_codeKLAX            0.7672     0.8622   0.890 0.373879    
## origin_airport_codeKLGB           -0.1699     2.0235  -0.084 0.933123    
## origin_airport_codeKOAK           -5.2060     0.9568  -5.441 7.41e-08 ***
## origin_airport_codeKONT           -4.0318     0.8672  -4.649 4.01e-06 ***
## origin_airport_codeKPSP            0.0082     0.9038   0.009 0.992764    
## origin_airport_codeKSAN           -1.6794     0.8622  -1.948 0.051836 .  
## origin_airport_codeKSBA           -6.7131     1.1982  -5.603 3.07e-08 ***
## origin_airport_codeKSFO            1.2101     0.8622   1.404 0.160919    
## origin_airport_codeKSJC           -2.4048     0.8622  -2.789 0.005432 ** 
## origin_airport_codeKSMF           -2.1573     0.8622  -2.502 0.012578 *  
## origin_airport_codeKSNA           -0.9158     0.8622  -1.062 0.288503    
## year2020                          -5.3569     0.9803  -5.465 6.52e-08 ***
## year2021                          -9.4041     2.4336  -3.864 0.000122 ***
## origin_airport_codeKFAT:year2020  -2.2559     1.4881  -1.516 0.129987    
## origin_airport_codeKLAX:year2020  -0.8463     1.2531  -0.675 0.499672    
## origin_airport_codeKLGB:year2020   1.7081     2.5386   0.673 0.501283    
## origin_airport_codeKOAK:year2020   2.5209     1.5604   1.616 0.106651    
## origin_airport_codeKONT:year2020   4.3173     1.3158   3.281 0.001087 ** 
## origin_airport_codeKPSP:year2020   0.2212     1.3643   0.162 0.871232    
## origin_airport_codeKSAN:year2020  -0.2118     1.2531  -0.169 0.865801    
## origin_airport_codeKSBA:year2020   4.8562     1.9457   2.496 0.012802 *  
## origin_airport_codeKSFO:year2020  -2.7376     1.2531  -2.185 0.029253 *  
## origin_airport_codeKSJC:year2020  -0.0455     1.3055  -0.035 0.972208    
## origin_airport_codeKSMF:year2020   2.1095     1.2602   1.674 0.094607 .  
## origin_airport_codeKSNA:year2020  -0.4014     1.2602  -0.318 0.750210    
## origin_airport_codeKFAT:year2021   2.2842     3.4537   0.661 0.508607    
## origin_airport_codeKLAX:year2021   3.9503     2.8381   1.392 0.164416    
## origin_airport_codeKLGB:year2021   8.7501     4.5327   1.930 0.053970 .  
## origin_airport_codeKOAK:year2021       NA         NA      NA       NA    
## origin_airport_codeKONT:year2021   7.5604     2.9963   2.523 0.011853 *  
## origin_airport_codeKPSP:year2021   5.8679     2.8510   2.058 0.039957 *  
## origin_airport_codeKSAN:year2021   3.1919     2.8381   1.125 0.261126    
## origin_airport_codeKSBA:year2021       NA         NA      NA       NA    
## origin_airport_codeKSFO:year2021   1.4290     2.8381   0.504 0.614772    
## origin_airport_codeKSJC:year2021   1.6423     3.4221   0.480 0.631445    
## origin_airport_codeKSMF:year2021   6.1471     2.8381   2.166 0.030667 *  
## origin_airport_codeKSNA:year2021   2.7394     2.8381   0.965 0.334780    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.312 on 678 degrees of freedom
## Multiple R-squared:  0.4627, Adjusted R-squared:  0.4342 
## F-statistic: 16.22 on 36 and 678 DF,  p-value: < 2.2e-16

–> change No difference in Adjusted R-squared –> no interaction

anova(lm3, lm4)

There are difference –> interaction is present

lm5 - airline & origin_airport_code

lm5 <- lm(avg_delay ~ airline + origin_airport_code , data = data1)
summary(lm5)
## 
## Call:
## lm(formula = avg_delay ~ airline + origin_airport_code, data = data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.9423  -3.1195  -0.0521   2.7534  19.8037 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    1.4864     0.6319   2.352 0.018945 *  
## airlineDelta Air Lines Inc.    0.2285     0.3924   0.582 0.560623    
## airlineUnited Air Lines Inc.  -1.5441     0.3915  -3.944 8.82e-05 ***
## origin_airport_codeKFAT        1.0981     0.9185   1.195 0.232320    
## origin_airport_codeKLAX        0.2607     0.7742   0.337 0.736462    
## origin_airport_codeKLGB       -0.3155     1.4894  -0.212 0.832284    
## origin_airport_codeKOAK       -3.9205     0.9542  -4.109 4.45e-05 ***
## origin_airport_codeKONT       -2.0040     0.8092  -2.477 0.013498 *  
## origin_airport_codeKPSP        0.4883     0.8285   0.589 0.555799    
## origin_airport_codeKSAN       -1.9515     0.7742  -2.521 0.011937 *  
## origin_airport_codeKSBA       -4.1890     1.1929  -3.511 0.000474 ***
## origin_airport_codeKSFO       -0.3633     0.7742  -0.469 0.639038    
## origin_airport_codeKSJC       -2.1134     0.8075  -2.617 0.009058 ** 
## origin_airport_codeKSMF       -1.1203     0.7780  -1.440 0.150339    
## origin_airport_codeKSNA       -1.2245     0.7778  -1.574 0.115881    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.212 on 700 degrees of freedom
## Multiple R-squared:  0.1028, Adjusted R-squared:  0.08489 
## F-statistic: 5.731 on 14 and 700 DF,  p-value: 1.199e-10

lm6

lm6 <- lm(avg_delay ~ airline * origin_airport_code , data = data1)
summary(lm6)
## 
## Call:
## lm(formula = avg_delay ~ airline * origin_airport_code, data = data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.3754  -3.0324  -0.1989   2.6510  18.2852 
## 
## Coefficients: (5 not defined because of singularities)
##                                                      Estimate Std. Error
## (Intercept)                                            1.2550     0.8542
## airlineDelta Air Lines Inc.                            4.4728     1.5018
## airlineUnited Air Lines Inc.                          -4.0857     1.3887
## origin_airport_codeKFAT                               -0.1403     1.1727
## origin_airport_codeKLAX                                1.5884     1.1727
## origin_airport_codeKLGB                               -4.3285     1.7899
## origin_airport_codeKOAK                               -2.3774     1.3887
## origin_airport_codeKONT                               -0.3038     1.1727
## origin_airport_codeKPSP                                0.8502     1.1727
## origin_airport_codeKSAN                               -1.6898     1.1727
## origin_airport_codeKSBA                               -3.8193     1.4215
## origin_airport_codeKSFO                               -0.6832     1.1727
## origin_airport_codeKSJC                               -3.2685     1.1727
## origin_airport_codeKSMF                               -1.0981     1.1727
## origin_airport_codeKSNA                               -0.6734     1.1727
## airlineDelta Air Lines Inc.:origin_airport_codeKFAT        NA         NA
## airlineUnited Air Lines Inc.:origin_airport_codeKFAT   7.1958     1.9931
## airlineDelta Air Lines Inc.:origin_airport_codeKLAX   -5.3153     1.8831
## airlineUnited Air Lines Inc.:origin_airport_codeKLAX   0.3235     1.7942
## airlineDelta Air Lines Inc.:origin_airport_codeKLGB        NA         NA
## airlineUnited Air Lines Inc.:origin_airport_codeKLGB       NA         NA
## airlineDelta Air Lines Inc.:origin_airport_codeKOAK   -6.4742     2.0720
## airlineUnited Air Lines Inc.:origin_airport_codeKOAK       NA         NA
## airlineDelta Air Lines Inc.:origin_airport_codeKONT   -6.3888     1.8900
## airlineUnited Air Lines Inc.:origin_airport_codeKONT  -0.7023     1.9931
## airlineDelta Air Lines Inc.:origin_airport_codeKPSP   -5.3249     2.0735
## airlineUnited Air Lines Inc.:origin_airport_codeKPSP   2.8559     1.8725
## airlineDelta Air Lines Inc.:origin_airport_codeKSAN   -4.6870     1.8831
## airlineUnited Air Lines Inc.:origin_airport_codeKSAN   2.8934     1.7942
## airlineDelta Air Lines Inc.:origin_airport_codeKSBA        NA         NA
## airlineUnited Air Lines Inc.:origin_airport_codeKSBA   1.9535     2.7230
## airlineDelta Air Lines Inc.:origin_airport_codeKSFO   -4.7696     1.8831
## airlineUnited Air Lines Inc.:origin_airport_codeKSFO   4.7206     1.7942
## airlineDelta Air Lines Inc.:origin_airport_codeKSJC   -2.2467     1.9339
## airlineUnited Air Lines Inc.:origin_airport_codeKSJC   5.3294     1.8871
## airlineDelta Air Lines Inc.:origin_airport_codeKSMF   -4.0418     1.8831
## airlineUnited Air Lines Inc.:origin_airport_codeKSMF   2.9848     1.8092
## airlineDelta Air Lines Inc.:origin_airport_codeKSNA   -3.3760     1.8974
## airlineUnited Air Lines Inc.:origin_airport_codeKSNA   0.8054     1.7942
##                                                      t value Pr(>|t|)    
## (Intercept)                                            1.469 0.142232    
## airlineDelta Air Lines Inc.                            2.978 0.003001 ** 
## airlineUnited Air Lines Inc.                          -2.942 0.003370 ** 
## origin_airport_codeKFAT                               -0.120 0.904794    
## origin_airport_codeKLAX                                1.355 0.176014    
## origin_airport_codeKLGB                               -2.418 0.015857 *  
## origin_airport_codeKOAK                               -1.712 0.087345 .  
## origin_airport_codeKONT                               -0.259 0.795672    
## origin_airport_codeKPSP                                0.725 0.468703    
## origin_airport_codeKSAN                               -1.441 0.150038    
## origin_airport_codeKSBA                               -2.687 0.007389 ** 
## origin_airport_codeKSFO                               -0.583 0.560352    
## origin_airport_codeKSJC                               -2.787 0.005464 ** 
## origin_airport_codeKSMF                               -0.936 0.349367    
## origin_airport_codeKSNA                               -0.574 0.565989    
## airlineDelta Air Lines Inc.:origin_airport_codeKFAT       NA       NA    
## airlineUnited Air Lines Inc.:origin_airport_codeKFAT   3.610 0.000328 ***
## airlineDelta Air Lines Inc.:origin_airport_codeKLAX   -2.823 0.004903 ** 
## airlineUnited Air Lines Inc.:origin_airport_codeKLAX   0.180 0.856993    
## airlineDelta Air Lines Inc.:origin_airport_codeKLGB       NA       NA    
## airlineUnited Air Lines Inc.:origin_airport_codeKLGB      NA       NA    
## airlineDelta Air Lines Inc.:origin_airport_codeKOAK   -3.125 0.001856 ** 
## airlineUnited Air Lines Inc.:origin_airport_codeKOAK      NA       NA    
## airlineDelta Air Lines Inc.:origin_airport_codeKONT   -3.380 0.000765 ***
## airlineUnited Air Lines Inc.:origin_airport_codeKONT  -0.352 0.724671    
## airlineDelta Air Lines Inc.:origin_airport_codeKPSP   -2.568 0.010438 *  
## airlineUnited Air Lines Inc.:origin_airport_codeKPSP   1.525 0.127679    
## airlineDelta Air Lines Inc.:origin_airport_codeKSAN   -2.489 0.013051 *  
## airlineUnited Air Lines Inc.:origin_airport_codeKSAN   1.613 0.107297    
## airlineDelta Air Lines Inc.:origin_airport_codeKSBA       NA       NA    
## airlineUnited Air Lines Inc.:origin_airport_codeKSBA   0.717 0.473368    
## airlineDelta Air Lines Inc.:origin_airport_codeKSFO   -2.533 0.011540 *  
## airlineUnited Air Lines Inc.:origin_airport_codeKSFO   2.631 0.008707 ** 
## airlineDelta Air Lines Inc.:origin_airport_codeKSJC   -1.162 0.245744    
## airlineUnited Air Lines Inc.:origin_airport_codeKSJC   2.824 0.004879 ** 
## airlineDelta Air Lines Inc.:origin_airport_codeKSMF   -2.146 0.032199 *  
## airlineUnited Air Lines Inc.:origin_airport_codeKSMF   1.650 0.099439 .  
## airlineDelta Air Lines Inc.:origin_airport_codeKSNA   -1.779 0.075639 .  
## airlineUnited Air Lines Inc.:origin_airport_codeKSNA   0.449 0.653655    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.097 on 681 degrees of freedom
## Multiple R-squared:  0.1742, Adjusted R-squared:  0.1342 
## F-statistic: 4.352 on 33 and 681 DF,  p-value: 5.404e-14

In the previous assessment, we tested ineraction (inter-dependence) among variables by checking the statistical significance of interaction variables. This posses a challenge, as there are a lot interaction variables. Adjusted R-squared for the model that takes into account of the interaction variable is 0.4342, while the one that does not is 0.3945, both adjusted R-squared being statistically signifciant. The model that accounts for inter-dependence explains the departure delay better.

Test are there any significant inter-dependence

anova(lm5, lm6)

p-value is statistically significant, meaning there are difference between these 2 models. We need to take into account of such interaction effect of in the mixed model.

lm7 - test month & airline

lm7 <- lm(avg_delay ~ airline + month , data = data1)
summary(lm7)
## 
## Call:
## lm(formula = avg_delay ~ airline + month, data = data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.9643  -3.0615  -0.2416   2.8177  19.9071 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                   0.94012    0.59274   1.586  0.11317   
## airlineDelta Air Lines Inc.   0.05144    0.38380   0.134  0.89342   
## airlineUnited Air Lines Inc. -1.29619    0.39423  -3.288  0.00106 **
## month2                        0.02217    0.72839   0.030  0.97573   
## month3                       -1.10846    0.72302  -1.533  0.12570   
## month4                       -2.32934    0.79510  -2.930  0.00350 **
## month5                        0.29473    0.82813   0.356  0.72203   
## month6                        1.21051    0.83292   1.453  0.14658   
## month7                       -0.34253    0.81459  -0.420  0.67426   
## month8                       -0.29125    0.80639  -0.361  0.71808   
## month9                       -1.72793    0.80241  -2.153  0.03162 * 
## month10                      -0.66418    0.79154  -0.839  0.40170   
## month11                      -1.27094    0.79872  -1.591  0.11201   
## month12                       0.61982    0.79512   0.780  0.43593   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.298 on 701 degrees of freedom
## Multiple R-squared:  0.06416,    Adjusted R-squared:  0.0468 
## F-statistic: 3.697 on 13 and 701 DF,  p-value: 1.027e-05

lm8

lm8 <- lm(avg_delay ~ airline * month , data = data1)
summary(lm8)
## 
## Call:
## lm(formula = avg_delay ~ airline * month, data = data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.6398  -2.9774  -0.2722   2.6974  19.2316 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)                            1.0640     0.9194   1.157    0.248
## airlineDelta Air Lines Inc.            0.3676     1.3323   0.276    0.783
## airlineUnited Air Lines Inc.          -2.0604     1.3705  -1.503    0.133
## month2                                -0.4743     1.1943  -0.397    0.691
## month3                                -1.3671     1.1943  -1.145    0.253
## month4                                -1.9532     1.2860  -1.519    0.129
## month5                                 0.4917     1.3002   0.378    0.705
## month6                                 1.7618     1.3002   1.355    0.176
## month7                                -0.5849     1.3002  -0.450    0.653
## month8                                -1.3212     1.3002  -1.016    0.310
## month9                                -1.8926     1.3156  -1.439    0.151
## month10                                0.3802     1.3002   0.292    0.770
## month11                               -1.1281     1.3002  -0.868    0.386
## month12                               -0.7801     1.3002  -0.600    0.549
## airlineDelta Air Lines Inc.:month2     1.1871     1.7449   0.680    0.497
## airlineUnited Air Lines Inc.:month2    0.3615     1.7984   0.201    0.841
## airlineDelta Air Lines Inc.:month3     0.1436     1.7379   0.083    0.934
## airlineUnited Air Lines Inc.:month3    0.7150     1.7818   0.401    0.688
## airlineDelta Air Lines Inc.:month4    -1.3410     1.9017  -0.705    0.481
## airlineUnited Air Lines Inc.:month4    0.1357     1.9619   0.069    0.945
## airlineDelta Air Lines Inc.:month5    -1.9449     2.0126  -0.966    0.334
## airlineUnited Air Lines Inc.:month5    1.1537     2.0129   0.573    0.567
## airlineDelta Air Lines Inc.:month6    -2.5525     2.0126  -1.268    0.205
## airlineUnited Air Lines Inc.:month6    0.5077     2.0381   0.249    0.803
## airlineDelta Air Lines Inc.:month7    -0.4634     1.9449  -0.238    0.812
## airlineUnited Air Lines Inc.:month7    1.3727     2.0129   0.682    0.496
## airlineDelta Air Lines Inc.:month8     0.5529     1.9449   0.284    0.776
## airlineUnited Air Lines Inc.:month8    2.9530     1.9712   1.498    0.135
## airlineDelta Air Lines Inc.:month9    -0.1711     1.9219  -0.089    0.929
## airlineUnited Air Lines Inc.:month9    0.7414     1.9814   0.374    0.708
## airlineDelta Air Lines Inc.:month10   -2.3629     1.8842  -1.254    0.210
## airlineUnited Air Lines Inc.:month10  -0.9123     1.9712  -0.463    0.644
## airlineDelta Air Lines Inc.:month11   -0.2687     1.9272  -0.139    0.889
## airlineUnited Air Lines Inc.:month11  -0.1619     1.9538  -0.083    0.934
## airlineDelta Air Lines Inc.:month12    1.6353     1.8842   0.868    0.386
## airlineUnited Air Lines Inc.:month12   3.0110     1.9908   1.512    0.131
## 
## Residual standard error: 4.312 on 679 degrees of freedom
## Multiple R-squared:  0.08762,    Adjusted R-squared:  0.04059 
## F-statistic: 1.863 on 35 and 679 DF,  p-value: 0.002102

Adjusted R-squared is even lower…

Boxplot

origin_airport_code boxplot

boxplot(avg_delay ~ origin_airport_code, data = data1)

year boxplot

boxplot(avg_delay ~ year, data = data1)

The mean is significantly different for different years.

airline * origin_airport_code [Relatiomship between airline and origin_airport_code]

boxplot(avg_delay ~ airline * origin_airport_code, data = data1)

Different means among groups. Split into subgroups before performing analysis.

We can model these individual differences by assuming different random intercepts for each year & each airport. Why not airlines? As we can pick airlines ourself… These random effects essentially give structure to the error term.

Mixed model

lmer1 <- lmer(avg_delay ~  airline + year + month + (1|origin_airport_code), data = data1, REML = FALSE)
summary(lmer1)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: avg_delay ~ airline + year + month + (1 | origin_airport_code)
##    Data: data1
## 
##      AIC      BIC   logLik deviance df.resid 
##   3717.8   3800.1  -1840.9   3681.8      697 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6368 -0.5762 -0.0324  0.5081  5.2907 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  origin_airport_code (Intercept) 3.169    1.780   
##  Residual                        9.583    3.096   
## Number of obs: 715, groups:  origin_airport_code, 13
## 
## Fixed effects:
##                              Estimate Std. Error t value
## (Intercept)                   3.89818    0.67153   5.805
## airlineDelta Air Lines Inc.   0.11484    0.28726   0.400
## airlineUnited Air Lines Inc. -2.16100    0.28935  -7.469
## year2020                     -5.34366    0.24446 -21.859
## year2021                     -6.75335    0.55234 -12.227
## month2                        0.97777    0.54489   1.794
## month3                       -0.03747    0.54161  -0.069
## month4                       -2.74447    0.57340  -4.786
## month5                       -0.40261    0.59792  -0.673
## month6                        0.40168    0.60191   0.667
## month7                       -1.00432    0.58823  -1.707
## month8                       -1.02200    0.58253  -1.754
## month9                       -2.32818    0.57940  -4.018
## month10                      -1.17772    0.57121  -2.062
## month11                      -1.92774    0.57682  -3.342
## month12                      -0.02905    0.57336  -0.051
## 
## Correlation matrix not shown by default, as p = 16 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

The variance accounted by year is larger than orgin airport. Residual, the variability that is not accounted by year and origin airport has the largest variance.

lmer2

lmer2 <- lmer(avg_delay ~  airline + year + month + (1|origin_airport_code), data = data1, REML = FALSE)
summary(lmer2)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: avg_delay ~ airline + year + month + (1 | origin_airport_code)
##    Data: data1
## 
##      AIC      BIC   logLik deviance df.resid 
##   3717.8   3800.1  -1840.9   3681.8      697 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6368 -0.5762 -0.0324  0.5081  5.2907 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  origin_airport_code (Intercept) 3.169    1.780   
##  Residual                        9.583    3.096   
## Number of obs: 715, groups:  origin_airport_code, 13
## 
## Fixed effects:
##                              Estimate Std. Error t value
## (Intercept)                   3.89818    0.67153   5.805
## airlineDelta Air Lines Inc.   0.11484    0.28726   0.400
## airlineUnited Air Lines Inc. -2.16100    0.28935  -7.469
## year2020                     -5.34366    0.24446 -21.859
## year2021                     -6.75335    0.55234 -12.227
## month2                        0.97777    0.54489   1.794
## month3                       -0.03747    0.54161  -0.069
## month4                       -2.74447    0.57340  -4.786
## month5                       -0.40261    0.59792  -0.673
## month6                        0.40168    0.60191   0.667
## month7                       -1.00432    0.58823  -1.707
## month8                       -1.02200    0.58253  -1.754
## month9                       -2.32818    0.57940  -4.018
## month10                      -1.17772    0.57121  -2.062
## month11                      -1.92774    0.57682  -3.342
## month12                      -0.02905    0.57336  -0.051
## 
## Correlation matrix not shown by default, as p = 16 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

Difference in models

anova(lmer1, lmer2)

lmer2 is better, the random effects accounts for more variations.

Check coefficients

coef(lmer2)
## $origin_airport_code
##      (Intercept) airlineDelta Air Lines Inc. airlineUnited Air Lines Inc.
## KBUR   5.0541516                   0.1148361                    -2.161005
## KFAT   5.8015804                   0.1148361                    -2.161005
## KLAX   5.5895392                   0.1148361                    -2.161005
## KLGB   5.3906646                   0.1148361                    -2.161005
## KOAK   0.3009409                   0.1148361                    -2.161005
## KONT   2.8607443                   0.1148361                    -2.161005
## KPSP   5.3434286                   0.1148361                    -2.161005
## KSAN   3.4599584                   0.1148361                    -2.161005
## KSBA   0.8852358                   0.1148361                    -2.161005
## KSFO   4.9888639                   0.1148361                    -2.161005
## KSJC   2.6730857                   0.1148361                    -2.161005
## KSMF   4.2078303                   0.1148361                    -2.161005
## KSNA   4.1203339                   0.1148361                    -2.161005
##       year2020  year2021    month2      month3    month4     month5    month6
## KBUR -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KFAT -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KLAX -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KLGB -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KOAK -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KONT -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KPSP -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KSAN -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KSBA -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KSFO -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KSJC -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KSMF -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
## KSNA -5.343658 -6.753348 0.9777669 -0.03746621 -2.744472 -0.4026089 0.4016801
##         month7    month8    month9   month10   month11     month12
## KBUR -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KFAT -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KLAX -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KLGB -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KOAK -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KONT -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KPSP -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KSAN -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KSBA -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KSFO -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KSJC -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KSMF -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## KSNA -1.004316 -1.021997 -2.328179 -1.177725 -1.927744 -0.02905211
## 
## attr(,"class")
## [1] "coef.mer"

lmer3

lmer3 <- lmer(avg_delay ~  airline + (1|year) + (1|month) + (1|origin_airport_code), data = data1, REML = FALSE)

summary(lmer3)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: 
## avg_delay ~ airline + (1 | year) + (1 | month) + (1 | origin_airport_code)
##    Data: data1
## 
##      AIC      BIC   logLik deviance df.resid 
##   3744.2   3776.2  -1865.1   3730.2      708 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6660 -0.5959 -0.0289  0.5193  5.2814 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  origin_airport_code (Intercept) 3.425    1.851   
##  month               (Intercept) 1.102    1.050   
##  year                (Intercept) 8.395    2.897   
##  Residual                        9.766    3.125   
## Number of obs: 715, groups:  origin_airport_code, 13; month, 12; year, 3
## 
## Fixed effects:
##                              Estimate Std. Error t value
## (Intercept)                   -0.8323     1.7932  -0.464
## airlineDelta Air Lines Inc.    0.1144     0.2900   0.395
## airlineUnited Air Lines Inc.  -2.1555     0.2920  -7.382
## 
## Correlation of Fixed Effects:
##             (Intr) aDALI.
## arlnDALInc. -0.069       
## arlnUALInc. -0.060  0.426
anova(lmer2, lmer3)
coef(lmer3)
## $origin_airport_code
##      (Intercept) airlineDelta Air Lines Inc. airlineUnited Air Lines Inc.
## KBUR   0.3166692                   0.1144391                    -2.155512
## KFAT   1.0812247                   0.1144391                    -2.155512
## KLAX   0.8608820                   0.1144391                    -2.155512
## KLGB   0.6730247                   0.1144391                    -2.155512
## KOAK  -4.4162527                   0.1144391                    -2.155512
## KONT  -1.8701801                   0.1144391                    -2.155512
## KPSP   0.6351172                   0.1144391                    -2.155512
## KSAN  -1.2732370                   0.1144391                    -2.155512
## KSBA  -3.8926558                   0.1144391                    -2.155512
## KSFO   0.2589266                   0.1144391                    -2.155512
## KSJC  -2.0524523                   0.1144391                    -2.155512
## KSMF  -0.5262898                   0.1144391                    -2.155512
## KSNA  -0.6141267                   0.1144391                    -2.155512
## 
## $month
##    (Intercept) airlineDelta Air Lines Inc. airlineUnited Air Lines Inc.
## 1   -0.1575372                   0.1144391                    -2.155512
## 2    0.6902799                   0.1144391                    -2.155512
## 3   -0.2253344                   0.1144391                    -2.155512
## 4   -2.5335050                   0.1144391                    -2.155512
## 5   -0.5128050                   0.1144391                    -2.155512
## 6    0.1656154                   0.1144391                    -2.155512
## 7   -1.0240491                   0.1144391                    -2.155512
## 8   -1.0392074                   0.1144391                    -2.155512
## 9   -2.1653179                   0.1144391                    -2.155512
## 10  -1.1785374                   0.1144391                    -2.155512
## 11  -1.8225210                   0.1144391                    -2.155512
## 12  -0.1841734                   0.1144391                    -2.155512
## 
## $year
##      (Intercept) airlineDelta Air Lines Inc. airlineUnited Air Lines Inc.
## 2019    3.102546                   0.1144391                    -2.155512
## 2020   -2.208223                   0.1144391                    -2.155512
## 2021   -3.391096                   0.1144391                    -2.155512
## 
## attr(,"class")
## [1] "coef.mer"

The above are random intercept models.

Random slope model –> allow slope to differ

lmer4 <- lmer(avg_delay ~ airline + (1|year) + (1|month) + (1 + airline|origin_airport_code), data = data1, REML = FALSE)
summary(lmer4)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: avg_delay ~ airline + (1 | year) + (1 | month) + (1 + airline |  
##     origin_airport_code)
##    Data: data1
## 
##      AIC      BIC   logLik deviance df.resid 
##   3697.6   3752.4  -1836.8   3673.6      703 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.960 -0.599 -0.006  0.513  5.706 
## 
## Random effects:
##  Groups              Name                         Variance Std.Dev. Corr       
##  origin_airport_code (Intercept)                  3.360    1.833               
##                      airlineDelta Air Lines Inc.  1.189    1.090     0.08      
##                      airlineUnited Air Lines Inc. 5.764    2.401    -0.32 -0.10
##  month               (Intercept)                  1.156    1.075               
##  year                (Intercept)                  8.700    2.950               
##  Residual                                         8.573    2.928               
## Number of obs: 715, groups:  origin_airport_code, 13; month, 12; year, 3
## 
## Fixed effects:
##                              Estimate Std. Error t value
## (Intercept)                   -0.8723     1.8203  -0.479
## airlineDelta Air Lines Inc.    0.1501     0.4382   0.343
## airlineUnited Air Lines Inc.  -2.2177     0.7757  -2.859
## 
## Correlation of Fixed Effects:
##             (Intr) aDALI.
## arlnDALInc. -0.031       
## arlnUALInc. -0.098  0.031

coefficients

coef(lmer4)
## $origin_airport_code
##      (Intercept) airlineDelta Air Lines Inc. airlineUnited Air Lines Inc.
## KBUR  0.95172591                  2.16037631                   -5.9819803
## KFAT  0.05685458                  0.04520088                    1.0052288
## KLAX  1.35156875                 -0.29553265                   -3.3516133
## KLGB  0.31471168                  0.60308929                   -2.7875589
## KOAK -4.08088397                 -0.67662155                   -0.7476625
## KONT -0.71292282                 -1.04055433                   -6.1287612
## KPSP  0.71440357                 -0.32542698                   -2.2908863
## KSAN -1.52087309                 -0.15792817                   -1.2480626
## KSBA -3.88841428                  0.07315146                   -2.3799211
## KSFO -0.53646118                 -0.21432024                    0.4658495
## KSJC -2.68813920                  0.76742881                   -0.6863337
## KSMF -0.85989826                  0.29693617                   -1.3924544
## KSNA -0.44143813                  0.71597734                   -3.3054280
## 
## $month
##    (Intercept) airlineDelta Air Lines Inc. airlineUnited Air Lines Inc.
## 1   -0.1431508                   0.1501366                     -2.21766
## 2    0.7294980                   0.1501366                     -2.21766
## 3   -0.1708902                   0.1501366                     -2.21766
## 4   -2.5231718                   0.1501366                     -2.21766
## 5   -0.4888234                   0.1501366                     -2.21766
## 6    0.1425545                   0.1501366                     -2.21766
## 7   -1.1439909                   0.1501366                     -2.21766
## 8   -1.1056968                   0.1501366                     -2.21766
## 9   -2.2864294                   0.1501366                     -2.21766
## 10  -1.2985517                   0.1501366                     -2.21766
## 11  -1.9206356                   0.1501366                     -2.21766
## 12  -0.2581888                   0.1501366                     -2.21766
## 
## $year
##      (Intercept) airlineDelta Air Lines Inc. airlineUnited Air Lines Inc.
## 2019    3.139365                   0.1501366                     -2.21766
## 2020   -2.253663                   0.1501366                     -2.21766
## 2021   -3.502572                   0.1501366                     -2.21766
## 
## attr(,"class")
## [1] "coef.mer"

Slopes do differ a lot

origin_airport_code <- c('KBUR', 'KBUR', 'KBUR')
airline <- c('American Airlines Inc.', 'Delta Air Lines Inc.', 'United Air Lines Inc.')
year <- c(2021, 2021, 2021)
month <- c(11, 11, 11)

newdata <- data.frame(origin_airport_code, airline, year, month)

predict

predict(lmer4, newdata)
##          1          2          3 
## -2.7269021 -0.5665258 -8.7088824

diagnostic plot

plot(lmer4, type = c("p", "smooth"))

plot(lmer4, sqrt(abs(resid(.))) ~ fitted(.), type = c("p", "smooth"))

qqmath(lmer4, id = 0.05)

confint

#confint(lmer4)

data3

data3 <- raw_data %>% group_by(origin_state_abr, origin_airport_code, airline, year, month) %>% summarise(avg_delay = mean(dep_delay), ave_age = mean(age)) %>% droplevels()
## `summarise()` has grouped output by 'origin_state_abr', 'origin_airport_code', 'airline', 'year'. You can override using the `.groups` argument.

lmer5

lmer5 <- lmer(avg_delay ~ airline + (1|year) + (1|month) + (1|origin_airport_code) + (1+ airline|origin_state_abr), data = data3, REML = FALSE)
summary(lmer5)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: 
## avg_delay ~ airline + (1 | year) + (1 | month) + (1 | origin_airport_code) +  
##     (1 + airline | origin_state_abr)
##    Data: data3
## 
##      AIC      BIC   logLik deviance df.resid 
##  36477.6  36565.6 -18225.8  36451.6     6383 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.4148 -0.4867 -0.0400  0.3869 23.2122 
## 
## Random effects:
##  Groups              Name                         Variance Std.Dev. Corr       
##  origin_airport_code (Intercept)                   2.86078 1.6914              
##  origin_state_abr    (Intercept)                   0.08272 0.2876              
##                      airlineDelta Air Lines Inc.   1.18156 1.0870   -0.78      
##                      airlineUnited Air Lines Inc.  3.02602 1.7395   -0.52 -0.13
##  month               (Intercept)                   0.89824 0.9478              
##  year                (Intercept)                   4.77851 2.1860              
##  Residual                                         16.33502 4.0417              
## Number of obs: 6396, groups:  
## origin_airport_code, 131; origin_state_abr, 47; month, 12; year, 3
## 
## Fixed effects:
##                              Estimate Std. Error t value
## (Intercept)                   -0.4751     1.3067  -0.364
## airlineDelta Air Lines Inc.   -1.3599     0.2204  -6.171
## airlineUnited Air Lines Inc.  -1.1069     0.3206  -3.453
## 
## Correlation of Fixed Effects:
##             (Intr) aDALI.
## arlnDALInc. -0.060       
## arlnUALInc. -0.037  0.071

lmer6

lmer6 <- lmer(avg_delay ~ airline + (1|year) + (1|month) + (1+ airline|origin_airport_code) + (1|origin_state_abr), data = data3, REML = FALSE)
summary(lmer6)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: avg_delay ~ airline + (1 | year) + (1 | month) + (1 + airline |  
##     origin_airport_code) + (1 | origin_state_abr)
##    Data: data3
## 
##      AIC      BIC   logLik deviance df.resid 
##  36114.3  36202.2 -18044.1  36088.3     6383 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -9.9789 -0.4804 -0.0348  0.3874 24.3303 
## 
## Random effects:
##  Groups              Name                         Variance Std.Dev. Corr       
##  origin_airport_code (Intercept)                   2.8957  1.7017              
##                      airlineDelta Air Lines Inc.   3.3313  1.8252   -0.48      
##                      airlineUnited Air Lines Inc. 19.9879  4.4708    0.23 -0.35
##  origin_state_abr    (Intercept)                   0.1511  0.3887              
##  month               (Intercept)                   0.9080  0.9529              
##  year                (Intercept)                   4.8660  2.2059              
##  Residual                                         14.8082  3.8481              
## Number of obs: 6396, groups:  
## origin_airport_code, 131; origin_state_abr, 47; month, 12; year, 3
## 
## Fixed effects:
##                              Estimate Std. Error t value
## (Intercept)                   -0.4894     1.3204  -0.371
## airlineDelta Air Lines Inc.   -1.3862     0.2148  -6.453
## airlineUnited Air Lines Inc.  -0.7066     0.4823  -1.465
## 
## Correlation of Fixed Effects:
##             (Intr) aDALI.
## arlnDALInc. -0.087       
## arlnUALInc.  0.014 -0.176

coef lmer6

coef(lmer6)
## $origin_airport_code
##       (Intercept) airlineDelta Air Lines Inc. airlineUnited Air Lines Inc.
## KABQ -0.124806656                 -2.22576054                  -0.18294408
## KAGS -0.864615426                 -1.87810165                  -0.41020862
## KALB -1.681679541                 -1.86563651                   1.09178478
## KAMA -0.326643016                 -1.47023513                  -0.60783678
## KATL  0.206139388                  0.56599529                   1.39145887
## KAUS -1.708292081                 -0.26278792                  -0.74205165
## KAVL -0.827251516                 -1.82911246                  -0.43972769
## KAVP -1.683928367                 -2.95233270                   0.23708328
## KBDL -2.093709591                 -0.78992001                  -1.91872524
## KBHM -1.372073150                 -0.62232213                  -3.03947897
## KBIL -1.188598292                 -2.30288739                  -0.15424845
## KBIS -1.173216197                 -2.28271936                  -0.16640095
## KBNA -0.390058781                 -2.51927344                   1.77800016
## KBOI -0.597068400                 -2.16708904                  -4.00032676
## KBOS -0.938576576                  0.58409063                  -0.51667518
## KBTR  0.117346444                 -3.17001896                   5.08264234
## KBTV  1.256775175                  2.14476077                  -5.10320486
## KBUF -1.713689287                 -0.85466412                  -3.30082765
## KBUR  0.728568273                  1.88463592                  -5.46677180
## KBWI -0.884927077                 -1.43339912                  -2.13870668
## KBZN -0.029193418                 -2.66934079                   0.22513936
## KCAE  1.994006413                 -4.38981838                   2.11616053
## KCHS -1.104696702                 -0.49090690                  -2.84765990
## KCID  1.039835256                 -1.52788244                   3.30243538
## KCLE -1.990922065                 -0.53181557                  -1.18092287
## KCLT  1.446133770                 -2.59632296                  -2.56746371
## KCMH -0.182680520                 -2.62363217                  -1.21783332
## KCOS -0.285146319                 -1.88469397                   0.43295896
## KCRP -2.018270967                  0.04612231                  -7.85117621
## KCRW -1.157773629                 -2.26247205                  -0.17860123
## KCVG  1.337658376                 -1.89356005                  -2.33440990
## KDAL -0.219326160                 -1.03203938                  -0.92001445
## KDAY -0.943275252                 -1.98123541                  -0.34806403
## KDCA -1.722547977                 -0.81352218                  -0.81394012
## KDFW  3.109294613                 -3.20063058                  -4.40102044
## KDSM -0.933006367                 -1.18104941                  -0.74355323
## KDTW -1.030875041                  0.59307390                   0.81843707
## KEGE  1.906223556                 -2.46028142                  -2.49377785
## KELP -0.838116236                  0.28052182                  -4.76115318
## KEUG -1.252974825                 -0.67083155                  -4.27485229
## KEWR -0.269374802                 -0.36074558                   2.59586475
## KEYW  1.428871061                 -2.37467280                   0.45593704
## KFAR -1.750488284                 -0.52935195                  -1.62910276
## KFAT  0.124470034                 -1.91524573                   1.71691002
## KFAY -0.518434653                 -1.42421142                  -0.68370608
## KFLL -0.070580008                 -0.78029267                  -2.43404541
## KFNT -1.615548501                 -2.86267737                   0.18306029
## KFSD  0.907856990                  1.13634888                  -3.48863078
## KGEG -3.476503971                  0.41552428                  -3.11303840
## KGPT -0.544744114                 -1.45870672                  -0.66292049
## KGRB -3.087709654                 -2.19560479                  -4.96594965
## KGRR  0.464521343                 -3.19790725                  -3.80035842
## KGSO -1.228859029                 -1.46955485                  -0.79976825
## KGSP  1.142197029                 -4.63962265                   2.64948984
## KGTF -1.186736786                 -2.30044670                  -0.15571911
## KHOU -1.015647066                 -2.07612473                  -0.29088723
## KHPN  0.400822630                 -0.21893970                  -1.40995824
## KIAD -0.121110039                 -1.90582423                  -0.27205711
## KIAH  1.018786504                  0.23099401                  -0.58201644
## KICT -0.896486142                 -0.77412657                  -3.46820487
## KILM -1.335925462                 -0.07312441                  -1.88987907
## KIND -0.764415335                 -1.54843411                  -0.38774472
## KJAC  0.703043938                 -2.28627580                   1.23971298
## KJAN -1.054443943                 -2.12699273                  -0.26023606
## KJAX -0.861157419                 -2.09101810                  -1.43634111
## KJFK -0.166810456                  0.84313361                   1.18941110
## KLAS  0.102163709                 -1.90125788                  -1.57914633
## KLAX  1.496803097                 -0.76723925                  -3.30120005
## KLBB -0.696537601                 -1.27917572                  -0.83235348
## KLEX -0.480480316                 -1.37444811                  -0.71369161
## KLFT -1.279137278                 -2.42159635                  -0.08271881
## KLGA -0.619968404                  0.70943939                  -0.16964069
## KLGB  0.436733491                 -0.17185567                  -1.43832934
## KLIT  1.440846015                 -4.14866715                   1.81447378
## KMAF -1.094642796                 -0.80582971                  -3.66392696
## KMCI -1.696340272                 -1.00949521                  -2.76239957
## KMCO -0.966324848                 -0.58738172                  -0.11903815
## KMDT -1.042673909                 -1.57112684                   3.49870020
## KMDW -1.063309369                 -2.13861652                  -0.25323200
## KMEM -1.270724205                 -1.64503075                  -4.13034778
## KMFE -0.149791852                 -1.03808504                  -5.56492399
## KMFR -0.421569680                 -1.44971843                  -0.38959279
## KMHT -3.034707651                 -0.17189027                  -2.17479356
## KMIA  1.669997519                 -1.13780017                  -1.82295338
## KMKE -1.515360332                 -1.90452870                   4.46842586
## KMOB -1.095697825                 -2.18108220                  -0.22764375
## KMSN  2.135142579                 -5.19321797                  -0.87997703
## KMSO  1.725380223                 -3.25207539                   1.18950777
## KMSP  0.008099133                 -0.47335876                  -1.08378611
## KMSY -0.560291225                 -0.67440715                  -1.44235750
## KMYR -2.368485009                 -1.71777572                   7.75586071
## KOAK -3.084939639                 -1.24046193                  -1.36866161
## KOMA -2.070391852                 -1.12366140                  -0.33799929
## KONT -0.392227216                 -1.49204089                  -6.01769697
## KORD  2.419264516                 -2.52482717                  -1.37393217
## KPBI -1.035169378                 -0.54686528                  -0.77968343
## KPDX  0.416476836                 -1.97205533                  -2.68458005
## KPHF -1.858225961                 -3.18086063                   0.37478575
## KPHL -0.404577177                 -1.59351733                  -1.40309762
## KPHX  0.372098463                 -0.60891647                  -2.24307151
## KPIT -2.122852646                 -0.66507869                   0.84640340
## KPNS  0.414422554                 -2.90599915                  -1.64620657
## KPSP  0.877956692                 -1.13285942                  -2.04749524
## KPVD -1.669366332                 -1.51844768                  -2.59524958
## KPWM -1.507813049                 -0.25835001                  -3.17123826
## KRDU  0.038232676                 -1.42573223                  -1.32719764
## KRIC -0.995625319                 -1.38222652                  -3.44970968
## KRNO -0.866761691                 -0.83901437                  -2.48480857
## KROA -0.282789957                 -1.11524908                  -0.86987536
## KROC -0.813258606                 -3.24826923                   1.67510862
## KRSW -0.909453668                 -2.13616666                  -2.18808181
## KSAN -1.202785145                 -0.52707248                  -1.39687575
## KSAT -1.424264296                 -0.06287290                  -2.98270279
## KSAV -0.577039207                 -1.11436300                   8.61142810
## KSBA -3.125825794                  0.05212237                  -3.04706892
## KSBN  2.061274282                  1.90614283                  -2.59542078
## KSDF  0.007914770                 -2.29744675                  -0.45152310
## KSEA -0.069818835                  0.01901700                  -2.26914253
## KSFO -0.248506105                 -0.67253348                   0.43314227
## KSJC -2.368715899                  0.62648806                  -0.60551023
## KSLC  0.807836979                 -1.14948891                   0.59552196
## KSMF -0.627544226                 -0.04407589                  -1.44655929
## KSNA -0.300133868                  0.43198059                  -3.32572048
## KSRQ -1.372006409                 -0.37063331                   0.08766097
## KSTL -1.110269337                 -1.13637403                   4.60826019
## KSYR -2.883140061                 -1.05132379                   1.17441377
## KTLH -0.834065260                 -1.83804621                  -0.43434454
## KTPA -0.863439377                 -0.52518987                  -0.83061246
## KTUL -1.266170508                 -1.53738853                  -4.09700713
## KTVC  6.180013927                 -7.73159321                  30.69689371
## KTYS  0.829790432                 -4.62654564                   2.05011221
## 
## $origin_state_abr
##    (Intercept) airlineDelta Air Lines Inc. airlineUnited Air Lines Inc.
## AL  -0.5886419                    -1.38616                   -0.7066342
## AR  -0.4425415                    -1.38616                   -0.7066342
## AZ  -0.4051077                    -1.38616                   -0.7066342
## CA  -0.1667563                    -1.38616                   -0.7066342
## CO  -0.3563375                    -1.38616                   -0.7066342
## CT  -0.5794514                    -1.38616                   -0.7066342
## FL  -0.3489484                    -1.38616                   -0.7066342
## GA  -0.4446634                    -1.38616                   -0.7066342
## IA  -0.4206968                    -1.38616                   -0.7066342
## ID  -0.5134320                    -1.38616                   -0.7066342
## IL  -0.3845164                    -1.38616                   -0.7066342
## IN  -0.2399631                    -1.38616                   -0.7066342
## KS  -0.4943391                    -1.38616                   -0.7066342
## KY  -0.3683839                    -1.38616                   -0.7066342
## LA  -0.5785233                    -1.38616                   -0.7066342
## MA  -0.4631256                    -1.38616                   -0.7066342
## MD  -0.5151774                    -1.38616                   -0.7066342
## ME  -0.5216482                    -1.38616                   -0.7066342
## MI  -0.3617128                    -1.38616                   -0.7066342
## MN  -0.4281670                    -1.38616                   -0.7066342
## MO  -0.6020677                    -1.38616                   -0.7066342
## MS  -0.5563684                    -1.38616                   -0.7066342
## MT  -0.5541916                    -1.38616                   -0.7066342
## NC  -0.4627414                    -1.38616                   -0.7066342
## ND  -0.6227537                    -1.38616                   -0.7066342
## NE  -0.5904767                    -1.38616                   -0.7066342
## NH  -0.6252932                    -1.38616                   -0.7066342
## NJ  -0.4506183                    -1.38616                   -0.7066342
## NM  -0.4898935                    -1.38616                   -0.7066342
## NV  -0.4689670                    -1.38616                   -0.7066342
## NY  -0.6589458                    -1.38616                   -0.7066342
## OH  -0.6293680                    -1.38616                   -0.7066342
## OK  -0.5406497                    -1.38616                   -0.7066342
## OR  -0.4635166                    -1.38616                   -0.7066342
## PA  -0.7616960                    -1.38616                   -0.7066342
## RI  -0.5704103                    -1.38616                   -0.7066342
## SC  -0.5673327                    -1.38616                   -0.7066342
## SD  -0.3154291                    -1.38616                   -0.7066342
## TN  -0.5840006                    -1.38616                   -0.7066342
## TX  -0.2356007                    -1.38616                   -0.7066342
## UT  -0.3962856                    -1.38616                   -0.7066342
## VA  -0.7024263                    -1.38616                   -0.7066342
## VT  -0.2592775                    -1.38616                   -0.7066342
## WA  -0.5642329                    -1.38616                   -0.7066342
## WI  -0.7084898                    -1.38616                   -0.7066342
## WV  -0.5615489                    -1.38616                   -0.7066342
## WY  -0.4377071                    -1.38616                   -0.7066342
## 
## $month
##    (Intercept) airlineDelta Air Lines Inc. airlineUnited Air Lines Inc.
## 1    0.5530361                    -1.38616                   -0.7066342
## 2    0.9382499                    -1.38616                   -0.7066342
## 3   -0.5505058                    -1.38616                   -0.7066342
## 4   -1.7948625                    -1.38616                   -0.7066342
## 5   -0.9119892                    -1.38616                   -0.7066342
## 6    0.6979843                    -1.38616                   -0.7066342
## 7   -0.2347804                    -1.38616                   -0.7066342
## 8   -0.6804378                    -1.38616                   -0.7066342
## 9   -1.6794191                    -1.38616                   -0.7066342
## 10  -0.8607703                    -1.38616                   -0.7066342
## 11  -1.5606916                    -1.38616                   -0.7066342
## 12   0.2112274                    -1.38616                   -0.7066342
## 
## $year
##      (Intercept) airlineDelta Air Lines Inc. airlineUnited Air Lines Inc.
## 2019    2.593033                    -1.38616                   -0.7066342
## 2020   -2.028811                    -1.38616                   -0.7066342
## 2021   -2.032461                    -1.38616                   -0.7066342
## 
## attr(,"class")
## [1] "coef.mer"

lm model to test airline & origin_state_abr

lmtest1 <- lm(avg_delay ~ airline + origin_state_abr, data = data3)
lmtest2 <- lm(avg_delay ~ airline * origin_state_abr, data = data3)
anova(lmtest1, lmtest2)