# Load libraries
library(dplyr)
library(ggplot2)
library(tidyr)
library(scales)

Airline_Delay_Post_COVID_2021_2023 <- read.csv("Airline_Delay_Post_COVID_2021_2023.csv")
df <- Airline_Delay_Post_COVID_2021_2023

I created four new variables:

  1. total_delay_minutes — the sum of all five delay cause categories (carrier, weather, NAS, security, and late aircraft), giving a single comprehensive measure of total delay burden per observation.

  2. delay_pct — the percentage of flights delayed more than 15 minutes, calculated as (arr_del15 / arr_flights) * 100. Using a percentage instead of a raw count avoids unfairly penalizing airlines that simply operate more flights.

  3. avg_carrier_delay — the average delay in minutes per carrier-caused delay event (carrier_delay / carrier_ct). This normalizes carrier delay by frequency so we can compare the severity of each event rather than just how many occurred. Observations where carrier_ct is zero are assigned NA to avoid division by zero.

  4. delay_rate — the proportion of arriving flights delayed 15+ minutes (arr_del15 / arr_flights). This serves as the response variable in Pair 2 and captures airline reliability as passengers experience it.

df <- df %>%
  mutate(
    total_delay_minutes = carrier_delay + weather_delay +
      nas_delay + security_delay + late_aircraft_delay,
    delay_pct           = (arr_del15 / arr_flights) * 100,
    avg_carrier_delay   = ifelse(carrier_ct > 0,
                                  carrier_delay / carrier_ct,
                                  NA),
    delay_rate          = arr_del15 / arr_flights
  )

These new variables help us compare airlines more fairly. For example, using percentages instead of raw counts avoids unfairly penalizing airlines that operate more flights.


Pair 1: Total Flights vs Total Delay Minutes

Variables for analysis:
Explanatory variable (X): arr_flights - original column
Response variable (Y): total_delay_minutes - created variable

This direction is appropriate because the number of flights logically influences the total amount of delay minutes accumulated.

Visualization

ggplot(df, aes(x = arr_flights, y = total_delay_minutes)) +
  geom_point(alpha = 0.2, size = 1.2, color = "steelblue") +
  scale_x_continuous(labels = scales::comma) +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title    = "Number of Arriving Flights vs Total Delay Minutes",
    subtitle = "Each point = one carrier–airport–month observation (2021–2023)",
    x        = "Number of Arriving Flights",
    y        = "Total Delay Minutes"
  ) +
  theme_minimal(base_size = 15)

The scatterplot shows a strong positive relationship: as the number of arriving flights increases, total delay minutes also increase. The pattern appears approximately linear, supporting the use of Pearson’s correlation coefficient. Variability is slightly higher for large hubs, but the linear trend still holds.

A few high-volume airports have extreme total delay minutes, but they follow the overall trend and do not distort the linear relationship.

Correlation

r1 <- cor(df$arr_flights, df$total_delay_minutes, use = "complete.obs")
cat("Pearson r (arr_flights vs total_delay_minutes):", round(r1, 4))
## Pearson r (arr_flights vs total_delay_minutes): 0.9004

The correlation is strong and positive (r ≈ 0.90), confirming the nearly linear pattern observed in the scatterplot. This indicates that total delay minutes are largely driven by operational volume.

Confidence Interval for Total Delay Minutes

t.test(df$total_delay_minutes)
## 
##  One Sample t-test
## 
## data:  df$total_delay_minutes
## t = 70.298, df = 44876, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  3948.316 4174.800
## sample estimates:
## mean of x 
##  4061.558

The 95% confidence interval for mean total delay minutes is approximately 3,948 to 4,175 minutes per carrier–airport–month observation, indicating that the average carrier–airport–month observation accumulates approximately 4,000 total delay minutes.

Pair 2: Number of Arriving Flights vs Delay Rate

We use the delay_rate variable created above:

Variables are as follows:

Explanatory (X): arr_flights - original column
Response (Y): delay_rate - created variable

arr_flights is used as X instead of arr_del15 to avoid artificially inflating the correlation.

Visualization

ggplot(df, aes(x = arr_flights, y = delay_rate)) +
  geom_jitter(alpha = 0.2, size = 1.0, color = "darkorange",
              width = 10, height = 0.002) +
  scale_x_continuous(labels = scales::comma) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title    = "Number of Arriving Flights vs Delay Rate",
    subtitle = "Each point = one carrier–airport–month observation (2021–2023)",
    x        = "Number of Arriving Flights",
    y        = "Delay Rate (Proportion of Flights Delayed)"
  ) +
  theme_minimal(base_size = 13)

The scatterplot displays a clear funnel-shaped pattern. For small operations (low arr_flights), delay rates vary widely, ranging from near 0% to well above 50%. In contrast, larger operations cluster within a narrower band, generally around 15–30% delay rates. This indicates that while smaller carrier–airport combinations experience substantial month-to-month variability, larger operations exhibit more stable and consistent delay proportions. Importantly, the plot does not show a systematic increase in delay rate as flight volume grows; instead, variability decreases with scale.

A small number of observations show extremely high delay rates, particularly among low-volume operations. These points occur where a limited number of flights experienced unusually high disruption within a single month. Because they are concentrated among small operations and do not form a broader pattern, they do not alter the overall relationship observed in the plot.

Correlation

r2 <- cor(df$arr_flights, df$delay_rate, use = "complete.obs")
cat("Pearson r (arr_flights vs delay_rate):", round(r2, 4))
## Pearson r (arr_flights vs delay_rate): -0.0031

The correlation between arr_flights and delay_rate is weak, confirming what the scatterplot suggests: operational scale does not strongly predict delay proportion. While larger operations accumulate more total delay minutes, they are not systematically more delayed on a per-flight basis. This reinforces that delay_rate is a more meaningful performance measure than raw delay counts.

Confidence Interval for Delay Rate

t.test(df$delay_rate)
## 
##  One Sample t-test
## 
## data:  df$delay_rate
## t = 351.3, df = 44864, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.1904073 0.1925439
## sample estimates:
## mean of x 
## 0.1914756

The 95% confidence interval for the mean delay rate is approximately 19.0% to 19.3%. This indicates that, on average, about one in five flights was delayed during 2021–2023. The narrow interval reflects the large sample size and provides a stable baseline for evaluating airline performance.

Overall Insights

  1. There is a strong positive linear relationship between number of arriving flghts and total delay minutes (r ≈ 0.90), indicating that delay accumulation is largely driven by operational
  2. In contrast, operational scale does not strongly predict delay rate. Larger hubs are not systematically less reliable on a per-flight basis.
  3. The average delay rate of approximately 19% provides a practical system-wide benchmark for airline reliability during 2021–2023.

Further Questions

  1. Do certain carriers consistently operate above or below the 19% baseline?
  2. Does seasonality (month) significantly affect delay rates?
  3. Are specific airports contributing disproportionately to extreme delay observations?
  4. Has system-wide reliability improved from 2021 to 2023?

Conclusion

This analysis highlights the distinction between operational volume and operational performance. While larger operations generate more total delay minutes, delay proportion remains relatively stable across scale. By combining visualization, correlation analysis, and interval estimation, we obtain a clearer understanding of airline reliability during the post-COVID recovery period.