Data Dive: Time Series Analysis

Introduction

This data dive explores how flight delays change over time using the nycflights13 dataset. By converting date information into a proper time format and analyzing trends and seasonality, we aim to understand how delays vary across the year.

library(tidyverse)
library(nycflights13)
library(lubridate)
library(tsibble)

df <- flights |>
  mutate(date = make_date(year, month, day)) |>
  filter(!is.na(arr_delay))

Response Variable

Response Variable: arr_delay (arrival delay)

daily_delay <- df |>
  group_by(date) |>
  summarise(
    mean_delay = mean(arr_delay),
    .groups = "drop"
  )

delay_ts <- daily_delay |>
  as_tsibble(index = date)

delay_ts
## # A tsibble: 365 x 2 [1D]
##    date       mean_delay
##    <date>          <dbl>
##  1 2013-01-01     12.7  
##  2 2013-01-02     12.7  
##  3 2013-01-03      5.73 
##  4 2013-01-04     -1.93 
##  5 2013-01-05     -1.53 
##  6 2013-01-06      4.24 
##  7 2013-01-07     -4.95 
##  8 2013-01-08     -3.23 
##  9 2013-01-09     -0.264
## 10 2013-01-10     -5.90 
## # ℹ 355 more rows
delay_ts |>
  ggplot(aes(x = date, y = mean_delay)) +
  geom_line() +
  labs(
    title = "Average Arrival Delay Over Time",
    x = "Date",
    y = "Mean Arrival Delay (minutes)"
  ) +
  theme_classic()

Interpretation

The time series plot shows that average arrival delay fluctuates throughout the year rather than remaining constant. There are noticeable spikes and dips, indicating that delays are influenced by short-term factors rather than a steady trend. The variability suggests that external conditions such as weather, travel demand, and operational disruptions play a role in determining delays.

Analyzing Different Seasons

season_data <- daily_delay |>
  mutate(season = case_when(
    month(date) %in% c(12, 1, 2) ~ "Winter",
    month(date) %in% c(3, 4, 5) ~ "Spring",
    month(date) %in% c(6, 7, 8) ~ "Summer",
    month(date) %in% c(9, 10, 11) ~ "Fall"
  ))

season_data |>
  ggplot(aes(x = date, y = mean_delay, color = season)) +
  geom_line() +
  labs(
    title = "Average Arrival Delay by Season",
    x = "Date",
    y = "Mean Arrival Delay (minutes)"
  ) +
  theme_classic()

Interpretation of Seasonal Patterns

Comparing delays across seasons shows that delay patterns vary throughout the year. Summer appears to have higher variability and more frequent spikes in delay, while fall shows relatively lower delays. winter and spring exhibit moderate fluctuations.

This suggests that seasonal factors, such as increased travel demand in summer or weather conditions in winter, may influence delay behavior. Therefore, delay patterns are not uniform across the year.

Linear Regression

trend_model <- lm(mean_delay ~ as.numeric(date), data = daily_delay)
summary(trend_model)
## 
## Call:
## lm(formula = mean_delay ~ as.numeric(date), data = daily_delay)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.906 -11.083  -5.328   5.207  78.079 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)      113.475652 136.476416   0.831    0.406
## as.numeric(date)  -0.006701   0.008590  -0.780    0.436
## 
## Residual standard error: 17.29 on 363 degrees of freedom
## Multiple R-squared:  0.001674,   Adjusted R-squared:  -0.001076 
## F-statistic: 0.6086 on 1 and 363 DF,  p-value: 0.4358

Interpretation of Trend Model

The linear regression model shows no significant long-term trend in average arrival delay over time. The slope is very close to zero and the p-value is greater than 0.05, indicating that we fail to detect a meaningful upward or downward trend.

Additionally, the R² value is extremely low, meaning that time explains very little of the variation in delays. This suggests that delays fluctuate rather than consistently increasing or decreasing over the year.

Plot Smoothed Trend

delay_ts |>
  ggplot(aes(x = date, y = mean_delay)) +
  geom_point(alpha = 0.3) +
  geom_smooth(se = FALSE) +
  labs(
    title = "Smoothed Trend of Arrival Delay",
    x = "Date",
    y = "Mean Arrival Delay (minutes)"
  ) +
  theme_classic()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Interpretation of Smoothed Trend

The smoothed trend reveals a nonlinear pattern in delays over time. Delays increase slightly during the middle of the year, decrease toward the fall, and then rise again toward the end of the year.

This indicates that while there is no strong linear trend, there are underlying seasonal or cyclical patterns influencing delays.

Seasonality Using ACF

acf(daily_delay$mean_delay, main = "ACF of Mean Arrival Delay")

Interpretation of ACF

The ACF plot shows significant correlation at small lags, indicating that delay values are related to recent past values. There are also smaller repeating peaks at higher lags, suggesting the presence of weak periodic patterns.

This indicates that delays are not independent over time and may follow short-term cycles, such as weekly patterns.

pacf(daily_delay$mean_delay, main = "PACF of Mean Arrival Delay")

Interpretation

The PACF plot shows that only a few early lags have significant partial correlation, while later lags are mostly insignificant. This suggests that most of the dependence in the data can be explained by recent observations rather than long-term dependencies.

Overall, the time series shows short-term correlation but limited long-term structure.

Insights and Significance

The analysis shows that arrival delays vary over time and are influenced more by short-term fluctuations and seasonal patterns than by a long-term trend. While there is no strong evidence of delays increasing or decreasing over the year, there are clear periods of higher variability, particularly during summer.

This suggests that delays are driven by external and recurring factors such as weather, travel demand, and operational congestion rather than a steady change over time. Understanding these patterns is important for anticipating periods of higher delay risk.

Further Questions

  • Do weather conditions explain the spikes observed in certain periods?
  • Are delays more strongly seasonal at specific airports?
  • Do certain airlines perform differently across seasons?
  • Is there a weekly pattern in delays that could be modeled more explicitly?
  • Can time series forecasting models improve prediction of delays?

Conclusion

This time series analysis shows that arrival delays vary over time but do not follow a strong long-term upward or downward trend. Instead, delays are driven by short-term fluctuations and seasonal patterns, with certain periods showing higher variability than others. The regression results confirm that time alone does not explain much of the variation in delays, while the smoothed trend and ACF/PACF plots suggest the presence of cyclical behavior.

Overall, this indicates that delays are influenced more by recurring external factors such as weather, travel demand, and operational conditions rather than a steady trend over time. Understanding these patterns is important for better planning and forecasting, and future analysis could incorporate additional variables to improve predictive accuracy.