Time Series Linear Regression (TSLM) with fpp3

Author

Published

February 28, 2026

Introduction

Time series linear regression is a fundamental technique for modeling time series data with deterministic patterns like trends and seasonality. Unlike simple linear regression, time series regression must account for the temporal dependencies in the data.

In this analysis, we’ll examine the classic AirPassengers dataset - monthly international airline passenger numbers from 1949-1960. We’ll build three increasingly sophisticated models:

Model 1: Quadratic trend only
Model 2: Quadratic trend + seasonal dummies
Model 3: Log-linear with trend + seasonality

Setup and Data Preparation

suppressPackageStartupMessages({
  library(fpp3)      # Main time series package
  library(patchwork) # For combining plots
  library(broom)     # For model summaries
  library(stargazer) # For regression tables
})

?AirPassengers
# Load and prepare the AirPassengers data
AP_data <- AirPassengers %>%
  as_tsibble() %>%
  mutate(
    t = row_number(),           # Time trend variable
    t_squared = t^2,            # Quadratic trend
    log_value = log(value)      # Log transformation
  )

# Display first few rows
head(AP_data)

The as_tsibble() function converts the classic R time series object into the modern fpp3 format. We create additional variables for modeling:

t: Linear time trend (1, 2, 3, …)
t_squared: Quadratic trend for curved relationships
log_value: Log transformation for multiplicative patterns

Exploratory Data Analysis

??ggplot2
??ggplot2::autoplot

# Create side-by-side plots
p1 <- AP_data %>% 
  autoplot(value) + 
  labs(title = "Original AirPassengers Data", 
       y = "Passengers (thousands)",
       x = "Year")

p2 <- AP_data %>% 
  autoplot(log_value) + 
  labs(title = "Log-Transformed Data", 
       y = "Log(Passengers)",
       x = "Year")

# Display plots together
p1 / p2

Key Observations:

Original data shows clear upward trend and seasonal pattern
Seasonality appears multiplicative - seasonal fluctuations increase over time
Log transformation converts multiplicative seasonality to additive
Both show strong annual cycles with summer peaks

ACF/PACF Analysis for Model Identification

https://towardsdatascience.com/interpreting-acf-and-pacf-plots-for-time-series-forecasting-af0d6db4061c/

# Analyze autocorrelation patterns
AP_data %>% 
  gg_tsdisplay(value, plot_type = "partial", lag_max = 36) +
  labs(title = "ACF/PACF Analysis: Original Series")

Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
ℹ Please use `ggtime::gg_tsdisplay()` instead.

ACF/PACF Interpretation:

ACF (top): Slow decay + seasonal spikes at lags 12, 24, 36 → non-stationary with trend and seasonality
PACF (bottom): Strong spike at lag 1, then smaller oscillations → suggests autoregressive structure
Seasonal pattern: Clear 12-month cycle confirms annual seasonality
Conclusion: Series needs detrending and deseasoning before ARIMA modeling

# ACF/PACF of log-transformed data
AP_data %>% 
  gg_tsdisplay(log_value, plot_type = "partial", lag_max = 36) +
  labs(title = "ACF/PACF Analysis: Log-Transformed Series")

Log-Transformed Analysis:

Log transformation stabilizes variance but doesn’t remove trend/seasonality
ACF pattern similar to original, confirming need for regression approach (explicit modeling of these components i.e. regression with trend + seasonal terms)
This motivates our Model 3 (log-linear regression)

Model Development

Model 1: Quadratic Trend Only

The first model captures only the long-term growth pattern: \[AP_t = \beta_0 + \beta_1 t + \beta_2 t^2 + \epsilon_t\]

?fable::TSLM

# Fit Model 1: Quadratic trend
model1 <- AP_data %>% 
  model(Model1 = TSLM(formula = value ~ trend() + I(trend()^2)
                      )
        )

# Display model summary
report(model1)

Series: value 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-100.353  -27.339   -7.442   21.603  146.116 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.124e+02  1.138e+01   9.872  < 2e-16 ***
trend()      1.641e+00  3.625e-01   4.527 1.26e-05 ***
I(trend()^2) 7.008e-03  2.421e-03   2.894  0.00441 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 44.91 on 141 degrees of freedom
Multiple R-squared: 0.8618, Adjusted R-squared: 0.8599
F-statistic: 439.8 on 2 and 141 DF, p-value: < 2.22e-16

Model 1 Interpretation:

Intercept: Starting level of passengers
trend(): Linear growth rate per month
I(trend()^2): Quadratic term captures accelerating growth
R-squared: Proportion of variance explained by trend alone

# Extract fitted values and residuals
aug1 <- augment(model1)

# Plot fitted vs actual
autoplot(AP_data, value) +
  autolayer(aug1, .fitted, color = "red", size = 1) +
  labs(title = "Model 1: Quadratic Trend Fit",
       y = "Passengers (thousands)",
       subtitle = "Red line shows fitted values")

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the fabletools package.
  Please report the issue at <https://github.com/tidyverts/fabletools/issues>.

Model 1 Assessment:

Captures overall upward trend reasonably well
Missing seasonal patterns - systematic over/under-prediction
Residuals will show strong seasonal autocorrelation

Model 2: Trend + Seasonality

Model 2 adds seasonal dummy variables to capture monthly effects: \[AP_t = \beta_0 + \beta_1 t + \beta_2 t^2 + \gamma_1 dm1_t + ... + \gamma_{11} dm11_t + \epsilon_t\]

# Fit Model 2: Trend + seasonal dummies
model2 <- AP_data %>% 
  model(Model2 = TSLM(value ~ trend() + I(trend()^2) + season()
                      )
        )

# Display model summary  
report(model2)

Series: value 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-46.1215 -13.3935   0.8246  12.7327  75.7730 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     88.557838   8.799762  10.064  < 2e-16 ***
trend()          1.625504   0.191722   8.478 4.34e-14 ***
I(trend()^2)     0.007137   0.001281   5.573 1.38e-07 ***
season()year2   -9.338962   9.694487  -0.963 0.337172    
season()year3   23.224469   9.694859   2.396 0.018021 *  
season()year4   17.523627   9.695469   1.807 0.073012 .  
season()year5   19.641845   9.696310   2.026 0.044842 *  
season()year6   56.829122   9.697379   5.860 3.59e-08 ***
season()year7   93.835460   9.698673   9.675  < 2e-16 ***
season()year8   90.910857   9.700192   9.372 2.91e-16 ***
season()year9   39.555314   9.701939   4.077 7.90e-05 ***
season()year10   1.018831   9.703917   0.105 0.916544    
season()year11 -35.448592   9.706131  -3.652 0.000376 ***
season()year12  -9.180288   9.708591  -0.946 0.346115    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 23.75 on 130 degrees of freedom
Multiple R-squared: 0.9644, Adjusted R-squared: 0.9608
F-statistic: 270.8 on 13 and 130 DF, p-value: < 2.22e-16

Model 2 Features:

season() automatically creates 11 dummy variables (December = reference)
Seasonal coefficients show relative difference vs December
Much higher R-squared indicates better fit

# Extract fitted values and residuals
aug2 <- augment(model2)

# Plot fitted vs actual
autoplot(AP_data, value) +
  autolayer(aug2, .fitted, color = "blue", size = 1) +
  labs(title = "Model 2: Trend + Seasonal Fit",
       y = "Passengers (thousands)",
       subtitle = "Blue line shows fitted values")

Model 2 Assessment: - Excellent fit - captures both trend and seasonality - Residuals should show much less autocorrelation - Still assumes additive seasonality

Model 3: Log-Linear Regression

Model 3 uses log transformation to handle multiplicative seasonality: \[\log(AP_t) = \beta_0 + \beta_1 t + \beta_2 t^2 + \gamma_1 dm1_t + ... + \gamma_{11} dm11_t + \epsilon_t\]

# Fit Model 3: Log-linear with trend + seasonality
model3 <- AP_data %>% 
  model(Model3 = TSLM(log(value) ~ trend() + I(trend()^2) + season()
                      )
        )

# Display model summary
report(model3)

Series: value 
Model: TSLM 
Transformation: log(value) 

Residuals:
     Min       1Q   Median       3Q      Max 
-0.12748 -0.03709  0.00418  0.03197  0.11529 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     4.651e+00  1.786e-02 260.410  < 2e-16 ***
trend()         1.318e-02  3.892e-04  33.877  < 2e-16 ***
I(trend()^2)   -2.148e-05  2.599e-06  -8.265 1.41e-13 ***
season()year2  -2.227e-02  1.968e-02  -1.132 0.259839    
season()year3   1.078e-01  1.968e-02   5.477 2.15e-07 ***
season()year4   7.639e-02  1.968e-02   3.882 0.000164 ***
season()year5   7.393e-02  1.968e-02   3.756 0.000259 ***
season()year6   1.960e-01  1.968e-02   9.959  < 2e-16 ***
season()year7   3.000e-01  1.969e-02  15.238  < 2e-16 ***
season()year8   2.907e-01  1.969e-02  14.765  < 2e-16 ***
season()year9   1.462e-01  1.969e-02   7.423 1.33e-11 ***
season()year10  8.145e-03  1.970e-02   0.414 0.679912    
season()year11 -1.354e-01  1.970e-02  -6.873 2.36e-10 ***
season()year12 -2.132e-02  1.971e-02  -1.082 0.281286    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0482 on 130 degrees of freedom
Multiple R-squared: 0.9892, Adjusted R-squared: 0.9881
F-statistic: 912.7 on 13 and 130 DF, p-value: < 2.22e-16

Model 3 Advantages:

Multiplicative seasonality: Seasonal effects grow with trend
Percentage interpretation: Coefficients represent % changes
Variance stabilization: More homoscedastic residuals

# Extract fitted values with back-transformation
#aug3 <- model3 %>%
#  augment() %>%
#  mutate(fitted_original = exp(.fitted))  # Back-transform to original scale

# Since .fitted is already on original scale, use it directly:
aug3 <- model3 %>%
  augment()  

# Plot without back-transformation
autoplot(AP_data, value) +
  autolayer(aug3, .fitted, color = "green", size = 1) +  # Use .fitted directly
  labs(title = "Model 3: Log-Linear Fit",
       y = "Passengers (thousands)",
       subtitle = "Fitted values already on original scale")

Model 3 Assessment:

Best visual fit for multiplicative pattern
Handles increasing seasonal variation appropriately
Residuals on log scale should be most well-behaved

The stargazer function isn’t working because fpp3 model objects have a different structure than standard R regression objects. str(model1) output shows a complex nested list structure that stargazer doesn’t recognize.

str(model1)
stargazer(model1, model2, model3, type = "text", 
          title = "Regression Results", 
          dep.var.labels = c("Passengers", "Passengers", "Log(Passengers)"),
          column.labels = c("Model 1", "Model 2", "Model 3"),
          omit.stat = c("f", "ser"), 
          no.space = TRUE)

So use alternate command to extract and display coefficients.

# Get model summaries in a clean format
model_comparison <- bind_rows(
  glance(model1) %>% mutate(Model = "Model 1"),
  glance(model2) %>% mutate(Model = "Model 2"),
  glance(model3) %>% mutate(Model = "Model 3")
)

model_comparison

Residual Diagnostics

Residual analysis is crucial for validating model assumptions and identifying remaining patterns.

# Model 1 residuals
aug1 %>% 
  gg_tsdisplay(.resid, plot_type = "partial", lag_max = 24) + 
  labs(title = "Model 1 Residuals: Strong Seasonal Pattern Remains")

# Model 2 residuals  
aug2 %>% 
  gg_tsdisplay(.resid, plot_type = "partial", lag_max = 24) + 
  labs(title = "Model 2 Residuals: Much Improved")

# Model 3 residuals (on log scale)
aug3 %>% 
  gg_tsdisplay(.resid, plot_type = "partial", lag_max = 24) + 
  labs(title = "Model 3 Residuals: Cleanest Pattern")

Residual Interpretation:

Model 1: Strong seasonal spikes in ACF at lags 12, 24 → seasonality not captured - Model 2: Seasonal spikes largely removed → successful deseasonalization
Model 3: Cleanest ACF/PACF pattern → best model for residual behavior

Model Comparison

# Calculate accuracy metrics for all models
acc <- bind_rows(
  accuracy(model1) %>% mutate(Model = "Model 1: Trend Only"),
  accuracy(model2) %>% mutate(Model = "Model 2: Trend + Seasonal"),
  accuracy(model3) %>% mutate(Model = "Model 3: Log-Linear")
)

# Display key accuracy measures
acc %>% 
  select(Model,  RMSE, MAE, MAPE, MASE) %>%
  knitr::kable(digits = 3, 
               caption = "Model Accuracy Comparison")

Model Accuracy Comparison
Model	RMSE	MAE	MAPE	MASE
Model 1: Trend Only	44.435	32.842	11.264	1.025
Model 2: Trend + Seasonal	22.562	17.579	7.273	0.549
Model 3: Log-Linear	13.119	10.001	3.696	0.312

Accuracy Metrics Explained:

RMSE: Root Mean Square Error (lower = better)
MAE: Mean Absolute Error (lower = better)
MAPE: Mean Absolute Percentage Error (lower = better)
MASE: Mean Absolute Scaled Error (< 1 = better than naive forecast)

# Custom R-squared function for time series models / or pull model metric from glance command

rsq <- function(mdl) {
  aug <- augment(mdl)
  1 - sum(aug$.resid^2) / sum((aug$value - mean(aug$value))^2)
}

# Calculate pseudo R-squared for each model
r2_comparison <- tibble(
  Model = c("Model 1: Trend Only", 
            "Model 2: Trend + Seasonal", 
            "Model 3: Log-Linear"),
  R_squared = c(rsq(model1), rsq(model2), rsq(model3))
)

r2_comparison %>%
  knitr::kable(digits = 4, 
               caption = "Model R-squared Comparison")

Model R-squared Comparison
Model	R_squared
Model 1: Trend Only	0.8618
Model 2: Trend + Seasonal	0.9644
Model 3: Log-Linear	0.9880

R-squared Results:

Model 1: Moderate fit from trend alone
Model 2: Dramatic improvement with seasonality
Model 3: Comparable to Model 2, with additional benefits

Forecasting

# Generate 12-month ahead forecasts
fc <- bind_rows(
  model1 %>% forecast(h = 12) %>% mutate(Model = "Model 1"),
  model2 %>% forecast(h = 12) %>% mutate(Model = "Model 2"),
  model3 %>% forecast(h = 12) %>% mutate(Model = "Model 3")
)

# Plot forecasts with prediction intervals
AP_data %>%
  autoplot(value) +                             # original series (black line)
  autolayer(fc, .mean, alpha = 0.8, size = 1) + # model forecasts (colored lines)
  autolayer(fc, level = 80, alpha = 0.3) +      # 80% PI (shaded bands)
  facet_wrap(~Model, ncol = 1) +                # separate panels for each model
  labs(title = "12-Month Ahead Forecasts with 80% Prediction Intervals",
       y = "Passengers (thousands)",            
       x = "Year") +                    
  theme_minimal()

Scale for fill_ramp is already present.
Adding another scale for fill_ramp, which will replace the existing scale.

Forecast Comparison:

Model 1: Smooth trend, misses seasonal peaks/troughs
Model 2: Proper seasonal pattern, constant seasonal amplitude
Model 3: Seasonal pattern with increasing amplitude (most realistic)

Key Findings and Recommendations

Statistical Results:

Model 1 captures 86% of variance with trend alone
Model 2 improves to 96% by adding seasonality
Model 3 achieves similar fit with better residual properties

Practical Insights:

Summer months (June-August) show highest passenger demand
Winter months (November-February) show lowest demand
Growth rate: Approximately 1% per month with acceleration
Seasonal variation increases over time (multiplicative pattern)

Model Selection:

Recommended: Model 3 (Log-Linear)

Handles multiplicative seasonality appropriately
Most realistic forecast behavior
Best residual diagnostics
Interpretable percentage effects

Future Extensions:

ARIMA modeling for residual autocorrelation
External regressors (economic factors, fuel prices)
Structural breaks for regime changes
Exponential smoothing for comparison

Conclusion

This analysis demonstrates the evolution from simple trend models to sophisticated time series regression. The log-linear model (Model 3) provides the best balance of fit, interpretability, and realistic forecasting behavior for the AirPassengers data.

Key Learning Points:

ACF/PACF analysis guides model specification
Seasonal dummy variables effectively capture monthly patterns
Log transformation handles multiplicative seasonality
Residual diagnostics are essential for model validation
fpp3 framework provides comprehensive tools for time series regression