1. Executive Summary

This report evaluates monthly new residential construction starts using a compact and reproducible time-series workflow. The model comparison includes Holt-Winters variants and baseline methods (Naive and Seasonal Naive). Detailed backtest evidence and forecast-quality assessment are presented in the Forecast Error Evaluation and Forecast Comparison subsections.

2. Analysis Setup

analysis_env <- new.env(parent = globalenv())
invisible(capture.output(sys.source("construction_data_analysis.R", envir = analysis_env)))

champion_model <- analysis_env$champion_model_key
all_metrics <- analysis_env$metrics_table

test_metrics <- subset(all_metrics, dataset == "test")
test_metrics <- test_metrics[order(test_metrics$MAE), ]

train_metrics <- subset(all_metrics, dataset == "train")
train_metrics <- train_metrics[order(train_metrics$MAE), ]

hw_train_metrics <- subset(train_metrics, model_family == "holt_winters")
hw_test_metrics <- subset(test_metrics, model_family == "holt_winters")

diagnostics_table <- analysis_env$diagnostics_table
forward_forecast <- analysis_env$forward_forecast_table

decomposition_fit <- stats::stl(analysis_env$full_ts, s.window = "periodic")
decomposition_df <- data.frame(
  date = analysis_env$construction_df$date,
  observed = as.numeric(analysis_env$full_ts),
  trend = as.numeric(decomposition_fit$time.series[, "trend"]),
  seasonal = as.numeric(decomposition_fit$time.series[, "seasonal"]),
  irregular = as.numeric(decomposition_fit$time.series[, "remainder"])
)

decomposition_components <- rbind(
  data.frame(date = decomposition_df$date, component = "Seasonal", value = decomposition_df$seasonal),
  data.frame(date = decomposition_df$date, component = "Irregular", value = decomposition_df$irregular)
)

hw_backtest_data <- subset(
  analysis_env$backtest_plot_data,
  series %in% c(
    "Actual",
    "HW Additive (Auto)",
    "HW Additive (Fixed)",
    "HW Multiplicative (Auto)",
    "HW Multiplicative (Fixed)"
  )
)

hw_backtest_data$series <- factor(
  hw_backtest_data$series,
  levels = c(
    "Actual",
    "HW Additive (Auto)",
    "HW Additive (Fixed)",
    "HW Multiplicative (Auto)",
    "HW Multiplicative (Fixed)"
  )
)

3. Seasonal Time Series

Data Description

The dataset used in this section describes the number of newly started residential construction projects in the United States, expressed in thousands of housing units.

The observations are recorded at a monthly frequency and are not seasonally adjusted. The data originate from the official statistical database of the United States Census Bureau, which publishes long-term indicators related to construction activity in the U.S. housing market.

The dataset contains 475 observations, beginning in August 1985 and ending in February 2025. The relatively long observation window allows both long-term cyclical dynamics and short-term seasonal fluctuations in construction activity to be examined.

Residential construction activity is known to be influenced by macroeconomic conditions, housing demand, financing conditions, and seasonal factors related to weather and construction logistics. As a result, the series provides a suitable example of an economic time series exhibiting both seasonal and cyclical components.

Before model estimation, the production workflow applies practical quality controls, including timestamp parsing checks, missing-value validation, duplicate-period detection, monthly continuity verification, and positivity checks required by multiplicative seasonal specifications.


Time Series Decomposition

Visual inspection of the time series indicates the presence of both seasonal and cyclical patterns. Regular annual fluctuations can be clearly observed, indicating a strong seasonal component in residential construction activity.

This seasonal pattern is consistent with the annual cycle of construction activity, which tends to intensify during warmer months when weather conditions allow for more intensive building work. Conversely, construction activity typically declines during colder months.

The decomposition results, obtained with an STL decomposition (Seasonal-Trend decomposition using Loess), illustrate these structural components of the series. In particular, the smoothed trend component (represented by the yellow line in the decomposition plot) highlights the cyclical behaviour of the housing construction market. Periods of expansion and contraction correspond to broader macroeconomic and housing market cycles.

Such cyclical movements reflect changes in housing demand, credit conditions, and economic growth, which significantly influence investment in residential construction.

decomposition_overlay <- rbind(
  data.frame(
    date = decomposition_df$date,
    series = "Smoothed cyclical trend (STL trend)",
    value = decomposition_df$trend
  ),
  data.frame(
    date = decomposition_df$date,
    series = "Observed construction starts",
    value = decomposition_df$observed
  )
)

decomposition_overlay$series <- factor(
  decomposition_overlay$series,
  levels = c(
    "Smoothed cyclical trend (STL trend)",
    "Observed construction starts"
  )
)

ggplot2::ggplot(
  decomposition_overlay,
  ggplot2::aes(x = date, y = value, color = series)
) +
  ggplot2::geom_line(linewidth = 0.95) +
  ggplot2::scale_color_manual(
    values = c(
      "Smoothed cyclical trend (STL trend)" = "#F1C40F",
      "Observed construction starts" = "#FF1F1F"
    )
  ) +
  ggplot2::scale_x_date(date_labels = "%b %Y", date_breaks = "5 years") +
  ggplot2::labs(
    title = "Residential construction starts",
    x = NULL,
    y = "Thousand units",
    color = NULL
  ) +
  ggplot2::theme_classic(base_size = 12) +
  ggplot2::theme(
    legend.position = "top",
    legend.justification = "left",
    legend.box.just = "left",
    plot.title = ggplot2::element_text(face = "bold")
  )
Figure 1. Observed residential construction starts and smoothed cyclical trend (STL).

Figure 1. Observed residential construction starts and smoothed cyclical trend (STL).

ggplot2::ggplot(
  decomposition_components,
  ggplot2::aes(x = date, y = value)
) +
  ggplot2::geom_line(color = "#2A9D8F", linewidth = 0.55) +
  ggplot2::facet_wrap(~component, ncol = 1, scales = "free_y") +
  ggplot2::labs(
    title = "Seasonal and Irregular Components",
    x = "Date",
    y = "Component value"
  ) +
  ggplot2::theme_minimal(base_size = 12)
Figure 2. Seasonal and irregular components from STL decomposition.

Figure 2. Seasonal and irregular components from STL decomposition.


Exploratory Seasonal Models

In order to evaluate both the goodness of fit and the forecasting performance of seasonal time-series models, classical exponential smoothing models with seasonality were applied using the Holt-Winters method.

Two types of seasonal models were considered:

  • Additive Holt-Winters models
  • Multiplicative Holt-Winters models

Each specification was estimated in two versions:

  1. A version with manually selected smoothing parameters
  2. A version with automatically estimated parameters

The smoothing parameters included:

  • alpha - level smoothing parameter
  • beta - trend smoothing parameter
  • gamma - seasonal smoothing parameter

To evaluate predictive performance, the 12 most recent observations were separated from the dataset and used as an out-of-sample test set. The remaining observations were used as the training dataset for model estimation.

Two types of model performance were therefore examined:

  • In-sample fit errors, measuring how well the models reproduce the training data
  • Out-of-sample forecast errors, measuring how accurately the models predict unseen observations

Model Fit Evaluation

The results of the in-sample fit evaluation are presented in Table 1, which reports the forecast accuracy metrics for additive and multiplicative Holt-Winters models.

Model performance was evaluated using the following error measures:

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • Mean Absolute Percentage Error (MAPE)
  • Adjusted Mean Absolute Percentage Error (AMAPE)

Among the models estimated on the training data, the multiplicative Holt-Winters model with automatically estimated parameters achieved the lowest values for all error metrics, indicating the best fit to the historical observations.

This result suggests that the multiplicative formulation captures the relationship between the level of the series and the magnitude of seasonal fluctuations more accurately than the additive specification when fitting the historical data.

train_fit_display <- hw_train_metrics
names(train_fit_display)[names(train_fit_display) == "sMAPE"] <- "AMAPE"

knitr::kable(
  train_fit_display,
  digits = 4,
  caption = "Table 1. In-sample fit errors for additive and multiplicative Holt-Winters models"
)
Table 1. In-sample fit errors for additive and multiplicative Holt-Winters models
model model_name model_family seasonal_type parameter_source dataset alpha beta gamma MAE MSE RMSE MAPE AMAPE
7 HW_Mult_Auto HW Multiplicative (Auto) holt_winters multiplicative auto train 0.4466 0.0037 0.3208 7.2368 87.9262 9.3769 7.0926 7.0867
8 HW_Add_Auto HW Additive (Auto) holt_winters additive auto train 0.4331 0.0152 0.4040 7.3771 90.4554 9.5108 7.6791 7.7899
9 HW_Add_Fixed HW Additive (Fixed) holt_winters additive fixed train 0.3000 0.1000 0.4000 7.4624 93.9604 9.6933 7.8672 8.0601
10 HW_Mult_Fixed HW Multiplicative (Fixed) holt_winters multiplicative fixed train 0.4000 0.2000 0.3000 7.8202 106.5169 10.3207 7.6365 7.5843

Forecast Error Evaluation

Forecast performance was then evaluated using the out-of-sample test dataset consisting of the last 12 observations.

The results are reported in Table 2, which compares forecast errors for all tested Holt-Winters specifications.

In contrast to the in-sample fit results, the additive Holt-Winters model with manually selected parameters produced the lowest forecast errors across all evaluated metrics.

This discrepancy between training and testing performance may be explained by the phenomenon of model overfitting. The multiplicative model with automatically optimized parameters was calibrated using the same observations on which its fit was evaluated, which may lead the model to capture noise specific to the training dataset rather than the true underlying data-generating process.

As a result, although the multiplicative model achieved a better fit to the historical data, it performed worse when predicting new observations.

This finding highlights an important principle in time-series forecasting: models that achieve the best in-sample fit do not necessarily produce the most accurate forecasts.

For completeness, baseline benchmark models (Naive and Seasonal Naive) are also evaluated in the production script and written to analysis_outputs/model_metrics.csv.

test_fit_display <- hw_test_metrics
names(test_fit_display)[names(test_fit_display) == "sMAPE"] <- "AMAPE"

knitr::kable(
  test_fit_display,
  digits = 4,
  caption = "Table 2. Out-of-sample forecast errors for additive and multiplicative Holt-Winters models"
)
Table 2. Out-of-sample forecast errors for additive and multiplicative Holt-Winters models
model model_name model_family seasonal_type parameter_source dataset alpha beta gamma MAE MSE RMSE MAPE AMAPE
1 HW_Add_Fixed HW Additive (Fixed) holt_winters additive fixed test 0.3000 0.1000 0.4000 6.4556 59.2120 7.6949 5.9171 5.6859
4 HW_Add_Auto HW Additive (Auto) holt_winters additive auto test 0.4331 0.0152 0.4040 8.3955 100.1151 10.0058 7.7004 7.2834
5 HW_Mult_Auto HW Multiplicative (Auto) holt_winters multiplicative auto test 0.4466 0.0037 0.3208 11.6801 174.7865 13.2207 10.4639 9.8160
6 HW_Mult_Fixed HW Multiplicative (Fixed) holt_winters multiplicative fixed test 0.4000 0.2000 0.3000 19.3681 456.6959 21.3704 17.5748 15.8468

Forecast Comparison

Figure 3 presents a visual comparison between the forecasts generated by the additive and multiplicative Holt-Winters models and the actual observed values of the variable representing the number of newly started residential construction projects.

The comparison confirms the quantitative evaluation based on forecast error metrics. The additive model with manually selected parameters tracks the actual values more closely during the forecast horizon, while the multiplicative models exhibit larger deviations from realized observations.

This graphical comparison supports the conclusion that the additive specification provides more reliable short-term forecasts for the analysed series.

ggplot2::ggplot(
  hw_backtest_data,
  ggplot2::aes(x = date, y = value, color = series, linetype = series)
) +
  ggplot2::geom_line(linewidth = 0.9) +
  ggplot2::geom_vline(
    xintercept = max(analysis_env$train_df$date),
    color = "#9B2226",
    linetype = "dashed",
    linewidth = 0.8
  ) +
  ggplot2::scale_color_manual(
    values = c(
      "Actual" = "#1B1F3B",
      "HW Additive (Auto)" = "#005F73",
      "HW Additive (Fixed)" = "#0A9396",
      "HW Multiplicative (Auto)" = "#EE9B00",
      "HW Multiplicative (Fixed)" = "#CA6702"
    )
  ) +
  ggplot2::labs(
    title = "Forecast comparison: additive vs multiplicative Holt-Winters",
    subtitle = "Dashed line marks forecast origin (last historical observation)",
    x = "Date",
    y = "Construction starts (thousand units)",
    color = "Series",
    linetype = "Series"
  ) +
  ggplot2::theme_minimal(base_size = 12) +
  ggplot2::theme(
    plot.title = ggplot2::element_text(face = "bold"),
    legend.position = "bottom"
  )
Figure 3. Forecast comparison for additive and multiplicative Holt-Winters models against observed values.

Figure 3. Forecast comparison for additive and multiplicative Holt-Winters models against observed values.

4. Residual Diagnostics (Train Window)

Residual diagnostics are reported to flag potential remaining autocorrelation and bias.

In this analysis, Ljung-Box p-values are generally low across specifications, indicating that residual autocorrelation is still present. This suggests that Holt-Winters captures a substantial share of trend-seasonal structure but does not fully absorb all serial dependence in the series.

knitr::kable(
  diagnostics_table,
  digits = 6,
  caption = "Table 3. Residual diagnostics by model"
)
Table 3. Residual diagnostics by model
model model_name residual_mean residual_sd ljung_box_p_value
HW_Add_Auto HW Additive (Auto) 0.393067 9.513233 0.001117
HW_Add_Fixed HW Additive (Fixed) 0.066823 9.703849 0.000000
HW_Mult_Auto HW Multiplicative (Auto) 0.771482 9.355485 0.004570
HW_Mult_Fixed HW Multiplicative (Fixed) -0.366337 10.325654 0.000000
Naive Naive -0.091342 13.336106 0.000000
Seasonal_Naive Seasonal Naive -0.745011 17.544460 0.000000

5. 12-Month Forward Forecast (Champion Model)

The table below reports point forecasts together with 80% and 95% prediction intervals, which quantify forecast uncertainty around the central projection path.

knitr::kable(
  forward_forecast,
  digits = 3,
  caption = "Table 4. Champion model forward forecast with 80% and 95% intervals"
)
Table 4. Champion model forward forecast with 80% and 95% intervals
date point_forecast lower_80 upper_80 lower_95 upper_95
2025-03-01 112.322 99.963 124.682 93.420 131.225
2025-04-01 122.701 109.686 135.716 102.796 142.606
2025-05-01 123.650 109.895 137.405 102.614 144.686
2025-06-01 124.235 109.660 138.810 101.944 146.525
2025-07-01 119.039 103.567 134.511 95.377 142.701
2025-08-01 118.381 101.940 134.822 93.237 143.526
2025-09-01 115.504 98.025 132.983 88.773 142.236
2025-10-01 115.027 96.446 133.607 86.610 143.443
2025-11-01 106.449 86.706 126.192 76.255 136.643
2025-12-01 103.108 82.146 124.071 71.049 135.167
2026-01-01 97.018 74.782 119.253 63.012 131.024
2026-02-01 105.961 82.401 129.520 69.930 141.992

6. Exported Artifacts

artifact_dir <- file.path("analysis_outputs")
artifact_files <- if (dir.exists(artifact_dir)) list.files(artifact_dir) else character(0)

knitr::kable(
  data.frame(file = artifact_files),
  caption = "Table 5. Files generated by the production script"
)
Table 5. Files generated by the production script
file
backtest_forecast_comparison.png
champion_model_forward_forecast.csv
historical_series.png
holt_winters_model_metrics.csv
model_metrics.csv
model_residual_diagnostics.csv
session_info.txt

7. Reproducibility Note

The script writes session details to analysis_outputs/session_info.txt. This supports traceability of package versions and runtime environment for RPubs publication.

8. Full Reproducible Code

To preserve the same implementation detail as the standalone analysis script, the full production code is embedded below.


options(stringsAsFactors = FALSE, scipen = 999)

required_packages <- c(
  "readr",
  "dplyr",
  "lubridate",
  "forecast",
  "Metrics",
  "ggplot2"
)

missing_packages <- setdiff(required_packages, rownames(utils::installed.packages()))
if (length(missing_packages) > 0L) {
  stop(
    sprintf(
      "Missing packages detected: %s. Install them before running the analysis.",
      paste(missing_packages, collapse = ", ")
    )
  )
}

get_script_directory <- function() {
  script_flag <- "--file="
  matching_args <- grep(script_flag, commandArgs(trailingOnly = FALSE), value = TRUE)
  if (length(matching_args) == 0L) {
    return(normalizePath(getwd(), winslash = "/", mustWork = TRUE))
  }
  script_path <- sub(script_flag, "", matching_args[1], fixed = TRUE)
  dirname(normalizePath(script_path, winslash = "/", mustWork = TRUE))
}

parse_monthly_date <- function(month_string) {
  current_locale <- Sys.getlocale("LC_TIME")
  on.exit(Sys.setlocale("LC_TIME", current_locale), add = TRUE)
  Sys.setlocale("LC_TIME", "C")
  as.Date(paste0("01-", month_string), format = "%d-%b-%y")
}

compute_error_metrics <- function(actual, predicted) {
  actual <- as.numeric(actual)
  predicted <- as.numeric(predicted)

  if (length(actual) != length(predicted)) {
    stop("Metric computation failed: actual and predicted vectors have different lengths.")
  }

  valid_rows <- is.finite(actual) & is.finite(predicted)
  actual <- actual[valid_rows]
  predicted <- predicted[valid_rows]

  if (length(actual) == 0L) {
    stop("Metric computation failed: no valid observations after NA filtering.")
  }

  epsilon <- .Machine$double.eps
  mape <- mean(abs((actual - predicted) / pmax(abs(actual), epsilon))) * 100
  smape <- mean((2 * abs(actual - predicted)) / pmax(abs(actual) + abs(predicted), epsilon)) * 100

  data.frame(
    MAE = Metrics::mae(actual, predicted),
    MSE = Metrics::mse(actual, predicted),
    RMSE = sqrt(Metrics::mse(actual, predicted)),
    MAPE = mape,
    sMAPE = smape,
    check.names = FALSE
  )
}

fit_holt_winters <- function(series, seasonal_type, alpha = NA_real_, beta = NA_real_, gamma = NA_real_) {
  if (is.na(alpha) && is.na(beta) && is.na(gamma)) {
    return(stats::HoltWinters(series, seasonal = seasonal_type))
  }
  stats::HoltWinters(
    series,
    seasonal = seasonal_type,
    alpha = alpha,
    beta = beta,
    gamma = gamma
  )
}

build_monthly_ts <- function(data_frame) {
  stats::ts(
    data_frame$nstarted,
    start = c(lubridate::year(data_frame$date[1]), lubridate::month(data_frame$date[1])),
    frequency = 12
  )
}


project_dir <- get_script_directory()
data_path <- file.path(project_dir, "construction_data.csv")

if (!file.exists(data_path)) {
  stop(sprintf("Input file not found: %s", data_path))
}

construction_df <- readr::read_csv(
  data_path,
  col_types = readr::cols(
    date = readr::col_character(),
    nstarted = readr::col_double()
  )
)

construction_df <- dplyr::mutate(construction_df, date = parse_monthly_date(date))
construction_df <- dplyr::arrange(construction_df, date)

if (anyNA(construction_df$date)) {
  stop("Date parsing failed for at least one record.")
}
if (anyNA(construction_df$nstarted)) {
  stop("Missing target values detected in nstarted.")
}
if (anyDuplicated(construction_df$date) > 0L) {
  stop("Duplicate monthly timestamps detected. Each month must be unique.")
}
expected_months <- seq.Date(min(construction_df$date), max(construction_df$date), by = "month")
missing_months <- setdiff(expected_months, construction_df$date)
if (length(missing_months) > 0L) {
  stop(sprintf("Time index contains %d missing month(s).", length(missing_months)))
}
if (any(construction_df$nstarted <= 0)) {
  stop("Non-positive values detected. Multiplicative Holt-Winters requires strictly positive series.")
}

forecast_horizon <- 12L
history_window_months <- 24L

if (nrow(construction_df) <= forecast_horizon + history_window_months) {
  stop("Insufficient data to run a 12-month backtest with a 24-month history window.")
}

split_index <- nrow(construction_df) - forecast_horizon
train_df <- construction_df[seq_len(split_index), , drop = FALSE]
test_df <- construction_df[(split_index + 1L):nrow(construction_df), , drop = FALSE]

if (nrow(test_df) != forecast_horizon) {
  stop("Backtest split error: test window length does not match forecast horizon.")
}
if (max(train_df$date) >= min(test_df$date)) {
  stop("Backtest split error: train and test periods are not strictly time-ordered.")
}
if (length(intersect(train_df$date, test_df$date)) > 0L) {
  stop("Backtest split error: train and test date ranges overlap.")
}

train_ts <- build_monthly_ts(train_df)
test_ts <- build_monthly_ts(test_df)
full_ts <- build_monthly_ts(construction_df)


model_specs <- data.frame(
  model_key = c(
    "HW_Add_Auto",
    "HW_Add_Fixed",
    "HW_Mult_Auto",
    "HW_Mult_Fixed",
    "Naive",
    "Seasonal_Naive"
  ),
  display_name = c(
    "HW Additive (Auto)",
    "HW Additive (Fixed)",
    "HW Multiplicative (Auto)",
    "HW Multiplicative (Fixed)",
    "Naive",
    "Seasonal Naive"
  ),
  model_family = c(
    "holt_winters",
    "holt_winters",
    "holt_winters",
    "holt_winters",
    "naive",
    "snaive"
  ),
  seasonal_type = c("additive", "additive", "multiplicative", "multiplicative", NA, NA),
  alpha = c(NA_real_, 0.3, NA_real_, 0.4, NA_real_, NA_real_),
  beta = c(NA_real_, 0.1, NA_real_, 0.2, NA_real_, NA_real_),
  gamma = c(NA_real_, 0.4, NA_real_, 0.3, NA_real_, NA_real_),
  stringsAsFactors = FALSE
)

model_results <- lapply(seq_len(nrow(model_specs)), function(i) {
  spec <- model_specs[i, , drop = FALSE]
  if (spec$model_family == "holt_winters") {
    fitted_model <- fit_holt_winters(
      series = train_ts,
      seasonal_type = spec$seasonal_type,
      alpha = spec$alpha,
      beta = spec$beta,
      gamma = spec$gamma
    )
    model_forecast <- forecast::forecast(fitted_model, h = forecast_horizon)
    hw_fitted <- fitted_model$fitted[, "xhat"]
    fitted_values <- as.numeric(hw_fitted)
    aligned_train_actual <- as.numeric(
      stats::window(train_ts, start = stats::start(hw_fitted), end = stats::end(hw_fitted))
    )

    estimated_alpha <- ifelse(is.na(spec$alpha), fitted_model$alpha, spec$alpha)
    estimated_beta <- ifelse(is.na(spec$beta), fitted_model$beta, spec$beta)
    estimated_gamma <- ifelse(is.na(spec$gamma), fitted_model$gamma, spec$gamma)
  } else if (spec$model_family == "naive") {
    fitted_model <- forecast::naive(train_ts, h = forecast_horizon)
    model_forecast <- fitted_model
    fitted_values <- as.numeric(fitted_model$fitted)
    aligned_train_actual <- as.numeric(train_ts)

    estimated_alpha <- NA_real_
    estimated_beta <- NA_real_
    estimated_gamma <- NA_real_
  } else if (spec$model_family == "snaive") {
    fitted_model <- forecast::snaive(train_ts, h = forecast_horizon)
    model_forecast <- fitted_model
    fitted_values <- as.numeric(fitted_model$fitted)
    aligned_train_actual <- as.numeric(train_ts)

    estimated_alpha <- NA_real_
    estimated_beta <- NA_real_
    estimated_gamma <- NA_real_
  } else {
    stop(sprintf("Unsupported model family: %s", spec$model_family))
  }

  train_metrics <- compute_error_metrics(aligned_train_actual, fitted_values)
  test_metrics <- compute_error_metrics(test_ts, model_forecast$mean)

  residual_vector <- aligned_train_actual - fitted_values
  residual_vector <- residual_vector[is.finite(residual_vector)]
  ljung_box_p_value <- NA_real_
  if (length(residual_vector) > 12L) {
    ljung_box_p_value <- stats::Box.test(residual_vector, lag = 12, type = "Ljung-Box")$p.value
  }

  list(
    spec = spec,
    model = fitted_model,
    forecast = model_forecast,
    train_metrics = train_metrics,
    test_metrics = test_metrics,
    alpha = estimated_alpha,
    beta = estimated_beta,
    gamma = estimated_gamma,
    diagnostics = data.frame(
      residual_mean = mean(residual_vector),
      residual_sd = stats::sd(residual_vector),
      ljung_box_p_value = ljung_box_p_value,
      check.names = FALSE
    )
  )
})

names(model_results) <- model_specs$model_key

build_metrics_row <- function(model_key, result_object, dataset_label, metric_frame) {
  spec <- result_object$spec
  parameter_source <- if (
    spec$model_family == "holt_winters" && is.na(spec$alpha)
  ) {
    "auto"
  } else if (spec$model_family == "holt_winters") {
    "fixed"
  } else {
    "baseline"
  }

  data.frame(
    model = model_key,
    model_name = spec$display_name,
    model_family = spec$model_family,
    seasonal_type = spec$seasonal_type,
    parameter_source = parameter_source,
    dataset = dataset_label,
    alpha = result_object$alpha,
    beta = result_object$beta,
    gamma = result_object$gamma,
    MAE = metric_frame$MAE,
    MSE = metric_frame$MSE,
    RMSE = metric_frame$RMSE,
    MAPE = metric_frame$MAPE,
    sMAPE = metric_frame$sMAPE,
    check.names = FALSE
  )
}

metrics_table <- do.call(
  rbind,
  lapply(names(model_results), function(model_key) {
    result_object <- model_results[[model_key]]
    rbind(
      build_metrics_row(model_key, result_object, "train", result_object$train_metrics),
      build_metrics_row(model_key, result_object, "test", result_object$test_metrics)
    )
  })
)

metrics_table <- dplyr::arrange(metrics_table, dataset, MAE)

cat("\nModel Performance Summary:\n")
print(metrics_table, row.names = FALSE)

diagnostics_table <- do.call(
  rbind,
  lapply(names(model_results), function(model_key) {
    result_object <- model_results[[model_key]]
    diagnostics <- result_object$diagnostics
    data.frame(
      model = model_key,
      model_name = result_object$spec$display_name,
      residual_mean = diagnostics$residual_mean,
      residual_sd = diagnostics$residual_sd,
      ljung_box_p_value = diagnostics$ljung_box_p_value,
      check.names = FALSE
    )
  })
)

cat("\nResidual Diagnostics (train window):\n")
print(diagnostics_table, row.names = FALSE)

test_metrics_table <- metrics_table[metrics_table$dataset == "test", , drop = FALSE]
champion_model_key <- test_metrics_table$model[which.min(test_metrics_table$MAE)]
cat(sprintf("\nChampion model based on test MAE: %s\n", champion_model_key))

champion_spec <- model_specs[model_specs$model_key == champion_model_key, , drop = FALSE]
if (champion_spec$model_family == "holt_winters") {
  champion_model_full <- fit_holt_winters(
    series = full_ts,
    seasonal_type = champion_spec$seasonal_type,
    alpha = champion_spec$alpha,
    beta = champion_spec$beta,
    gamma = champion_spec$gamma
  )
  champion_forecast <- forecast::forecast(champion_model_full, h = forecast_horizon)
} else if (champion_spec$model_family == "naive") {
  champion_model_full <- forecast::naive(full_ts, h = forecast_horizon)
  champion_forecast <- champion_model_full
} else if (champion_spec$model_family == "snaive") {
  champion_model_full <- forecast::snaive(full_ts, h = forecast_horizon)
  champion_forecast <- champion_model_full
} else {
  stop(sprintf("Unsupported champion model family: %s", champion_spec$model_family))
}

future_dates <- seq.Date(
  from = seq.Date(max(construction_df$date), by = "month", length.out = 2L)[2],
  by = "month",
  length.out = forecast_horizon
)

forward_forecast_table <- data.frame(
  date = future_dates,
  point_forecast = as.numeric(champion_forecast$mean),
  lower_80 = as.numeric(champion_forecast$lower[, "80%"]),
  upper_80 = as.numeric(champion_forecast$upper[, "80%"]),
  lower_95 = as.numeric(champion_forecast$lower[, "95%"]),
  upper_95 = as.numeric(champion_forecast$upper[, "95%"])
)

cat("\nChampion Model Forward Forecast (12 Months):\n")
print(forward_forecast_table, row.names = FALSE)


historical_plot <- ggplot2::ggplot(construction_df, ggplot2::aes(x = date, y = nstarted)) +
  ggplot2::geom_line(color = "#1B4965", linewidth = 1) +
  ggplot2::scale_x_date(date_labels = "%Y", date_breaks = "2 years") +
  ggplot2::labs(
    title = "Monthly Residential Construction Starts",
    subtitle = "Raw historical observations",
    x = "Date",
    y = "Construction starts (thousand units)"
  ) +
  ggplot2::theme_minimal(base_size = 12) +
  ggplot2::theme(
    plot.title = ggplot2::element_text(face = "bold"),
    axis.text.x = ggplot2::element_text(angle = 45, hjust = 1)
  )

history_df <- tail(train_df, history_window_months)
actual_frame <- dplyr::bind_rows(
  dplyr::transmute(history_df, date = date, series = "Actual", value = nstarted),
  dplyr::transmute(test_df, date = date, series = "Actual", value = nstarted)
)

forecast_frame <- dplyr::bind_rows(
  lapply(names(model_results), function(model_key) {
    data.frame(
      date = test_df$date,
      series = model_results[[model_key]]$spec$display_name,
      value = as.numeric(model_results[[model_key]]$forecast$mean),
      stringsAsFactors = FALSE
    )
  })
)

forecast_anchor_frame <- data.frame(
  date = rep(max(train_df$date), times = nrow(model_specs)),
  series = model_specs$display_name,
  value = rep(tail(train_df$nstarted, 1), times = nrow(model_specs)),
  stringsAsFactors = FALSE
)

backtest_plot_data <- dplyr::bind_rows(actual_frame, forecast_anchor_frame, forecast_frame)

backtest_plot <- ggplot2::ggplot(
  backtest_plot_data,
  ggplot2::aes(x = date, y = value, color = series, linetype = series)
) +
  ggplot2::geom_line(linewidth = 0.9) +
  ggplot2::geom_vline(
    xintercept = max(train_df$date),
    color = "#9B2226",
    linetype = "dashed",
    linewidth = 0.8
  ) +
  ggplot2::scale_color_brewer(palette = "Dark2") +
  ggplot2::labs(
    title = "Backtest: Actuals vs Model Forecasts",
    subtitle = "Dashed line marks forecast origin (last historical observation)",
    x = "Date",
    y = "Construction starts (thousand units)",
    color = "Series",
    linetype = "Series"
  ) +
  ggplot2::theme_minimal(base_size = 12) +
  ggplot2::theme(
    plot.title = ggplot2::element_text(face = "bold"),
    legend.position = "bottom"
  )

print(historical_plot)
print(backtest_plot)


output_dir <- file.path(project_dir, "analysis_outputs")
if (!dir.exists(output_dir)) {
  dir.create(output_dir, recursive = TRUE)
}

readr::write_csv(metrics_table, file.path(output_dir, "model_metrics.csv"))
readr::write_csv(metrics_table, file.path(output_dir, "holt_winters_model_metrics.csv"))
readr::write_csv(diagnostics_table, file.path(output_dir, "model_residual_diagnostics.csv"))
readr::write_csv(forward_forecast_table, file.path(output_dir, "champion_model_forward_forecast.csv"))
ggplot2::ggsave(
  filename = file.path(output_dir, "historical_series.png"),
  plot = historical_plot,
  width = 12,
  height = 6,
  dpi = 300
)
ggplot2::ggsave(
  filename = file.path(output_dir, "backtest_forecast_comparison.png"),
  plot = backtest_plot,
  width = 12,
  height = 6,
  dpi = 300
)
writeLines(capture.output(sessionInfo()), con = file.path(output_dir, "session_info.txt"))

cat(sprintf("\nAnalysis artifacts written to: %s\n", output_dir))