New Residential Construction Time Series Analysis

1. Executive Summary

This report evaluates monthly new residential construction starts using a compact and reproducible time-series workflow. The model comparison includes Holt-Winters variants and baseline methods (Naive and Seasonal Naive). Detailed backtest evidence and forecast-quality assessment are presented in the Forecast Error Evaluation and Forecast Comparison subsections.

2. Analysis Setup

analysis_env <- new.env(parent = globalenv())
invisible(capture.output(sys.source("construction_data_analysis.R", envir = analysis_env)))

champion_model <- analysis_env$champion_model_key
all_metrics <- analysis_env$metrics_table

test_metrics <- subset(all_metrics, dataset == "test")
test_metrics <- test_metrics[order(test_metrics$MAE), ]

train_metrics <- subset(all_metrics, dataset == "train")
train_metrics <- train_metrics[order(train_metrics$MAE), ]

hw_train_metrics <- subset(train_metrics, model_family == "holt_winters")
hw_test_metrics <- subset(test_metrics, model_family == "holt_winters")

diagnostics_table <- analysis_env$diagnostics_table
forward_forecast <- analysis_env$forward_forecast_table

decomposition_fit <- stats::stl(analysis_env$full_ts, s.window = "periodic")
decomposition_df <- data.frame(
  date = analysis_env$construction_df$date,
  observed = as.numeric(analysis_env$full_ts),
  trend = as.numeric(decomposition_fit$time.series[, "trend"]),
  seasonal = as.numeric(decomposition_fit$time.series[, "seasonal"]),
  irregular = as.numeric(decomposition_fit$time.series[, "remainder"])
)

decomposition_components <- rbind(
  data.frame(date = decomposition_df$date, component = "Seasonal", value = decomposition_df$seasonal),
  data.frame(date = decomposition_df$date, component = "Irregular", value = decomposition_df$irregular)
)

hw_backtest_data <- subset(
  analysis_env$backtest_plot_data,
  series %in% c(
    "Actual",
    "HW Additive (Auto)",
    "HW Additive (Fixed)",
    "HW Multiplicative (Auto)",
    "HW Multiplicative (Fixed)"
  )
)

hw_backtest_data$series <- factor(
  hw_backtest_data$series,
  levels = c(
    "Actual",
    "HW Additive (Auto)",
    "HW Additive (Fixed)",
    "HW Multiplicative (Auto)",
    "HW Multiplicative (Fixed)"
  )
)

3. Seasonal Time Series

Data Description

The dataset used in this section describes the number of newly started residential construction projects in the United States, expressed in thousands of housing units.

The observations are recorded at a monthly frequency and are not seasonally adjusted. The data originate from the official statistical database of the United States Census Bureau, which publishes long-term indicators related to construction activity in the U.S. housing market.

The dataset contains 475 observations, beginning in August 1985 and ending in February 2025. The relatively long observation window allows both long-term cyclical dynamics and short-term seasonal fluctuations in construction activity to be examined.

Residential construction activity is known to be influenced by macroeconomic conditions, housing demand, financing conditions, and seasonal factors related to weather and construction logistics. As a result, the series provides a suitable example of an economic time series exhibiting both seasonal and cyclical components.

Before model estimation, the production workflow applies practical quality controls, including timestamp parsing checks, missing-value validation, duplicate-period detection, monthly continuity verification, and positivity checks required by multiplicative seasonal specifications.

Time Series Decomposition

Visual inspection of the time series indicates the presence of both seasonal and cyclical patterns. Regular annual fluctuations can be clearly observed, indicating a strong seasonal component in residential construction activity.

This seasonal pattern is consistent with the annual cycle of construction activity, which tends to intensify during warmer months when weather conditions allow for more intensive building work. Conversely, construction activity typically declines during colder months.

The decomposition results, obtained with an STL decomposition (Seasonal-Trend decomposition using Loess), illustrate these structural components of the series. In particular, the smoothed trend component (represented by the yellow line in the decomposition plot) highlights the cyclical behaviour of the housing construction market. Periods of expansion and contraction correspond to broader macroeconomic and housing market cycles.

Such cyclical movements reflect changes in housing demand, credit conditions, and economic growth, which significantly influence investment in residential construction.

decomposition_overlay <- rbind(
  data.frame(
    date = decomposition_df$date,
    series = "Smoothed cyclical trend (STL trend)",
    value = decomposition_df$trend
  ),
  data.frame(
    date = decomposition_df$date,
    series = "Observed construction starts",
    value = decomposition_df$observed
  )
)

decomposition_overlay$series <- factor(
  decomposition_overlay$series,
  levels = c(
    "Smoothed cyclical trend (STL trend)",
    "Observed construction starts"
  )
)

ggplot2::ggplot(
  decomposition_overlay,
  ggplot2::aes(x = date, y = value, color = series)
) +
  ggplot2::geom_line(linewidth = 0.95) +
  ggplot2::scale_color_manual(
    values = c(
      "Smoothed cyclical trend (STL trend)" = "#F1C40F",
      "Observed construction starts" = "#FF1F1F"
    )
  ) +
  ggplot2::scale_x_date(date_labels = "%b %Y", date_breaks = "5 years") +
  ggplot2::labs(
    title = "Residential construction starts",
    x = NULL,
    y = "Thousand units",
    color = NULL
  ) +
  ggplot2::theme_classic(base_size = 12) +
  ggplot2::theme(
    legend.position = "top",
    legend.justification = "left",
    legend.box.just = "left",
    plot.title = ggplot2::element_text(face = "bold")
  )

Figure 1. Observed residential construction starts and smoothed cyclical trend (STL).

ggplot2::ggplot(
  decomposition_components,
  ggplot2::aes(x = date, y = value)
) +
  ggplot2::geom_line(color = "#2A9D8F", linewidth = 0.55) +
  ggplot2::facet_wrap(~component, ncol = 1, scales = "free_y") +
  ggplot2::labs(
    title = "Seasonal and Irregular Components",
    x = "Date",
    y = "Component value"
  ) +
  ggplot2::theme_minimal(base_size = 12)

Figure 2. Seasonal and irregular components from STL decomposition.

Exploratory Seasonal Models

In order to evaluate both the goodness of fit and the forecasting performance of seasonal time-series models, classical exponential smoothing models with seasonality were applied using the Holt-Winters method.

Two types of seasonal models were considered:

Additive Holt-Winters models
Multiplicative Holt-Winters models

Each specification was estimated in two versions:

A version with manually selected smoothing parameters
A version with automatically estimated parameters

The smoothing parameters included:

alpha - level smoothing parameter
beta - trend smoothing parameter
gamma - seasonal smoothing parameter

To evaluate predictive performance, the 12 most recent observations were separated from the dataset and used as an out-of-sample test set. The remaining observations were used as the training dataset for model estimation.

Two types of model performance were therefore examined:

In-sample fit errors, measuring how well the models reproduce the training data
Out-of-sample forecast errors, measuring how accurately the models predict unseen observations

Model Fit Evaluation

The results of the in-sample fit evaluation are presented in Table 1, which reports the forecast accuracy metrics for additive and multiplicative Holt-Winters models.

Model performance was evaluated using the following error measures:

Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Mean Absolute Percentage Error (MAPE)
Adjusted Mean Absolute Percentage Error (AMAPE)

Among the models estimated on the training data, the multiplicative Holt-Winters model with automatically estimated parameters achieved the lowest values for all error metrics, indicating the best fit to the historical observations.

This result suggests that the multiplicative formulation captures the relationship between the level of the series and the magnitude of seasonal fluctuations more accurately than the additive specification when fitting the historical data.

train_fit_display <- hw_train_metrics
names(train_fit_display)[names(train_fit_display) == "sMAPE"] <- "AMAPE"

knitr::kable(
  train_fit_display,
  digits = 4,
  caption = "Table 1. In-sample fit errors for additive and multiplicative Holt-Winters models"
)

Table 1. In-sample fit errors for additive and multiplicative Holt-Winters models
	model	model_name	model_family	seasonal_type	parameter_source	dataset	alpha	beta	gamma	MAE	MSE	RMSE	MAPE	AMAPE
7	HW_Mult_Auto	HW Multiplicative (Auto)	holt_winters	multiplicative	auto	train	0.4466	0.0037	0.3208	7.2368	87.9262	9.3769	7.0926	7.0867
8	HW_Add_Auto	HW Additive (Auto)	holt_winters	additive	auto	train	0.4331	0.0152	0.4040	7.3771	90.4554	9.5108	7.6791	7.7899
9	HW_Add_Fixed	HW Additive (Fixed)	holt_winters	additive	fixed	train	0.3000	0.1000	0.4000	7.4624	93.9604	9.6933	7.8672	8.0601
10	HW_Mult_Fixed	HW Multiplicative (Fixed)	holt_winters	multiplicative	fixed	train	0.4000	0.2000	0.3000	7.8202	106.5169	10.3207	7.6365	7.5843

Forecast Error Evaluation

Forecast performance was then evaluated using the out-of-sample test dataset consisting of the last 12 observations.

The results are reported in Table 2, which compares forecast errors for all tested Holt-Winters specifications.

In contrast to the in-sample fit results, the additive Holt-Winters model with manually selected parameters produced the lowest forecast errors across all evaluated metrics.

This discrepancy between training and testing performance may be explained by the phenomenon of model overfitting. The multiplicative model with automatically optimized parameters was calibrated using the same observations on which its fit was evaluated, which may lead the model to capture noise specific to the training dataset rather than the true underlying data-generating process.

As a result, although the multiplicative model achieved a better fit to the historical data, it performed worse when predicting new observations.

This finding highlights an important principle in time-series forecasting: models that achieve the best in-sample fit do not necessarily produce the most accurate forecasts.

For completeness, baseline benchmark models (Naive and Seasonal Naive) are also evaluated in the production script and written to analysis_outputs/model_metrics.csv.

test_fit_display <- hw_test_metrics
names(test_fit_display)[names(test_fit_display) == "sMAPE"] <- "AMAPE"

knitr::kable(
  test_fit_display,
  digits = 4,
  caption = "Table 2. Out-of-sample forecast errors for additive and multiplicative Holt-Winters models"
)

Table 2. Out-of-sample forecast errors for additive and multiplicative Holt-Winters models
	model	model_name	model_family	seasonal_type	parameter_source	dataset	alpha	beta	gamma	MAE	MSE	RMSE	MAPE	AMAPE
1	HW_Add_Fixed	HW Additive (Fixed)	holt_winters	additive	fixed	test	0.3000	0.1000	0.4000	6.4556	59.2120	7.6949	5.9171	5.6859
4	HW_Add_Auto	HW Additive (Auto)	holt_winters	additive	auto	test	0.4331	0.0152	0.4040	8.3955	100.1151	10.0058	7.7004	7.2834
5	HW_Mult_Auto	HW Multiplicative (Auto)	holt_winters	multiplicative	auto	test	0.4466	0.0037	0.3208	11.6801	174.7865	13.2207	10.4639	9.8160
6	HW_Mult_Fixed	HW Multiplicative (Fixed)	holt_winters	multiplicative	fixed	test	0.4000	0.2000	0.3000	19.3681	456.6959	21.3704	17.5748	15.8468

Forecast Comparison

Figure 3 presents a visual comparison between the forecasts generated by the additive and multiplicative Holt-Winters models and the actual observed values of the variable representing the number of newly started residential construction projects.

The comparison confirms the quantitative evaluation based on forecast error metrics. The additive model with manually selected parameters tracks the actual values more closely during the forecast horizon, while the multiplicative models exhibit larger deviations from realized observations.

This graphical comparison supports the conclusion that the additive specification provides more reliable short-term forecasts for the analysed series.

ggplot2::ggplot(
  hw_backtest_data,
  ggplot2::aes(x = date, y = value, color = series, linetype = series)
) +
  ggplot2::geom_line(linewidth = 0.9) +
  ggplot2::geom_vline(
    xintercept = max(analysis_env$train_df$date),
    color = "#9B2226",
    linetype = "dashed",
    linewidth = 0.8
  ) +
  ggplot2::scale_color_manual(
    values = c(
      "Actual" = "#1B1F3B",
      "HW Additive (Auto)" = "#005F73",
      "HW Additive (Fixed)" = "#0A9396",
      "HW Multiplicative (Auto)" = "#EE9B00",
      "HW Multiplicative (Fixed)" = "#CA6702"
    )
  ) +
  ggplot2::labs(
    title = "Forecast comparison: additive vs multiplicative Holt-Winters",
    subtitle = "Dashed line marks forecast origin (last historical observation)",
    x = "Date",
    y = "Construction starts (thousand units)",
    color = "Series",
    linetype = "Series"
  ) +
  ggplot2::theme_minimal(base_size = 12) +
  ggplot2::theme(
    plot.title = ggplot2::element_text(face = "bold"),
    legend.position = "bottom"
  )

Figure 3. Forecast comparison for additive and multiplicative Holt-Winters models against observed values.

4. Residual Diagnostics (Train Window)

Residual diagnostics are reported to flag potential remaining autocorrelation and bias.

In this analysis, Ljung-Box p-values are generally low across specifications, indicating that residual autocorrelation is still present. This suggests that Holt-Winters captures a substantial share of trend-seasonal structure but does not fully absorb all serial dependence in the series.

knitr::kable(
  diagnostics_table,
  digits = 6,
  caption = "Table 3. Residual diagnostics by model"
)

Table 3. Residual diagnostics by model
model	model_name	residual_mean	residual_sd	ljung_box_p_value
HW_Add_Auto	HW Additive (Auto)	0.393067	9.513233	0.001117
HW_Add_Fixed	HW Additive (Fixed)	0.066823	9.703849	0.000000
HW_Mult_Auto	HW Multiplicative (Auto)	0.771482	9.355485	0.004570
HW_Mult_Fixed	HW Multiplicative (Fixed)	-0.366337	10.325654	0.000000
Naive	Naive	-0.091342	13.336106	0.000000
Seasonal_Naive	Seasonal Naive	-0.745011	17.544460	0.000000

5. 12-Month Forward Forecast (Champion Model)

The table below reports point forecasts together with 80% and 95% prediction intervals, which quantify forecast uncertainty around the central projection path.

knitr::kable(
  forward_forecast,
  digits = 3,
  caption = "Table 4. Champion model forward forecast with 80% and 95% intervals"
)

Table 4. Champion model forward forecast with 80% and 95% intervals
date	point_forecast	lower_80	upper_80	lower_95	upper_95
2025-03-01	112.322	99.963	124.682	93.420	131.225
2025-04-01	122.701	109.686	135.716	102.796	142.606
2025-05-01	123.650	109.895	137.405	102.614	144.686
2025-06-01	124.235	109.660	138.810	101.944	146.525
2025-07-01	119.039	103.567	134.511	95.377	142.701
2025-08-01	118.381	101.940	134.822	93.237	143.526
2025-09-01	115.504	98.025	132.983	88.773	142.236
2025-10-01	115.027	96.446	133.607	86.610	143.443
2025-11-01	106.449	86.706	126.192	76.255	136.643
2025-12-01	103.108	82.146	124.071	71.049	135.167
2026-01-01	97.018	74.782	119.253	63.012	131.024
2026-02-01	105.961	82.401	129.520	69.930	141.992

6. Exported Artifacts

artifact_dir <- file.path("analysis_outputs")
artifact_files <- if (dir.exists(artifact_dir)) list.files(artifact_dir) else character(0)

knitr::kable(
  data.frame(file = artifact_files),
  caption = "Table 5. Files generated by the production script"
)

Table 5. Files generated by the production script
file
backtest_forecast_comparison.png
champion_model_forward_forecast.csv
historical_series.png
holt_winters_model_metrics.csv
model_metrics.csv
model_residual_diagnostics.csv
session_info.txt

7. Reproducibility Note

The script writes session details to analysis_outputs/session_info.txt. This supports traceability of package versions and runtime environment for RPubs publication.

8. Full Reproducible Code

To preserve the same implementation detail as the standalone analysis script, the full production code is embedded below.


options(stringsAsFactors = FALSE, scipen = 999)

required_packages <- c(
  "readr",
  "dplyr",
  "lubridate",
  "forecast",
  "Metrics",
  "ggplot2"
)

missing_packages <- setdiff(required_packages, rownames(utils::installed.packages()))
if (length(missing_packages) > 0L) {
  stop(
    sprintf(
      "Missing packages detected: %s. Install them before running the analysis.",
      paste(missing_packages, collapse = ", ")
    )
  )
}

get_script_directory <- function() {
  script_flag <- "--file="
  matching_args <- grep(script_flag, commandArgs(trailingOnly = FALSE), value = TRUE)
  if (length(matching_args) == 0L) {
    return(normalizePath(getwd(), winslash = "/", mustWork = TRUE))
  }
  script_path <- sub(script_flag, "", matching_args[1], fixed = TRUE)
  dirname(normalizePath(script_path, winslash = "/", mustWork = TRUE))
}

parse_monthly_date <- function(month_string) {
  current_locale <- Sys.getlocale("LC_TIME")
  on.exit(Sys.setlocale("LC_TIME", current_locale), add = TRUE)
  Sys.setlocale("LC_TIME", "C")
  as.Date(paste0("01-", month_string), format = "%d-%b-%y")
}

compute_error_metrics <- function(actual, predicted) {
  actual <- as.numeric(actual)
  predicted <- as.numeric(predicted)

  if (length(actual) != length(predicted)) {
    stop("Metric computation failed: actual and predicted vectors have different lengths.")
  }

  valid_rows <- is.finite(actual) & is.finite(predicted)
  actual <- actual[valid_rows]
  predicted <- predicted[valid_rows]

  if (length(actual) == 0L) {
    stop("Metric computation failed: no valid observations after NA filtering.")
  }

  epsilon <- .Machine$double.eps
  mape <- mean(abs((actual - predicted) / pmax(abs(actual), epsilon))) * 100
  smape <- mean((2 * abs(actual - predicted)) / pmax(abs(actual) + abs(predicted), epsilon)) * 100

  data.frame(
    MAE = Metrics::mae(actual, predicted),
    MSE = Metrics::mse(actual, predicted),
    RMSE = sqrt(Metrics::mse(actual, predicted)),
    MAPE = mape,
    sMAPE = smape,
    check.names = FALSE
  )
}

fit_holt_winters <- function(series, seasonal_type, alpha = NA_real_, beta = NA_real_, gamma = NA_real_) {
  if (is.na(alpha) && is.na(beta) && is.na(gamma)) {
    return(stats::HoltWinters(series, seasonal = seasonal_type))
  }
  stats::HoltWinters(
    series,
    seasonal = seasonal_type,
    alpha = alpha,
    beta = beta,
    gamma = gamma
  )
}

build_monthly_ts <- function(data_frame) {
  stats::ts(
    data_frame$nstarted,
    start = c(lubridate::year(data_frame$date[1]), lubridate::month(data_frame$date[1])),
    frequency = 12
  )
}


project_dir <- get_script_directory()
data_path <- file.path(project_dir, "construction_data.csv")

if (!file.exists(data_path)) {
  stop(sprintf("Input file not found: %s", data_path))
}

construction_df <- readr::read_csv(
  data_path,
  col_types = readr::cols(
    date = readr::col_character(),
    nstarted = readr::col_double()
  )
)

construction_df <- dplyr::mutate(construction_df, date = parse_monthly_date(date))
construction_df <- dplyr::arrange(construction_df, date)

if (anyNA(construction_df$date)) {
  stop("Date parsing failed for at least one record.")
}
if (anyNA(construction_df$nstarted)) {
  stop("Missing target values detected in nstarted.")
}
if (anyDuplicated(construction_df$date) > 0L) {
  stop("Duplicate monthly timestamps detected. Each month must be unique.")
}
expected_months <- seq.Date(min(construction_df$date), max(construction_df$date), by = "month")
missing_months <- setdiff(expected_months, construction_df$date)
if (length(missing_months) > 0L) {
  stop(sprintf("Time index contains %d missing month(s).", length(missing_months)))
}
if (any(construction_df$nstarted <= 0)) {
  stop("Non-positive values detected. Multiplicative Holt-Winters requires strictly positive series.")
}

forecast_horizon <- 12L
history_window_months <- 24L

if (nrow(construction_df) <= forecast_horizon + history_window_months) {
  stop("Insufficient data to run a 12-month backtest with a 24-month history window.")
}

split_index <- nrow(construction_df) - forecast_horizon
train_df <- construction_df[seq_len(split_index), , drop = FALSE]
test_df <- construction_df[(split_index + 1L):nrow(construction_df), , drop = FALSE]

if (nrow(test_df) != forecast_horizon) {
  stop("Backtest split error: test window length does not match forecast horizon.")
}
if (max(train_df$date) >= min(test_df$date)) {
  stop("Backtest split error: train and test periods are not strictly time-ordered.")
}
if (length(intersect(train_df$date, test_df$date)) > 0L) {
  stop("Backtest split error: train and test date ranges overlap.")
}

train_ts <- build_monthly_ts(train_df)
test_ts <- build_monthly_ts(test_df)
full_ts <- build_monthly_ts(construction_df)


model_specs <- data.frame(
  model_key = c(
    "HW_Add_Auto",
    "HW_Add_Fixed",
    "HW_Mult_Auto",
    "HW_Mult_Fixed",
    "Naive",
    "Seasonal_Naive"
  ),
  display_name = c(
    "HW Additive (Auto)",
    "HW Additive (Fixed)",
    "HW Multiplicative (Auto)",
    "HW Multiplicative (Fixed)",
    "Naive",
    "Seasonal Naive"
  ),
  model_family = c(
    "holt_winters",
    "holt_winters",
    "holt_winters",
    "holt_winters",
    "naive",
    "snaive"
  ),
  seasonal_type = c("additive", "additive", "multiplicative", "multiplicative", NA, NA),
  alpha = c(NA_real_, 0.3, NA_real_, 0.4, NA_real_, NA_real_),
  beta = c(NA_real_, 0.1, NA_real_, 0.2, NA_real_, NA_real_),
  gamma = c(NA_real_, 0.4, NA_real_, 0.3, NA_real_, NA_real_),
  stringsAsFactors = FALSE
)

model_results <- lapply(seq_len(nrow(model_specs)), function(i) {
  spec <- model_specs[i, , drop = FALSE]
  if (spec$model_family == "holt_winters") {
    fitted_model <- fit_holt_winters(
      series = train_ts,
      seasonal_type = spec$seasonal_type,
      alpha = spec$alpha,
      beta = spec$beta,
      gamma = spec$gamma
    )
    model_forecast <- forecast::forecast(fitted_model, h = forecast_horizon)
    hw_fitted <- fitted_model$fitted[, "xhat"]
    fitted_values <- as.numeric(hw_fitted)
    aligned_train_actual <- as.numeric(
      stats::window(train_ts, start = stats::start(hw_fitted), end = stats::end(hw_fitted))
    )

    estimated_alpha <- ifelse(is.na(spec$alpha), fitted_model$alpha, spec$alpha)
    estimated_beta <- ifelse(is.na(spec$beta), fitted_model$beta, spec$beta)
    estimated_gamma <- ifelse(is.na(spec$gamma), fitted_model$gamma, spec$gamma)
  } else if (spec$model_family == "naive") {
    fitted_model <- forecast::naive(train_ts, h = forecast_horizon)
    model_forecast <- fitted_model
    fitted_values <- as.numeric(fitted_model$fitted)
    aligned_train_actual <- as.numeric(train_ts)

    estimated_alpha <- NA_real_
    estimated_beta <- NA_real_
    estimated_gamma <- NA_real_
  } else if (spec$model_family == "snaive") {
    fitted_model <- forecast::snaive(train_ts, h = forecast_horizon)
    model_forecast <- fitted_model
    fitted_values <- as.numeric(fitted_model$fitted)
    aligned_train_actual <- as.numeric(train_ts)

    estimated_alpha <- NA_real_
    estimated_beta <- NA_real_
    estimated_gamma <- NA_real_
  } else {
    stop(sprintf("Unsupported model family: %s", spec$model_family))
  }

  train_metrics <- compute_error_metrics(aligned_train_actual, fitted_values)
  test_metrics <- compute_error_metrics(test_ts, model_forecast$mean)

  residual_vector <- aligned_train_actual - fitted_values
  residual_vector <- residual_vector[is.finite(residual_vector)]
  ljung_box_p_value <- NA_real_
  if (length(residual_vector) > 12L) {
    ljung_box_p_value <- stats::Box.test(residual_vector, lag = 12, type = "Ljung-Box")$p.value
  }

  list(
    spec = spec,
    model = fitted_model,
    forecast = model_forecast,
    train_metrics = train_metrics,
    test_metrics = test_metrics,
    alpha = estimated_alpha,
    beta = estimated_beta,
    gamma = estimated_gamma,
    diagnostics = data.frame(
      residual_mean = mean(residual_vector),
      residual_sd = stats::sd(residual_vector),
      ljung_box_p_value = ljung_box_p_value,
      check.names = FALSE
    )
  )
})

names(model_results) <- model_specs$model_key

build_metrics_row <- function(model_key, result_object, dataset_label, metric_frame) {
  spec <- result_object$spec
  parameter_source <- if (
    spec$model_family == "holt_winters" && is.na(spec$alpha)
  ) {
    "auto"
  } else if (spec$model_family == "holt_winters") {
    "fixed"
  } else {
    "baseline"
  }

  data.frame(
    model = model_key,
    model_name = spec$display_name,
    model_family = spec$model_family,
    seasonal_type = spec$seasonal_type,
    parameter_source = parameter_source,
    dataset = dataset_label,
    alpha = result_object$alpha,
    beta = result_object$beta,
    gamma = result_object$gamma,
    MAE = metric_frame$MAE,
    MSE = metric_frame$MSE,
    RMSE = metric_frame$RMSE,
    MAPE = metric_frame$MAPE,
    sMAPE = metric_frame$sMAPE,
    check.names = FALSE
  )
}

metrics_table <- do.call(
  rbind,
  lapply(names(model_results), function(model_key) {
    result_object <- model_results[[model_key]]
    rbind(
      build_metrics_row(model_key, result_object, "train", result_object$train_metrics),
      build_metrics_row(model_key, result_object, "test", result_object$test_metrics)
    )
  })
)

metrics_table <- dplyr::arrange(metrics_table, dataset, MAE)

cat("\nModel Performance Summary:\n")
print(metrics_table, row.names = FALSE)

diagnostics_table <- do.call(
  rbind,
  lapply(names(model_results), function(model_key) {
    result_object <- model_results[[model_key]]
    diagnostics <- result_object$diagnostics
    data.frame(
      model = model_key,
      model_name = result_object$spec$display_name,
      residual_mean = diagnostics$residual_mean,
      residual_sd = diagnostics$residual_sd,
      ljung_box_p_value = diagnostics$ljung_box_p_value,
      check.names = FALSE
    )
  })
)

cat("\nResidual Diagnostics (train window):\n")
print(diagnostics_table, row.names = FALSE)

test_metrics_table <- metrics_table[metrics_table$dataset == "test", , drop = FALSE]
champion_model_key <- test_metrics_table$model[which.min(test_metrics_table$MAE)]
cat(sprintf("\nChampion model based on test MAE: %s\n", champion_model_key))

champion_spec <- model_specs[model_specs$model_key == champion_model_key, , drop = FALSE]
if (champion_spec$model_family == "holt_winters") {
  champion_model_full <- fit_holt_winters(
    series = full_ts,
    seasonal_type = champion_spec$seasonal_type,
    alpha = champion_spec$alpha,
    beta = champion_spec$beta,
    gamma = champion_spec$gamma
  )
  champion_forecast <- forecast::forecast(champion_model_full, h = forecast_horizon)
} else if (champion_spec$model_family == "naive") {
  champion_model_full <- forecast::naive(full_ts, h = forecast_horizon)
  champion_forecast <- champion_model_full
} else if (champion_spec$model_family == "snaive") {
  champion_model_full <- forecast::snaive(full_ts, h = forecast_horizon)
  champion_forecast <- champion_model_full
} else {
  stop(sprintf("Unsupported champion model family: %s", champion_spec$model_family))
}

future_dates <- seq.Date(
  from = seq.Date(max(construction_df$date), by = "month", length.out = 2L)[2],
  by = "month",
  length.out = forecast_horizon
)

forward_forecast_table <- data.frame(
  date = future_dates,
  point_forecast = as.numeric(champion_forecast$mean),
  lower_80 = as.numeric(champion_forecast$lower[, "80%"]),
  upper_80 = as.numeric(champion_forecast$upper[, "80%"]),
  lower_95 = as.numeric(champion_forecast$lower[, "95%"]),
  upper_95 = as.numeric(champion_forecast$upper[, "95%"])
)

cat("\nChampion Model Forward Forecast (12 Months):\n")
print(forward_forecast_table, row.names = FALSE)


historical_plot <- ggplot2::ggplot(construction_df, ggplot2::aes(x = date, y = nstarted)) +
  ggplot2::geom_line(color = "#1B4965", linewidth = 1) +
  ggplot2::scale_x_date(date_labels = "%Y", date_breaks = "2 years") +
  ggplot2::labs(
    title = "Monthly Residential Construction Starts",
    subtitle = "Raw historical observations",
    x = "Date",
    y = "Construction starts (thousand units)"
  ) +
  ggplot2::theme_minimal(base_size = 12) +
  ggplot2::theme(
    plot.title = ggplot2::element_text(face = "bold"),
    axis.text.x = ggplot2::element_text(angle = 45, hjust = 1)
  )

history_df <- tail(train_df, history_window_months)
actual_frame <- dplyr::bind_rows(
  dplyr::transmute(history_df, date = date, series = "Actual", value = nstarted),
  dplyr::transmute(test_df, date = date, series = "Actual", value = nstarted)
)

forecast_frame <- dplyr::bind_rows(
  lapply(names(model_results), function(model_key) {
    data.frame(
      date = test_df$date,
      series = model_results[[model_key]]$spec$display_name,
      value = as.numeric(model_results[[model_key]]$forecast$mean),
      stringsAsFactors = FALSE
    )
  })
)

forecast_anchor_frame <- data.frame(
  date = rep(max(train_df$date), times = nrow(model_specs)),
  series = model_specs$display_name,
  value = rep(tail(train_df$nstarted, 1), times = nrow(model_specs)),
  stringsAsFactors = FALSE
)

backtest_plot_data <- dplyr::bind_rows(actual_frame, forecast_anchor_frame, forecast_frame)

backtest_plot <- ggplot2::ggplot(
  backtest_plot_data,
  ggplot2::aes(x = date, y = value, color = series, linetype = series)
) +
  ggplot2::geom_line(linewidth = 0.9) +
  ggplot2::geom_vline(
    xintercept = max(train_df$date),
    color = "#9B2226",
    linetype = "dashed",
    linewidth = 0.8
  ) +
  ggplot2::scale_color_brewer(palette = "Dark2") +
  ggplot2::labs(
    title = "Backtest: Actuals vs Model Forecasts",
    subtitle = "Dashed line marks forecast origin (last historical observation)",
    x = "Date",
    y = "Construction starts (thousand units)",
    color = "Series",
    linetype = "Series"
  ) +
  ggplot2::theme_minimal(base_size = 12) +
  ggplot2::theme(
    plot.title = ggplot2::element_text(face = "bold"),
    legend.position = "bottom"
  )

print(historical_plot)
print(backtest_plot)


output_dir <- file.path(project_dir, "analysis_outputs")
if (!dir.exists(output_dir)) {
  dir.create(output_dir, recursive = TRUE)
}

readr::write_csv(metrics_table, file.path(output_dir, "model_metrics.csv"))
readr::write_csv(metrics_table, file.path(output_dir, "holt_winters_model_metrics.csv"))
readr::write_csv(diagnostics_table, file.path(output_dir, "model_residual_diagnostics.csv"))
readr::write_csv(forward_forecast_table, file.path(output_dir, "champion_model_forward_forecast.csv"))
ggplot2::ggsave(
  filename = file.path(output_dir, "historical_series.png"),
  plot = historical_plot,
  width = 12,
  height = 6,
  dpi = 300
)
ggplot2::ggsave(
  filename = file.path(output_dir, "backtest_forecast_comparison.png"),
  plot = backtest_plot,
  width = 12,
  height = 6,
  dpi = 300
)
writeLines(capture.output(sessionInfo()), con = file.path(output_dir, "session_info.txt"))

cat(sprintf("\nAnalysis artifacts written to: %s\n", output_dir))