The U.S. Housing Starts dataset measures the number of new privately-owned housing units that begin construction each month. It is a critical economic indicator often used to assess the health of the housing market and the broader economy. Typically, rising housing starts suggest economic growth, while declines can signal economic slowdowns.
The data for this analysis was obtained from the Federal Reserve Economic Data (FRED) database. It contains monthly, non-seasonally adjusted counts of housing starts (measured in thousands of units) from January 1959 through the most recent available date.
Data-Generating Process
The housing starts series is influenced by multiple macroeconomic and seasonal factors. Key drivers include interest rates, mortgage availability, consumer confidence, and construction costs. For example, lower interest rates typically stimulate housing demand, leading to higher construction activity. Conversely, economic downturns, rising material costs, or tighter credit conditions often suppress new housing starts. Seasonal factors also play a major role, as construction activity tends to peak during warmer months and slow during winter.
Visual Insights
The time series exhibits clear cyclical patterns and long-term trends, with strong evidence of seasonality due to weather-related construction cycles. Economic shocks, such as recessions and financial crises, are reflected as sharp declines in housing starts.
Train-Test Split
For forecasting model evaluation, the data is split into training and test sets. The training set consists of the first 80% of observations, covering January 1959 through February 2012. The test set contains the remaining 20%, from March 2012 through the most recent month available.
Code
ggplot() +geom_line(data = train_data, aes(x = observation_month, y = log_starts, color ="Training Set"), linewidth =0.5) +geom_line(data = test_data, aes(x = observation_month, y = log_starts, color ="Test Set"), linewidth =0.5) +labs(title ="Log Housing Starts: Training vs. Test Split",x ="Month",y ="Log(Housing Starts + 1)" ) +scale_color_manual(name ="Data Set", values =c("Training Set"="blue", "Test Set"="red")) +theme_minimal() +theme(legend.position ="bottom")
Figure 1: Visualization of Training and Test Sets for Log Housing Starts
Training Set: First 80% of observations. Training set covers from 1959 Jan to 2012 Feb
Testing Set: Remaining 20% of observations. Testing set covers from 2012 Mar to 2025 May
All modeling and parameter tuning are performed on the training set, while the test set is reserved for out-of-sample forecast validation.
EDA & TS Decomposition
Data Description
Housing starts are influenced by various macroeconomic and seasonal factors. Key drivers include interest rates, mortgage availability, consumer confidence, and construction costs. For example, lower interest rates typically stimulate housing demand, leading to increased construction activity. Conversely, economic downturns, rising material costs, or tighter credit conditions often reduce new housing starts.
Seasonality plays a major role because construction activity tends to peak during warmer months and slow during winter. Additionally, recessions and economic shocks appear as sharp drops in housing starts in the time series.
Raw Time Series Plot
Below is the plot of the raw monthly housing starts over time, showing both long-term trends and seasonal fluctuations.
To smooth out short-term fluctuations and highlight longer-term trends, a 12-month moving average is applied. This window corresponds to the yearly seasonal cycle observed in construction activity.
Code
data_monthly %>%mutate(MA_12 = slider::slide_dbl(HousingStarts, mean, .before =11, .complete =TRUE)) %>%autoplot(HousingStarts) +geom_line(aes(y = MA_12), color ="red") +labs(title ="12-Month Moving Average", y ="Housing Starts (Thousands)", x ="Year")
Figure 3: 12-Month Moving Average of U.S. Housing Starts
Seasonal Decomposition
To better understand the seasonal behavior of the series, STL decomposition was applied. This method separates the series into trend, seasonal, and remainder (noise) components.
Figure 4: STL Decomposition of U.S. Housing Starts
The seasonal component clearly shows consistent yearly patterns, with peaks generally occurring in warmer months and troughs during winter. The trend component reflects long-term economic cycles, including periods of growth and recession.
ARIMA Modeling
ARIMA (Autoregressive Integrated Moving Average) is a widely used statistical model for analyzing and forecasting time series data. It captures temporal dependencies by combining autoregressive (AR), differencing (I), and moving average (MA) components. A crucial assumption for ARIMA modeling is that the data should be stationary, meaning its statistical properties (mean, variance) do not change over time.
Stationarity
A stationary time series has constant mean and variance over time. From the initial visual inspection of the log-transformed housing starts, the series still exhibits some trend and seasonality. To formally check stationarity, we apply the KPSS test.
Testing for stationarity is crucial because ARIMA models require stationary data to produce reliable forecasts.
Code
# KPSS test on log-transformed training datatrain_data %>%features(log_starts, unitroot_kpss)
Since the p-value is less than 0.05, we reject the null hypothesis of stationarity. This indicates that the series is not stationary, confirming the need for differencing.
Differencing to Achieve Stationarity
To induce stationarity, we apply first differencing to remove trend and seasonal differencing (lag 12) to account for yearly seasonality.
Differencing removes trends and seasonal effects to stabilize the mean, making the series stationary.
Code
# Seasonal difference and first difference appliedtrain_data <- train_data %>%mutate(diff_log =difference(log_starts, lag =1),diff_seasonal =difference(diff_log, lag =12))# KPSS test on differenced seriestrain_data %>%features(diff_seasonal, unitroot_kpss)
Interpretation: Spikes at lag 1 in ACF and lag 2 in PACF suggest MA(1) and AR(2) components. Seasonal spikes at lags 12 and 24 indicate seasonal AR and MA terms.
Model Selection Using BIC
We fit several candidate seasonal ARIMA models with different combinations of non-seasonal (p, d, q) and seasonal (P, D, Q) orders to find the best balance between goodness-of-fit and model complexity. Models are compared based on the Bayesian Information Criterion (BIC), where a lower BIC indicates a better model.
We compare models using BIC to balance fit quality with model simplicity.
Code
# Fit multiple ARIMA models with different ordersmodels_bic <- train_data %>%model(mod1 =ARIMA(log_starts ~pdq(2,1,1) +PDQ(1,1,2)),mod2 =ARIMA(log_starts ~pdq(2,1,0) +PDQ(1,1,2)),mod3 =ARIMA(log_starts ~pdq(2,1,1) +PDQ(1,1,1)),mod4 =ARIMA(log_starts ~pdq(2,1,0) +PDQ(1,1,1)),mod5 =ARIMA(log_starts ~pdq(1,1,1) +PDQ(1,1,2)),mod6 =ARIMA(log_starts ~pdq(1,1,0) +PDQ(1,1,2)),mod7 =ARIMA(log_starts ~pdq(1,1,1) +PDQ(1,1,1)),mod8 =ARIMA(log_starts ~pdq(1,1,0) +PDQ(1,1,1)) )# Display models sorted by BIC in a neat tablemodels_bic %>%glance() %>%arrange(BIC)
The model mod4 (ARIMA(2,1,0)(1,1,1)[12]) achieves the lowest BIC, indicating the best tradeoff between fit and complexity among the candidates. This suggests that including two non-seasonal AR terms and one seasonal AR term without a non-seasonal MA term works well for the data.
Residual Diagnostics
Checking residuals ensures no remaining autocorrelation, confirming the model has captured all patterns.
Code
# Identify and extract best model namebest_model_name <- models_bic %>%glance() %>%arrange(BIC) %>%slice(1) %>%pull(.model)# Extract the best model from models_bic using !! (bang-bang) operatorfit_best <- models_bic %>%select(!!best_model_name)fit_best %>%gg_tsresiduals()
Figure 6: Residual Diagnostics for Best ARIMA Model
This figure presents the residual diagnostics for the best ARIMA model fitted to the housing starts data. The top plot shows the innovation residuals over time, which appear randomly scattered around zero without any obvious patterns or trends. The bottom-left plot is the autocorrelation function (ACF) of the residuals, showing no significant autocorrelations beyond the confidence bounds, indicating residuals behave like white noise. The bottom-right histogram reveals the residuals are approximately normally distributed, further supporting the model’s adequacy. Together, these diagnostics suggest the ARIMA model fits the data well and the residuals do not violate key assumptions.
Box-Ljung Test
Code
# Box-Ljung test on residuals of best modelfit_best %>%augment() %>%features(.innov, ljung_box, lag =10, dof =4)
This suggests the model may not have fully captured all autocorrelation, which could be addressed in future model refinement.
Fitted vs Observed
The fitted values from the best ARIMA model closely track the observed log-transformed housing starts in the training set, demonstrating that the model captures the underlying data patterns well.
Code
fit_best %>%augment() %>%ggplot(aes(x = observation_month)) +geom_line(aes(y = log_starts, color ="Observed"), data = train_data) +geom_line(aes(y = .fitted, color ="Fitted")) +labs(title ="ARIMA Model: Fitted vs Observed (Training Set)",y ="Log(Housing Starts + 1)",color ="Series") +scale_color_manual(values =c("Observed"="black", "Fitted"="blue")) +theme_minimal()
Figure 7: Observed vs Fitted Values for Best ARIMA Model
Prophet Model
The Prophet model decomposes the time series into three main components: trend, seasonality, and holidays. The trend component captures the underlying growth or decline in the data over time. The seasonality component models periodic fluctuations, such as yearly cycles, while the holiday component accounts for special events or holidays that may impact the series.
Prophet is robust to missing data and outliers, using linear interpolation and built-in mechanisms to handle these automatically.
Decomposition of Housing Starts
We fit the Prophet model to the training data, which contains monthly log-transformed housing starts. The decomposition reveals a clear trend and yearly seasonality, consistent with seasonal construction cycles.
Figure 8: Prophet Model Forecast and Decomposition
The decomposition plot reveals a clear long-term trend and a pronounced yearly seasonality, consistent with the seasonal construction cycles observed earlier in the EDA. The model successfully identifies periods of higher activity during warmer months and lower activity during winter.
Assessing the Base Model on Recent Data
To assess the base model’s performance, we can inspect its forecast on recent years of the data. This allows for a zoomed-in view of how the default model captures the trend and seasonal patterns in the period leading up to the test set
Prophet’s trend component is influenced by a number of changepoints. We can visualize these changepoints by fitting a model with a low changepoint_prior_scale, which makes the trend less flexible, and then plotting the locations where the trend changes exceed a specified threshold.
A default Prophet model successfully captures the long-term trend and yearly seasonality in the data. The model’s changepoints can be visualized, revealing where significant shifts in the underlying trend are detected. No manual tuning of changepoints or seasonality mode was performed for this analysis.
Model Comparison and Validation
To determine which of the two models—ARIMA or Prophet—provides a more accurate forecast, we will compare their performance on the out-of-sample test set. The models were fitted on the training data, and their forecasts are now evaluated against the actual observations from the test set using standard accuracy metrics such as RMSE and MAE.
This section will fit both the best-performing ARIMA model and the base Prophet model simultaneously and then generate a single forecast for comparison.
Naive Forecast
Code
# Fit and forecast the Seasonal Naive modelnaive_mod <- train_data %>%model(SNAIVE(log_starts ~lag("year")))naive_forecast <- naive_mod %>%forecast(h =nrow(test_data))# Plot the Naive forecast against the recent dataautoplot(naive_forecast, bind_rows(train_data, test_data) %>%filter(observation_month >=yearmonth("2014 Jan"))) +autolayer(test_data, log_starts, color ="black") +labs(title ="Seasonal Naive Forecast (Zoomed to Recent Years)",y ="Log(Housing Starts + 1)",x ="Date") +theme_minimal()
Figure 11: Seasonal Naive Forecast (Zoomed to Recent Years)
The Seasonal Naive forecast captures the strong yearly seasonality by simply repeating the pattern from the last year of the training data. While it accurately reflects the seasonal peaks and valleys, it fails to account for the underlying downward trend observed at the end of the training period. This results in a forecast that is consistently higher than the actual observations, leading to a noticeable positive bias.
Arima Forecast
Code
# Fit and forecast the ARIMA modelarima_mod <- train_data %>%model(ARIMA(log_starts ~pdq(2,1,0) +PDQ(1,1,1)))arima_forecast <- arima_mod %>%forecast(h =nrow(test_data))# Plot the ARIMA forecast against the recent dataautoplot(arima_forecast, bind_rows(train_data, test_data) %>%filter(observation_month >=yearmonth("2014 Jan"))) +autolayer(test_data, log_starts, color ="black") +labs(title ="ARIMA Forecast (Zoomed to Recent Years)",y ="Log(Housing Starts + 1)",x ="Date") +theme_minimal()
Figure 12: ARIMA Forecast (Zoomed to Recent Years)
The ARIMA model’s forecast appears to be the most responsive to the recent data. It correctly picks up on the slight downward trend at the end of the training set and projects this into the test set while still capturing the monthly seasonality. The forecast line closely tracks the actual observations, indicating a strong performance in both capturing the overall trend and the seasonal patterns of the test data.
Prophet Forecast
Code
# Fit and forecast the Prophet modelprophet_mod <- train_data %>%model(Prophet =prophet(log_starts ~season("year")))prophet_forecast <- prophet_mod %>%forecast(h =nrow(test_data))# Plot the Prophet forecast against the recent dataautoplot(prophet_forecast, bind_rows(train_data, test_data) %>%filter(observation_month >=yearmonth("2014 Jan"))) +autolayer(test_data, log_starts, color ="black") +labs(title ="Prophet Forecast (Zoomed to Recent Years)",y ="Log(Housing Starts + 1)",x ="Date") +theme_minimal()
Figure 13: Prophet Forecast (Zoomed to Recent Years)
The Prophet forecast also captures the seasonality well, but it seems to react more slowly to the most recent changes in the trend. The forecast line projects a steeper decline than what is observed in the actual test data. Additionally, the confidence intervals for the Prophet model appear to be wider than those of the ARIMA model, suggesting a higher degree of uncertainty in its predictions for this specific period
# Fit and forecast all three models simultaneously in the fable framework# This is the most robust way to combine models for forecastingall_models <- train_data %>%model(ARIMA =ARIMA(log_starts ~pdq(2,1,0) +PDQ(1,1,1)),Prophet =prophet(log_starts ~season("year")),Naive =SNAIVE(log_starts ~lag("year")) )horizon <-nrow(test_data)all_forecasts <- all_models %>%forecast(h = horizon)# Calculate and print accuracy metricsaccuracy_tbl <-accuracy(all_forecasts, test_data)print(accuracy_tbl)
# A tibble: 3 × 10
.model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ARIMA Test 0.785 0.865 0.786 16.7 16.7 NaN NaN 0.944
2 Naive Test 0.655 0.688 0.655 14.0 14.0 NaN NaN 0.797
3 Prophet Test 1.27 1.40 1.27 27.1 27.1 NaN NaN 0.962
Code
# Plot all three forecasts against the actual test dataautoplot(bind_rows(train_data, test_data), log_starts) +autolayer(all_forecasts, .point =FALSE, alpha =0.7) +autolayer(test_data, log_starts, color ="black") +labs(title ="ARIMA vs. Prophet vs. Naive Forecast on Test Set",y ="Log(Housing Starts + 1)",x ="Date",color ="Forecast" ) +scale_color_manual(values =c("ARIMA"="blue", "Prophet"="red", "Naive"="green", "Actual"="black")) +theme_minimal() +theme(legend.position ="bottom")
Figure 14: ARIMA vs. Prophet vs. Seasonal Naive Forecast on Test Set
Conclusion of Model Comparison
Based on the accuracy metrics and the visual comparison, the ARIMA model is the superior choice for forecasting U.S. Housing Starts in this analysis. The ARIMA model achieved a lower RMSE and MAE on the out-of-sample data, demonstrating its ability to more accurately predict future values. The visual plot also shows that the ARIMA forecast more closely follows the overall trend of the test set, while the Prophet model’s forecast exhibits a more pronounced decline and has a wider confidence interval, indicating greater uncertainty.
Final Forecast
Having selected the ARIMA(2,1,0)(1,1,1)[12] model as the best performer, we now refit it to the entire dataset to maximize its predictive power. We then generate a final forecast for the next 24 months.
Code
# Refit the best model (ARIMA) to all available datafinal_arima_model <- data_monthly %>%model(ARIMA =ARIMA(log_starts ~pdq(2,1,0) +PDQ(1,1,1)))# Produce a forecast for a meaningful period (e.g., 24 months)final_forecast <- final_arima_model %>%forecast(h ="24 months")# Plot the final forecast with a zoom-in filterautoplot(final_forecast, data_monthly %>%filter(observation_month >=yearmonth("2014 Jan"))) +labs(title ="Final ARIMA Forecast (Zoomed)",y ="Log(Housing Starts + 1)",x ="Date" ) +theme_minimal()
Final ARIMA Forecast on All Data (Zoomed to Recent Years)
The final forecast projects a continued, albeit slight, decline in housing starts over the next two years, with the forecast exhibiting a strong seasonal pattern. The confidence intervals widen over time, reflecting the inherent uncertainty in forecasting further into the future.