Split the time series into a training and test set
Code
# Load necessary librarieslibrary(tidyverse)library(tsibble)library(fable)library(feasts)library(ggplot2)# Load datadf <-read_csv("../dataset/zillow_sales.csv")# Convert to a tsibblezillow_ts <- df %>%select(date, zillow_sales) %>%mutate(date =yearmonth(date)) %>%as_tsibble(index = date)# Create training and test setstrain <- zillow_ts %>%filter(date <=yearmonth("2021 Jun"))test <- zillow_ts %>%filter(date >yearmonth("2021 Jun"))#plot the setssplit_point <-yearmonth("2021 Jun")zillow_ts %>%ggplot(aes(x = date, y = zillow_sales)) +geom_line(color ="steelblue", size =1) +geom_vline(xintercept =as.Date(split_point), linetype ="dashed", color ="red", size =1) +labs(title ="Train/Test Split for Zillow Home Sales",x ="Date", y ="Home Sales Count") +annotate("text", x =as.Date(split_point), y =max(zillow_ts$zillow_sales),label ="Train/Test Split", vjust =-0.5, color ="red") +theme_minimal()
To split the data approximately 80% of training data and 20% of test data, the cut off I have set is June 2021. A visual inspection of the two segments indicates that the test set follows similar seasonal and trend patterns observed in the training set, suggesting that it is a representative sample.
Section 2 - Cross-Validation Scheme
Set up a rolling window
Code
# Load necessary librarieslibrary(fabletools)# Define the cross-validation foldscv_data <- zillow_ts %>%stretch_tsibble(.init =160, .step =3)cv_data %>%ggplot()+geom_point(aes(date,factor(.id),color=factor(.id)))+ylab('Iteration')+ggtitle('Samples included in each CV Iteration')
I tested step sizes of 1, 3, and 6 months. With a step of 1, the rolling window produced more than 40 iterations, making the results cluttered and difficult to interpret. At a step of 6, only 8 iterations were generated, which felt too limited. I chose a step of 3 as a balanced approach—it expands the training window by one quarter at a time, yielding a sufficient number of iterations while maintaining readability. This setup strikes a good balance between computational efficiency and evaluating the model across multiple forecast origins.
The best ARIMA seems to be fitting better than the Naive. We can see that the Naive model forecasts more linearly. Next, let’s take a look at the RMSE, MAPE and MASE to see which one fits better.
.model .type ME RMSE MAE MPE MAPE MASE
<char> <char> <num> <num> <num> <num> <num> <num>
1: Naive Test -157.7488 738.9811 603.0540 -10.049778 25.736619 1.9891370
2: best_ARIMA Test -119.9645 303.8663 236.1257 -5.202612 9.030149 0.7788462
RMSSE ACF1
<num> <num>
1: 1.9097743 0.7897712
2: 0.7852921 0.6895161
The ARIMA model seems to consistently outperform the naive model across most folds and metrics, which lines with the visual inspection of forecast plots. ARIMA forecasts tracked the trend and seasonality of the actual data more closely, that’s the major reason it’s outperforming.
cv_forecast %>%group_by(.id,.model) %>%mutate(h =row_number()) %>%ungroup() %>%as_fable(response ="zillow_sales", distribution = zillow_sales) %>%accuracy(zillow_ts, by =c("h", ".model")) %>%ggplot(aes(x = h, y = RMSE,color=.model)) +geom_point()+geom_line()+ylab('Average RMSE at Forecasting Intervals')+xlab('Months in the Future')
Code
cv_forecast %>%group_by(.id,.model) %>%mutate(h =row_number()) %>%ungroup() %>%as_fable(response ="zillow_sales", distribution = zillow_sales) %>%accuracy(zillow_ts, by =c("h", ".model")) %>%mutate(MAPE = MAPE/100) %>%# Rescaleggplot(aes(x = h, y = MAPE,color=.model)) +geom_point()+geom_line()+theme_bw()+scale_y_continuous(name ='Average MAPE at Forecasting Intervals',labels=scales::percent)
From the two plots comparison we can also confirm that best_ARIMA model is forecasting way better than the Naive model.
# refit on training setfit_final <- train %>%model(ARIMA_best =ARIMA(box_cox(zillow_sales, lambda_bc) ~pdq(0,1,1) +PDQ(0,1,1)) )# forecast test set horizonfc_final <- fit_final %>%forecast(h =12)fc_final %>%autoplot(zillow_ts) +autolayer(test, zillow_sales, color ="red") +labs(title ="Forecast vs. Actual: Test Set Performance",x ="Date",y ="Box-Cox Transformed Sales" )
The Best ARIMA forecast generated by the model did a pretty good job in terms of capturing the trend and seasonality. The forecast value and observed value are pretty close for the 12 months period. The more we forecast, the worse it gets as we did cross-validation on 3 months step. While it might not be perfect, the values generated are not far from the actual recorded observations.
# A tibble: 1 × 10
.model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ARIMA_best Test 34.1 321. 284. 0.823 8.69 NaN NaN 0.724
After selecting the ARIMA model as the most accurate during the cross-validation, I refit it to the training set and did a 12 months forecast. The matrics on test data continue to indicate that the model perform well on unseen data, suggesting minimal overfitting. The model performed with 8.68% average deviation from real value during out of sample testing, corresponding to an average error of 321.3 points in the sales index.