BANA7050_Assignment4_RFischer

Author

Ruowei Fischer

Section 1 - Train and Test Splits

Split the time series into a training and test set

Code
# Load necessary libraries
library(tidyverse)
library(tsibble)
library(fable)
library(feasts)
library(ggplot2)

# Load data
df <- read_csv("../dataset/zillow_sales.csv")

# Convert to a tsibble
zillow_ts <- df %>%
  select(date, zillow_sales) %>%
  mutate(date = yearmonth(date)) %>%
  as_tsibble(index = date)

# Create training and test sets
train <- zillow_ts %>% filter(date <= yearmonth("2021 Jun"))
test <- zillow_ts %>% filter(date > yearmonth("2021 Jun"))

#plot the sets
split_point <- yearmonth("2021 Jun")
zillow_ts %>%
  ggplot(aes(x = date, y = zillow_sales)) +
  geom_line(color = "steelblue", size = 1) +
  geom_vline(xintercept = as.Date(split_point), linetype = "dashed", color = "red", size = 1) +
  labs(title = "Train/Test Split for Zillow Home Sales",
       x = "Date", y = "Home Sales Count") +
  annotate("text", x = as.Date(split_point), y = max(zillow_ts$zillow_sales),
           label = "Train/Test Split", vjust = -0.5, color = "red") +
  theme_minimal()

To split the data approximately 80% of training data and 20% of test data, the cut off I have set is June 2021. A visual inspection of the two segments indicates that the test set follows similar seasonal and trend patterns observed in the training set, suggesting that it is a representative sample.

Section 2 - Cross-Validation Scheme

Set up a rolling window

Code
# Load necessary libraries
library(fabletools)

# Define the cross-validation folds
cv_data <- zillow_ts %>%
  stretch_tsibble(.init = 160, .step = 3)

cv_data %>%
    ggplot()+
    geom_point(aes(date,factor(.id),color=factor(.id)))+
    ylab('Iteration')+
    ggtitle('Samples included in each CV Iteration')

I tested step sizes of 1, 3, and 6 months. With a step of 1, the rolling window produced more than 40 iterations, making the results cluttered and difficult to interpret. At a step of 6, only 8 iterations were generated, which felt too limited. I chose a step of 3 as a balanced approach—it expands the training window by one quarter at a time, yielding a sufficient number of iterations while maintaining readability. This setup strikes a good balance between computational efficiency and evaluating the model across multiple forecast origins.

Section 3 - Model Selection and Comparison

Code
# Find Box-Cox lambda
lambda_bc <- zillow_ts %>%
  features(zillow_sales, features = guerrero) %>%
  pull(lambda_guerrero)

cv_forecast <- cv_data %>%
  model(
    Naive = NAIVE(box_cox(zillow_sales, lambda_bc)),
    best_ARIMA = ARIMA(box_cox(zillow_sales, lambda_bc) ~ pdq(0,1,1) + PDQ(0,1,1)),
  ) %>%
  forecast(h = 6)

cv_forecast %>%
  autoplot(cv_data)+
  facet_wrap(~.id,nrow=4)+
  theme_bw()+
  ylab('Box-Cox Transformed Zillow Sales')

Code
# Combined view
cv_forecast %>%
  as_tsibble() %>%
    select(-zillow_sales) %>%
    left_join(
        zillow_ts
    ) %>%
    ggplot()+
    geom_line(aes(date,zillow_sales))+
    geom_line(aes(date,.mean,color=factor(.id),linetype=.model))+
    scale_color_discrete(name='Iteration')+
    theme_bw()

The best ARIMA seems to be fitting better than the Naive. We can see that the Naive model forecasts more linearly. Next, let’s take a look at the RMSE, MAPE and MASE to see which one fits better.

Code
library(data.table)

cv_forecast %>%
  group_by(.id) %>%
  accuracy(zillow_ts) %>%
  ungroup() %>%
  data.table()
        .model   .id  .type           ME       RMSE        MAE         MPE
        <char> <int> <char>        <num>      <num>      <num>       <num>
 1:      Naive     1   Test   151.687130  298.97915  266.53258   3.7036385
 2:      Naive     2   Test  -797.865009 1014.53772  797.86501 -31.0124571
 3:      Naive     3   Test  -459.052296  616.21593  459.30813 -18.5243671
 4:      Naive     4   Test   946.246255  965.48394  946.24625  28.7730835
 5:      Naive     5   Test  -494.812521  673.19818  537.52879 -18.8602214
 6:      Naive     6   Test -1164.794055 1287.85850 1164.79405 -60.6619083
 7:      Naive     7   Test  -358.171126  480.58489  387.07334 -20.3952531
 8:      Naive     8   Test   839.251223  865.57833  839.25122  32.5047136
 9:      Naive     9   Test  -212.055505  392.54413  318.01847 -10.3716152
10:      Naive    10   Test  -769.925282  831.04831  769.92528 -42.2515466
11:      Naive    11   Test     7.282686  385.47170  340.31163  -3.3158457
12:      Naive    12   Test   748.486962  766.59520  748.48696  29.4710742
13:      Naive    13   Test  -425.863986  474.98637  425.86399 -18.4381347
14:      Naive    14   Test  -280.046482  293.71743  280.04648 -12.5863198
15:      Naive    15   Test          NaN        NaN        NaN         NaN
16: best_ARIMA     1   Test   321.200693  370.77509  321.20069   8.9194546
17: best_ARIMA     2   Test   225.648069  303.86443  289.90776   6.2884688
18: best_ARIMA     3   Test  -475.042491  565.58152  476.71266 -16.4145493
19: best_ARIMA     4   Test  -320.408247  358.42549  320.40825  -9.6705443
20: best_ARIMA     5   Test  -302.821065  319.64731  302.82106 -10.5702771
21: best_ARIMA     6   Test  -290.906534  336.51603  290.90653 -15.0448373
22: best_ARIMA     7   Test  -354.236658  382.18018  354.23666 -16.8690971
23: best_ARIMA     8   Test   -86.310817  147.87371  136.48120  -3.4435675
24: best_ARIMA     9   Test   -31.591840   75.14266   60.77345  -1.4192294
25: best_ARIMA    10   Test   -13.277510   69.66050   60.24393  -0.3763834
26: best_ARIMA    11   Test    42.855895   94.12774   78.79915   1.7542993
27: best_ARIMA    12   Test  -144.838127  246.43347  179.09354  -5.8434827
28: best_ARIMA    13   Test  -247.079543  302.76419  248.20281 -10.1352993
29: best_ARIMA    14   Test   114.576103  162.84458  135.81796   5.1795719
30: best_ARIMA    15   Test          NaN        NaN        NaN         NaN
        .model   .id  .type           ME       RMSE        MAE         MPE
         MAPE      MASE     RMSSE         ACF1
        <num>     <num>     <num>        <num>
 1:  7.229750 0.9543663 0.8535551  0.069051847
 2: 31.012457 2.8570863 2.8979124  0.485052409
 3: 18.531841 1.6536553 1.7690444  0.083970897
 4: 28.773083 3.4367562 2.7934596  0.244340574
 5: 20.053413 1.9757548 1.9636414  0.436901209
 6: 60.661908 4.2382622 3.7380224  0.535634711
 7: 21.513760 1.3544874 1.3289182  0.152821144
 8: 32.504714 2.8353082 2.2900437  0.132043692
 9: 13.993043 1.0457734 1.0113441  0.281485490
10: 42.251547 2.4805664 2.1054819  0.572456474
11: 16.716475 1.0867114 0.9714420  0.520691227
12: 29.471074 2.4177057 1.9469582 -0.109442405
13: 18.438135 1.3829681 1.2133086  0.411266431
14: 12.586320 0.9125691 0.7533412 -0.378161472
15:       NaN       NaN       NaN           NA
16:  8.919455 1.1501150 1.0585252  0.573925641
17:  9.161052 1.0381349 0.8679544  0.300987161
18: 16.465594 1.7163172 1.6236822  0.511170791
19:  9.670544 1.1637193 1.0370417  0.434814315
20: 10.570277 1.1130570 0.9323743  0.141865049
21: 15.044837 1.0585031 0.9767412  0.009964871
22: 16.869097 1.2395819 1.0568085  0.184811985
23:  5.563144 0.4610851 0.3912266 -0.289824148
24:  2.442009 0.1998477 0.1935963 -0.340919853
25:  3.116730 0.1940955 0.1764866 -0.071639006
26:  3.647046 0.2516280 0.2372149  0.108269287
27:  7.188633 0.5784944 0.6258788 -0.145945792
28: 10.187399 0.8060239 0.7733830 -0.126019914
29:  6.122397 0.4425811 0.4176720 -0.002465525
30:       NaN       NaN       NaN           NA
         MAPE      MASE     RMSSE         ACF1
Code
#Average comparison

cv_forecast %>%
  accuracy(zillow_ts) %>%
  data.table()
       .model  .type        ME     RMSE      MAE        MPE      MAPE      MASE
       <char> <char>     <num>    <num>    <num>      <num>     <num>     <num>
1:      Naive   Test -157.7488 738.9811 603.0540 -10.049778 25.736619 1.9891370
2: best_ARIMA   Test -119.9645 303.8663 236.1257  -5.202612  9.030149 0.7788462
       RMSSE      ACF1
       <num>     <num>
1: 1.9097743 0.7897712
2: 0.7852921 0.6895161

The ARIMA model seems to consistently outperform the naive model across most folds and metrics, which lines with the visual inspection of forecast plots. ARIMA forecasts tracked the trend and seasonality of the actual data more closely, that’s the major reason it’s outperforming.

Code
cv_forecast %>%
  group_by(.id,.model) %>%
  mutate(h = row_number()) %>%
  ungroup() %>%
  as_fable(response = "zillow_sales", distribution = zillow_sales) %>%
  accuracy(zillow_ts, by = c("h", ".model")) %>%
  ggplot(aes(x = h, y = RMSE,color=.model)) +
  geom_point()+
  geom_line()+
  ylab('Average RMSE at Forecasting Intervals')+
  xlab('Months in the Future')

Code
cv_forecast %>%
  group_by(.id,.model) %>%
  mutate(h = row_number()) %>%
  ungroup() %>%
  as_fable(response = "zillow_sales", distribution = zillow_sales) %>%
  accuracy(zillow_ts, by = c("h", ".model")) %>%
  mutate(MAPE = MAPE/100) %>% # Rescale
  ggplot(aes(x = h, y = MAPE,color=.model)) +
  geom_point()+
  geom_line()+
  theme_bw()+
  scale_y_continuous(
    name = 'Average MAPE at Forecasting Intervals',labels=scales::percent)

From the two plots comparison we can also confirm that best_ARIMA model is forecasting way better than the Naive model.

Section 4

Code
# refit on training set
fit_final <- train %>%
  model(
    ARIMA_best = ARIMA(box_cox(zillow_sales, lambda_bc) ~ pdq(0,1,1) + PDQ(0,1,1))
  )

# forecast test set horizon
fc_final <- fit_final %>%
  forecast(h = 12)

fc_final %>%
  autoplot(zillow_ts) +
  autolayer(test, zillow_sales, color = "red") +
  labs(
    title = "Forecast vs. Actual: Test Set Performance",
    x = "Date",
    y = "Box-Cox Transformed Sales"
  )

The Best ARIMA forecast generated by the model did a pretty good job in terms of capturing the trend and seasonality. The forecast value and observed value are pretty close for the 12 months period. The more we forecast, the worse it gets as we did cross-validation on 3 months step. While it might not be perfect, the values generated are not far from the actual recorded observations.

Code
accuracy_test <- fc_final %>%
  accuracy(test)

accuracy_test
# A tibble: 1 × 10
  .model     .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
  <chr>      <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ARIMA_best Test   34.1  321.  284. 0.823  8.69   NaN   NaN 0.724

After selecting the ARIMA model as the most accurate during the cross-validation, I refit it to the training set and did a 12 months forecast. The matrics on test data continue to indicate that the model perform well on unseen data, suggesting minimal overfitting. The model performed with 8.68% average deviation from real value during out of sample testing, corresponding to an average error of 321.3 points in the sales index.