Introduction

‘Zillow Housing Data’ here refers to the Sales Count collected monthly. The data is publicly available and provided by Zillow.

The US housing market as seen from Bubble bursting in the late 2000’s showed us that it was a linchpin in the US Economy. Understanding its volatility and being able to forecast it can be a tricky proposition even with first hand quality data as Zillow themselves learnt.

Data Manipulation and extraction

#pivot sales data
sales_pivot<-sales %>% 
  pivot_longer(cols = 6:ncol(sales),
               names_to = "Time",
               values_to = "sales",
               values_drop_na = TRUE) %>% 
  mutate(Month = ymd(Time)) %>%
  select(RegionName,Month, sales)

#extract sales for seattle
seattle_sales <- sales_pivot %>% 
  filter(RegionName == "Seattle, WA") %>%  
  select(Month, sales)

# prophet requirements
prophet_data = seattle_sales %>% 
  rename(ds = Month, # Have to name our date variable "ds"
         y = sales)  # Have to name our time series "y"

# split data into train and test
train = prophet_data %>% 
  filter(ds<ymd("2021-01-01"))

test = prophet_data %>%
  filter(ds>=ymd("2021-01-01"))

Exploratory Data Anlaysis

Distribution of data

tbl_summary(seattle_sales)
Characteristic N = 1671
Month 2008-02-29 to 2021-12-31
sales 4,497 (3,326, 6,060)

1 Range; Median (IQR)

seattle_sales %>% ggplot(aes(x = sales))+
  geom_histogram(bins =20, aes(y = ..density..)) +
  geom_density(col = "blue", lwd = 1) +
  ylab("Density") + xlab("Sales(count)") +
  ggtitle("Distribution of sales")

The distribution of sales appears to be bimodal and, upon looking at the trend through time it can be explained by the change in the nature of the data generating process and the rising trend from 2012-2016.

Distribution across time

Looking at the plot of the sales through time we can see that there have been a few changes in the trend of home sales in the United states. More noticeably we can see that sales seem to have flattened in the recent years. Some of the change can be explained as a consequence of COVID - 19 however, the flattening appears to have begun before 2017. Also, there may be multiplicative seasonality present here as the peaks are getting taller and troughs are getting deeper with time.

ggplot(seattle_sales,aes(x = Month, y = sales)) +
  geom_line(color = "blue") +
  geom_vline(aes(xintercept=ymd("2012-01-31")),color='red') +
  geom_vline(aes(xintercept=ymd("2016-01-31")),color='red') +
  xlab("Year") + ylab("Sales (Count)") +
  ggtitle("Home Sales through the years") +theme_bw()

Upon further visual inspection we can also see that the data is not mean stationary or variance stationary. We will confirm the same with the help of ADF and KPSS tests.

Stationarity tests

Augmented- Dickey Fuller Test

adf.test(seattle_sales$sales)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  seattle_sales$sales
## Dickey-Fuller = -4.8812, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

We observe that the p-value is <0.05, this indicates stationarity.

KPSS Test

kpss.test(seattle_sales$sales)
## 
##  KPSS Test for Level Stationarity
## 
## data:  seattle_sales$sales
## KPSS Level = 2.4902, Truncation lag parameter = 4, p-value = 0.01

Here as well the p-value is observed to be <0.05, but as the null hypothesis for the KPSS test is the opposite it indicates non-stationarity.

We take the first order difference of the sales and test the same to get a more convincing answer regarding the stationarity of the sales

seattle_diff = seattle_sales %>% 
  mutate(sales_diff = sales - lag(sales))

seattle_diff = seattle_diff %>% 
  filter(!is.na(sales_diff))

seattle_diff %>%
  ggplot()+
  geom_line(aes(Month,sales_diff), col = "blue") + 
  ggtitle("Sales First Difference") +
  ylab("Sales (difference)")

The first order difference upon visual inspection upon visual inspection does infact appear to be mean stationary.

adf.test(seattle_diff$sales_diff)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  seattle_diff$sales_diff
## Dickey-Fuller = -8.8536, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
kpss.test(seattle_diff$sales_diff)
## 
##  KPSS Test for Level Stationarity
## 
## data:  seattle_diff$sales_diff
## KPSS Level = 0.013639, Truncation lag parameter = 4, p-value = 0.1

The test results are in agreement with each other regarding stationarity post differencing of sales as the p value > 0.05 for the KPSS and <0.05 for the ADF tests for stationarity.

ACF and Partial ACF

acf(seattle_diff$sales_diff,lag.max=37,main = "Sales")

pacf(seattle_diff$sales_diff,lag.max=37, main = "Sales")

Through the ACF and PACF plots we can see that it must be an AR process with p >= 4 and MA > 0. By extending the lag.max to more than 3 years we can see that there may be yearly seasonality from the peaks at the 12,24th and 37th lag.

ARIMA Model

We now proceed with modelling the sales data.

ar_mod_aic<-auto.arima(seattle_sales$sales,stepwise = F,approximation = F,
                   max.p = 20, max.d = 20, max.q = 20, trace = T, ic = "aic")
## 
##  ARIMA(0,1,0)                    : 2677.872
##  ARIMA(0,1,0) with drift         : 2679.77
##  ARIMA(0,1,1)                    : 2679.694
##  ARIMA(0,1,1) with drift         : 2681.595
##  ARIMA(0,1,2)                    : 2680.58
##  ARIMA(0,1,2) with drift         : 2682.491
##  ARIMA(0,1,3)                    : 2678.176
##  ARIMA(0,1,3) with drift         : 2680.106
##  ARIMA(0,1,4)                    : 2663.666
##  ARIMA(0,1,4) with drift         : 2662.412
##  ARIMA(0,1,5)                    : 2652.04
##  ARIMA(0,1,5) with drift         : 2649.99
##  ARIMA(1,1,0)                    : 2679.672
##  ARIMA(1,1,0) with drift         : 2681.573
##  ARIMA(1,1,1)                    : 2681.508
##  ARIMA(1,1,1) with drift         : 2683.412
##  ARIMA(1,1,2)                    : 2681.974
##  ARIMA(1,1,2) with drift         : 2683.893
##  ARIMA(1,1,3)                    : 2679.672
##  ARIMA(1,1,3) with drift         : 2681.597
##  ARIMA(1,1,4)                    : 2655.839
##  ARIMA(1,1,4) with drift         : Inf
##  ARIMA(2,1,0)                    : 2681.012
##  ARIMA(2,1,0) with drift         : 2682.921
##  ARIMA(2,1,1)                    : 2682.966
##  ARIMA(2,1,1) with drift         : 2684.876
##  ARIMA(2,1,2)                    : Inf
##  ARIMA(2,1,2) with drift         : Inf
##  ARIMA(2,1,3)                    : Inf
##  ARIMA(2,1,3) with drift         : Inf
##  ARIMA(3,1,0)                    : 2682.553
##  ARIMA(3,1,0) with drift         : 2684.47
##  ARIMA(3,1,1)                    : 2683.218
##  ARIMA(3,1,1) with drift         : 2685.136
##  ARIMA(3,1,2)                    : Inf
##  ARIMA(3,1,2) with drift         : Inf
##  ARIMA(4,1,0)                    : 2673.199
##  ARIMA(4,1,0) with drift         : 2675.061
##  ARIMA(4,1,1)                    : 2633.279
##  ARIMA(4,1,1) with drift         : 2631.546
##  ARIMA(5,1,0)                    : 2657.442
##  ARIMA(5,1,0) with drift         : 2659.164
## 
## 
## 
##  Best model: ARIMA(4,1,1) with drift

With AIC as the information criterion as per which the model will be selected, we get an ARIMA(4,1,1) model with drift as the best model.

With BIC as the criterion for selection, the model expected should not be much different.

ar_mod_bic<-auto.arima(seattle_sales$sales,stepwise = F,approximation = F,
                   max.p = 20, max.d = 20, max.q = 20, trace = T, ic = "bic")
## 
##  ARIMA(0,1,0)                    : 2680.984
##  ARIMA(0,1,0) with drift         : 2685.994
##  ARIMA(0,1,1)                    : 2685.918
##  ARIMA(0,1,1) with drift         : 2690.931
##  ARIMA(0,1,2)                    : 2689.916
##  ARIMA(0,1,2) with drift         : 2694.939
##  ARIMA(0,1,3)                    : 2690.624
##  ARIMA(0,1,3) with drift         : 2695.666
##  ARIMA(0,1,4)                    : 2679.226
##  ARIMA(0,1,4) with drift         : 2681.084
##  ARIMA(0,1,5)                    : 2670.712
##  ARIMA(0,1,5) with drift         : 2671.774
##  ARIMA(1,1,0)                    : 2685.896
##  ARIMA(1,1,0) with drift         : 2690.909
##  ARIMA(1,1,1)                    : 2690.844
##  ARIMA(1,1,1) with drift         : 2695.86
##  ARIMA(1,1,2)                    : 2694.422
##  ARIMA(1,1,2) with drift         : 2699.453
##  ARIMA(1,1,3)                    : 2695.232
##  ARIMA(1,1,3) with drift         : 2700.269
##  ARIMA(1,1,4)                    : 2674.511
##  ARIMA(1,1,4) with drift         : Inf
##  ARIMA(2,1,0)                    : 2690.348
##  ARIMA(2,1,0) with drift         : 2695.369
##  ARIMA(2,1,1)                    : 2695.414
##  ARIMA(2,1,1) with drift         : 2700.436
##  ARIMA(2,1,2)                    : Inf
##  ARIMA(2,1,2) with drift         : Inf
##  ARIMA(2,1,3)                    : Inf
##  ARIMA(2,1,3) with drift         : Inf
##  ARIMA(3,1,0)                    : 2695.001
##  ARIMA(3,1,0) with drift         : 2700.03
##  ARIMA(3,1,1)                    : 2698.778
##  ARIMA(3,1,1) with drift         : 2703.808
##  ARIMA(3,1,2)                    : Inf
##  ARIMA(3,1,2) with drift         : Inf
##  ARIMA(4,1,0)                    : 2688.759
##  ARIMA(4,1,0) with drift         : 2693.732
##  ARIMA(4,1,1)                    : 2651.951
##  ARIMA(4,1,1) with drift         : 2653.33
##  ARIMA(5,1,0)                    : 2676.114
##  ARIMA(5,1,0) with drift         : 2680.948
## 
## 
## 
##  Best model: ARIMA(4,1,1)

We get the ARIMA(4,1,1) model as the best model when we set BIC as the criterion for selection.

arima_models = list(
  "AIC" = ar_mod_aic,
  "BIC" = ar_mod_bic
)
modelsummary(arima_models)
AIC BIC
ar1 0.665 0.656
(0.075) (0.075)
ar2 0.068 0.069
(0.088) (0.088)
ar3 0.025 0.027
(0.088) (0.087)
ar4 -0.397 -0.396
(0.073) (0.072)
ma1 -0.877 -0.846
(0.038) (0.038)
drift 22.171
(10.079)
Num.Obs. 166 166
AIC 2631.5 2633.3
BIC 2653.3 2652.0
RMSE 636.52 644.21
x 0.861 0.860

We will proceed with model with drift as our best model.

checkresiduals(ar_mod_aic)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(4,1,1) with drift
## Q* = 6.9708, df = 4, p-value = 0.1374
## 
## Model df: 6.   Total lags used: 10

The residuals seem to be white noise as well as normally distributed however while there seems to be no visible pattern or trend of the residual plot we can with the help of the acf plot of the residuals see tha there may still be some autocorrelation that is not captured by our simple ARIMA model. This yearly seasonality can be validated by increasing the lag and conducting the Ljung-Box test.

acf(ar_mod_aic$residuals,lag.max = 37)

There is autocorrelation among the residuals at every 12th lag and to address the same a sARIMA model may be a better choice.

resid = ar_mod_aic$residuals

for (i in c(1:12,24,36)){
  print(Box.test(resid,type='Ljung-Box',lag = i))
} 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 0.89524, df = 1, p-value = 0.3441
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 1.3282, df = 2, p-value = 0.5147
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 1.8049, df = 3, p-value = 0.6139
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 3.8881, df = 4, p-value = 0.4214
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 3.914, df = 5, p-value = 0.5619
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 4.1442, df = 6, p-value = 0.6572
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 4.2928, df = 7, p-value = 0.7455
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 5.3994, df = 8, p-value = 0.7142
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 5.4962, df = 9, p-value = 0.7891
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 6.9708, df = 10, p-value = 0.7282
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 8.433, df = 11, p-value = 0.6741
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 70.694, df = 12, p-value = 2.375e-10
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 126.53, df = 24, p-value = 6.661e-16
## 
## 
##  Box-Ljung test
## 
## data:  resid
## X-squared = 173.13, df = 36, p-value < 2.2e-16

The acf plot and the Ljung-Box test validate the fact that there may be some unmodelled yearly seasonality with our ARIMA model as the p values are only <0.05 for 12,24 and 36th lags.

Seasonal ARIMA

ar_mod_season<-Arima(seattle_sales$sales,order = c(4,1,1),include.drift = T,seasonal = list(order = c(12,0,0)))
arima_models = list(
  "with AIC" = ar_mod_aic,
  "with BIC" = ar_mod_bic,
  "with Seasonality" = ar_mod_season
)
modelsummary(arima_models)
with AIC with BIC with Seasonality
ar1 0.665 0.656 0.452
(0.075) (0.075) (0.226)
ar2 0.068 0.069 -0.229
(0.088) (0.088) (0.126)
ar3 0.025 0.027 0.165
(0.088) (0.087) (0.129)
ar4 -0.397 -0.396 -0.316
(0.073) (0.072) (0.122)
ma1 -0.877 -0.846 -0.436
(0.038) (0.038) (0.251)
drift 22.171 17.034
(10.079) (14.723)
sar1 -0.180
(0.118)
sar2 0.036
(0.118)
sar3 -0.136
(0.108)
sar4 -0.095
(0.113)
sar5 -0.122
(0.101)
sar6 -0.161
(0.088)
sar7 -0.168
(0.087)
sar8 -0.088
(0.090)
sar9 -0.027
(0.088)
sar10 -0.125
(0.086)
sar11 -0.081
(0.086)
sar12 0.602
(0.081)
Num.Obs. 166 166 166
AIC 2631.5 2633.3 2560.9
BIC 2653.3 2652.0 2620.1
RMSE 636.52 644.21 468.83
x 0.861 0.860

The AIC, BIC and RMSE have all decreased with the SARIMA model when we take yearly seasonality into account. However, the aim was to model the autocorrelation we had observed among the residuals.

checkresiduals(ar_mod_season)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(4,1,1) with drift
## Q* = 18.495, df = 3, p-value = 0.0003477
## 
## Model df: 18.   Total lags used: 21

The ACF plots for the residuals now show no significant degree of autocorrelation after accounting for yearly seasonality.

Model Fit

resid_season = ar_mod_season$residuals
pred =  seattle_sales$sales - resid
pred_season = seattle_sales$sales - resid_season

plot1 = ggplot() +
  geom_line(aes(seattle_sales$Month,seattle_sales$sales), alpha = 0.6) +
  geom_line(aes(seattle_sales$Month,pred_season), color = "blue", alpha = 0.6) +
  xlab("Year") + ylab("Sales (count)") + 
  ggtitle("Observed vs fitted values")
plot1

Here we can see that the sARIMA model is able to capture most of the variation in the data.

Comparing with a linear model

Linear Model

A linear model can provide a benchmark with which we can compare our model’s performance.

lin_mod<-lm(sales~Month,data = seattle_sales)
modelsummary(lin_mod)
Model 1
(Intercept) -9706.469
(989.598)
Month 0.875
(0.060)
Num.Obs. 167
R2 0.564
R2 Adj. 0.561
AIC 2827.5
BIC 2836.8
Log.Lik. -1410.728
F 213.520

A linear model does not capture any of the variation in the data.

RMSE_lin = sqrt(mean(((predict(lin_mod)) - seattle_sales$sales)^2,na.rm=T))
RMSE = sqrt(mean((pred_season - seattle_sales$sales)^2,na.rm=T))
RMSE_lin
## [1] 1128.378

The In-sample RMSE of the sARIMA model is 140.7% lower than the linear model, which is a very significant amount and shows that sARIMA model does well to capture the variation in the data.

seattle_sales %>% 
  ggplot()+
  geom_line(aes(Month,sales), col = "blue")+
  geom_line(aes(Month,predict(lin_mod)), col= "green") +
  geom_line(aes(Month,pred_season), color = "red", alpha = 0.6) +
  ggtitle("Comparison with a Linear model") + xlab("Year") + ylab("Sales (Count)")

Visually as well it is apparent that the sARIMA model significantly outperforms the simple linear model.

sARIMA only on train

To eventually compare the performance of the sARIMA model with the prophet model we will use a train and test split.

train_mod_season<-Arima(train$y,order = c(4,1,1),include.drift = T,seasonal = list(order = c(12,0,0)))
modelsummary(train_mod_season)
Model 1
ar1 0.439
(0.428)
ar2 -0.189
(0.135)
ar3 0.205
(0.149)
ar4 -0.222
(0.139)
ma1 -0.465
(0.498)
sar1 -0.206
(0.153)
sar2 0.013
(0.154)
sar3 -0.110
(0.133)
sar4 -0.123
(0.135)
sar5 -0.135
(0.130)
sar6 -0.164
(0.119)
sar7 -0.154
(0.115)
sar8 -0.128
(0.118)
sar9 -0.104
(0.119)
sar10 -0.148
(0.116)
sar11 -0.044
(0.117)
sar12 0.613
(0.108)
drift 17.478
(15.244)
Num.Obs. 154
AIC 2366.7
BIC 2424.4
RMSE 449.35

The sARIMA model for train is has an RMSE of 449.35 in-sample, it is lower than the model trained on the full data due to the smaller sample size of the data.

sARIMA out of sample

forecast(train_mod_season, h=12) %>% autoplot()

The plot here shows that the model is trying to replicate and almost copy the behaviour from a year prior.

sarima_forecast<-forecast(train_mod_season, h=12)
test %>% 
  ggplot()+
  geom_line(aes(ds,y), col = "blue")+
  geom_line(aes(ds,sarima_forecast$mean), col= "green") +
  ggtitle("sARIMA out of sample") + xlab("Year") + ylab("Sales (Count)")

Here we get a better look at the model’s performance w.r.t to the test sample and, we can see that the model does not do a very good job in predicting the test data.

RMSE_test_ar = rmse(test$y,sarima_forecast$mean)
MAPE_test_ar = mape(test$y,sarima_forecast$mean)
MAE_test_ar=mae(test$y,sarima_forecast$mean)
print(paste("RMSE:",round(RMSE_test_ar,2)))
## [1] "RMSE: 1162.68"
print(paste("MAPE:",round(MAPE_test_ar*100,2),"%"))
## [1] "MAPE: 16.64 %"
print(paste("MAE:",round(MAE_test_ar,2)))
## [1] "MAE: 997.06"

The RMSE value is ~3 times out of sample as in-sample indicating that the sARIMA model may have overfit on the training data. Other metric such as MAPE and MAE corroborate the same.

Modelling Sales with Prophet

model = prophet(train)
future = make_future_dataframe(model,periods = 12,freq = 'month')
forecast = predict(model,future)

dyplot.prophet(model,forecast)

The plot above shows that the prophet model does a very decent job in capturing the trend of the sales through the years and upon visual inspection there seems to be no need for any floors or caps to be set for the model.

Decomposition of the elements of the time series (trend, seasonality, etc.)

prophet_plot_components(model,forecast)

As a prophet model is an additive model it can capture any underlying trend and seasonality present in the time series. Since, the sales data is a monthly time series we can see the seasonality across the year.

The trend indicated here is in agreement with our earlier assessment that the sales have flattened in recent years.

The seasonality component shows an interesting pattern in the variation of sales. The sales seem to dip early in the year and peak around the summer.

Changepoints

plot(model,forecast)+add_changepoints_to_plot(model)+
  ylab("Sales")+xlab("Date")+theme_bw()

Here we can see the change points automatically identified during the modelling process by the prophet procedure. It has identified 9 change points that can be localized into 2 major segments of time where the change occurred.

Additive and multiplicative seasonality

Seasonality is an important component of the prophet procedure and from the plot above we were able to see the seasonality component in play. Although the seasonality here seems to be multiplicative in nature and not additive we will see that the performance here both in and out of sample is similar for both.

#seasonality
#additive
additive = prophet(train,seasonality.mode = 'additive')
add_fcst = predict(additive,future)
add_p<-plot(additive,add_fcst) + 
  ylab("Sales")+xlab("Date")+ ggtitle("Additive seasonality")+
  theme_bw()

#multiplicative
multi = prophet(train,seasonality.mode = 'multiplicative')
multi_fcst = predict(multi,future)
m_p<-plot(multi,multi_fcst) +
  ylab("Sales")+xlab("Date")+ ggtitle("Multiplicative seasonality")+
  theme_bw()
add_p + m_p

The above plots show us that the model with multiplicative seasonality performs similar to the model with additive seasonality.

# plot test with actual data
forecast_plot_data_add = add_fcst %>% 
  as_tibble() %>% 
  mutate(ds = as.Date(ds)) %>% 
  filter(ds>=ymd("2021-01-01"))
a_test<-ggplot()+
  geom_line(aes(test$ds,test$y))+
  geom_line(aes(forecast_plot_data_add$ds,forecast_plot_data_add$yhat),color='red')+
  ylab("Sales")+xlab("Date")+ ggtitle("Additive seasonality")+
  theme_bw()

# plot test with actual data
forecast_plot_data_multi = multi_fcst %>% 
  as_tibble() %>% 
  mutate(ds = as.Date(ds)) %>% 
  filter(ds>=ymd("2021-01-01"))
m_test<-ggplot()+
  geom_line(aes(test$ds,test$y))+
  geom_line(aes(forecast_plot_data_multi$ds,forecast_plot_data_multi$yhat),color='red')+
  ylab("Sales")+xlab("Date")+ ggtitle("Multiplicative seasonality")+
  theme_bw()

a_test+m_test

Out of sample as well the both models seem to be performing similarly.

Cross Validation

We can conduct rolling window cross validation with prophet as well.

df.cv <- cross_validation(model, initial = 731, period = 365, horizon = 365, units = "days")
head(df.cv)
## # A tibble: 6 x 6
##       y ds                   yhat yhat_lower yhat_upper cutoff             
##   <dbl> <dttm>              <dbl>      <dbl>      <dbl> <dttm>             
## 1  1733 2011-01-31 00:00:00 1194.       646.      1717. 2011-01-03 00:00:00
## 2  1794 2011-02-28 00:00:00  962.       372.      1461. 2011-01-03 00:00:00
## 3  2622 2011-03-31 00:00:00 2484.      1959.      3015. 2011-01-03 00:00:00
## 4  2641 2011-04-30 00:00:00 2990.      2447.      3531. 2011-01-03 00:00:00
## 5  2841 2011-05-31 00:00:00 3326.      2731.      3867. 2011-01-03 00:00:00
## 6  3329 2011-06-30 00:00:00 4036.      3480.      4610. 2011-01-03 00:00:00
df.cv %>% 
  ggplot()+
  geom_line(aes(ds,y)) +
  geom_point(aes(ds,yhat,color=factor(cutoff)))+
  theme_bw()+
  xlab("Date")+
  ylab("Home Sales(count)")+
  scale_color_discrete(name = 'Cutoff')

With cross-validation we can see that the model has instances of high error early on in the time series. These points appear to be located around the earlier identified change points. However, prophet seems to do a good enough job of correcting it as the cut-off increases.

Model performance

In-sample performance with cross validation

With prophet we can see RMSE through the various cutoffs in cross validation.

p1<-plot_cross_validation_metric(df.cv, metric = 'rmse')
p2<-plot_cross_validation_metric(df.cv,metric = 'mae')
p1+p2

The RMSE seems to increase slightly as the Horizon for cross validation increases but remains flat mostly. As the RMSE doesn’t vary significantly the model doesn’t necessarily perform much worse further in the future than it does earlier. The MAE plot corroborates this.

plot_cross_validation_metric(df.cv,metric = 'mape')

Plotting MAPE we can also say that the MAPE only seems to go down as the horizon increases. The plot reinforces that the model performance is not diminishing as the horizon increases.

Out-of-sample performance of original model

We can also get an understanding of how the model performs on the test data by looking at the same metrics.

#performance metrics
forecast_metric_data = forecast %>% 
  as_tibble() %>% 
  mutate(ds = as.Date(ds)) %>% 
  filter(ds>=ymd("2021-01-01"))
RMSE = sqrt(mean((test$y - forecast_metric_data$yhat)^2))
MAE = mean(abs(test$y - forecast_metric_data$yhat))
MAPE = mean(abs((test$y - forecast_metric_data$yhat)/test$y))
print(paste("RMSE:",round(RMSE,2)))
## [1] "RMSE: 471.91"
print(paste("MAE:",round(MAE,2)))
## [1] "MAE: 402.43"
print(paste("MAPE:",round(MAPE,2)))
## [1] "MAPE: 0.07"

The model can forecast sales to within 7% of the original data.

Out-of-sample performance

# plot test with actual data
forecast_plot_data = forecast %>% 
  as_tibble() %>% 
  mutate(ds = as.Date(ds)) %>% 
  filter(ds>=ymd("2021-01-01"))
forecast_not_scaled = ggplot()+
  geom_line(aes(test$ds,test$y))+
  geom_line(aes(forecast_plot_data$ds,forecast_plot_data$yhat),color='red') +
  ylab("Sales")+xlab("Date")+ ggtitle("Out of sample performance")+
  theme_bw()
forecast_scaled = forecast_not_scaled + 
  ylim(2000,10000)

forecast_scaled

Upon visual inspection, we can conclude that the best model out of all 3 models we have looked at so far the original prophet model seems to perform the best out of sample. It performs significantly better than the ARIMA model performed on the test sample.

Model Comparison between ARIMA and Prophet

tibble('Metric' = c("RMSE","MAPE","MAE"),
  'sARIMA' = round(c(RMSE_test_ar,MAPE_test_ar*100,MAE_test_ar),2),
         'Prophet' = round(c(RMSE,MAPE*100,MAE),2))
## # A tibble: 3 x 3
##   Metric sARIMA Prophet
##   <chr>   <dbl>   <dbl>
## 1 RMSE   1163.   472.  
## 2 MAPE     16.6    7.26
## 3 MAE     997.   402.

The above table illustrates the superiority of the prophet method here as it is able to capture the variation of the data much better than the sARIMA model does. Perhaps the addition of external regressors to map the change points would aid the ARIMA model and help it perform better out of sample. The Prophet model performs within the acceptable error limits and due to the nature of its modelling procedure it is able to capture the nuances present in the sales data without actually overfitting.