‘Zillow Housing Data’ here refers to the Sales Count collected monthly. The data is publicly available and provided by Zillow.
The US housing market as seen from Bubble bursting in the late 2000’s showed us that it was a linchpin in the US Economy. Understanding its volatility and being able to forecast it can be a tricky proposition even with first hand quality data as Zillow themselves learnt.
#pivot sales data
sales_pivot<-sales %>%
pivot_longer(cols = 6:ncol(sales),
names_to = "Time",
values_to = "sales",
values_drop_na = TRUE) %>%
mutate(Month = ymd(Time)) %>%
select(RegionName,Month, sales)
#extract sales for seattle
seattle_sales <- sales_pivot %>%
filter(RegionName == "Seattle, WA") %>%
select(Month, sales)
# prophet requirements
prophet_data = seattle_sales %>%
rename(ds = Month, # Have to name our date variable "ds"
y = sales) # Have to name our time series "y"
# split data into train and test
train = prophet_data %>%
filter(ds<ymd("2021-01-01"))
test = prophet_data %>%
filter(ds>=ymd("2021-01-01"))
tbl_summary(seattle_sales)
| Characteristic | N = 1671 |
|---|---|
| Month | 2008-02-29 to 2021-12-31 |
| sales | 4,497 (3,326, 6,060) |
|
1
Range; Median (IQR)
|
|
seattle_sales %>% ggplot(aes(x = sales))+
geom_histogram(bins =20, aes(y = ..density..)) +
geom_density(col = "blue", lwd = 1) +
ylab("Density") + xlab("Sales(count)") +
ggtitle("Distribution of sales")
The distribution of sales appears to be bimodal and, upon looking at the trend through time it can be explained by the change in the nature of the data generating process and the rising trend from 2012-2016.
Looking at the plot of the sales through time we can see that there have been a few changes in the trend of home sales in the United states. More noticeably we can see that sales seem to have flattened in the recent years. Some of the change can be explained as a consequence of COVID - 19 however, the flattening appears to have begun before 2017. Also, there may be multiplicative seasonality present here as the peaks are getting taller and troughs are getting deeper with time.
ggplot(seattle_sales,aes(x = Month, y = sales)) +
geom_line(color = "blue") +
geom_vline(aes(xintercept=ymd("2012-01-31")),color='red') +
geom_vline(aes(xintercept=ymd("2016-01-31")),color='red') +
xlab("Year") + ylab("Sales (Count)") +
ggtitle("Home Sales through the years") +theme_bw()
Upon further visual inspection we can also see that the data is not mean stationary or variance stationary. We will confirm the same with the help of ADF and KPSS tests.
adf.test(seattle_sales$sales)
##
## Augmented Dickey-Fuller Test
##
## data: seattle_sales$sales
## Dickey-Fuller = -4.8812, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
We observe that the p-value is <0.05, this indicates stationarity.
kpss.test(seattle_sales$sales)
##
## KPSS Test for Level Stationarity
##
## data: seattle_sales$sales
## KPSS Level = 2.4902, Truncation lag parameter = 4, p-value = 0.01
Here as well the p-value is observed to be <0.05, but as the null hypothesis for the KPSS test is the opposite it indicates non-stationarity.
We take the first order difference of the sales and test the same to get a more convincing answer regarding the stationarity of the sales
seattle_diff = seattle_sales %>%
mutate(sales_diff = sales - lag(sales))
seattle_diff = seattle_diff %>%
filter(!is.na(sales_diff))
seattle_diff %>%
ggplot()+
geom_line(aes(Month,sales_diff), col = "blue") +
ggtitle("Sales First Difference") +
ylab("Sales (difference)")
The first order difference upon visual inspection upon visual inspection does infact appear to be mean stationary.
adf.test(seattle_diff$sales_diff)
##
## Augmented Dickey-Fuller Test
##
## data: seattle_diff$sales_diff
## Dickey-Fuller = -8.8536, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
kpss.test(seattle_diff$sales_diff)
##
## KPSS Test for Level Stationarity
##
## data: seattle_diff$sales_diff
## KPSS Level = 0.013639, Truncation lag parameter = 4, p-value = 0.1
The test results are in agreement with each other regarding stationarity post differencing of sales as the p value > 0.05 for the KPSS and <0.05 for the ADF tests for stationarity.
acf(seattle_diff$sales_diff,lag.max=37,main = "Sales")
pacf(seattle_diff$sales_diff,lag.max=37, main = "Sales")
Through the ACF and PACF plots we can see that it must be an AR process with p >= 4 and MA > 0. By extending the lag.max to more than 3 years we can see that there may be yearly seasonality from the peaks at the 12,24th and 37th lag.
We now proceed with modelling the sales data.
ar_mod_aic<-auto.arima(seattle_sales$sales,stepwise = F,approximation = F,
max.p = 20, max.d = 20, max.q = 20, trace = T, ic = "aic")
##
## ARIMA(0,1,0) : 2677.872
## ARIMA(0,1,0) with drift : 2679.77
## ARIMA(0,1,1) : 2679.694
## ARIMA(0,1,1) with drift : 2681.595
## ARIMA(0,1,2) : 2680.58
## ARIMA(0,1,2) with drift : 2682.491
## ARIMA(0,1,3) : 2678.176
## ARIMA(0,1,3) with drift : 2680.106
## ARIMA(0,1,4) : 2663.666
## ARIMA(0,1,4) with drift : 2662.412
## ARIMA(0,1,5) : 2652.04
## ARIMA(0,1,5) with drift : 2649.99
## ARIMA(1,1,0) : 2679.672
## ARIMA(1,1,0) with drift : 2681.573
## ARIMA(1,1,1) : 2681.508
## ARIMA(1,1,1) with drift : 2683.412
## ARIMA(1,1,2) : 2681.974
## ARIMA(1,1,2) with drift : 2683.893
## ARIMA(1,1,3) : 2679.672
## ARIMA(1,1,3) with drift : 2681.597
## ARIMA(1,1,4) : 2655.839
## ARIMA(1,1,4) with drift : Inf
## ARIMA(2,1,0) : 2681.012
## ARIMA(2,1,0) with drift : 2682.921
## ARIMA(2,1,1) : 2682.966
## ARIMA(2,1,1) with drift : 2684.876
## ARIMA(2,1,2) : Inf
## ARIMA(2,1,2) with drift : Inf
## ARIMA(2,1,3) : Inf
## ARIMA(2,1,3) with drift : Inf
## ARIMA(3,1,0) : 2682.553
## ARIMA(3,1,0) with drift : 2684.47
## ARIMA(3,1,1) : 2683.218
## ARIMA(3,1,1) with drift : 2685.136
## ARIMA(3,1,2) : Inf
## ARIMA(3,1,2) with drift : Inf
## ARIMA(4,1,0) : 2673.199
## ARIMA(4,1,0) with drift : 2675.061
## ARIMA(4,1,1) : 2633.279
## ARIMA(4,1,1) with drift : 2631.546
## ARIMA(5,1,0) : 2657.442
## ARIMA(5,1,0) with drift : 2659.164
##
##
##
## Best model: ARIMA(4,1,1) with drift
With AIC as the information criterion as per which the model will be selected, we get an ARIMA(4,1,1) model with drift as the best model.
With BIC as the criterion for selection, the model expected should not be much different.
ar_mod_bic<-auto.arima(seattle_sales$sales,stepwise = F,approximation = F,
max.p = 20, max.d = 20, max.q = 20, trace = T, ic = "bic")
##
## ARIMA(0,1,0) : 2680.984
## ARIMA(0,1,0) with drift : 2685.994
## ARIMA(0,1,1) : 2685.918
## ARIMA(0,1,1) with drift : 2690.931
## ARIMA(0,1,2) : 2689.916
## ARIMA(0,1,2) with drift : 2694.939
## ARIMA(0,1,3) : 2690.624
## ARIMA(0,1,3) with drift : 2695.666
## ARIMA(0,1,4) : 2679.226
## ARIMA(0,1,4) with drift : 2681.084
## ARIMA(0,1,5) : 2670.712
## ARIMA(0,1,5) with drift : 2671.774
## ARIMA(1,1,0) : 2685.896
## ARIMA(1,1,0) with drift : 2690.909
## ARIMA(1,1,1) : 2690.844
## ARIMA(1,1,1) with drift : 2695.86
## ARIMA(1,1,2) : 2694.422
## ARIMA(1,1,2) with drift : 2699.453
## ARIMA(1,1,3) : 2695.232
## ARIMA(1,1,3) with drift : 2700.269
## ARIMA(1,1,4) : 2674.511
## ARIMA(1,1,4) with drift : Inf
## ARIMA(2,1,0) : 2690.348
## ARIMA(2,1,0) with drift : 2695.369
## ARIMA(2,1,1) : 2695.414
## ARIMA(2,1,1) with drift : 2700.436
## ARIMA(2,1,2) : Inf
## ARIMA(2,1,2) with drift : Inf
## ARIMA(2,1,3) : Inf
## ARIMA(2,1,3) with drift : Inf
## ARIMA(3,1,0) : 2695.001
## ARIMA(3,1,0) with drift : 2700.03
## ARIMA(3,1,1) : 2698.778
## ARIMA(3,1,1) with drift : 2703.808
## ARIMA(3,1,2) : Inf
## ARIMA(3,1,2) with drift : Inf
## ARIMA(4,1,0) : 2688.759
## ARIMA(4,1,0) with drift : 2693.732
## ARIMA(4,1,1) : 2651.951
## ARIMA(4,1,1) with drift : 2653.33
## ARIMA(5,1,0) : 2676.114
## ARIMA(5,1,0) with drift : 2680.948
##
##
##
## Best model: ARIMA(4,1,1)
We get the ARIMA(4,1,1) model as the best model when we set BIC as the criterion for selection.
arima_models = list(
"AIC" = ar_mod_aic,
"BIC" = ar_mod_bic
)
modelsummary(arima_models)
| AIC | BIC | |
|---|---|---|
| ar1 | 0.665 | 0.656 |
| (0.075) | (0.075) | |
| ar2 | 0.068 | 0.069 |
| (0.088) | (0.088) | |
| ar3 | 0.025 | 0.027 |
| (0.088) | (0.087) | |
| ar4 | -0.397 | -0.396 |
| (0.073) | (0.072) | |
| ma1 | -0.877 | -0.846 |
| (0.038) | (0.038) | |
| drift | 22.171 | |
| (10.079) | ||
| Num.Obs. | 166 | 166 |
| AIC | 2631.5 | 2633.3 |
| BIC | 2653.3 | 2652.0 |
| RMSE | 636.52 | 644.21 |
| x | 0.861 | 0.860 |
We will proceed with model with drift as our best model.
checkresiduals(ar_mod_aic)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,1,1) with drift
## Q* = 6.9708, df = 4, p-value = 0.1374
##
## Model df: 6. Total lags used: 10
The residuals seem to be white noise as well as normally distributed however while there seems to be no visible pattern or trend of the residual plot we can with the help of the acf plot of the residuals see tha there may still be some autocorrelation that is not captured by our simple ARIMA model. This yearly seasonality can be validated by increasing the lag and conducting the Ljung-Box test.
acf(ar_mod_aic$residuals,lag.max = 37)
There is autocorrelation among the residuals at every 12th lag and to address the same a sARIMA model may be a better choice.
resid = ar_mod_aic$residuals
for (i in c(1:12,24,36)){
print(Box.test(resid,type='Ljung-Box',lag = i))
}
##
## Box-Ljung test
##
## data: resid
## X-squared = 0.89524, df = 1, p-value = 0.3441
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 1.3282, df = 2, p-value = 0.5147
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 1.8049, df = 3, p-value = 0.6139
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 3.8881, df = 4, p-value = 0.4214
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 3.914, df = 5, p-value = 0.5619
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 4.1442, df = 6, p-value = 0.6572
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 4.2928, df = 7, p-value = 0.7455
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 5.3994, df = 8, p-value = 0.7142
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 5.4962, df = 9, p-value = 0.7891
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 6.9708, df = 10, p-value = 0.7282
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 8.433, df = 11, p-value = 0.6741
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 70.694, df = 12, p-value = 2.375e-10
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 126.53, df = 24, p-value = 6.661e-16
##
##
## Box-Ljung test
##
## data: resid
## X-squared = 173.13, df = 36, p-value < 2.2e-16
The acf plot and the Ljung-Box test validate the fact that there may be some unmodelled yearly seasonality with our ARIMA model as the p values are only <0.05 for 12,24 and 36th lags.
ar_mod_season<-Arima(seattle_sales$sales,order = c(4,1,1),include.drift = T,seasonal = list(order = c(12,0,0)))
arima_models = list(
"with AIC" = ar_mod_aic,
"with BIC" = ar_mod_bic,
"with Seasonality" = ar_mod_season
)
modelsummary(arima_models)
| with AIC | with BIC | with Seasonality | |
|---|---|---|---|
| ar1 | 0.665 | 0.656 | 0.452 |
| (0.075) | (0.075) | (0.226) | |
| ar2 | 0.068 | 0.069 | -0.229 |
| (0.088) | (0.088) | (0.126) | |
| ar3 | 0.025 | 0.027 | 0.165 |
| (0.088) | (0.087) | (0.129) | |
| ar4 | -0.397 | -0.396 | -0.316 |
| (0.073) | (0.072) | (0.122) | |
| ma1 | -0.877 | -0.846 | -0.436 |
| (0.038) | (0.038) | (0.251) | |
| drift | 22.171 | 17.034 | |
| (10.079) | (14.723) | ||
| sar1 | -0.180 | ||
| (0.118) | |||
| sar2 | 0.036 | ||
| (0.118) | |||
| sar3 | -0.136 | ||
| (0.108) | |||
| sar4 | -0.095 | ||
| (0.113) | |||
| sar5 | -0.122 | ||
| (0.101) | |||
| sar6 | -0.161 | ||
| (0.088) | |||
| sar7 | -0.168 | ||
| (0.087) | |||
| sar8 | -0.088 | ||
| (0.090) | |||
| sar9 | -0.027 | ||
| (0.088) | |||
| sar10 | -0.125 | ||
| (0.086) | |||
| sar11 | -0.081 | ||
| (0.086) | |||
| sar12 | 0.602 | ||
| (0.081) | |||
| Num.Obs. | 166 | 166 | 166 |
| AIC | 2631.5 | 2633.3 | 2560.9 |
| BIC | 2653.3 | 2652.0 | 2620.1 |
| RMSE | 636.52 | 644.21 | 468.83 |
| x | 0.861 | 0.860 |
The AIC, BIC and RMSE have all decreased with the SARIMA model when we take yearly seasonality into account. However, the aim was to model the autocorrelation we had observed among the residuals.
checkresiduals(ar_mod_season)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,1,1) with drift
## Q* = 18.495, df = 3, p-value = 0.0003477
##
## Model df: 18. Total lags used: 21
The ACF plots for the residuals now show no significant degree of autocorrelation after accounting for yearly seasonality.
resid_season = ar_mod_season$residuals
pred = seattle_sales$sales - resid
pred_season = seattle_sales$sales - resid_season
plot1 = ggplot() +
geom_line(aes(seattle_sales$Month,seattle_sales$sales), alpha = 0.6) +
geom_line(aes(seattle_sales$Month,pred_season), color = "blue", alpha = 0.6) +
xlab("Year") + ylab("Sales (count)") +
ggtitle("Observed vs fitted values")
plot1
Here we can see that the sARIMA model is able to capture most of the variation in the data.
A linear model can provide a benchmark with which we can compare our model’s performance.
lin_mod<-lm(sales~Month,data = seattle_sales)
modelsummary(lin_mod)
| Model 1 | |
|---|---|
| (Intercept) | -9706.469 |
| (989.598) | |
| Month | 0.875 |
| (0.060) | |
| Num.Obs. | 167 |
| R2 | 0.564 |
| R2 Adj. | 0.561 |
| AIC | 2827.5 |
| BIC | 2836.8 |
| Log.Lik. | -1410.728 |
| F | 213.520 |
A linear model does not capture any of the variation in the data.
RMSE_lin = sqrt(mean(((predict(lin_mod)) - seattle_sales$sales)^2,na.rm=T))
RMSE = sqrt(mean((pred_season - seattle_sales$sales)^2,na.rm=T))
RMSE_lin
## [1] 1128.378
The In-sample RMSE of the sARIMA model is 140.7% lower than the linear model, which is a very significant amount and shows that sARIMA model does well to capture the variation in the data.
seattle_sales %>%
ggplot()+
geom_line(aes(Month,sales), col = "blue")+
geom_line(aes(Month,predict(lin_mod)), col= "green") +
geom_line(aes(Month,pred_season), color = "red", alpha = 0.6) +
ggtitle("Comparison with a Linear model") + xlab("Year") + ylab("Sales (Count)")
Visually as well it is apparent that the sARIMA model significantly outperforms the simple linear model.
To eventually compare the performance of the sARIMA model with the prophet model we will use a train and test split.
train_mod_season<-Arima(train$y,order = c(4,1,1),include.drift = T,seasonal = list(order = c(12,0,0)))
modelsummary(train_mod_season)
| Model 1 | |
|---|---|
| ar1 | 0.439 |
| (0.428) | |
| ar2 | -0.189 |
| (0.135) | |
| ar3 | 0.205 |
| (0.149) | |
| ar4 | -0.222 |
| (0.139) | |
| ma1 | -0.465 |
| (0.498) | |
| sar1 | -0.206 |
| (0.153) | |
| sar2 | 0.013 |
| (0.154) | |
| sar3 | -0.110 |
| (0.133) | |
| sar4 | -0.123 |
| (0.135) | |
| sar5 | -0.135 |
| (0.130) | |
| sar6 | -0.164 |
| (0.119) | |
| sar7 | -0.154 |
| (0.115) | |
| sar8 | -0.128 |
| (0.118) | |
| sar9 | -0.104 |
| (0.119) | |
| sar10 | -0.148 |
| (0.116) | |
| sar11 | -0.044 |
| (0.117) | |
| sar12 | 0.613 |
| (0.108) | |
| drift | 17.478 |
| (15.244) | |
| Num.Obs. | 154 |
| AIC | 2366.7 |
| BIC | 2424.4 |
| RMSE | 449.35 |
The sARIMA model for train is has an RMSE of 449.35 in-sample, it is lower than the model trained on the full data due to the smaller sample size of the data.
forecast(train_mod_season, h=12) %>% autoplot()
The plot here shows that the model is trying to replicate and almost copy the behaviour from a year prior.
sarima_forecast<-forecast(train_mod_season, h=12)
test %>%
ggplot()+
geom_line(aes(ds,y), col = "blue")+
geom_line(aes(ds,sarima_forecast$mean), col= "green") +
ggtitle("sARIMA out of sample") + xlab("Year") + ylab("Sales (Count)")
Here we get a better look at the model’s performance w.r.t to the test sample and, we can see that the model does not do a very good job in predicting the test data.
RMSE_test_ar = rmse(test$y,sarima_forecast$mean)
MAPE_test_ar = mape(test$y,sarima_forecast$mean)
MAE_test_ar=mae(test$y,sarima_forecast$mean)
print(paste("RMSE:",round(RMSE_test_ar,2)))
## [1] "RMSE: 1162.68"
print(paste("MAPE:",round(MAPE_test_ar*100,2),"%"))
## [1] "MAPE: 16.64 %"
print(paste("MAE:",round(MAE_test_ar,2)))
## [1] "MAE: 997.06"
The RMSE value is ~3 times out of sample as in-sample indicating that the sARIMA model may have overfit on the training data. Other metric such as MAPE and MAE corroborate the same.
model = prophet(train)
future = make_future_dataframe(model,periods = 12,freq = 'month')
forecast = predict(model,future)
dyplot.prophet(model,forecast)
The plot above shows that the prophet model does a very decent job in capturing the trend of the sales through the years and upon visual inspection there seems to be no need for any floors or caps to be set for the model.
prophet_plot_components(model,forecast)
As a prophet model is an additive model it can capture any underlying trend and seasonality present in the time series. Since, the sales data is a monthly time series we can see the seasonality across the year.
The trend indicated here is in agreement with our earlier assessment that the sales have flattened in recent years.
The seasonality component shows an interesting pattern in the variation of sales. The sales seem to dip early in the year and peak around the summer.
plot(model,forecast)+add_changepoints_to_plot(model)+
ylab("Sales")+xlab("Date")+theme_bw()
Here we can see the change points automatically identified during the modelling process by the prophet procedure. It has identified 9 change points that can be localized into 2 major segments of time where the change occurred.
Seasonality is an important component of the prophet procedure and from the plot above we were able to see the seasonality component in play. Although the seasonality here seems to be multiplicative in nature and not additive we will see that the performance here both in and out of sample is similar for both.
#seasonality
#additive
additive = prophet(train,seasonality.mode = 'additive')
add_fcst = predict(additive,future)
add_p<-plot(additive,add_fcst) +
ylab("Sales")+xlab("Date")+ ggtitle("Additive seasonality")+
theme_bw()
#multiplicative
multi = prophet(train,seasonality.mode = 'multiplicative')
multi_fcst = predict(multi,future)
m_p<-plot(multi,multi_fcst) +
ylab("Sales")+xlab("Date")+ ggtitle("Multiplicative seasonality")+
theme_bw()
add_p + m_p
The above plots show us that the model with multiplicative seasonality performs similar to the model with additive seasonality.
# plot test with actual data
forecast_plot_data_add = add_fcst %>%
as_tibble() %>%
mutate(ds = as.Date(ds)) %>%
filter(ds>=ymd("2021-01-01"))
a_test<-ggplot()+
geom_line(aes(test$ds,test$y))+
geom_line(aes(forecast_plot_data_add$ds,forecast_plot_data_add$yhat),color='red')+
ylab("Sales")+xlab("Date")+ ggtitle("Additive seasonality")+
theme_bw()
# plot test with actual data
forecast_plot_data_multi = multi_fcst %>%
as_tibble() %>%
mutate(ds = as.Date(ds)) %>%
filter(ds>=ymd("2021-01-01"))
m_test<-ggplot()+
geom_line(aes(test$ds,test$y))+
geom_line(aes(forecast_plot_data_multi$ds,forecast_plot_data_multi$yhat),color='red')+
ylab("Sales")+xlab("Date")+ ggtitle("Multiplicative seasonality")+
theme_bw()
a_test+m_test
Out of sample as well the both models seem to be performing similarly.
We can conduct rolling window cross validation with prophet as well.
df.cv <- cross_validation(model, initial = 731, period = 365, horizon = 365, units = "days")
head(df.cv)
## # A tibble: 6 x 6
## y ds yhat yhat_lower yhat_upper cutoff
## <dbl> <dttm> <dbl> <dbl> <dbl> <dttm>
## 1 1733 2011-01-31 00:00:00 1194. 646. 1717. 2011-01-03 00:00:00
## 2 1794 2011-02-28 00:00:00 962. 372. 1461. 2011-01-03 00:00:00
## 3 2622 2011-03-31 00:00:00 2484. 1959. 3015. 2011-01-03 00:00:00
## 4 2641 2011-04-30 00:00:00 2990. 2447. 3531. 2011-01-03 00:00:00
## 5 2841 2011-05-31 00:00:00 3326. 2731. 3867. 2011-01-03 00:00:00
## 6 3329 2011-06-30 00:00:00 4036. 3480. 4610. 2011-01-03 00:00:00
df.cv %>%
ggplot()+
geom_line(aes(ds,y)) +
geom_point(aes(ds,yhat,color=factor(cutoff)))+
theme_bw()+
xlab("Date")+
ylab("Home Sales(count)")+
scale_color_discrete(name = 'Cutoff')
With cross-validation we can see that the model has instances of high error early on in the time series. These points appear to be located around the earlier identified change points. However, prophet seems to do a good enough job of correcting it as the cut-off increases.
With prophet we can see RMSE through the various cutoffs in cross validation.
p1<-plot_cross_validation_metric(df.cv, metric = 'rmse')
p2<-plot_cross_validation_metric(df.cv,metric = 'mae')
p1+p2
The RMSE seems to increase slightly as the Horizon for cross validation increases but remains flat mostly. As the RMSE doesn’t vary significantly the model doesn’t necessarily perform much worse further in the future than it does earlier. The MAE plot corroborates this.
plot_cross_validation_metric(df.cv,metric = 'mape')
Plotting MAPE we can also say that the MAPE only seems to go down as the horizon increases. The plot reinforces that the model performance is not diminishing as the horizon increases.
We can also get an understanding of how the model performs on the test data by looking at the same metrics.
#performance metrics
forecast_metric_data = forecast %>%
as_tibble() %>%
mutate(ds = as.Date(ds)) %>%
filter(ds>=ymd("2021-01-01"))
RMSE = sqrt(mean((test$y - forecast_metric_data$yhat)^2))
MAE = mean(abs(test$y - forecast_metric_data$yhat))
MAPE = mean(abs((test$y - forecast_metric_data$yhat)/test$y))
print(paste("RMSE:",round(RMSE,2)))
## [1] "RMSE: 471.91"
print(paste("MAE:",round(MAE,2)))
## [1] "MAE: 402.43"
print(paste("MAPE:",round(MAPE,2)))
## [1] "MAPE: 0.07"
The model can forecast sales to within 7% of the original data.
# plot test with actual data
forecast_plot_data = forecast %>%
as_tibble() %>%
mutate(ds = as.Date(ds)) %>%
filter(ds>=ymd("2021-01-01"))
forecast_not_scaled = ggplot()+
geom_line(aes(test$ds,test$y))+
geom_line(aes(forecast_plot_data$ds,forecast_plot_data$yhat),color='red') +
ylab("Sales")+xlab("Date")+ ggtitle("Out of sample performance")+
theme_bw()
forecast_scaled = forecast_not_scaled +
ylim(2000,10000)
forecast_scaled
Upon visual inspection, we can conclude that the best model out of all 3 models we have looked at so far the original prophet model seems to perform the best out of sample. It performs significantly better than the ARIMA model performed on the test sample.
tibble('Metric' = c("RMSE","MAPE","MAE"),
'sARIMA' = round(c(RMSE_test_ar,MAPE_test_ar*100,MAE_test_ar),2),
'Prophet' = round(c(RMSE,MAPE*100,MAE),2))
## # A tibble: 3 x 3
## Metric sARIMA Prophet
## <chr> <dbl> <dbl>
## 1 RMSE 1163. 472.
## 2 MAPE 16.6 7.26
## 3 MAE 997. 402.
The above table illustrates the superiority of the prophet method here as it is able to capture the variation of the data much better than the sARIMA model does. Perhaps the addition of external regressors to map the change points would aid the ARIMA model and help it perform better out of sample. The Prophet model performs within the acceptable error limits and due to the nature of its modelling procedure it is able to capture the nuances present in the sales data without actually overfitting.