Forecasting with Prophet

Modelling Sales with Prophet

model = prophet(train)
future = make_future_dataframe(model,periods = 12,freq = 'month')
forecast = predict(model,future)

dyplot.prophet(model,forecast)

The plot above shows that the prophet model does a very decent job in capturing the trend of the sales through the years and upon visual inspection there seems to be no need for any floors or caps to be set for the model.

Decomposition of the elements of the time series (trend, seasonality, etc.)

prophet_plot_components(model,forecast)

As a prophet model is an additive model it can capture any underlying trend and seasonality present in the time series. Since, the sales data is a monthly time series we can see the seasonality across the year.

The trend indicated here is in agreement with our earlier assessment that the sales have flattened in recent years.

The seasonality component shows an interesting pattern in the variation of sales. The sales seem to dip early in the year and peak around the summer.

Changepoints

plot(model,forecast)+add_changepoints_to_plot(model)+
  ylab("Sales")+xlab("Date")+theme_bw()

Here we can see the change points automatically identified during the modelling process by the prophet procedure. It has identified 9 change points that can be localized into 2 major segments of time where the change occurred.

Additive and multiplicative seasonality

Seasonality is an important component of the prophet procedure and from the plot above we were able to see the seasonality component in play. Although the seasonality here seems to be multiplicative in nature and not additive we will see that the performance here both in and out of sample is similar for both.

#seasonality
#additive
additive = prophet(train,seasonality.mode = 'additive')
add_fcst = predict(additive,future)
add_p<-plot(additive,add_fcst) + 
  ylab("Sales")+xlab("Date")+ ggtitle("Additive seasonality")+
  theme_bw()

#multiplicative
multi = prophet(train,seasonality.mode = 'multiplicative')
multi_fcst = predict(multi,future)
m_p<-plot(multi,multi_fcst) +
  ylab("Sales")+xlab("Date")+ ggtitle("Multiplicative seasonality")+
  theme_bw()
add_p + m_p

The above plots show us that the model with multiplicative seasonality performs similar to the model with additive seasonality.

# plot test with actual data
forecast_plot_data_add = add_fcst %>% 
  as_tibble() %>% 
  mutate(ds = as.Date(ds)) %>% 
  filter(ds>=ymd("2021-01-01"))
a_test<-ggplot()+
  geom_line(aes(test$ds,test$y))+
  geom_line(aes(forecast_plot_data_add$ds,forecast_plot_data_add$yhat),color='red')+
  ylab("Sales")+xlab("Date")+ ggtitle("Additive seasonality")+
  theme_bw()

# plot test with actual data
forecast_plot_data_multi = multi_fcst %>% 
  as_tibble() %>% 
  mutate(ds = as.Date(ds)) %>% 
  filter(ds>=ymd("2021-01-01"))
m_test<-ggplot()+
  geom_line(aes(test$ds,test$y))+
  geom_line(aes(forecast_plot_data_multi$ds,forecast_plot_data_multi$yhat),color='red')+
  ylab("Sales")+xlab("Date")+ ggtitle("Multiplicative seasonality")+
  theme_bw()

a_test+m_test

Out of sample as well the both models seem to be performing similarly.

Cross Validation

We can conduct rolling window cross validation with prophet as well.

df.cv <- cross_validation(model, initial = 731, period = 365, horizon = 365, units = "days")
head(df.cv)

## # A tibble: 6 x 6
##       y ds                   yhat yhat_lower yhat_upper cutoff             
##   <dbl> <dttm>              <dbl>      <dbl>      <dbl> <dttm>             
## 1  1733 2011-01-31 00:00:00 1194.       666.      1758. 2011-01-03 00:00:00
## 2  1794 2011-02-28 00:00:00  962.       400.      1475. 2011-01-03 00:00:00
## 3  2622 2011-03-31 00:00:00 2484.      1978.      3034. 2011-01-03 00:00:00
## 4  2641 2011-04-30 00:00:00 2990.      2445.      3545. 2011-01-03 00:00:00
## 5  2841 2011-05-31 00:00:00 3326.      2802.      3902. 2011-01-03 00:00:00
## 6  3329 2011-06-30 00:00:00 4036.      3508.      4608. 2011-01-03 00:00:00

df.cv %>% 
  ggplot()+
  geom_line(aes(ds,y)) +
  geom_point(aes(ds,yhat,color=factor(cutoff)))+
  theme_bw()+
  xlab("Date")+
  ylab("Home Sales(count)")+
  scale_color_discrete(name = 'Cutoff')

With cross-validation we can see that the model has instances of high error early on in the time series. These points appear to be located around the earlier identified change points. However, prophet seems to do a good enough job of correcting it as the cut-off increases.

Model performance

In-sample performance with cross validation

With prophet we can see RMSE through the various cutoffs in cross validation.

p1<-plot_cross_validation_metric(df.cv, metric = 'rmse')
p2<-plot_cross_validation_metric(df.cv,metric = 'mae')
p1+p2

The RMSE seems to increase slightly as the Horizon for cross validation increases but remains flat mostly. As the RMSE doesn’t vary significantly the model doesn’t necessarily perform much worse further in the future than it does earlier. The MAE plot corroborates this.

plot_cross_validation_metric(df.cv,metric = 'mape')

Plotting MAPE we can also say that the MAPE only seems to go down as the horizon increases. The plot reinforces that the model performance is not diminishing as the horizon increases.

Out-of-sample performance of original model

We can also get an understanding of how the model performs on the test data by looking at the same metrics.

#performance metrics
forecast_metric_data = forecast %>% 
  as_tibble() %>% 
  mutate(ds = as.Date(ds)) %>% 
  filter(ds>=ymd("2021-01-01"))
RMSE = sqrt(mean((test$y - forecast_metric_data$yhat)^2))
MAE = mean(abs(test$y - forecast_metric_data$yhat))
MAPE = mean(abs((test$y - forecast_metric_data$yhat)/test$y))
print(paste("RMSE:",round(RMSE,2)))

## [1] "RMSE: 471.91"

print(paste("MAE:",round(MAE,2)))

## [1] "MAE: 402.43"

print(paste("MAPE:",round(MAPE,2)))

## [1] "MAPE: 0.07"

The model can forecast sales to within 7% of the original data.

Out-of-sample performance

# plot test with actual data
forecast_plot_data = forecast %>% 
  as_tibble() %>% 
  mutate(ds = as.Date(ds)) %>% 
  filter(ds>=ymd("2021-01-01"))
forecast_not_scaled = ggplot()+
  geom_line(aes(test$ds,test$y))+
  geom_line(aes(forecast_plot_data$ds,forecast_plot_data$yhat),color='red') +
  ylab("Sales")+xlab("Date")+ ggtitle("Out of sample performance")+
  theme_bw()
forecast_scaled = forecast_not_scaled + 
  ylim(2000,10000)

forecast_scaled

Upon visual inspection, we can conclude that the best model out of all 3 models we have looked at so far the original prophet model seems to perform the best out of sample.