Probabilistic programming enables domain experts to build probabilistic reasoning tools in the language of their own domain. This is a contrast to other machine learning approaches, where often require a procrustean effort to reframe a domain-specific problem into the input format of the chosen machine learning method.
The idea is to take domain abstractions and build them in to programmable model components. The programmer can assemble these components like Legos to build a model that addresses decision problems in her domain. Once she connects the components, the inference algorithm is generated automatically, allowing her to prototype models quickly. Since the model is written in the language of the domain, it is easily understood by and shared with others working in the domain (who might not be ML experts).
bsts (stands for Bayesian structural time series, see Scott and Varian (2014)) is an excellent case study from the forecasting domain. bsts is an open source R package from Google. It enables one to assemble a time series model from components familiar to forecasters, namely periodic and seasonal trends, as as regression components that capture correlation between different time series.
The bsts package contains the initial.claims dataset, which contains a weekly time series of US initial claims for unemployment from Federal Reserve Economic Data.
library(tidyverse, quietly = TRUE)
library(bsts, quietly = TRUE)
data(iclaims)
.data <- initial.claims
claims <- .data$iclaimsNSA
plot(claims, ylab = "")
The mathematical model behind the approach :
Start with an empty list of model components.
(model_components <- list())
## list()
Next I add a trend component. There are many choices.
?AddAr
?AddAutoAr
?AddLocalLevel
?AddLocalLinearTrend
?AddStudentLocalLinearTrend
?AddGeneralizedLocalLinearTrend
Of many choices available, I choose a local linear trend. One can think of a linear trend as a regression relationship between the time series and time. You expect that for a change in time \(\Delta t\), the response should change by \(\beta_1 * \Delta t\).
The AddLocalLinearTrend function adds the local linear trend component to the component list. The various values of the component are set to their defaults.
summary(model_components <- AddLocalLinearTrend(model_components,
y = claims))
## Length Class Mode
## [1,] 6 LocalLinearTrend list
Seasonal components capture periodicity in the data. Choices include;
?AddTrig # Trigonometric seasonal
?AddSeasonal
?AddNamedHolidays
?AddFixedDateHoliday
?AddNthWeekdayInMonthHoliday
?AddLastWeekdayInMonthHoliday
The holiday components are particularly useful in retail forecasting.
I add a weekly seasonal component. A week about 52 weeks. I also add a yearly commponent.
summary(model_components <- AddSeasonal(model_components, y = claims,
nseasons = 52))
## Length Class Mode
## [1,] 6 LocalLinearTrend list
## [2,] 6 Seasonal list
fit <- bsts(claims, model_components, niter = 2000)
## =-=-=-=-= Iteration 0 Thu Mar 9 17:13:13 2017 =-=-=-=-=
## =-=-=-=-= Iteration 200 Thu Mar 9 17:13:20 2017 =-=-=-=-=
## =-=-=-=-= Iteration 400 Thu Mar 9 17:13:26 2017 =-=-=-=-=
## =-=-=-=-= Iteration 600 Thu Mar 9 17:13:35 2017 =-=-=-=-=
## =-=-=-=-= Iteration 800 Thu Mar 9 17:13:42 2017 =-=-=-=-=
## =-=-=-=-= Iteration 1000 Thu Mar 9 17:13:50 2017 =-=-=-=-=
## =-=-=-=-= Iteration 1200 Thu Mar 9 17:13:57 2017 =-=-=-=-=
## =-=-=-=-= Iteration 1400 Thu Mar 9 17:14:05 2017 =-=-=-=-=
## =-=-=-=-= Iteration 1600 Thu Mar 9 17:14:13 2017 =-=-=-=-=
## =-=-=-=-= Iteration 1800 Thu Mar 9 17:14:20 2017 =-=-=-=-=
Since this is a Bayesian model, the trend and seasonal component parameters are MCMC samples from a posterior. The following visualizes the contributions of each component to the model fit, based on posterior sample means:
burnin <- 500 # Throw away first 500
tibble(
date = as.Date(time(claims)),
trend = colMeans(fit$state.contributions[-(1:burnin),"trend",]),
seasonality = colMeans(fit$state.contributions[-(1:burnin),"seasonal.52.1",])) %>%
gather("component", "value", trend, seasonality) %>%
ggplot(aes(x = date, y= value)) +
geom_line() + theme_bw() +
theme(legend.title = element_blank()) + ylab("") + xlab("") +
facet_grid(component ~ ., scales="free") + guides(colour=FALSE) +
theme(axis.text.x=element_text(angle = -90, hjust = 0))
The generic predict method works on the bsts fit. Predictions into the future has credible intervals. Note how quickly they expand as time progresses.
pred <- predict(fit, horizon = 100, burn = burnin, quantiles = c(.05, .95))
plot(pred)
I can plot to see where the true model deviates the most from my fitted model.
errors <- bsts.prediction.errors(fit, burn = 1000)
PlotDynamicDistribution(errors)
## NULL
The dataset also includes time series that correlate strongly with initial claims. bsts will combine the trend and seasonal components with a regression on these other time series.
fit2 <- bsts(iclaimsNSA ~ ., state.specification = model_components,
data = initial.claims, niter = 1000)
## =-=-=-=-= Iteration 0 Thu Mar 9 17:14:43 2017 =-=-=-=-=
## =-=-=-=-= Iteration 100 Thu Mar 9 17:14:46 2017 =-=-=-=-=
## =-=-=-=-= Iteration 200 Thu Mar 9 17:14:50 2017 =-=-=-=-=
## =-=-=-=-= Iteration 300 Thu Mar 9 17:14:53 2017 =-=-=-=-=
## =-=-=-=-= Iteration 400 Thu Mar 9 17:14:57 2017 =-=-=-=-=
## =-=-=-=-= Iteration 500 Thu Mar 9 17:15:00 2017 =-=-=-=-=
## =-=-=-=-= Iteration 600 Thu Mar 9 17:15:04 2017 =-=-=-=-=
## =-=-=-=-= Iteration 700 Thu Mar 9 17:15:08 2017 =-=-=-=-=
## =-=-=-=-= Iteration 800 Thu Mar 9 17:15:11 2017 =-=-=-=-=
## =-=-=-=-= Iteration 900 Thu Mar 9 17:15:15 2017 =-=-=-=-=
We can obtain parameter estimates by again taking the sample mean of the sampled beta values.
colMeans(fit2$coefficients)
## (Intercept) michigan.unemployment
## 0.000000e+00 9.846581e-06
## idaho.unemployment pennsylvania.unemployment
## 1.362501e-01 -3.951598e-04
## unemployment.filing new.jersey.unemployment
## 4.600161e-03 5.138106e-06
## department.of.unemployment illinois.unemployment
## -1.049029e-04 -1.513685e-05
## rhode.island.unemployment unemployment.office
## 7.806800e-05 5.065605e-01
## filing.unemployment
## 3.790764e-04
This basic look demonstrates how a forecaster could use this modeling approach without having to dive into the underlying Bayesian computation. That said, there is much more to dive into with bsts, particularly with setting Bayes-related parameters such as prior probability distributions. More complex data might require a deeper understanding of the model to get a good fit. Even in this case, the fitted model can be shared with forecasting experts, who will understand its structure and interpretation.
Scott, Steven, and Hal Varian. 2014. “Predicting the Present with Bayesian Structural Time Series” 5. Inderscience Publishers Ltd: 4–23.