Example First order DLM
This is an example from Mike West book - Bayesian Forecasting and Dynamic Models which is illustrated as follows:
A pharmaceutical company markets KURIT, an ethical drug, which currently sells an average of \(100\) units per month. Medical advice leads to a change in drug formulation that is expected to result in wider market demand for the product. It is agreed that from January, \(t = 1\), the new formulation with new packaging will replace the current product, but the price and brand name KURIT remains unchanged. In order to plan production, stocks and raw material supplies, short-term forecasts of future demand are required. The drug is used regularly by individual patients, so that demand tends to be locally constant in time. So a constant first-order polynomial DLM is adopted for the total monthly sales. Sales fluctuations and observational variation about demand level are expected to considerably exceed month-to-month variation in the demand level, so that \(W\) is small compared to \(V\) . In accord with this, the constant DLM \(\{1, 1, 100, 5\}\), which operated successfully on the old formulation, is retained for the new formulation.
In December, \(t = 0\), the expert market view for the new product is that demand is most likely to have expanded by about \(30\%\), to \(130\) units per month. It is believed that demand is unlikely to have fallen by more than 10 units or to have increased by more than 70. This range of \(80\) units is taken as representing 4 standard deviations for µ0. Hence the initial view of the company prior to launch is described by \(m_0 = 130\) and \(C_0 = 400\), so that \[ \mu_0 \sim \mathcal{N}(130,400)\] Consequently, the operational routine model for sales \(Y_t\) in month \(t\) is
\[\begin{gather} y_{t} = \mu_{t} + \upsilon_{t}, & \quad \upsilon_{t} \sim \mathcal{N}(0, V_{t}), \\ \mu_{t} = \mu_{t-1} + \omega_{t}, & \quad \omega_{t} \sim \mathcal{N}(0, W_{t}), \\ (\mu_0 \mid D_0) & \sim \mathcal{N}(130,400). \end{gather}\]
Here \(r = 0.05\), a low signal-to-noise ratio typical in this sort of application.
Observations over the next few months and the various components of the one-step forecasting and updating recurrence relationships are given in belows:
import pandas as pd
Yt = [150, 136, 143, 154, 135, 148, 128, 149, 146]
m0 = 130
C0 = 400
V = 100
W = 5
df = pd.DataFrame(columns = ['Yt', 'mt', 'Ct', 'ft', 'Qt', 'At', 'et'])
time_index = 0
m_t = None
c_t = None
for Y in Yt:
if time_index == 0:
m_pt = m0
c_pt = C0
time_index += 1
else:
m_pt = m_t
c_pt = c_t
time_index += 1
f_t = m_pt
R_t = c_pt + W
Q_t = R_t + V
A_t = R_t/Q_t
e_t = Y-f_t
c_t = (A_t*V)
m_t = f_t + ((A_t*e_t))
df = df.append({'Yt': Y,
'mt': m_t,
'ft':f_t,
'Ct': c_t,
'Qt': Q_t,
'At': A_t,
'et': e_t
}, ignore_index=True)
df.round(1)## Yt mt Ct ft Qt At et
## 0 150.0 146.0 80.2 130.0 505.0 0.8 20.0
## 1 136.0 141.4 46.0 146.0 185.2 0.5 -10.0
## 2 143.0 142.0 33.8 141.4 151.0 0.3 1.6
## 3 154.0 145.3 27.9 142.0 138.8 0.3 12.0
## 4 135.0 142.8 24.8 145.3 132.9 0.2 -10.3
## 5 148.0 144.0 22.9 142.8 129.8 0.2 5.2
## 6 128.0 140.5 21.8 144.0 127.9 0.2 -16.0
## 7 149.0 142.3 21.2 140.5 126.8 0.2 8.5
## 8 146.0 143.1 20.7 142.3 126.2 0.2 3.7
Initially, at \(t = 0\), the company’s prior view of market demand is vitally important in making decisions about production and stocks. Subsequently, however, the value of this particular subjective prior diminishes rapidly as data is received. For example, the adaptive coefficient \(A_1\) takes the value \(0.8\), so that
\[ m_1 = m_0 + 0.8 e_1 = (4Y_1 + m_0) / 5\]
Thus, the January observation is given \(4\) times the weight of the prior mean \(m_0\) in calculating the posterior mean \(m_1\). At \(t = 2, A_2 = 0.46\) and \[m_2 = m_1 + 0.46 e_2 = 0.46 Y_2 + 0.43 Y_1 + 0.11 m_0\]
so that \(Y_2\) is also relatively highly weighted and \(m_0\) contributes only \(11\%\) of the weight of information incorporated in \(m_2\). As \(t\) increases, At appears to decay rapidly to a limiting value near \(0.2\). Finally, the coefficient of \(m_0\) in \(m_t\) is simply \((1 − A_t)(1 − A_{t−1}) \dots (1 − A_1)\), so that \(m_0\) contributes only \(1\%\) of the information used to calculate \(m_10\), and as \(t\). increases the relevance of this subjective prior decays to zero.