Sigrid Keydana, Trivadis
2017/14/11
Trivadis
My background
My passion
Where to find me
It's 2000, just before the Olympics. This is the data we have:
lm(seconds ~ year, male400_1996) %>% coefficients()
(Intercept) year
207.46609 -0.08257
(yes, the series is not totally regular, but let me make the point ;-))
arima_fit <- auto.arima(male400_1996$seconds)
arima_fit$coef
ma1 drift
-0.7999 -0.3807
What if we didn't have to:
or even:
States form a Markov chain.
Observations are independent conditional on states.
\( \theta_t = G_t\theta_{t-1} + w_t \)
\( Y_t = F_t\theta_t + v_t \)
\( w_t \sim \mathcal{N}(0, W_t) \)
\( v_t \sim \mathcal{N}(0, V_t) \)
In theory, any and all of
could change over time.
How can we optimise our assumptions about the true position?
Here's the model again
Kalman steps (given: \( \theta_{t-1}|y_{1:t-1} \sim \mathcal{N}(m_{t-1}, C_{t-1}) \))
One-step-ahead predictive distribution of \( \theta_t \) given \( y_{1:t-1} \):
\( E(\theta_t|y_{1:t-1}) = a_t = G_tm_{t-1} \) \( Var(\theta_t|y_{1:t-1}) = R_t = G_tC_{t-1}G'_t + W_t \)
One-step-ahead predictive distribution of \( Y_t \) given \( y_{1:t-1} \):
\( E(Y_t|y_{1:t-1}) = f_t = F_t a_t \) \( Var(Y_t|y_{1:t-1}) = Q_t = F_tR_tF'_t + V_t \)
Filtering distribution of \( \theta_t \) given \( y_{1:t} \):
\( E(\theta_t|y_{1:t}) = m_t = a_t + R_tF'_tQ_t^{-1}e_t \) \( Var(\theta_t|y_{1:t}) = C_t = R_t - R_tF'_tQ_t^{-1}F_tR_t \) where \( e_t = Y_t - f_t \) is the forecast error.
We correct our prediction by how much it differs from the incoming observation.
Posterior estimate \( E(\theta_t|y_{1:t}) = m_t = a_t + R_tF'_tQ_t^{-1}e_t \)
Kalman gain \( R_tF'_tQ_t^{-1} \)
State equation: \( \mu_t = \mu_{t-1} + w_t, w_t \sim \mathcal{N}(0, W) \)
Observation equation: \( Y_t = \mu_t + v_t, v_t \sim \mathcal{N}(0, V) \)
\( G = F = 1 \)
W and V could be time-invariant or time-dependent.
library(dlm)
build <- function(params) {
dlmModPoly(order = 1, dV = exp(params[1]), dW = exp(params[2]))
}
initial_params <- rep(log(var(diff(lakeSup))), 2)
mle <- dlmMLE(lakeSup, initial_params, build)
model <- dlmModPoly(order = 1, dV = exp(mle$par[1]), dW = exp(mle$par[2]))
Map(function(x) round(x,2), unlist(model)) %>% unlist()
m0 C0 FF V GG W
0.00 10000000.00 1.00 9.47 1.00 0.12
How would the filtering and smoothing curves have looked had we specified equal degrees of system and observation noise?
model2 <- dlmModPoly(order = 1, dV = 10, dW = 10)
filt2 <- dlmFilter(lakeSup, model2)
smooth2 <- dlmSmooth(lakeSup, model2)
build <- function(params) {
dlmModPoly(order = 1, dV = exp(params[1]), dW = exp(params[2]))
}
initial_params <- rep(log_var_diff, 2)
mle <- dlmMLE(nvda, initial_params, build)
model <- dlmModPoly(order = 1, dV = exp(mle$par[1]), dW = exp(mle$par[2]))
Map(function(x) round(x,2), unlist(model)) %>% unlist()
m0 C0 FF V GG W
0.00 10000000.00 1.00 0.43 1.00 12.06
(No, the lines did not turn invisible…)
Not with this we won't…
Again, system and observation noise (as well as the transition equations) could be time-invariant or changing over time.
Comparing Nvidia to AMD, Mu, and IBM:
Does it look like Nvidia and IBM have a somewhat “complementary” relationship?
lm(nvda ~ ibm) %>% summary()
Call:
lm(formula = nvda ~ ibm)
Residuals:
Min 1Q Median 3Q Max
-44.30 -8.29 2.70 10.08 29.24
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 457.2733 14.4070 31.7 <2e-16 ***
ibm -2.0611 0.0914 -22.6 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14.9 on 186 degrees of freedom
Multiple R-squared: 0.732, Adjusted R-squared: 0.731
F-statistic: 509 on 1 and 186 DF, p-value: <2e-16
… but is it really constant over time?
lm(nvda ~ ibm) %>% residuals() %>% qqnorm()
build <- function(u) {
dlmModReg(ibm, dV = exp(u[1]), dW = exp(u[2 : 3]))
}
mle <- dlmMLE(nvda, parm = rep(0, 3), build)
model <- build(mle$par)
model$FF
[,1] [,2]
[1,] 1 1
model$GG
[,1] [,2]
[1,] 1 0
[2,] 0 1
model$W
[,1] [,2]
[1,] 12.2 0.000e+00
[2,] 0.0 2.393e-12
model$V
[,1]
[1,] 0.3863
Final coefficients:
[1] 170.38366 0.05716
Development of coefficients over time (except initial noise):
Let's model this as a dynamic regression problem!
build <- function(u) {
dlmModReg(male400_1996$year, dV = exp(u[1]), dW = exp(u[2 : 3]))
}
mle <- dlmMLE(male400_1996$seconds, parm = rep(0, 3), build)
model <- build(mle$par)
filt <-dlmFilter(male400_1996$seconds, model)
filt$m[1 + length(male400_1996$seconds), ] # most recent coefficients
[1] 209.48317 -0.08335
Clearly remembering the downward walk won't help us to predict the near future…
Here, the sequence length (length of the hidden state) was chosen to be 30.
A way to get several predictions from an LSTM network in Keras: use TimeDistributed layer
model <- keras_model_sequential()
model %>%
layer_lstm(units = lstm_units, input_shape = c(num_steps, num_features), return_sequences = TRUE) %>%
time_distributed(layer_dense(units = 1)) %>%
compile(
loss = 'mean_squared_error',
optimizer = 'adam'
)
… let's give it some more adequate input!
This series has multiple seasonality.
lstm_units <- 24 * 7 # hourly * weekly
model %>%
layer_lstm(units = lstm_units, input_shape = c(num_steps, num_features),
return_sequences = TRUE) %>%
time_distributed(layer_dense(units = 1)) %>%
compile(
loss = 'mean_squared_error',
optimizer = 'adam'
)
And here are the forecasts, 168 steps ahead (displaying 3 only)
Have fun!