Traditional time series models can rely on assumptions about dependence structure that poorly describe real time series. For example, time series are rarely simple auto-regressive processes, and data are often non-Gaussian. Bayesian state-space models estimate latent states together with parameters, so there is broad flexibility in the their posterior covariance structure.


Resources

Data file

combinedLevelChem2007.csv

Software

source('clarkFunctions2022.r')

Objectives


Stream chemistry application

A former student wanted to know if total dissolved nitrogen (TDN) and dissolved organic nitrogen (DON) can be explained by fluctuations in temperature, precipitation, and chlorine concentation ([Cl]), and if it differs between coastal streams. We will answer these questions using a heirarchical state-space model.

The model I use is:

\[ \begin{aligned} x_{j,t+1} &= x_{j,t} + \mathbf{z}_{j,t} \beta + \epsilon_{j,t} \\ y_{j,t} &= x_{j,t} + \nu_{j,t} \end{aligned} \]

First I read the data and set up a design matrix z for the sites and several covariates:

data <- read.csv('combinedLevelChem2007.csv', header=T)
nt   <- nrow(data)

formula <- as.formula( ~ prec + Temp*Site + Cl*Site)    # site names used in interactions
tmp <- model.frame(formula, data, na.action=na.pass)    # will not remove rows with NA
z   <- model.matrix(formula, tmp)                       # design matrix

ynames <- c('TDN','DON')
q <- ncol(z)
y <- data[,ynames[1]]                                   # use TDN as response
xnames <- colnames(z)                                   # names of predictors

There are four sites. Use the function colSums to determine the number of observations by site.

Exercise 1: What model is being fitted in the above block of code? What will the interaction terms tell us?

Multiple time series

This example is complicated by the fact that there are not one, but four time series. I will stack them up, but I will need to identify the beginning and end of each series and model them accordingly.

To compute transitions from time \(t\) to \(t+1\) I need two sequences of indices, call them from and to. from will be used to reference all times in a sequence except the last. to will reference all times except the first. Say that I have four transitions, 1, 2, 3, 4. Then I can set from <- c(1:3) and to <- c(2:4). They are the same length and align like this:

*The vectors `from` and `to` can be used to vectorize the transitions.*

The vectors from and to can be used to vectorize the transitions.

Note that if I wanted to get rates of change from a time series y as diff(y), it could also be done as x[to] - x[from].

If I have three time series that are cancatenated, I can stack from and to for each series into two vectors, call them prows and rrows:

*The vectors `prows` and `rrows` vectorize the transitions for three time series.*

The vectors prows and rrows vectorize the transitions for three time series.

There is no transition at the breaks between series 1 and 2 (4 \(\rightarrow\) 5) or between 2 and 3 (7 \(\rightarrow\) 8).

Here I locate the breaks between the four series in data by first extracting their four names:

slevs  <-  unique( data$Site )        # find site names
ngroup <- length(slevs)               # number of groups

In the code that follows I make vectors first and last, holding the locations of first and last values in the design matrix data and show the breaks with vertical lines:

sl    <- match(data$Site, slevs)     # assign integers
last  <- which( diff(sl) == 1 )      # first and last row for each site
first <- c(1, last + 1)          
last  <- c(last, nt)
plot( y, type = 'l')
abline(v = first, col = 'brown')
*Stacked response with vertical lines separating the four sites.*

Stacked response with vertical lines separating the four sites.

Missing covariates, missing states

I have written a function to indicate the locations of missing values in y and z, both in the full data set and by sites. It also constructs the vectors prows and rrows that I use for MCMC:

v <- getMissX(z, y, first, last)
prows <- v$prows                 # t rows by site
rrows <- v$rrows                 # t+1 rows by site
missz <- v$missx                 # missing predictors in design matrix
missy <- v$missy                 # missing y index in stacked data
missY <- v$missY                 # missing y by site as a list

The objects missz, missy, and missY hold locations that will be used to vectorize imputation of missing values.

I need a prior distributions and initial values for parameters and missing values. In the code below, the initial values for missing states come from a function initialStatesSS.

priorx   <- colMeans(z,na.rm=T)   # prior mean for missing covariates
z[missz] <- priorx[missz[,2]]     # initial missing z

sigM <- .1                        # prior mean variances
tauM <- .1
sp <- varPrior(sigM, nt)            
tp <- varPrior(tauM, nt/5)        # weak prior

sg <- 1/rgamma(1,sp[1],sp[2])     # initialize values
tg <- 1/rgamma(1,tp[1],tp[2])
bg <- matrix(0,q,1)

tmp <- initialStatesSS(y)        # initial missing y
xg  <- tmp$x

Gibbs sampling

In the Gibbs sampler that follows, note two new steps, i) a loop to estimate latent variables by site, and ii) imputation for missing \(z\):

ng     <- 2000
vgibbs <- matrix(0, ng, 2)
xgibbs <- matrix(0, ng, nt)
bgibbs <- matrix(0, ng, q)
zgibbs <- matrix(0, ng, length(missz))
colnames(vgibbs) <- c('sigma','tau')
colnames(bgibbs) <- xnames

sg <- .1

for(g in 1:ng){
  
  # parameters and mean vector
  bg <- .updateBeta(z[prows,], xg[rrows] - xg[prows], sigma=sg, 
                          priorIVB=diag(.001,q), priorB=matrix(0,q)) 
  mu <- z%*%bg                              
  
  # latent variables by site (there are ngroup sites)
  for(j in 1:ngroup){           
    jj <- first[j]:last[j]
    xg[jj] <- updateSSRW(states=xg[jj], yy=y[jj],
                         zb=mu[jj], missing=missY[[j]], tg=tg, sg=sg)   
  }
  
  # variances
  sg <- updateVariance(xg[rrows],xg[prows] + mu[prows],sp[1],sp[2])
  tg <- updateVariance(y[-missy],xg[-missy],tp[1],tp[2])
  
  # missing observations
  z[missz] <- imputeX( missx = missz, xx = z[prows,], 
                       yy = xg[prows] - xg[rrows],
                       bg = bg, sg, priorx )[missz]
  vgibbs[g,] <- c(sg,tg)
  xgibbs[g,] <- xg
  bgibbs[g,] <- bg
  zgibbs[g,] <- z[missz]
}

Here are latent states:

keep  <- 200:ng
xpost <- t( apply(xgibbs[keep,],2,quantile,c(.5,.025,.975)) )
ylim  <- range(xpost)

par(mfrow=c(2,2),bty='n', mar=c(3,2,2,1))
for(j in 1:4){
  ij <- first[j]:last[j]
  shadeInterval(ij,xpost[ij,2:3],add=F,ylim=ylim,xlab='Weeks')  
  points(ij,y[ij],lwd=2,cex=.1, col='blue')                 #obs 
  wm <- missY[[j]]
  points(ij[wm],xpost[ij[wm],1],col='orange',cex=.5,pch=16) #missing
  title(slevs[j])
}
*Posterior estimates of true response (shaded 95% CI) with observations (dots).*

Posterior estimates of true response (shaded 95% CI) with observations (dots).

Here are estimates of the missing z’s;

# variables that are missing;
zvars <- table(missz[,2])
zpost <- t( apply(zgibbs[keep,],2,quantile,c(.5,.025,.975)) )
zvar  <- 3                         # I choose variable 3: temperature

par(mfrow=c(2,2),bty='n', mar=c(3,2,2,1))

for(j in 1:4){
  
  ww <- which(missz[,2] == zvar & missz[,1] >= 
                first[j] & missz[,1] <= last[j])
  if(length(ww) == 0)next
  
  zq <- zpost[ww,]
  mj <- missz[ww, 1]
  zj <- matrix(z[,zvar], nt, 3)
  zj[,2:3] <- 0
  zj[ mj,] <- zq
  
  ij <- first[j]:last[j]
  ylim <- range(zj[ij,])
  shadeInterval(ij,zj[ij,2:3], add=F, ylim = ylim, xlab='Weeks', col='brown')  
  lines(ij, zj[ij,1])
  segments(mj, zq[,2], mj, zq[,3], lwd = 1, col='brown')
  points(mj, zq[,1], col='brown', pch=3)
  
  if(j == 1)title(colnames(z)[zvar])
}

Here are some parameter estimates:

.processPars(bgibbs, xtrue=bg*0, CPLOT=T) 
## $summary
##                         estimate        se      0.025     0.975 true value
## (Intercept)           -0.0391300 0.0437000 -1.321e-01 3.884e-02          0
## prec                  -0.0001049 0.0009432 -1.891e-03 1.824e-03          0
## Temp                   0.0035520 0.0025370 -8.486e-04 9.199e-03          0
## SiteMidpoint          -0.0215500 0.0566300 -1.292e-01 8.974e-02          0
## SiteOutflow           -0.0575500 0.0561400 -1.668e-01 5.925e-02          0
## SitePump station       0.0303800 0.0597200 -8.311e-02 1.573e-01          0
## Cl                    -0.0002553 0.0001341 -5.122e-04 1.036e-05          0
## Temp:SiteMidpoint      0.0010100 0.0032420 -5.639e-03 7.025e-03          0
## Temp:SiteOutflow       0.0011620 0.0032540 -5.547e-03 7.184e-03          0
## Temp:SitePump station -0.0035680 0.0033930 -1.103e-02 2.596e-03          0
## SiteMidpoint:Cl        0.0001430 0.0001504 -1.562e-04 4.384e-04          0
## SiteOutflow:Cl         0.0002965 0.0001371  1.721e-05 5.564e-04          0
## SitePump station:Cl    0.0004041 0.0002109 -3.205e-05 8.052e-04          0
*MCMC chains for coefficients*.

MCMC chains for coefficients.

.processPars(bgibbs[keep,], xtrue=bg*0, DPLOT=T, burnin = 200) 
## $summary
##                         estimate        se      0.025     0.975 true value
## (Intercept)           -3.965e-02 0.0441000 -1.321e-01 4.013e-02          0
## prec                  -9.772e-05 0.0009441 -1.886e-03 1.843e-03          0
## Temp                   3.557e-03 0.0025480 -8.500e-04 9.236e-03          0
## SiteMidpoint          -2.029e-02 0.0565100 -1.297e-01 8.994e-02          0
## SiteOutflow           -5.645e-02 0.0561700 -1.732e-01 6.034e-02          0
## SitePump station       3.062e-02 0.0600800 -8.268e-02 1.580e-01          0
## Cl                    -2.506e-04 0.0001335 -5.041e-04 9.570e-06          0
## Temp:SiteMidpoint      9.826e-04 0.0032740 -5.781e-03 7.115e-03          0
## Temp:SiteOutflow       1.135e-03 0.0032850 -5.594e-03 7.236e-03          0
## Temp:SitePump station -3.575e-03 0.0033980 -1.106e-02 2.679e-03          0
## SiteMidpoint:Cl        1.378e-04 0.0001501 -1.647e-04 4.316e-04          0
## SiteOutflow:Cl         2.917e-04 0.0001366  1.690e-05 5.559e-04          0
## SitePump station:Cl    4.046e-04 0.0002114 -3.906e-05 8.048e-04          0
*Posterior estimates of coefficients*.

Posterior estimates of coefficients.

.processPars(vgibbs[keep,], xtrue=c(sigM,tauM), DPLOT=T, burnin = 200) #compare prior mean
*Posterior estimates for variances*.

Posterior estimates for variances.

## $summary
##       estimate       se  0.025  0.975 true value
## sigma   0.1069 0.002848 0.1016 0.1127        0.1
## tau     0.1703 0.007939 0.1556 0.1861        0.1

Here are some site differences for precipitation and Cl effects:

vars <- c('Temp', 'Cl')
par(mfrow=c(2,1),bty='n', mar=c(4,4,2,2))

sites <- unique(data$Site)

for(k in 1:2){
  
  wk   <- grep(vars[k],colnames(bgibbs))
  bk   <- bgibbs[keep,wk]
  xlim <- range(bk)
  ymax <- 8/diff(xlim)
  plot(NULL, xlim=xlim,ylim=c(0,ymax),xlab=vars[k],ylab='Density')
  
  for(j in 1:ncol(bk)){
    dk <- density(bk[,j])
    lines(dk[[1]],dk[[2]],col=j, lwd=2)
    wmax <- which.max(dk[[2]])
    text( dk[[1]][wmax], 1.2*dk[[2]][wmax], sites[j] ,col=j)
  }
}
*Posterior densities for coefficients.*

Posterior densities for coefficients.

Can you interpret these site differences?

Exercise 5 The following questions can be answered with this example:

  • Before running any more analysis, say what you think should be the variance on the process and the observations. Justify your answer.

  • How does the prior on \(\sigma^2, \tau^2\) affect estimates of \(\mathbf{\beta}\)? Explain how you manipulated prior weight to arrive at your answer.

  • Repeat the analysis with DON as the response variable and interpret results.

  • What is the answer to the motivating question: Can total dissolved nitrogen (TDN) and dissolved organic nitrogen (DON) be explained by temperature, precipitation, and chlorine concentration ([Cl]), and does it differ between coastal streams?

Recap

State-space models model serial dependence hierarchically. Hierarchical models do not require marginal independence, only conditional independence. Modeling latent states allows for dependence structure that could be quite different from what could be achieved with a standard models (e.g., autoregressive). The flexibility of hierarchical modeling makes it attractive for time series data.

Informative prior distributions are important for variances. Examples here show how they affect estimates of covariates and of latent states. The relative magnitudes of process and observation error determine the extent which data versus process dominate the fit.

Appendix

The random walk has a conditionally normal posterior distribution that can be written this way:

\[ \begin{aligned} x_t | x_{t-1}, x_{t+1}, y_t, \sigma^2 &\propto N(x_t|x_{t-1}, \sigma^2) \times N(x_{t+1} | x_t, \sigma^2) \times N(y_t | x_t ) \\ &\propto exp \left[ -\frac{(x_t - x_{t-1})^2}{2\sigma^2} - \frac{(x_{t+1} - x_{t})^2}{2\sigma^2} - \frac{(y_t - x_t)^2}{2\tau^2} \right] \end{aligned} \] Because this is a product of normals, it will be normal, with this form,

\[ \begin{aligned} x_t &\sim N(Vv, V) \\ V &= \left( \frac{1}{2 \sigma^2} + \frac{1}{\tau^2} \right)^{-1} \\ v &= \frac{x_{t-1} + x_{t+1}}{\sigma^2} + \frac{y_t}{\tau^2} \end{aligned} \] Recall, \(v\) is a sum of the previous and next values in the series \((t-1, t+1)\) and the observation \(y_t\). The variances act as weights. The fitted latent states will be pulled to the data if observation variance \(\tau^2\) is small relative to \(\sigma^2\). The latent states will be pulled to the mean of past and previous values if \(\tau^2\) is large relative to \(\sigma^2\).

Here is the function:

updateSSRW <- function( states, yy, zb=rep(0,length(y)), missing, tg, sg){       

  # state-space random walk 
  # update continuous states, random walk
  # missing times, obs y, obs error tg, process error sg
  # zb = z%*%beta
  
  nn <- length(states)
  
  for(t in 1:nn){
    
    VI <- 0
    v  <- 0
    
    if(!t %in% missing){                    # observation
      v  <- yy[t]/tg
      VI <- 1/tg
    }
    
    if(t < nn){                             # t+1 term excluded for last 
      v  <- v + (states[t+1] - zb[t])/sg 
      VI <- VI + 1/sg
    }
    
    if(t > 1){                              # t-1 term excluded for 1st 
      v  <- v + (states[t-1] + zb[t-1])/sg
      VI <- VI + 1/sg
    }
    
    V     <- 1/VI
    states[t] <- rnorm( 1, V*v, sqrt(V) )
  }
  states
}

The conditional distribution for the process variance looks like this:

\[ \begin{aligned} \left[ \sigma^2 | \mathbf{x}, \mathbf{y} \right] &\propto \prod_{t=2}^T N(x_t | x_{t-1}, \sigma^2) IG(\sigma^2 | s_1, s_2 ) \\ &\propto \left( \sigma^2 \right)^{-(T-1)/2}exp \left[ -\frac{1}{2\sigma^2} \sum_{t=2}^T (x_t - x_{t-1})^2 \right] \times \left( \sigma^2 \right)^{-(s_1 + 1)} exp(-s_2/\sigma^2) \\ &= \left( \sigma^2 \right)^{-(T-1)/2 - s_1 - 1} exp \left[ -\frac{1}{2\sigma^2} \sum_{t=2}^T (x_t - x_{t-1})^2 - \frac{s_2}{\sigma^2} \right] \\ &= IG \left( \sigma^2 \bigg| (T-1)/2 + s_1, \frac{1}{2}\sum_{t=2}^T (x_t - x_{t-1})^2 + s_2 \right) \end{aligned} \]

The conditional distribution for the observation variance looks like this: \[ \begin{aligned} \left[ \tau^2 | \mathbf{x}, \mathbf{y} \right] &\propto \prod_{t=1}^T N(y_t | x_t, \tau^2) IG(\tau^2 | u_1, u_2 ) \\ &\propto \left( \tau^2 \right)^{-T/2}exp \left[ -\frac{1}{2\tau^2} \sum_{t=2}^T (y_t - x_t)^2 \right] \times \left( \tau^2 \right)^{-(u_1 + 1)} exp(-u_2/\tau^2) \\ &= IG \left( \tau^2 \bigg| T/2 + u_1, \frac{1}{2}\sum_{t=1}^T (y_t - x_t)^2 + u_2 \right) \end{aligned} \]