Multiple time series may be viewed as independent realizations of the same process or as multiple processes. In the latter case, they may be subject to the same model. This vignette considers both examples.

Resources

Data file

combinedLevelChem2007.csv

Colp_K_bayes.csv

Software

The second file in C++ includes functions for the truncated multivariate normal:

source('clarkFunctions2024.r')
Rcpp::sourceCpp("cppFns.cpp")

Valuable reference

Wikle and Cressie: Statistics for Spatio-temporal data

Objectives

understand what makes a state-space model hierarchical, and why it can be important for time-series data–think conditional independence
understand how the relative magnitudes of process and observation error affect estimates and predictions and why prior distributions should be informative
understand differences in scale of multivariate time series and how to accommodate it.

There are two examples in this vignette, i) one that views multiple population time series as drawn from the same “mean” population and ii) another that views multiple streams as distinct, but subject to the same process and observation models.

Protist population growth

There can be multiple time series of measurements that are viewed as independent realizations of the same process. In this example of protists tracked under lab conditions, there are multiple populations that should be growing according to the same growth model.

Logistic population growth

The logistic model is

\[ dN = \left( rN(1 - N/K) + \epsilon \right) \times dt \]

Parameters include the per capita growth rate \(r\), carrying capacity \(K\), and model misspecification error \(\epsilon\). In fact, time increments \(dt\) are not all the same–I omit additional subscripts to reduce complexity.

The population size at time \(t\) is

\[N_{t+dt} = N_t + dN_t\]

Count data

Here are the data. I remove the first time increment when the population is not growing logistically:

data <- read.csv( "Colp_K_bayes.csv" )
data <- data[ data$Hours > 0, ]
print( data[ 1:20, ] )

These are counts, so I will model them as Poisson. Cell counts are made at discrete time increments. For each time period there are \(K = 5\) populations that arise from a model that has a shared mean \(N_t\) and effort given by the column \(E_{k,t} =\) count_vol_ml,

\[ \Pi_{k=1}^K Poi( y_{k,t} | E_{k,t}N_t) \] Here is code to set up the data. Note that I have divided by 1000 to reduce the number of digits:

effort <- data$count_vol_mL
ng <- tapply( data$count/effort, list( hr = data$Hours, tube = data$tube.id), mean )

ytot  <- tapply( data$count, list( hr = data$Hours, tube = data$tube.id), sum )
etot  <- tapply( data$count_vol_mL*1000, 
                 list( hr = data$Hours, tube = data$tube.id), sum )
ng <- rowMeans( ytot/etot, na.rm = T )
hr <- as.numeric( rownames( ytot ) )

plot( hr, ng, type = 'l', ylim = c(0, 7), las = 1, bty = 'n'  )
points( data$Hours, data$count/( data$count_vol_mL )/1000, pch = 16, cex = .5 )

Here are some increment vectors:

nt <- length( ng )       # number of times
dn <- ng[-1] - ng[-nt]   # growth increments
dt <- diff( hr )         # time increments

Data modeling

Noticing that the model can be linearized for efficient Gibbs sampling, I rewrite the growth model like this,

\[ \begin{align} dN_t/dt &= \mathbf{z}'_t \beta + \epsilon_t \\ \mathbf{z}'_j &= ( dN_t, dN_t^2)_{k = 1}^K \\ \beta &= (r, r/K ) \end{align} \] I can use this form to directly sample coefficients \(\beta\). I will draw them from a truncated MVN distribution bounded by reasonable values (see below).

For the latent states \(N\) I will need indirect sampling. I will use Metropolis. The conditional distribution of latent states is

\[ [N_t | \mathbf{y}_t] \propto N(dN_t/dt_t | \mathbf{z}'_t \beta, \sigma^2) \times \Pi_{k=1}^K Poi( y_{k,t} | E_{k,t}N_t) \] Notice that there are \(K = 5\) observations for each latent state \(N_t\).

Gibbs sampler

Here are some objects needed for a Gibbs sampler:

sp <- varPrior( 1, 2 )               # weak prior for variance
sg <- 1/rgamma( 1, sp[1], sp[2] )    # initialize values for variances, coefficients
bg <- matrix( c(.02, -.1 ), 2, 1 )   # initialize beta

loB <- matrix( c( .001, -.1 ), 1 )   # prior and truncated values for coefficients
hiB <- loB*0 + c( .5, -.01 )
priorB   <- matrix( c(.1, -.01), 2 )
priorIVB <- diag(.00001, 2 )

G      <- 50000                      # store estimates
sgibbs <- rep(0, G )
xgibbs <- matrix( 0, G, nt )
bgibbs <- matrix( 0, G, 2 )
colnames(bgibbs) <- c( 'r', 'K' )

cmat <- diag( nt )*.000001                         # covariance to draw latent N
updateCmat <- c( 100, 200, 500, 1000, 2000, 5000 ) # iterations to update cmat
accept <- 0

Here is a Gibbs sampler:

for(g in 1:G){
  
  if( g %in% updateCmat ){                # update covariance between states
    const <- 1
    if( accept/g > .5 )const <- 2         # if acceptance high, decrease jumps
    if( accept/g < .1 )const <- .2        # if acceptance low, increase jumps
    cmat <- cov( xgibbs[1:(g-1),] )*const
  }
  
  dx <- diff( ng )/dt                     # growth increments
  z  <- cbind( ng, ng^2 )[-nt,]           # design vector
  
  # direct sample for beta
  tmp <- getVv( x = z, y = dx, sg, priorB = priorB, priorIVB = priorIVB )
  bg <- t( .tnormMVNmatrix( t( bg ), t( tmp$mu ), tmp$V, loB, hiB ) )
  zb <- z%*%bg
  
  # Metropolis for latent N
  xp <- as.vector( .tnormMVNmatrix( ng, ng, cmat, ng*0, ng*0 + 20 ) ) # proposed N
  dp <- diff( xp )/dt                                                 # proposed dN
  
  pnow <- sum( dnorm( dx, zb[ 1:(nt-1) ], sqrt( sg ), log = T ) ) +   # process update
          sum( dpois( ytot, etot*ng, log = T ) )                      # likelihood for counts
  pnew <- sum( dnorm( dp, zb[ 1:(nt-1) ], sqrt( sg ), log = T ) ) +   # same for proposed values
          sum( dpois( ytot, etot*xp, log = T ) )
  a <- exp( pnew - pnow )
  u <- runif( 1, 0, 1)
  if( u < a ){
    ng <- xp
    accept <- accept + 1
  }
    
  sg <- updateVariance( (ng[-1] - ng[-nt])/dt,  zb, sp[1], sp[2] )           # process variance
  
  K  <- -bg[1]/bg[2]
  sgibbs[g]  <- sg
  xgibbs[g,] <- ng
  bgibbs[g,] <- c( bg[1], K )
}

Evaluation

Here are some showing that mixing is good:

chainPlot( bgibbs )    # r and K chains

chainPlot( xgibbs )    # latent states

Here are density plots for coefficients:

chain2density( bgibbs )

Here are estimated latent states with 95% credible intervals shaded:

xci <- t( apply( xgibbs, 2, quantile, c(.025, .975 ) ) )
xsd <- apply( xgibbs, 2, sd )
xmu <- colMeans( xgibbs )

plot( hr, xmu, lwd = 2, type = 'l', ylim = range( ytot/etot ), col = 'brown',
      bty = 'n' )
shadeInterval( hr, xci, col = 'brown', add = T )
points( rep( hr, ncol(ytot) ), as.vector(ytot/etot), pch = 16 )

Could the fitted model project forward in time? Here is an attempt where the model is started from estimates at time hour 24:

ns <- 500 
startTime <- 24                    # time in hours
incTime   <- c(startTime:200)

wt <- which( hr >= startTime )[1]  # find estimate for initial population size

xe <- matrix( 0, ns, length( incTime ) )
ib <- sample( nrow(xgibbs), ns )   # sample starting population size at random
xe[,1] <- xgibbs[ib,wt]            # exponential model
xl     <- xe                       # logistic model

for( t in 2:length(incTime) ){     # project both models
  ib <- sample( nrow(xgibbs), ns )
  xe[,t] <- xe[,t-1]*( 1 + bgibbs[ib, 1] ) 
  xl[,t] <- xl[,t-1]*( 1 + bgibbs[ib,1]*( 1 - xl[,t-1]/bgibbs[ib,2] ) )
}
eci <- t( apply( xe, 2, quantile, c(.025, .975 ) ) )
lci <- t( apply( xl, 2, quantile, c(.025, .975 ) ) )

plot( hr, xmu, lwd = 2, type = 'l', ylim = range( ytot/etot ), col = 'brown',
      bty = 'n' )
shadeInterval( incTime, eci, col = '#9ecae1', add = T )
lines( incTime, colMeans( xe ), lwd = 2, col = 'blue', lty = 2 )

shadeInterval( incTime, lci, col = '#d6604d', add = T )
lines( incTime, colMeans( xl ), lwd = 2, col = 'brown', lty = 2 )

As the fitted model predicts forward in time the data drift away from predictions.

Exercise 1 Does the logistic model fit these data well? Does it predict? What does it assume and which assumptions could be unrealistic?

Stream chemistry application

A former student wanted to know if total dissolved nitrogen (TDN) and dissolved organic nitrogen (DON) can be explained by fluctuations in temperature, precipitation, and chlorine concentation ([Cl]), and if it differs between coastal streams. We will answer these questions using a heirarchical state-space model.

The model I use is:

\[ \begin{aligned} x_{j,t+1} &= x_{j,t} + \mathbf{z}_{j,t} \beta + \epsilon_{j,t} \\ y_{j,t} &= x_{j,t} + \nu_{j,t} \end{aligned} \]

First I read the data and set up a design matrix z for the sites and several covariates:

data <- read.csv('combinedLevelChem2007.csv', header=T)
nt   <- nrow(data)

formula <- as.formula( ~ prec + Temp*Site + Cl*Site)    # site names used in interactions
tmp <- model.frame(formula, data, na.action=na.pass)    # will not remove rows with NA
z   <- model.matrix(formula, tmp)                       # design matrix

ynames <- c('TDN','DON')
wf <- which( is.finite( rowSums( cbind( z, data[,ynames ] ) ) ) )
q <- ncol(z)
y <- data[,ynames[1]]                                   # use TDN as response
xnames <- colnames(z)                                   # names of predictors

There are four sites. Use the function colSums to determine the number of observations by site.

Exercise 2: What model is being fitted in the above block of code? What will the interaction terms tell us?

Multiple time series

This example is complicated by the fact that there are not one, but four time series, and each can be different. I will stack them up, but I will need to identify the beginning and end of each series and model them accordingly.

To compute transitions from time \(t\) to \(t+1\) I need two sequences of indices, call them from and to. from will be used to reference all times in a sequence except the last. to will reference all times except the first. Say that I have four transitions, 1, 2, 3, 4. Then I can set from <- c(1:3) and to <- c(2:4). They are the same length and align like this:

The vectors from and to can be used to vectorize the transitions.

Note that if I wanted to get rates of change from a time series y as diff(y), it could also be done as y[to] - y[from].

Here I locate the breaks between the four series in data by first extracting their four names:

slevs  <-  unique( data$Site )        # find site names
ngroup <- length(slevs)               # number of groups

In the code that follows I make vectors first and last, holding the locations of first and last values in the design matrix data and show the breaks with vertical lines:

sl    <- match(data$Site, slevs)     # assign integers
last  <- which( diff(sl) == 1 )      # first and last row for each site
first <- c(1, last + 1)          
last  <- c(last, nt)
plot( y, type = 'l')
abline(v = first, col = 'brown')

Stacked response with vertical lines separating the four sites.

Missing covariates, missing states

I have written a function to indicate the locations of missing values in y and z, both in the full data set and by sites. It also constructs the vectors prows and rrows that I use for MCMC:

v <- getMissX(z, y, first, last)
prows <- v$prows                 # t rows by site
rrows <- v$rrows                 # t+1 rows by site
missz <- v$missx                 # missing predictors in design matrix
missy <- v$missy                 # missing y index in stacked data
missY <- v$missY                 # missing y by site as a list

The objects missz, missy, and missY hold locations that will be used to vectorize imputation of missing values.

I need a prior distributions and initial values for parameters and missing values. In the code below, the initial values for missing states come from a function initialStatesSS.

priorx   <- colMeans(z,na.rm=T)   # prior mean for missing covariates
z[missz] <- priorx[missz[,2]]     # initial missing z

sigM <- .1                        # prior mean variances
tauM <- .1
sp <- varPrior(sigM, nt)            
tp <- varPrior(tauM, nt/5)        # weak prior

sg <- 1/rgamma(1,sp[1],sp[2])     # initialize values
tg <- 1/rgamma(1,tp[1],tp[2])
bg <- matrix(0,q,1)

tmp <- initialStatesSS(y)        # initial missing y
xg  <- tmp$x

Gibbs sampling

In the Gibbs sampler that follows, note two new steps, i) a loop to estimate latent variables by site, and ii) imputation for missing \(z\):

ng     <- 2000
vgibbs <- matrix(0, ng, 2)
xgibbs <- matrix(0, ng, nt)
bgibbs <- matrix(0, ng, q)
zgibbs <- matrix(0, ng, length(missz))
colnames(vgibbs) <- c('sigma','tau')
colnames(bgibbs) <- xnames

sg <- .1

for(g in 1:ng){
  
  # parameters and mean vector
  bg <- updateBeta( z[prows,], xg[rrows] - xg[prows], sigma = sg, 
                          priorIVB = diag(.001, q), priorB = matrix(0, q) ) 
  mu <- z%*%bg                              
  
  # latent variables by site (there are ngroup sites)
  for(j in 1:ngroup){           
    jj <- first[j]:last[j]
    xg[jj] <- updateSSRW(states = xg[jj], yy = y[jj],
                         zb = mu[jj], missing = missY[[j]], tg = tg, sg = sg)   
  }
  
  # variances
  sg <- updateVariance(xg[rrows],xg[prows] + mu[prows],sp[1],sp[2])
  tg <- updateVariance(y[-missy],xg[-missy],tp[1],tp[2])
  
  # missing observations
  z[missz] <- imputeX( missx = missz, xx = z[prows,], 
                       yy = xg[prows] - xg[rrows],
                       bg = bg, sg, priorx )[missz]
  vgibbs[g,] <- c(sg,tg)
  xgibbs[g,] <- xg
  bgibbs[g,] <- bg
  zgibbs[g,] <- z[missz]
}

Here are latent states:

keep  <- 200:ng
xpost <- t( apply(xgibbs[keep,],2,quantile,c(.5,.025,.975)) )
ylim  <- range(xpost)

par(mfrow=c(2,2),bty='n', mar=c(3,2,2,1))
for(j in 1:4){
  ij <- first[j]:last[j]
  shadeInterval(ij,xpost[ij,2:3],add=F,ylim=ylim,xlab='Weeks')  
  points(ij,y[ij],lwd=2,cex=.1, col='blue')                 #obs 
  wm <- missY[[j]]
  points(ij[wm],xpost[ij[wm],1],col='orange',cex=.5,pch=16) #missing
  title(slevs[j])
}

Posterior estimates of true response (shaded 95% CI) with observations (dots).

Here are some parameter estimates:

chainPlot( bgibbs, trueValues = bg*0 )

MCMC chains for coefficients.

chain2density( vgibbs[keep,], trueValues = c(sigM,tauM) ) # compare prior mean

Posterior estimates for variances.

Here are some site differences for precipitation and Cl effects:

vars <- c('Temp', 'Cl')
par(mfrow=c(2,1),bty='n', mar=c(4,4,2,2))

sites <- unique(data$Site)

for(k in 1:2){
  
  wk   <- grep(vars[k],colnames(bgibbs))
  bk   <- bgibbs[keep,wk]
  xlim <- range(bk)
  ymax <- 8/diff(xlim)
  plot(NULL, xlim=xlim,ylim=c(0,ymax),xlab=vars[k],ylab='Density')
  
  for(j in 1:ncol(bk)){
    dk <- density(bk[,j])
    lines(dk[[1]],dk[[2]],col=j, lwd=2)
    wmax <- which.max(dk[[2]])
    text( dk[[1]][wmax], 1.2*dk[[2]][wmax], sites[j] ,col=j)
  }
}

Posterior densities for coefficients.

Can you interpret these site differences?

Exercise 3 The following questions can be answered with this example:

Before running any more analysis, say what you think should be the variance on the process and the observations. Justify your answer.
How does the prior on \(\sigma^2, \tau^2\) affect estimates of \(\mathbf{\beta}\)? Explain how you manipulated prior weight to arrive at your answer.
Repeat the analysis with DON as the response variable and interpret results.
What is the answer to the motivating question: Can total dissolved nitrogen (TDN) and dissolved organic nitrogen (DON) be explained by temperature, precipitation, and chlorine concentration ([Cl]), and does it differ between coastal streams?

Recap

State-space models model serial dependence hierarchically. Hierarchical models do not require marginal independence, only conditional independence. Modeling latent states allows for dependence structure that could be quite different from what could be achieved with a standard models (e.g., autoregressive). The flexibility of hierarchical modeling makes it attractive for time series data.

Informative prior distributions are important for variances. Examples here show how they affect estimates of covariates and of latent states. The relative magnitudes of process and observation error determine the extent which data versus process dominate the fit.

Appendix

The random walk has a conditionally normal posterior distribution that can be written this way:

\[ \begin{aligned} x_t | x_{t-1}, x_{t+1}, y_t, \sigma^2 &\propto N(x_t|x_{t-1}, \sigma^2) \times N(x_{t+1} | x_t, \sigma^2) \times N(y_t | x_t ) \\ &\propto exp \left[ -\frac{(x_t - x_{t-1})^2}{2\sigma^2} - \frac{(x_{t+1} - x_{t})^2}{2\sigma^2} - \frac{(y_t - x_t)^2}{2\tau^2} \right] \end{aligned} \] Because this is a product of normals, it will be normal, with this form,

\[ \begin{aligned} x_t &\sim N(Vv, V) \\ V &= \left( \frac{1}{2 \sigma^2} + \frac{1}{\tau^2} \right)^{-1} \\ v &= \frac{x_{t-1} + x_{t+1}}{\sigma^2} + \frac{y_t}{\tau^2} \end{aligned} \] Recall, \(v\) is a sum of the previous and next values in the series \((t-1, t+1)\) and the observation \(y_t\). The variances act as weights. The fitted latent states will be pulled to the data if observation variance \(\tau^2\) is small relative to \(\sigma^2\). The latent states will be pulled to the mean of past and previous values if \(\tau^2\) is large relative to \(\sigma^2\).

Here is the function:

updateSSRW <- function( states, yy, zb=rep(0,length(y)), missing, tg, sg){       

  # state-space random walk 
  # update continuous states, random walk
  # missing times, obs y, obs error tg, process error sg
  # zb = z%*%beta
  
  nn <- length(states)
  
  for(t in 1:nn){
    
    VI <- 0
    v  <- 0
    
    if(!t %in% missing){                    # observation
      v  <- yy[t]/tg
      VI <- 1/tg
    }
    
    if(t < nn){                             # t+1 term excluded for last 
      v  <- v + (states[t+1] - zb[t])/sg 
      VI <- VI + 1/sg
    }
    
    if(t > 1){                              # t-1 term excluded for 1st 
      v  <- v + (states[t-1] + zb[t-1])/sg
      VI <- VI + 1/sg
    }
    
    V     <- 1/VI
    states[t] <- rnorm( 1, V*v, sqrt(V) )
  }
  states
}

The conditional distribution for the process variance looks like this:

\[ \begin{aligned} \left[ \sigma^2 | \mathbf{x}, \mathbf{y} \right] &\propto \prod_{t=2}^T N(x_t | x_{t-1}, \sigma^2) IG(\sigma^2 | s_1, s_2 ) \\ &\propto \left( \sigma^2 \right)^{-(T-1)/2}exp \left[ -\frac{1}{2\sigma^2} \sum_{t=2}^T (x_t - x_{t-1})^2 \right] \times \left( \sigma^2 \right)^{-(s_1 + 1)} exp(-s_2/\sigma^2) \\ &= \left( \sigma^2 \right)^{-(T-1)/2 - s_1 - 1} exp \left[ -\frac{1}{2\sigma^2} \sum_{t=2}^T (x_t - x_{t-1})^2 - \frac{s_2}{\sigma^2} \right] \\ &= IG \left( \sigma^2 \bigg| (T-1)/2 + s_1, \frac{1}{2}\sum_{t=2}^T (x_t - x_{t-1})^2 + s_2 \right) \end{aligned} \]

The conditional distribution for the observation variance looks like this: \[ \begin{aligned} \left[ \tau^2 | \mathbf{x}, \mathbf{y} \right] &\propto \prod_{t=1}^T N(y_t | x_t, \tau^2) IG(\tau^2 | u_1, u_2 ) \\ &\propto \left( \tau^2 \right)^{-T/2}exp \left[ -\frac{1}{2\tau^2} \sum_{t=2}^T (y_t - x_t)^2 \right] \times \left( \tau^2 \right)^{-(u_1 + 1)} exp(-u_2/\tau^2) \\ &= IG \left( \tau^2 \bigg| T/2 + u_1, \frac{1}{2}\sum_{t=1}^T (y_t - x_t)^2 + u_2 \right) \end{aligned} \]

12. Multiple time series
Bayesian state space models
Duke University

env/bio 665 Bayesian inference for environmental models

Jim Clark

2024-03-19

Resources

Data file

Software

Valuable reference

Objectives

Protist population growth

Logistic population growth

Count data

Data modeling

Gibbs sampler

Evaluation

Stream chemistry application

Multiple time series

Missing covariates, missing states

Gibbs sampling

Recap

Appendix

12. Multiple time series Bayesian state space models Duke University

env/bio 665 Bayesian inference for environmental models

Jim Clark

2024-03-19

Resources

Data file

Software

Valuable reference

Objectives

Protist population growth

Logistic population growth

Count data

Data modeling

Gibbs sampler

Evaluation

Stream chemistry application

Multiple time series

Missing covariates, missing states

Gibbs sampling

Recap

Appendix

12. Multiple time series
Bayesian state space models
Duke University