David’s model involves change in multiple constituents dissolved or suspended in stream water, evaluated over time. This is a multivariate state-space problem.


Resources

Data file

combinedLevelChem2007.csv

Colp_K_bayes.csv

Software

The second file in C++ includes functions for the truncated multivariate normal:

source('clarkFunctions2024.r')
Rcpp::sourceCpp("cppFns.cpp")
library( rWishart )

Objectives

Dependence in multiple variables

Although seemingly daunting, it is possible to incorporate dependence structure in multivariate time series, where there could be autoregressive terms. Consider dissolved constituents in a stream that have some residual (autoregressive) lag together with effects from outside the stream, e.g., runoff affected by weather. A model could look something like this:

\[ \begin{align} z_{t+1} &= \alpha z_t + \beta' x_t + \epsilon_t \\ y_t &= z_t + \nu_t \\ \epsilon_t &= MVN_S(0, \Sigma ) \\ \nu_t &= MVN_S(0, \mathbf{T}) \end{align} \] In this model \(z_t\) is a length-\(S\) vector of dissolved constituents, \(x_t\) is a length-\(Q\) vector of predictors. The \(S \times S\) matrix \(\alpha\) is the autoregressive part or lag-1 effect.

The \(Q \times S\) matrix \(\beta\) is the effect of environmental predictors with coefficients \(\beta\). The process and observation errors are \(\epsilon_t\) and \(\nu_t\), respectively.

The strategy is to simulate data and see if we can recover parameters.

Simulated time series data

Here is a simulated time series

S <- 4                                # no. responses
Q <- 3                                # no. predictors
n <- 200                              # no. time increments

xnames <- paste('x', 1:Q, sep = '-')
ynames <- paste('y', 1:S, sep = '-')

x <- round( matrix( rnorm( n*Q ), n, Q, dimnames = list( NULL, xnames) ), 4 )
x[,1] <- 1
y     <- matrix( 10, n, S, dimnames = list(NULL, ynames) )
z     <- y
Sigma <- cov( .rMVN( S+1, rep(0, S), diag(.05, S) ) )
tau   <- rep( .005, S )
names( tau ) <- ynames

beta  <- round( matrix( rnorm( Q*S ), Q, S, dimnames = list( xnames, ynames ) ), 3 )
alpha <- diag( .1, S )

for( j in 2:n ){
  z[j,] <- z[drop=F, j-1, ]%*%alpha + x[drop=F, j-1, ]%*%beta + .rMVN( 1, 0, Sigma )
}
y <- z + .rMVN( n, 0, diag(tau) )   # observation error

Here is a Gibbs sampler:

ag <- alpha*0
bg <- beta*0
sg <- tg <- diag(.05, S)
zg <- y 

rownames(bg) <- colnames(x) <- xnames
colnames(bg) <- rownames(sg) <- colnames(sg) <- ynames
  
ng <- 4000                        # setup Gibbs sampler
bgibbs <- matrix( 0, ng, S*Q)
sgibbs <- matrix( 0, ng, S*S)
predy  <- matrix( 0, ng, S*n)        # predict the data

colnames(bgibbs) <- as.vector( outer(xnames,ynames,paste,sep='_') )
colnames(sgibbs) <- as.vector( outer(ynames,ynames,paste,sep='_') )
agibbs <- sgibbs

IXX <- solve(crossprod(x[-n,]))       # only do this once
df  <- n + S - 1             # Wishart 

for(g in 1:ng){
  
  bg <- updateBetaMVN( x[-n,], zg[-1,] - zg[-n, ]%*%ag, sg)
  ag <- updateBetaMVN( zg[-n,], zg[-1,] - x[-n, ]%*%bg, sg)
  sg <- solve( rWishart(1, df, diag(10, S) + 
                          crossprod( z[-n, ]%*%ag + x[-n,]%*%bg - zg[-1,]) )[,,1] )
  tg <- solve( rWishart(1, df, diag(10, S) + 
                          crossprod( zg - y ) )[,,1] )
  zg <- updateZ( x, y, zg, ag, bg, sg, tg )
 
  sgibbs[g,] <- sg
  bgibbs[g,] <- bg
  agibbs[g,] <- ag
}

Here are chains for \(\beta\):

chainPlot( bgibbs, trueValues = beta ) 

Here are chains for \(\alpha\):

chainPlot( agibbs, trueValues = alpha ) 

Here are chains for \(\Sigma\):

chainPlot( sgibbs, trueValues = Sigma ) 

It can be difficult to recover the error covariances in time series data. For this reason, it is a good idea to have effective prior distributions.


Appendix