David’s model involves change in multiple constituents dissolved or suspended in stream water, evaluated over time. This is a multivariate state-space problem.
combinedLevelChem2007.csv
Colp_K_bayes.csv
The second file in C++ includes functions for the truncated multivariate normal:
source('clarkFunctions2024.r')
Rcpp::sourceCpp("cppFns.cpp")
library( rWishart )
Although seemingly daunting, it is possible to incorporate dependence structure in multivariate time series, where there could be autoregressive terms. Consider dissolved constituents in a stream that have some residual (autoregressive) lag together with effects from outside the stream, e.g., runoff affected by weather. A model could look something like this:
\[ \begin{align} z_{t+1} &= \alpha z_t + \beta' x_t + \epsilon_t \\ y_t &= z_t + \nu_t \\ \epsilon_t &= MVN_S(0, \Sigma ) \\ \nu_t &= MVN_S(0, \mathbf{T}) \end{align} \] In this model \(z_t\) is a length-\(S\) vector of dissolved constituents, \(x_t\) is a length-\(Q\) vector of predictors. The \(S \times S\) matrix \(\alpha\) is the autoregressive part or lag-1 effect.
The \(Q \times S\) matrix \(\beta\) is the effect of environmental predictors with coefficients \(\beta\). The process and observation errors are \(\epsilon_t\) and \(\nu_t\), respectively.
The strategy is to simulate data and see if we can recover parameters.
Here is a simulated time series
S <- 4 # no. responses
Q <- 3 # no. predictors
n <- 200 # no. time increments
xnames <- paste('x', 1:Q, sep = '-')
ynames <- paste('y', 1:S, sep = '-')
x <- round( matrix( rnorm( n*Q ), n, Q, dimnames = list( NULL, xnames) ), 4 )
x[,1] <- 1
y <- matrix( 10, n, S, dimnames = list(NULL, ynames) )
z <- y
Sigma <- cov( .rMVN( S+1, rep(0, S), diag(.05, S) ) )
tau <- rep( .005, S )
names( tau ) <- ynames
beta <- round( matrix( rnorm( Q*S ), Q, S, dimnames = list( xnames, ynames ) ), 3 )
alpha <- diag( .1, S )
for( j in 2:n ){
z[j,] <- z[drop=F, j-1, ]%*%alpha + x[drop=F, j-1, ]%*%beta + .rMVN( 1, 0, Sigma )
}
y <- z + .rMVN( n, 0, diag(tau) ) # observation error
Here is a Gibbs sampler:
ag <- alpha*0
bg <- beta*0
sg <- tg <- diag(.05, S)
zg <- y
rownames(bg) <- colnames(x) <- xnames
colnames(bg) <- rownames(sg) <- colnames(sg) <- ynames
ng <- 4000 # setup Gibbs sampler
bgibbs <- matrix( 0, ng, S*Q)
sgibbs <- matrix( 0, ng, S*S)
predy <- matrix( 0, ng, S*n) # predict the data
colnames(bgibbs) <- as.vector( outer(xnames,ynames,paste,sep='_') )
colnames(sgibbs) <- as.vector( outer(ynames,ynames,paste,sep='_') )
agibbs <- sgibbs
IXX <- solve(crossprod(x[-n,])) # only do this once
df <- n + S - 1 # Wishart
for(g in 1:ng){
bg <- updateBetaMVN( x[-n,], zg[-1,] - zg[-n, ]%*%ag, sg)
ag <- updateBetaMVN( zg[-n,], zg[-1,] - x[-n, ]%*%bg, sg)
sg <- solve( rWishart(1, df, diag(10, S) +
crossprod( z[-n, ]%*%ag + x[-n,]%*%bg - zg[-1,]) )[,,1] )
tg <- solve( rWishart(1, df, diag(10, S) +
crossprod( zg - y ) )[,,1] )
zg <- updateZ( x, y, zg, ag, bg, sg, tg )
sgibbs[g,] <- sg
bgibbs[g,] <- bg
agibbs[g,] <- ag
}
Here are chains for \(\beta\):
chainPlot( bgibbs, trueValues = beta )
Here are chains for \(\alpha\):
chainPlot( agibbs, trueValues = alpha )
Here are chains for \(\Sigma\):
chainPlot( sgibbs, trueValues = Sigma )
It can be difficult to recover the error covariances in time series data. For this reason, it is a good idea to have effective prior distributions.