Writing the model string in JAGS, we started to be familiar of the standardizing data and eventually scaling back to former scale (we can also do that in STAN lately). This topic, we talk about:
Let look at 2 code chucks:
between ----------
signs from the Simple Linear Model fitting with Robust assumption
modelString = "
# Standardize the data---------------------------------------------------------
data {
Ntotal <- length(y)
xm <- mean(x)
ym <- mean(y)
xsd <- sd(x)
ysd <- sd(y)
for ( i in 1:length(y) ) {
zx[i] <- (x[i] - xm) / xsd
zy[i] <- (y[i] - ym) / ysd
}
}#-----------------------------------------------------------------------------
# Specify the model for standardized data:
model {
for ( i in 1:Ntotal ) {
zy[i] ~ dt( zbeta0 + zbeta1 * zx[i] , 1/zsigma^2 , nu )
}
# Priors vague on standardized scale:
zbeta0 ~ dnorm(0, 1/(10)^2 )
zbeta1 ~ dnorm(0, 1/(10)^2 )
zsigma ~ dunif(1.0E-3, 1.0E+3 )
nu ~ dexp(1/30.0)
# Transform to original scale------------------------------------------------
beta1 <- zbeta1 * ysd / xsd
beta0 <- zbeta0 * ysd + ym - zbeta1 * xm * ysd / xsd
sigma <- zsigma * ysd
#----------------------------------------------------------------------------
}
"
So why we need to standardize all response and predictors then later on scaling back
The intention of using z-scores in JAGS is to overcome a problem of correlation of the parameters (as the simulation the correlation between β0 and β1 in another Bayesian workshop).
Strong correlation creates thin and long shape on scatter-plot of the variables which makes Gibbs sampling very slow and inefficient.
But remember to scale back to the original measures. This can be applied to STAN in all situation !!! HMC implemented in Stan does not have this problem.
For the regression model using the standardized variables, we assume the following form for the regression line (in the present case, we assumed the response and predictors were tranformed)
E[Yscaled]=β0+k∑j=1βjzj
where zj is the j-th (standardized) regressor followed:
zj=xj−ˉxjSx, and ˆYscaled=ˆYunscaled−ˉYSY
Sj: sample standard deviation
Carrying out the regression with the standardized regressors, we obtain the fitted regression line:
ˆYscaled=^β0+k∑j=1ˆβjzj
We now wish to find the regression coefficients for the raw (non-standardized) predictors, from:
ˆYscaled=^β0+k∑j=1ˆβj(xj−ˉxjSx)ˆYunscaled−ˉYSY=^β0+k∑j=11Sx(xj−ˉxj)ˆβjˆYunscaled=^β0SY+ˉY+k∑j=1(SYSx)(xj−ˉxj)ˆβjˆYunscaled=^β0SY+ˉY−k∑j=1(SYSx)ˉxjˆβj+k∑j=1(SYSx)xjˆβj
As we can see,
This was ended of the proof.