Writing the model string in JAGS, we started to be familiar of the standardizing data and eventually scaling back to former scale (we can also do that in STAN lately). This topic, we talk about:
Let look at 2 code chucks:
between ---------- signs from the Simple Linear Model fitting with Robust assumption
modelString = "
# Standardize the data---------------------------------------------------------
data {
Ntotal <- length(y)
xm <- mean(x)
ym <- mean(y)
xsd <- sd(x)
ysd <- sd(y)
for ( i in 1:length(y) ) {
zx[i] <- (x[i] - xm) / xsd
zy[i] <- (y[i] - ym) / ysd
}
}#-----------------------------------------------------------------------------
# Specify the model for standardized data:
model {
for ( i in 1:Ntotal ) {
zy[i] ~ dt( zbeta0 + zbeta1 * zx[i] , 1/zsigma^2 , nu )
}
# Priors vague on standardized scale:
zbeta0 ~ dnorm(0, 1/(10)^2 )
zbeta1 ~ dnorm(0, 1/(10)^2 )
zsigma ~ dunif(1.0E-3, 1.0E+3 )
nu ~ dexp(1/30.0)
# Transform to original scale------------------------------------------------
beta1 <- zbeta1 * ysd / xsd
beta0 <- zbeta0 * ysd + ym - zbeta1 * xm * ysd / xsd
sigma <- zsigma * ysd
#----------------------------------------------------------------------------
}
"
So why we need to standardize all response and predictors then later on scaling back
The intention of using z-scores in JAGS is to overcome a problem of correlation of the parameters (as the simulation the correlation between \(\beta_0\) and \(\beta_1\) in another Bayesian workshop).
Strong correlation creates thin and long shape on scatter-plot of the variables which makes Gibbs sampling very slow and inefficient.
But remember to scale back to the original measures. This can be applied to STAN in all situation !!! HMC implemented in Stan does not have this problem.
For the regression model using the standardized variables, we assume the following form for the regression line (in the present case, we assumed the response and predictors were tranformed)
\[ E[Y_{scaled}] = \beta_0 + \sum_{j=1}^k \beta_j z_j \]
where \(z_j\) is the j-th (standardized) regressor followed:
\[ z_j = \frac{x_j - \bar{x}_j}{S_x}, \] and \[ \hat{Y}_{scaled} = \frac{\hat{Y}_{unscaled} - \bar{Y}}{S_Y}\]
\(S_j\): sample standard deviation
Carrying out the regression with the standardized regressors, we obtain the fitted regression line:
\[ \hat{Y}_{scaled} = \hat{\beta_0} + \sum_{j=1}^k \hat{\beta}_j z_j \]
We now wish to find the regression coefficients for the raw (non-standardized) predictors, from:
\[\begin{align*} \hat{Y}_{scaled} &= \hat{\beta_0} + \sum_{j=1}^k \hat{\beta}_j \big{(} \frac{x_j - \bar{x}_j}{S_x}\big{)} \\ \frac{\hat{Y}_{unscaled} - \bar{Y}}{S_Y} &= \hat{\beta_0} + \sum_{j=1}^k \frac{1}{S_x} (x_j - \bar{x}_j) \hat{\beta}_j \\ \hat{Y}_{unscaled} &= \hat{\beta_0} S_Y + \bar{Y} + \sum_{j=1}^k \big{(} \frac{S_Y}{S_x} \big{)} (x_j - \bar{x}_j) \hat{\beta}_j \\ \hat{Y}_{unscaled} &= \hat{\beta_0} S_Y + \bar{Y} - \sum_{j=1}^k \big{(} \frac{S_Y}{S_x} \big{)} \bar{x}_j \hat{\beta}_j + \sum_{j=1}^k \big{(} \frac{S_Y}{S_x} \big{)} x_j \hat{\beta}_j \end{align*}\]
As we can see,
This was ended of the proof.