Framingham heart study

We use the data set reported in [6], which consists of measurements of the cholesterol level for 200 randomly selected individuals from the Framingham heart study. The measurements were taken at the beginning of the study and then every 2 years for 10 years. We are interested in the relationship between the cholesterol level and the age (at baseline) and gender of the patients. [6] model this relationship through the LMM: \[\begin{eqnarray}\label{FraminghamModel} y_{ij} = \beta_1 + \beta_2 \text{age}_i + \beta_3 \text{sex}_i + \beta_4 t_{ij} + u_{1i} + u_{2i} t_{ij} + \varepsilon_{ij}, \,\,\,\,\, (1) \end{eqnarray}\]

where \(y_{ij}\) represents the cholesterol level divided by 100 at the \(j\)th time for subject \(i\) and \(t_{ij}\) is \((\text{time}-5)/10\), with time measured in years from baseline. [6] assume normal residual errors \(\varepsilon_{ij}\) and the density of the random effects \({\bf u}_i=(u_{1i}, u_{2i})^\top\) is represented by a semi-nonparametric truncated series expansion.

Model (1) is simply the LMM in (1) from [1] with \(p=4\) and \(q=2\). Here we adopt the following hierarchical structure: \[\begin{eqnarray*} \varepsilon_{ij}\mid \sigma_{\varepsilon},\delta_{\varepsilon} &\stackrel{i.i.d.}{\sim}& \text{t}({0},\sigma_{\varepsilon} ,\delta_{\varepsilon}), \notag\\ %y_j \vert {\bf u}, {\bf \beta}, \sigma_{\varepsilon},\delta_{\varepsilon} &\stackrel{ind.}{\sim}& \text{St}_{1} ({\bf x}_j^{\top}\bm{\beta} + {\bf z}_j^{\top}{\bf u},\sigma_{\varepsilon} , \delta_{\varepsilon}),\notag\\ {\bf u}_i\mid {\bf \theta_1},{\bf \theta_2},\rho &\sim& \text{GC}(F_1,F_2\mid \rho), \end{eqnarray*}\] where GC denotes a bivariate Gaussian copula with marginals \(F_1\) and \(F_2\), and correlation parameter \(\rho\). For these marginal distributions we use a two-piece sinh-arcsinh distribution with the epsilon–skew parameterisation which contains a scale parameter \(\sigma_i>0\), a skewness parameter \(\gamma_i\in (-1,1)\), and a kurtosis parameter \(\delta_i>0\), \(i=1,2\). We denote this model as Model 1. We adopt the prior structure: \[\begin{eqnarray}\label{PriorFramingham} \pi({\bf \beta},\sigma_{\varepsilon},\delta_{\varepsilon},\sigma_1,\sigma_2,\gamma_1,\gamma_2,\delta_1,\delta_2,\rho) &\sim& \dfrac{\pi(\delta_{\varepsilon})\pi(\rho)}{\sigma_{\varepsilon}} \prod_{i=1}^2 \pi(\sigma_i)\pi(\gamma_i)\pi(\delta_i). \end{eqnarray}\] As a weakly informative prior for the degrees of freedom of the Student-\(t\) distribution, \(\delta_{\varepsilon}\) we employ the prior proposed in [2]. For each of the scale parameters \((\sigma_1,\sigma_2\)) we adopt a half-Cauchy prior. The shape parameters \((\delta_1,\delta_2)\) are assigned the prior proposed in for a general class of kurtosis parameters. For each of the skewness parameters \((\gamma_1,\gamma_2)\) we adopt a uniform distribution on \((-1,1)\). In a bivariate Gaussian copula the Spearman measure of association, \(r_{\rho}\in(-1,1)\), can be calculated in closed form as \[\begin{eqnarray*} r_{\rho} = \dfrac{6}{\pi}\arcsin\left(\dfrac{\rho}{2}\right), \end{eqnarray*}\] for which we assume a uniform prior \(r_{\rho}\sim U(-1,1)\), inducing the following prior density on \(\rho\) \[\begin{eqnarray*} \pi(\rho) \propto \dfrac{1}{\sqrt{1-\left({\rho}/{2}\right)^2}}. \end{eqnarray*}\]

The propriety of the corresponding posterior distribution is guaranteed by Theorem 1 from [1]. See [1] for a description of other sub-models of interest, which can be implemented with some little tweaks of the R code presented below. The following R code illustrates the implementation of Model 1. A posterior sample of size \(5,000\) was obtained after a burn-in period of \(15,000\) iterations and thinning every \(25\) iterations (\(140,000\) simulations in total).

The format of the tables is better in Google Chrome and Mozilla Firefox in our experience

References

  1. Flexible linear mixed models with improper priors for longitudinal and survival data

  2. Bayesian modelling of skewness and kurtosis with Two-Piece Scale and shape distributions

  3. A weakly informative prior for the degrees of freedom of the t distribution

  4. twopiece R package

  5. TPSAS R Package

  6. Linear mixed models with flexible distributions of random effects for longitudinal data

  7. The data can be found here

rm(list=ls())

# Required packagges
library(twopiece)
## Loading required package: flexclust
## Loading required package: grid
## Loading required package: lattice
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: mvtnorm
## Loading required package: label.switching
library(TPSAS)
library(mvtnorm)
library(compiler)
library(spBayes)
## Loading required package: coda
## Loading required package: magic
## Loading required package: abind
## Loading required package: Formula
library(knitr)
enableJIT(3)
## [1] 3
# Preparing the data
data <- read.table("cholst.txt",header=F)
colnames(data) <- c("newid","oriid","cholst","sex","age","year")

# Some plots
plot(data$year,data$cholst,xlab="Year",ylab="Cholesterol Level",cex.axis=1.5, cex.lab=1.5)

plot(data$year,data$cholst,xlab="Year",ylab="Cholesterol Level",type="l",cex.axis=1.5, cex.lab=1.5)

hist(data$cholst,xlab="Cholesterol Level",cex.axis=1.5, cex.lab=1.5)
box()

hist(data$age,xlab="Age",cex.axis=1.5, cex.lab=1.5)
box()

# variables
Y <- data$cholst/100 # response
time <- (data$year-5)/10 # time
sex <- data$sex
age <- data$age

n <- length(Y) # sample size
q <- max(data$newid) # Number of clusters
# cluster indicators
ind <- data$newid
ind1 <- unique(ind)

#-----------------------------------------------------------------------------------
#Prior on delta Student-T
#-----------------------------------------------------------------------------------

# Parameters obtained by fitting the sample from \pi(\nu)
MLE <- c(0.84886337, 1.44326981, 0.05384718, 0.89544807)

# Prior on nu
logkdeT <- Vectorize(function(x) dtpsas(log(x),MLE[1],MLE[2],MLE[3],MLE[4],param="eps")/x)
curve(logkdeT,0,15,n=10000,lwd=2,xlab=~nu,ylab="Density",cex.axis=1.5,cex.lab=1.5)

#-----------------------------------------------------------------------------------
#Prior on delta TPSAS
#-----------------------------------------------------------------------------------
MLE2 <- c(-0.1553498,  0.1620447, -0.2605134,  0.5742682)

# Prior on delta
logkde <- Vectorize(function(t)  dtpsas(log(t),MLE2[1],MLE2[2],MLE2[3],MLE2[4],param="eps")/t)
curve(logkde,0,10,n=10000,lwd=2,xlab=~delta,ylab="Density",cex.axis=1.5,cex.lab=1.5)

# Log Copula
lpriord <- cmpfun(function(par){
   p1 <- par[1]
   p2 <- par[2]
   r <- par[3]
   R.I <- rbind(c (1/(1 - r^2), -r/(1 - r^2)),c (-r/(1 - r^2), 1/(1 - r^2)))
   V <- c(qnorm(p1),qnorm(p2))
   out <- - 0.5*log(1.-r^2.) -0.5*V%*%(R.I-diag(2))%*%V
   return(out)
})


# log posterior
lpc <- cmpfun(function(par){
if(par[5]>0 & par[6]>0.1 & par[7]>0 & par[8]>-1 & par[8]<1 & par[9]>0 & par[10]>-1 & par[10]<1 & par[11]>0 & par[12]>0 & par[13]>-1 & par[13]<1){
var = dt((Y-(par[1] + par[2]*age + par[3]*sex + par[4]*time + par[13+ind] + par[213+ind]*time))/par[5],df=par[6],log=T)-log(par[5])
p1 = ptpsas(par[13+ind1],0,par[7],par[8],par[11],param="eps")
p2 = ptpsas(par[213+ind1],0,par[9],par[10],par[12],param="eps")
d1 = dtpsas(par[13+ind1],0,par[7],par[8],par[11],param="eps",log=T)
d2 = dtpsas(par[213+ind1],0,par[9],par[10],par[12],param="eps",log=T)
varc = apply(cbind(p1,p2,par[13]),1,lpriord)
ifelse(is.na(sum(var) + sum(varc)),
return(-Inf),
return( sum(var) + sum(varc) + sum(d1) + sum(d2) - log(par[5]) + log(logkdeT(par[6])) +  dt(par[7],df=1,log=T) + dt(par[9],df=1,log=T) + log(logkde(par[11])) + log(logkde(par[12])) - 0.5*log(1-par[13]^2/4)))
}
else return(-Inf)
})

# Initial points
inits <- c(1.8,0.01,0,0,0.5,5,0.4,0,0.2,0,1,1,0.4,rep(0,400))
lpc(inits)
## [1] -560.1483
########################################################################################################
# # Sampling from the posterior: approximately 3 hours for each 25x100 iterations
# # Uncomment this section is you want to run the sampler
# n.batch <- 5600
# batch.length <- 25
# 
# # Here only fixed effects and shape parameters are saved
# # To save the posterior samples of the random effects, just increase the index in p.theta.samples[,1:13]
# set.seed(1234)
# chain <- adaptMetropGibbs(ltd=lpc, starting=inits, accept.rate=0.44, batch=n.batch, 
#                           batch.length=batch.length, report=100, verbose=FALSE)$p.theta.samples[,1:13]
# 
# # Burning and thinning the chain
# burn <- 15000
# thin <- 25
# NS <- n.batch*batch.length
# index <- seq(burn,NS,thin)
# 
# beta1p <- chain[index,1]
# beta2p <- chain[index,2]
# beta3p <- chain[index,3]
# beta4p <- chain[index,4]
# sigmap <- chain[index,5]
# deltap <- chain[index,6]
# sigma1p <- chain[index,7]
# gamma1p <- chain[index,8]
# sigma2p <- chain[index,9]
# gamma2p <- chain[index,10]
# delta1p <- chain[index,11]
# delta2p <- chain[index,12]
# rhop <- chain[index,13]
########################################################################################################

# Already saved outcome
load('postsamp.RData')

# Summaries of the posterior samples
hist(beta1p)

hist(beta2p)

hist(beta3p)

hist(beta4p)

hist(sigmap)

hist(deltap)

hist(sigma1p)

hist(gamma1p)

hist(sigma2p)

hist(gamma2p)

hist(delta1p)

hist(delta2p)

hist(rhop)

post.samp <- as.matrix(cbind(beta1p,beta2p,beta3p,beta4p,sigmap,deltap,sigma1p,gamma1p,sigma2p,gamma2p,delta1p,delta2p,rhop))

SUM <- apply(post.samp,2,summary)
colnames(SUM) <- c("beta1", "beta2", "beta3", "beta4", "sigma_e", "delta_e", "sigma_1", "gamma_1", "sigma_2", "gamma_2", "delta_1", "delta_2", "rho")
kable(SUM,digits = 3)
beta1 beta2 beta3 beta4 sigma_e delta_e sigma_1 gamma_1 sigma_2 gamma_2 delta_1 delta_2 rho
Min. 1.093 0.003 -0.211 -0.108 0.130 2.687 0.118 -0.641 0.038 -1.000 0.529 0.456 -0.023
1st Qu. 1.402 0.014 -0.092 0.077 0.157 4.445 0.239 -0.397 0.142 -0.754 0.757 0.859 0.342
Median 1.494 0.016 -0.057 0.130 0.163 5.094 0.275 -0.331 0.202 -0.546 0.820 1.048 0.425
Mean 1.496 0.016 -0.057 0.146 0.164 5.275 0.281 -0.327 0.264 -0.498 0.830 1.280 0.420
3rd Qu. 1.584 0.018 -0.022 0.200 0.170 5.916 0.317 -0.259 0.307 -0.296 0.893 1.406 0.503
Max. 2.006 0.028 0.122 0.546 0.202 16.794 0.677 0.080 1.505 0.686 1.427 5.392 0.810