Objective

The objective of the report is to analyse the Intensive Care Unit (ICU) dataset using Bayesian analysis methods. The report starts with a comprehensive overview of the ICU dataset including visualisations of each variable and univariate analysis, following with a Bayesian multivariate logistic regression model by using the given predictors in the dataset and the variable “outcome” as the target. The report specifies the model parameters and implements the model in JAGS. Lastly, the report implements diagnostic checking for each parameter.

Dataset Description

The following is the description of features in the ICU dataset.

ID: the ID of each patient. This variable will not be used and will be deleted
AGE: age of each patient
GENDER: self-explanatory
ETHNIC: ethnicity of the patient. There are four ethnic groups in total, including “Chinese”, “Indian”, “Malay” and “Others”
CHD: Chronic health groups: AIDS, Metastatic cancer, Cirrhosis, Hepatic failure, Immunosuppression, Leukemia or myeloma, Lymphoma, Diabetes, None
DIAB: presence of Diabetes, 0-dont have Diabetes, 1-have Diabetes
OPSTAT: ICU admission source, 0-floor, 1-other special care unit, 2-operating room
VENT: mechanically ventilated, 0-no, 1-yes
NOGCS: unable to obtain Glasgow Coma Score (GCS), 1-unable to obtain 0-obtain
ADMDIS: admission diagnosis, 9 disease categories in total, 0-Trauma, 1-Cardiovascular, 2-Respiratory, 3-Neurologic, 4-Gastrointestinal, 5-Genitourinary, 6-Metabolic, 7-skin, 8-Hematologic
APS: Acute Physiology Score
OUTCOME: mortality outcome, 0-survive, 1-not survive, this is used as our target variable

Univariate Analysis

The below plot (fig 1) demonstrates the distribution of the 6 variables Gender, Operation Statistics, Outcome, GCS Availability, Ethinicity, Diabetes and Ventilation details.

The following graph indicates that the target variable “Outcome” has very similar distribution to the other two variables “NOGOCS”, which describes whether patients have obtained Glasgow Coma Score, and “vent”, which contains information on whether patients are mechanically ventilated.

Fig 1 - Univariate Analysis Plot - for Categorical Variables

From Fig 2 - we see the details for Age and APS. As seen in Fig 2 Age is across a large set of patients and is not just a older section of the population thus not skewing our results.

Fig 2 - Univariate Analysis Plot - for Numeric Variables

Given below is the distribution of the variable “Chronic Health Group” and “Admission Diagnosis”.

Fig 3 - Univariate Analysis Plot - for Caetgorical Variables

We see from Fig 3 that most of the CHD is not available.

Multivariate Analysis

AS a next step we compare the outcome against each variable in our dataset.

From the graph below (figure 4) , it is clear that Acute Physiology score has close connection to the outcome of the patient. The higher the score is, the more likely that the patients would not survive. The variation of age, on the other hand, does not show a strong indication to the outcome of the patients.

Fig 4 - Outcome vs Age and Outcome vs APS

From figure 5 below we see that Gender and Ethnicity does not seem to make a significant impact.

Fig 5 - Multivariate Analysis Plot - Outcome vs Gender and Outcome vs Ethnicity

The following graph describes the relationship between patients outcome and their admission diagnosis.

Fig 6 - Outcome vs Admission Diagnosis

The following graph describes the relationship between outcome and chronic health groups.

Fig 7 - Outcome vs CHD

Model

The dependent variable, OUTCOME is a dichotomous. Hence logistic regression model is used to define relation between the dependent variable and independent variables.

Correlation

An initial check to analyse the presence of multi-collinearity is done. To create a logistic regression model, it is necessary that there is no multi-collinearity present in the data elements.

“Correlation Matrix”

From the above analysis, the following inferences can be made:

Most of elements in the data are independent, and do not show any strong correlation.
There is correlation between columns CHD_CIRRHOSIS and DIAB, but since the source data has some discrepancies in this column definition we are not sure of the medical significance and hence we have included here.
We ran a model by removing this column and the results were similar when the MCMC samples were executed. Also, to generate the full sampling without this column would take a further 96+ hours for completion.

Hence an assumption can be made that the data elements are independent.

Guess Co-Efficient

Sometimes the predictability of a model is increased by adding a randomness or “guess co-efficient” whereby the y value comes from the flip of a fair coin: i.e. y ~ Bernoulli($\mu$=1/2). When we add this guess co-efficient, the equation for logistic regression becomes

$\mu= \alpha\cdot 1/2+(1-\alpha)\cdot \textrm{logistic}\bigg(\beta_{0}+\sum_{i=1}^{k}\beta_{i}\bigg)$

When the guessing co-efficient is 0 the equation simplifies to that of a logistic regression.
Notice that when the guessing coefficient is 1, then the value of $\mu$ becomes totally random.

Our goal is to estimate the guessing coefficient along with the logistic parameters.

For this, we need to establish a prior on the guessing coefficient, $\alpha$. In most applications we would expect the proportion of random outliers in the data to be small, and therefore the prior for $\alpha$ should emphasize small values. In our case we have set the prior as

$\alpha \sim \textrm{dbeta}(1, 9)$

Fig 8 - Guess Coefficient Defn

This prior distribution gives values of $\alpha$ greater than 0.5 very low but non-zero probability.

Prior Distributions

Model Diagram

“Bayesian Model Diagram”

JAGS model string

We translate the above diagram into a model string that can be used by JAGS.

We tried a few samples runs with various prior distributions including:

uniform distribution with very large variances - they failed to converge even with 50,000 iterations with 100 thinning steps.
normal distribution with large variances but they failed to converge
uniform with small variances but they also failed to converge

data {

  for ( j in 1:2 ) {
    xm[j]  <- mean(x[,j])
    xsd[j] <-   sd(x[,j])
    
    for ( i in 1:Ntotal ) {
      zx[i,j] <- ( x[i,j] - xm[j] ) / xsd[j]
    }
  }

}

model {

for ( i in 1:Ntotal ) {
  y[i] ~ dbern( mu[i] )
  mu[i] <- ( guess*(1/2) + (1.0-guess)*ilogit(zbeta0+sum(zbeta[1:2]*zx[i,1:2]) + sum(beta[3:Nx]*x[i,3:Nx])) )
}

zbeta0 ~ dnorm( 0, 1/10 )

for ( j in 1:2 ) {
   zbeta[j] ~ dnorm( 0, 1/1000 )
}

beta[1:2] <- zbeta[1:2] / xsd[1:2]
for ( j in 3:Nx ) {
   beta[j] ~ dnorm( 0, 1/10 )
}

beta0 <- zbeta0 - sum(zbeta[1:2]*xm[1:2]/xsd[1:2]) - sum(beta[3:Nx])
guess ~ dbeta(1,9)

}

Markov Chain Parameters

For the Bayesian model, we have decided to use:

4 chains,
100 thinning steps
50000 number of iterations
10000 burn in steps and
1000 adapt steps

Model Diagnostics

After the chains have been generated, we take a look at the diagnostics plot to fine tune the parameters.

$Figure 9: Diagnostic plots for $beta_{0}$$

Figure 9: Diagnostic plots for $beta_{0}$

Interpretation

From the $\beta_{0}$ diagnostics (fig 9), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{0}$

$Figure 10: Diagnostic plots for $beta_{1}$$

Figure 10: Diagnostic plots for $beta_{1}$

Interpretation

From the $\beta_{1}$ diagnostics (fig 10), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{1}$

$Figure 11 : Diagnostic plots for $beta_{2}$$

Figure 11 : Diagnostic plots for $beta_{2}$

Interpretation

From the $\beta_{2}$ diagnostics (fig 11), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{2}$

$Figure 12: Diagnostic plots for $beta_{3}$$

Figure 12: Diagnostic plots for $beta_{3}$

Interpretation

From the $\beta_{3}$ diagnostics (fig 12), we see that

The chains are well mixed
The ESS values are high but lower when compared to other coefficients.
Some autocorrelation is observed for $\beta_{3}$ which handled by use of thinning. We see auto correlation reducing over the course of model run.
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{3}$

$Figure 13: Diagnostic plots for $beta_{4}$$

Figure 13: Diagnostic plots for $beta_{4}$

Interpretation

From the $\beta_{4}$ diagnostics (fig 13), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{4}$

$Figure 14: Diagnostic plots for $beta_{5}$$

Figure 14: Diagnostic plots for $beta_{5}$

Interpretation

From the $\beta_{5}$ diagnostics (fig 14), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{5}$

$Figure 15: Diagnostic plots for $beta_{6}$$

Figure 15: Diagnostic plots for $beta_{6}$

Interpretation

From the $\beta_{6}$ diagnostics (fig 15), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{6}$

$Figure 16: Diagnostic plots for $beta_{7}$$

Figure 16: Diagnostic plots for $beta_{7}$

Interpretation

From the $\beta_{7}$ diagnostics (fig 16), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{7}$

$Figure 17: Diagnostic plots for $beta_{8}$$

Figure 17: Diagnostic plots for $beta_{8}$

Interpretation

From the $\beta_{8}$ diagnostics (fig 17), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{8}$

$Figure 18: Diagnostic plots for $beta_{9}$$

Figure 18: Diagnostic plots for $beta_{9}$

Interpretation

From the $\beta_{9}$ diagnostics (fig 18), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{9}$

$Figure 19: Diagnostic plots for $beta_{10}$$

Figure 19: Diagnostic plots for $beta_{10}$

Interpretation

From the $\beta_{10}$ diagnostics (fig 19), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{10}$

$Figure 20: Diagnostic plots for $beta_{11}$$

Figure 20: Diagnostic plots for $beta_{11}$

Interpretation

From the $\beta_{11}$ diagnostics (fig 20), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{11}$

$Figure 21: Diagnostic plots for $beta_{12}$$

Figure 21: Diagnostic plots for $beta_{12}$

Interpretation

From the $\beta_{12}$ diagnostics (fig 21), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{12}$

$Figure 22: Diagnostic plots for $beta_{13}$$

Figure 22: Diagnostic plots for $beta_{13}$

Interpretation

From the $\beta_{13}$ diagnostics (fig 22), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{13}$

$Figure 23: Diagnostic plots for $beta_{14}$$

Figure 23: Diagnostic plots for $beta_{14}$

Interpretation

From the $\beta_{14}$ diagnostics (fig 23), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{14}$

$Figure 24: Diagnostic plots for $beta_{15}$$

Figure 24: Diagnostic plots for $beta_{15}$

Interpretation

From the $\beta_{15}$ diagnostics (fig 24), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{15}$

$Figure 25: Diagnostic plots for $beta_{16}$$

Figure 25: Diagnostic plots for $beta_{16}$

Interpretation

From the $\beta_{16}$ diagnostics (fig 25), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{16}$

$Figure 26: Diagnostic plots for $beta_{17}$$

Figure 26: Diagnostic plots for $beta_{17}$

Interpretation

From the $\beta_{17}$ diagnostics (fig 26), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{17}$

$Figure 27: Diagnostic plots for $beta_{18}$$

Figure 27: Diagnostic plots for $beta_{18}$

Interpretation

From the $\beta_{18}$ diagnostics (fig 27), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{18}$

$Figure 28: Diagnostic plots for $beta_{19}$$

Figure 28: Diagnostic plots for $beta_{19}$

Interpretation

From the $\beta_{19}$ diagnostics (fig 28), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{19}$

$Figure 29: Diagnostic plots for $beta_{20}$$

Figure 29: Diagnostic plots for $beta_{20}$

Interpretation

From the $\beta_{20}$ diagnostics (fig 29), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{20}$

$Figure 30: Diagnostic plots for $beta_{21}$$

Figure 30: Diagnostic plots for $beta_{21}$

Interpretation

From the $\beta_{21}$ diagnostics (fig 30), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{21}$

$Figure 31: Diagnostic plots for $beta_{22}$$

Figure 31: Diagnostic plots for $beta_{22}$

Interpretation

From the $\beta_{22}$ diagnostics (fig 31), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{22}$

$Figure 32: Diagnostic plots for $beta_{23}$$

Figure 32: Diagnostic plots for $beta_{23}$

Interpretation

From the $\beta_{23}$ diagnostics (fig 32), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{23}$

$Figure 33: Diagnostic plots for $beta_{24}$$

Figure 33: Diagnostic plots for $beta_{24}$

Interpretation

From the $\beta_{24}$ diagnostics (fig 33), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{24}$

$Figure 34: Diagnostic plots for $beta_{25}$$

Figure 34: Diagnostic plots for $beta_{25}$

Interpretation

From the $\beta_{25}$ diagnostics (fig 34), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{25}$

$Figure 35: Diagnostic plots for $beta_{26}$$

Figure 35: Diagnostic plots for $beta_{26}$

Interpretation

From the $\beta_{26}$ diagnostics (fig 35), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{26}$

$Figure 36: Diagnostic plots for $beta_{27}$$

Figure 36: Diagnostic plots for $beta_{27}$

Interpretation

From the $\beta_{27}$ diagnostics (fig 36), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for $\beta_{27}$

Figure 37: Diagnostic plots for guess

Interpretation

From the guess parameter diagnostics (fig 37), we see that

The chains are well mixed
The ESS values are also quite high. The lags are decreasing nicely, suggesting there is no evidence of autocorrelation
The Shrink factor after burn in is < 1.1 and hence are all valid
The MCSE is also very small
Overall very good diagnostic for guess parameter

All Coefficients

The table below (table 1) gives the Bayesian estimates for all coefficients.

Table 1: Bayesian estimates of all coefficients
	Mean	Median	Mode	ESS	HDImass	HDIlow	HDIhigh
CHAIN	2.5000000	2.5000000	1.9976523	1.5	0.95	1.0000000	4.0000000
beta0	-0.9429240	-0.9897880	-0.9805010	33474.6	0.95	-12.6322000	10.5130000
beta[1]	-0.0086887	-0.0085927	-0.0084821	48279.0	0.95	-0.0233384	0.0055350
beta[2]	0.0461086	0.0455797	0.0454050	47304.3	0.95	0.0347933	0.0587648
beta[3]	1.2023071	1.0250350	0.7600194	12311.3	0.95	-0.5400100	3.3816600
beta[4]	1.9134414	1.8939550	1.8874196	50000.0	0.95	1.2754700	2.5830200
beta[5]	-0.6083301	-0.5828250	-0.5263448	48965.9	0.95	-1.9099000	0.6105210
beta[6]	-0.5898821	-0.5838540	-0.5431524	50000.0	0.95	-1.0900200	-0.1037930
beta[7]	-0.0440617	-0.0461140	-0.0362615	53309.5	0.95	-0.8666780	0.7404060
beta[8]	-0.2605335	-0.2570815	-0.2305686	50000.0	0.95	-1.2044000	0.6720400
beta[9]	-0.1899954	-0.1902695	-0.1645840	50000.0	0.95	-0.9035720	0.5375890
beta[10]	0.4474317	0.4939450	0.5996662	50000.0	0.95	-1.4120200	2.1693400
beta[11]	0.1153628	0.1771660	0.3405421	50000.0	0.95	-2.1081600	2.2268800
beta[12]	0.9276330	0.9015410	0.7938341	48896.4	0.95	-0.3497580	2.2669500
beta[13]	-2.0402505	-1.9103400	-1.5981884	50000.0	0.95	-6.9720100	2.3416700
beta[14]	-0.5664209	-0.4771105	-0.4547327	51742.3	0.95	-2.7378200	1.3346900
beta[15]	2.6529404	2.6056250	2.5131043	50000.0	0.95	-0.4489040	5.8450300
beta[16]	0.3657012	0.4163850	0.4937717	50000.0	0.95	-3.3722100	4.0915000
beta[17]	-1.2841846	-1.0697100	-0.5392907	50000.0	0.95	-6.2287400	3.0623800
beta[18]	-0.9028516	-0.8852760	-0.8473646	50000.0	0.95	-1.8137800	-0.0432879
beta[19]	-0.3479895	-0.3456610	-0.3343690	49875.9	0.95	-0.9025080	0.1870730
beta[20]	-0.2228800	-0.2189495	-0.2099645	48071.1	0.95	-0.9312560	0.4294400
beta[21]	-0.5379596	-0.5290155	-0.5217323	50000.0	0.95	-1.2862900	0.2314980
beta[22]	-0.7345654	-0.7081525	-0.6588856	50000.0	0.95	-1.6243400	0.1263530
beta[23]	-0.2182389	-0.2102815	-0.2226587	50000.0	0.95	-1.0473300	0.5883670
beta[24]	-3.1360171	-2.7926800	-2.0571548	50000.0	0.95	-6.7724800	-0.2850510
beta[25]	-0.5727760	-0.5423465	-0.4388815	50000.0	0.95	-1.7670900	0.5679480
beta[26]	-1.3232109	-1.2863300	-1.1541217	51557.4	0.95	-2.6811500	-0.0571146
beta[27]	1.0139087	1.0072050	0.9416865	50149.7	0.95	-0.1490040	2.1675200
guess	0.0405235	0.0376888	0.0304350	39640.6	0.95	0.0000043	0.0852877
zbeta[1]	-0.1541820	-0.1524780	-0.1505150	48279.0	0.95	-0.4141430	0.0982192
zbeta[2]	1.4853673	1.4683300	1.4626998	47304.3	0.95	1.1208500	1.8930800

Posterior distributions

The posterior Distributions of the various parameters are as below.

Figure 38: Posterior distribution for beta 0

From the figure 38 and values printed above, the 95% HDI ranges for the $\beta_{0}$ are -12.6322 and 10.513 with a mean value of -0.942924 and mode -0.980501.

Figure 39: Posterior distribution for beta 1

From the figure 39 and values printed above, the 95% HDI ranges for the $\beta_{1}$ are -0.0233384 and 0.005535 with a mean value of -0.0086887 and mode -0.0084821.

Figure 40: Posterior distribution for beta 2

From the figure 40 and values printed above, the 95% HDI ranges for the $\beta_{2}$ are 0.0347933 and 0.0587648 with a mean value of 0.0461086 and mode 0.045405.

Figure 41: Posterior distribution for beta 3

From the figure 41 and values printed above, the 95% HDI ranges for the $\beta_{3}$ are -0.54001 and 3.38166 with a mean value of 1.2023071 and mode 0.7600194.

Figure 42: Posterior distribution for beta 4

From the figure 42 and values printed above, the 95% HDI ranges for the $\beta_{4}$ are 1.27547 and 2.58302 with a mean value of 1.9134414 and mode 1.8874196.

Figure 43: Posterior distribution for beta 5

From the figure 43 and values printed above, the 95% HDI ranges for the $\beta_{5}$ are -1.9099 and 0.610521 with a mean value of -0.6083301 and mode -0.5263448.

Figure 44: Posterior distribution for beta 6

From the figure 44 and values printed above, the 95% HDI ranges for the $\beta_{6}$ are -1.09002 and -0.103793 with a mean value of -0.5898821 and mode -0.5431524.

Figure 45: Posterior distribution for beta 7

From the figure 45 and values printed above, the 95% HDI ranges for the $\beta_{7}$ are -0.866678 and 0.740406 with a mean value of -0.0440617 and mode -0.0362615.

Figure 46: Posterior distribution for beta 8

From the figure 46 and values printed above, the 95% HDI ranges for the $\beta_{8}$ are -1.2044 and 0.67204 with a mean value of -0.2605335 and mode -0.2305686.

Figure 47: Posterior distribution for beta 9

From the figure 47 and values printed above, the 95% HDI ranges for the $\beta_{9}$ are -0.903572 and 0.537589 with a mean value of -0.1899954 and mode -0.164584.

Figure 48 : Posterior distribution for beta 10

From the figure 48 and values printed above, the 95% HDI ranges for the $\beta_{10}$ are -1.41202 and 2.16934 with a mean value of 0.4474317 and mode 0.5996662.

Figure 49: Posterior distribution for beta 11

From the figure 49 and values printed above, the 95% HDI ranges for the $\beta_{11}$ are -2.10816 and 2.22688 with a mean value of 0.1153628 and mode 0.3405421.

Figure 50: Posterior distribution for beta 12

From the figure 50 and values printed above, the 95% HDI ranges for the $\beta_{12}$ are -0.349758 and 2.26695 with a mean value of 0.927633 and mode 0.7938341.

Figure 51: Posterior distribution for beta 13

From the figure 51 and values printed above, the 95% HDI ranges for the $\beta_{13}$ are -6.97201 and 2.34167 with a mean value of -2.0402505 and mode -1.5981884.

Figure 52: Posterior distribution for beta 14

From the figure 52 and values printed above, the 95% HDI ranges for the $\beta_{14}$ are -2.73782 and 1.33469 with a mean value of -0.5664209 and mode -0.4547327.

Figure 53: Posterior distribution for beta 15

From the figure 53 and values printed above, the 95% HDI ranges for the $\beta_{15}$ are -0.448904 and 5.84503 with a mean value of 2.6529404 and mode 2.5131043.

Figure 54: Posterior distribution for beta 16

From the figure 54 and values printed above, the 95% HDI ranges for the $\beta_{16}$ are -3.37221 and 4.0915 with a mean value of 0.3657012 and mode 0.4937717.

Figure 55: Posterior distribution for beta 17

From the figure 55 and values printed above, the 95% HDI ranges for the $\beta_{17}$ are -6.22874 and 3.06238 with a mean value of -1.2841846 and mode -0.5392907.

Figure 56: Posterior distribution for beta 18

From the figure 56 and values printed above, the 95% HDI ranges for the $\beta_{18}$ are -1.81378 and -0.0432879 with a mean value of -0.9028516 and mode -0.8473646.

Figure 57 : Posterior distribution for beta 19

From the figure 57 and values printed above, the 95% HDI ranges for the $\beta_{19}$ are -0.902508 and 0.187073 with a mean value of -0.3479895 and mode -0.334369.

Figure 58: Posterior distribution for beta 20

From the figure 58 and values printed above, the 95% HDI ranges for the $\beta_{20}$ are -0.931256 and 0.42944 with a mean value of -0.22288 and mode -0.2099645.

Figure 59: Posterior distribution for beta 21

From the figure 59 and values printed above, the 95% HDI ranges for the $\beta_{21}$ are -1.28629 and 0.231498 with a mean value of -0.5379596 and mode -0.5217323.

Figure 60: Posterior distribution for beta 22

From the figure 60 and values printed above, the 95% HDI ranges for the $\beta_{22}$ are -1.62434 and 0.126353 with a mean value of -0.7345654 and mode -0.6588856.

Figure 61: Posterior distribution for beta 23

From the figure 61 and values printed above, the 95% HDI ranges for the $\beta_{23}$ are -1.04733 and 0.588367 with a mean value of -0.2182389 and mode -0.2226587.

Figure 62: Posterior distribution for beta 24

From the figure 62 and values printed above, the 95% HDI ranges for the $\beta_{24}$ are -6.77248 and -0.285051 with a mean value of -3.1360171 and mode -2.0571548.

Figure 63: Posterior distribution for beta 25

From the figure 63 and values printed above, the 95% HDI ranges for the $\beta_{25}$ are -1.76709 and 0.567948 with a mean value of -0.572776 and mode -0.4388815.

Figure 64 : Posterior distribution for beta 26

From the figure 64 and values printed above, the 95% HDI ranges for the $\beta_{26}$ are -2.68115 and -0.0571146 with a mean value of -1.3232109 and mode -1.1541217.

Figure 65: Posterior distribution for beta 27

From the figure 65 and values printed above, the 95% HDI ranges for the $\beta_{27}$ are -0.149004 and 2.16752 with a mean value of 1.0139087 and mode 0.9416865.

Figure 66 : Posterior distribution for guess

From the figure 66 and values printed above, the 95% HDI ranges for the guess are 0.0000043 and 0.0852877 with a mean value of 0.0405235 and mode 0.030435.

We see from the results that the alpha values are very less and the model of the parameter is at 0.05. This suggests that the data that we are using for the analysis contains most of the values for prediction and is not a random guess. This is extremely crucial since lives are at risk and randomness makes it harder to save lives.

Limitation

We observe that one of the co-efficient $\beta_{3}$ gives us a decent ESS of 10k. However, we still see that there is some correlation left in the dataset.
Our MCMC samples run took about 96 hours for completion. In an ideal world we would want to run it for much longer time periods but we are restricted by time available for execution.
There is still a minor < 0.04 co-efficient for the guess co-efficient, we have see n that with longer runs this keeps getting reduced. As discussed above we would like to run this data for a much longer period to test the co-efficient of all the parameters.

Next Steps

We want to expand the analysis to

Run for larger chains (possibly 1M), with longer thinning (> 500) over a cloud server for faster processing.
Perform a softmax regression to distinguish the impacts on GENDER, ETHNICITY etc.
Perform Hierarchical Modelling to correctly identify the priors.

Conclusion

We start off with an analysis of the In-ICU mortality rate by looking at the various patient data collected by the research agency. We then proceed to analyse the variables and do appropriate transformations, converting it to all metric datasets. After getting rid of columns that show multicollinearity we proceed to perform a Bayesian Logistic Regression of the data.

We proceed to perform Markov-Chain Monte Carlo Simulation Gibbs sampling across 27 variables to identify the significant variables that impact the outcome. We also add in a Guess co-efficient to test if the outcome is purely randomness. From running long chains, we can see that the model parameters all conform to good diagnostic and the guess co-efficient is not that significant.

We conclude that given the dataset we are able to predict the In-CPU mortality rate with a high level of accuracy.

References

Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.
Yin Wong, R., & Ismail, N. (2016). An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit. Retrieved from https://figshare.com/articles/An_Application_of_Bayesian_Approach_in_Modeling_Risk_of_Death_in_an_Intensive_Care_Unit/3133291

Appendix

Code

knitr::opts_chunk$set(fig.width=15, fig.height=6, fig.align = "center", 
                      warning=FALSE, message=FALSE, cache = FALSE,tidy = FALSE)
rm(list = ls())
options(scipen=999)

library(easypackages)
lib_vec = c("knitr","car","jsonlite","lattice","stringr","sp","tidyr","dplyr",
            "readr","data.table","rjags","coda","Hmisc","runjags","tictoc","readxl")
lib_installed = lib_vec %in% rownames(installed.packages())
if ( any(!lib_installed) ) { install.packages( lib_vec[!lib_installed] ) }

libraries(lib_vec)
# Setting debugging Flag
debug = FALSE

source("DBDA2E-utilities.R") 
model_file_name = "icu_model"

icu_data = read_xlsx("S1File.xlsx") 

if (debug == TRUE) {
  seed = 20181014
  icu_data = icu_data[sample(nrow(icu_data),200),] #icu_data # icu_data[sample(nrow(icu_data),500),]
  nbr_of_chains = 5
  thinning_steps = 10
  nbr_of_iters = 1000
  burn_in_steps = 150
  adaptSteps = 500
} else {
  icu_data = icu_data
  nbr_of_chains = 4
  thinning_steps = 100
  nbr_of_iters = 50000
  burn_in_steps = 10000
  adaptSteps = 1000
}
gender = factor(icu_data$GENDER,levels = c("male","female"), 
                labels = c("male","female"))

ethnic = factor(icu_data$ETHNIC, levels = c("Others","Chinese","Indian","Malay"),
                labels = c("Others","Chinese","Indian","Malay"))

chd = factor(icu_data$CHD, levels = c(0,1,2,3,4,5,6,7,8),
             labels = c("None","AIDS", "metastaticCancer", "cirrhosis", 
                        "hepaticFailure", "immunosuppression", "leukemia", 
                        "lymphoma", "diabetes"))

opstat = factor (icu_data$OPSTAT, levels = c(0,1,2),
                 labels=c("Floor","SpecialCare","OpRoom"))

admdis = factor(icu_data$ADMDIS, levels = c(0,1,2,3,4,5,6,7,8),
                labels=c("Trauma",'Cardiovascular', 'Respiratory', 'Neurologic', 
                         'Gastrointestinal', 'Genitourinary', 'Metabolic', 
                         'skin', 'Hematologic'))


gender_dummies = data.frame(model.matrix(~gender))
ethnic_dummies = data.frame(model.matrix(~ethnic))
chd_dummies = data.frame(model.matrix(~chd))
opstat_dummies = data.frame(model.matrix(~opstat))
admdis_dummies = data.frame(model.matrix(~admdis))

icu_data_upd = cbind(icu_data,gender_dummies,ethnic_dummies,chd_dummies,opstat_dummies,admdis_dummies)
drop_columns = c("GENDER","ETHNIC", "X.Intercept.","CHD","OPSTAT","ADMDIS")
icu_data_clean = icu_data_upd[,!(names(icu_data_upd) %in% drop_columns)]
rm(icu_data,icu_data_upd, gender_dummies,ethnic_dummies,drop_columns,ethnic, 
   gender, chd_dummies, admdis_dummies,opstat_dummies)
icu_data_clean = icu_data_clean[,c("AGE" , "APS" , "VENT" , "NOGCS" , "OUTCOME" ,"DIAB", 
                                    "genderfemale" , "ethnicChinese" , "ethnicIndian" , 
                                    "ethnicMalay" , "chdAIDS" , "chdmetastaticCancer" , 
                                    "chdcirrhosis" , "chdhepaticFailure" , 
                                   "chdimmunosuppression" , "chdleukemia" , 
                                   "chdlymphoma"  , "chddiabetes",
                                   "opstatSpecialCare" , "opstatOpRoom"  , 
                                   "admdisCardiovascular" , "admdisRespiratory" , 
                                   "admdisNeurologic" , "admdisGastrointestinal" , 
                                   "admdisGenitourinary" , "admdisMetabolic" , 
                                   "admdisskin" , "admdisHematologic")]
summary(icu_data_clean)
round(cor(icu_data_clean),3)
  modelString = "
  # Standardize the data:
  data {
      for ( j in 1:2 ) {
        xm[j]  <- mean(x[,j])
        xsd[j] <-   sd(x[,j])
        
        for ( i in 1:Ntotal ) {
          zx[i,j] <- ( x[i,j] - xm[j] ) / xsd[j]
        }
      }
  }

  # Specify the model for standardized data:
  model {
    for ( i in 1:Ntotal ) {
      # In JAGS, ilogit is logistic:
      y[i] ~ dbern( mu[i] )
      mu[i] <- ( guess*(1/2) + (1.0-guess)*ilogit(zbeta0+sum(zbeta[1:2]*zx[i,1:2]) + sum(beta[3:Nx]*x[i,3:Nx])) )
    }

    # Standardize zbetas:
    zbeta0 ~ dnorm( 0, 1/10) # dunif( -50 ,50 ) 
    
    for ( j in 1:2 ) {
       zbeta[j] ~ dnorm( 0, 1/1000)
    }

    for ( j in 3:Nx ) {
       beta[j] ~ dnorm( 0, 1/10) # dunif( -50 ,50 ) 
    }

    # Reverse Transform    
    beta[1:2] <- zbeta[1:2] / xsd[1:2]
    beta0 <- zbeta0 - sum(zbeta[1:2]*xm[1:2]/xsd[1:2]) - sum(beta[3:Nx])

    guess ~ dbeta(1,9)
  }
  "
genMCMC = function( data , xName="x" , yName="y" , 
                    numSavedSteps=10000 , thinSteps=thinning_steps , saveName=NULL  ,
                    runjagsMethod=runjagsMethodDefault , burnInSteps = burn_in_steps,
                    nChains=nbr_of_chains, model=modelString ){ #, xPred = xPred) { 
  require(runjags)
  
  tic("Starting Model Creation")
  y = data[,yName]
  x = as.matrix(data[,xName],ncol=length(xName))
  if ( any( !is.finite(y) ) ) { stop("All y values must be finite.") }
  if ( any( !is.finite(x) ) ) { stop("All x values must be finite.") }
  
  dataList = list(
    x = x ,
    y = y ,
    Nx = dim(x)[2] ,
    Ntotal = dim(x)[1]
  )
  
  writeLines( model , con=saveName )


  parameters = c( "beta0" ,  "beta[1]" , "beta[2]",  "beta[3]",  "beta[4]",  
                  "beta[5]",  "beta[6]", "beta[7]",  "beta[8]",  "beta[9]",  
                  "beta[10]", "beta[11]", "beta[12]","beta[13]", "beta[14]", 
                  "beta[15]", "beta[16]", "beta[17]", "beta[18]", "beta[19]", 
                  "beta[20]", "beta[21]", "beta[22]", "beta[23]", "beta[24]", 
                  "beta[25]", "beta[26]", "beta[27]", "guess" ,
                  "zbeta[1]" , "zbeta[2]")
  
  tic("Starting Model Run")
  runJagsOut <- run.jags( method=runjagsMethod ,
                          model=saveName , 
                          monitor=parameters , 
                          data=dataList ,  
                          #inits=initsList , 
                          n.chains=nChains ,
                          adapt=adaptSteps ,
                          burnin=burnInSteps , 
                          sample=ceiling(numSavedSteps/nChains) ,
                          thin=thinning_steps ,
                          summarise=FALSE ,
                          plots=FALSE )
  toc()
  codaSamples = as.mcmc.list( runJagsOut )

  if ( !is.null(saveName) ) {
    save( codaSamples , file=paste(saveName,"Mcmc.Rdata",sep="") )
  }
  return( codaSamples )
} 
smryMCMC = function(  codaSamples , saveName=NULL) {
  summaryInfo = NULL
  mcmcMat = as.matrix(codaSamples,chains=TRUE)
  paramName = colnames(mcmcMat)
  for ( pName in paramName ) {
    summaryInfo = rbind( summaryInfo , summarizePost( mcmcMat[,pName] ) )
  }
  rownames(summaryInfo) = paramName
  if ( !is.null(saveName) ) {
    write.csv( summaryInfo , file=paste(saveName,"SummaryInfo.csv",sep="") )
  }
  
  return( summaryInfo)
}
plotMCMC = function( codaSamples , data , xName="x" , yName="y" ,
                     showCurve=FALSE ,  pairsPlot=FALSE ,
                     saveName=NULL , saveType="jpg" ) {
  y = data[,yName]
  x = as.matrix(data[,xName])
  mcmcMat = as.matrix(codaSamples,chains=TRUE)
  chainLength = NROW( mcmcMat )
  guess = mcmcMat[,"guess"]
  if ( ncol(x)==1 ) { zbeta = matrix( zbeta , ncol=1 ) }
  beta0 = mcmcMat[,"beta0"]
  beta  = mcmcMat[,grep("^beta$|^beta\\[",colnames(mcmcMat))]
  if ( ncol(x)==1 ) { beta = matrix( beta , ncol=1 ) }
  if ( pairsPlot ) {
    openGraph()
    nPtToPlot = 1000
    plotIdx = floor(seq(1,chainLength,by=chainLength/nPtToPlot))
    panel.cor = function(x, y, digits=2, prefix="", cex.cor, ...) {
      usr = par("usr"); on.exit(par(usr))
      par(usr = c(0, 1, 0, 1))
      r = (cor(x, y))
      txt = format(c(r, 0.123456789), digits=digits)[1]
      txt = paste(prefix, txt, sep="")
      if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
      text(0.5, 0.5, txt, cex=1.5 ) # was cex=cex.cor*r
    }
    pairs( cbind( beta0 , beta , guess )[plotIdx,] ,
           labels=c( "beta[0]" , 
                     paste0("beta[",1:ncol(beta),"]\n",xName) , "guessing" ) , 
           lower.panel=panel.cor , col="skyblue" )
    if ( !is.null(saveName) ) {
      saveGraph( file=paste(saveName,"PostPairs",sep=""), type=saveType)
    }
  }
  if ( ncol(x)==1 ) {
    openGraph(width=7,height=6)
    par( mar=c(3.5,3.5,2,1) , mgp=c(2.0,0.7,0) )
    plot( x[,1] , y , xlab=xName[1] , ylab=yName , 
          cex=2.0 , cex.lab=1.5 , col="black" , main="Data with Post. Pred." )
    abline(h=0.5,lty="dotted")
    cVec = floor(seq(1,chainLength,length=30))
    xWid=max(x)-min(x)
    xComb = seq(min(x)-0.1*xWid,max(x)+0.1*xWid,length=201)
    for ( cIdx in cVec ) {
      lines( xComb , 
             guess[cIdx]*0.5 
             + (1.0-guess[cIdx])*1/(1+exp(-(beta0[cIdx]+beta[cIdx,1]*xComb ))) , 
             lwd=1.5 ,
             col="skyblue" )
      xInt = -beta0[cIdx]/beta[cIdx,1]
      arrows( xInt,0.5, xInt,-0.04, length=0.1 , col="skyblue" , lty="dashed" )
    }
    if ( !is.null(saveName) ) {
      saveGraph( file=paste(saveName,"DataThresh",sep=""), type=saveType)
    }
  }
  if ( ncol(x)==2 ) {
    openGraph(width=7,height=7)
    par( mar=c(3.5,3.5,2,1) , mgp=c(2.0,0.7,0) )
    plot( x[,1] , x[,2] , pch=as.character(y) , xlab=xName[1] , ylab=xName[2] ,
          col="black" , main="Data with Post. Pred.")
    cVec = floor(seq(1,chainLength,length=30))
    for ( cIdx in cVec ) {
      abline( -beta0[cIdx]/beta[cIdx,2] , -beta[cIdx,1]/beta[cIdx,2] , col="skyblue" )
    }
    if ( !is.null(saveName) ) {
      saveGraph( file=paste(saveName,"DataThresh",sep=""), type=saveType)
    }
  }
  
  decideOpenGraph = function( panelCount , saveName , finished=FALSE , 
                              nRow=6, nCol=3 ) {
    if ( finished==TRUE ) {
      if ( !is.null(saveName) ) {
        saveGraph( file=paste0(saveName,ceiling((panelCount-1)/(nRow*nCol))), 
                   type=saveType)
      }
      panelCount = 1 
      return(panelCount)
    } else {
    
    if ( ( panelCount %% (nRow*nCol) ) == 1 ) {
      if ( panelCount>1 & !is.null(saveName) ) {
        saveGraph( file=paste0(saveName,(panelCount%/%(nRow*nCol))), 
                   type=saveType)
      }
      openGraph(width=nCol*7.0/3,height=nRow*2.0)
      layout( matrix( 1:(nRow*nCol) , nrow=nRow, byrow=TRUE ) )
      par( mar=c(4,4,2.5,0.5) , mgp=c(2.5,0.7,0) )
    }
    panelCount = panelCount+1
    return(panelCount)
    }
  }
  
  panelCount = 1
  panelCount = decideOpenGraph( panelCount , saveName=paste0(saveName,"PostMarg") )
  histInfo = plotPost( beta0 , cex.lab = 1.75 , showCurve=showCurve ,
                       xlab=bquote(beta[0]) , main="Intercept" )
  
  panelCount = decideOpenGraph( panelCount , saveName=paste0(saveName,"PostMarg") )
  histInfo = plotPost( guess , cex.lab = 1.75 , showCurve=showCurve ,
                       xlab=bquote("Mix. Coef.") , main="'guessing'" )

  for ( bIdx in 1:ncol(beta) ) {
    panelCount = decideOpenGraph( panelCount , saveName=paste0(saveName,"PostMarg") )
    histInfo = plotPost( beta[,bIdx] , cex.lab = 1.75 , showCurve=showCurve ,
                         xlab=bquote(beta[.(bIdx)]) , main=xName[bIdx] )
  }
  panelCount = decideOpenGraph( panelCount , finished=TRUE , saveName=paste0(saveName,"PostMarg") )
  
}
xname=c("AGE" , "APS" , "VENT" , "NOGCS" , "DIAB", 
                                    "genderfemale" , "ethnicChinese" , "ethnicIndian" , 
                                    "ethnicMalay" , "chdAIDS" , "chdmetastaticCancer" , 
                                    "chdcirrhosis" , "chdhepaticFailure" , 
                                   "chdimmunosuppression" , "chdleukemia" , 
                                   "chdlymphoma"  , "chddiabetes",
                                   "opstatSpecialCare" , "opstatOpRoom"  , 
                                   "admdisCardiovascular" , "admdisRespiratory" , 
                                   "admdisNeurologic" , "admdisGastrointestinal" , 
                                   "admdisGenitourinary" , "admdisMetabolic" , 
                                   "admdisskin" , "admdisHematologic")
yname = "OUTCOME"
fileNameRoot = "icu_data"
graphFileType = "png" 
mcmcCoda = genMCMC( data=icu_data_clean, xName=xname , yName=yname , 
                    numSavedSteps=nbr_of_iters , thinSteps=thinning_steps,
                    model = modelString,burnInSteps = burn_in_steps,
                    nChains=nbr_of_chains, saveName = paste(fileNameRoot,"_model.txt"))

parameterNames = varnames(mcmcCoda) # get all parameter names
for ( parName in parameterNames ) {
  diagMCMC( codaObject=mcmcCoda , parName=parName , 
            saveName=fileNameRoot , saveType=graphFileType )
}
summaryInfo = smryMCMC( mcmcCoda ,saveName=fileNameRoot )
show(summaryInfo)

plotMCMC( mcmcCoda , data=icu_data_clean , xName=xname , yName=yname , 
          pairsPlot=TRUE , showCurve=FALSE ,
          saveName=fileNameRoot , saveType=graphFileType )
print("All Finished")
save.image("log_model-normal_2018-10-14.RData")

Bayesian Analysis of In-ICU Mortality Rate

Arjun Balaji (s3629999), Li Xie (s3308052), Rupesh Papneja(s3637387), Sarun S Menon (s3642783)

14 October 2018

Objective

Dataset Description

Univariate Analysis

Multivariate Analysis

Model

Correlation

Guess Co-Efficient

Prior Distributions

Model Diagram

JAGS model string

Markov Chain Parameters

Model Diagnostics

All Coefficients

Posterior distributions

Limitation

Next Steps

Conclusion

References

Appendix

Code