The objective of the report is to analyse the Intensive Care Unit (ICU) dataset using Bayesian analysis methods. The report starts with a comprehensive overview of the ICU dataset including visualisations of each variable and univariate analysis, following with a Bayesian multivariate logistic regression model by using the given predictors in the dataset and the variable “outcome” as the target. The report specifies the model parameters and implements the model in JAGS. Lastly, the report implements diagnostic checking for each parameter.
The following is the description of features in the ICU dataset.
The below plot (fig 1) demonstrates the distribution of the 6 variables Gender, Operation Statistics, Outcome, GCS Availability, Ethinicity, Diabetes and Ventilation details.
The following graph indicates that the target variable “Outcome” has very similar distribution to the other two variables “NOGOCS”, which describes whether patients have obtained Glasgow Coma Score, and “vent”, which contains information on whether patients are mechanically ventilated.
Fig 1 - Univariate Analysis Plot - for Categorical Variables
From Fig 2 - we see the details for Age and APS. As seen in Fig 2 Age is across a large set of patients and is not just a older section of the population thus not skewing our results.
Fig 2 - Univariate Analysis Plot - for Numeric Variables
Given below is the distribution of the variable “Chronic Health Group” and “Admission Diagnosis”.
Fig 3 - Univariate Analysis Plot - for Caetgorical Variables
We see from Fig 3 that most of the CHD is not available.
AS a next step we compare the outcome against each variable in our dataset.
From the graph below (figure 4) , it is clear that Acute Physiology score has close connection to the outcome of the patient. The higher the score is, the more likely that the patients would not survive. The variation of age, on the other hand, does not show a strong indication to the outcome of the patients.Fig 4 - Outcome vs Age and Outcome vs APS
From figure 5 below we see that Gender and Ethnicity does not seem to make a significant impact.
Fig 5 - Multivariate Analysis Plot - Outcome vs Gender and Outcome vs Ethnicity
Fig 6 - Outcome vs Admission Diagnosis
Fig 7 - Outcome vs CHD
The dependent variable, OUTCOME is a dichotomous. Hence logistic regression model is used to define relation between the dependent variable and independent variables.
An initial check to analyse the presence of multi-collinearity is done. To create a logistic regression model, it is necessary that there is no multi-collinearity present in the data elements.
“Correlation Matrix”
From the above analysis, the following inferences can be made:
Most of elements in the data are independent, and do not show any strong correlation.
There is correlation between columns CHD_CIRRHOSIS and DIAB, but since the source data has some discrepancies in this column definition we are not sure of the medical significance and hence we have included here.
We ran a model by removing this column and the results were similar when the MCMC samples were executed. Also, to generate the full sampling without this column would take a further 96+ hours for completion.
Hence an assumption can be made that the data elements are independent.
Sometimes the predictability of a model is increased by adding a randomness or “guess co-efficient” whereby the y value comes from the flip of a fair coin: i.e. y ~ Bernoulli(\(\mu\)=1/2). When we add this guess co-efficient, the equation for logistic regression becomes
\(\mu= \alpha\cdot 1/2+(1-\alpha)\cdot \textrm{logistic}\bigg(\beta_{0}+\sum_{i=1}^{k}\beta_{i}\bigg)\)
Our goal is to estimate the guessing coefficient along with the logistic parameters.
For this, we need to establish a prior on the guessing coefficient, \(\alpha\). In most applications we would expect the proportion of random outliers in the data to be small, and therefore the prior for \(\alpha\) should emphasize small values. In our case we have set the prior as
\(\alpha \sim \textrm{dbeta}(1, 9)\)
Fig 8 - Guess Coefficient Defn
This prior distribution gives values of \(\alpha\) greater than 0.5 very low but non-zero probability.
“Bayesian Model Diagram”
We translate the above diagram into a model string that can be used by JAGS.
We tried a few samples runs with various prior distributions including:
data {
for ( j in 1:2 ) {
xm[j] <- mean(x[,j])
xsd[j] <- sd(x[,j])
for ( i in 1:Ntotal ) {
zx[i,j] <- ( x[i,j] - xm[j] ) / xsd[j]
}
}
}
model {
for ( i in 1:Ntotal ) {
y[i] ~ dbern( mu[i] )
mu[i] <- ( guess*(1/2) + (1.0-guess)*ilogit(zbeta0+sum(zbeta[1:2]*zx[i,1:2]) + sum(beta[3:Nx]*x[i,3:Nx])) )
}
zbeta0 ~ dnorm( 0, 1/10 )
for ( j in 1:2 ) {
zbeta[j] ~ dnorm( 0, 1/1000 )
}
beta[1:2] <- zbeta[1:2] / xsd[1:2]
for ( j in 3:Nx ) {
beta[j] ~ dnorm( 0, 1/10 )
}
beta0 <- zbeta0 - sum(zbeta[1:2]*xm[1:2]/xsd[1:2]) - sum(beta[3:Nx])
guess ~ dbeta(1,9)
}
For the Bayesian model, we have decided to use:
After the chains have been generated, we take a look at the diagnostics plot to fine tune the parameters.
Figure 9: Diagnostic plots for \(beta_{0}\)
Interpretation
From the \(\beta_{0}\) diagnostics (fig 9), we see that
Figure 10: Diagnostic plots for \(beta_{1}\)
Interpretation
From the \(\beta_{1}\) diagnostics (fig 10), we see that
Figure 11 : Diagnostic plots for \(beta_{2}\)
Interpretation
From the \(\beta_{2}\) diagnostics (fig 11), we see that
Figure 12: Diagnostic plots for \(beta_{3}\)
Interpretation
From the \(\beta_{3}\) diagnostics (fig 12), we see that
Figure 13: Diagnostic plots for \(beta_{4}\)
Interpretation
From the \(\beta_{4}\) diagnostics (fig 13), we see that
Figure 14: Diagnostic plots for \(beta_{5}\)
Interpretation
From the \(\beta_{5}\) diagnostics (fig 14), we see that
Figure 15: Diagnostic plots for \(beta_{6}\)
Interpretation
From the \(\beta_{6}\) diagnostics (fig 15), we see that
Figure 16: Diagnostic plots for \(beta_{7}\)
Interpretation
From the \(\beta_{7}\) diagnostics (fig 16), we see that
Figure 17: Diagnostic plots for \(beta_{8}\)
Interpretation
From the \(\beta_{8}\) diagnostics (fig 17), we see that
Figure 18: Diagnostic plots for \(beta_{9}\)
Interpretation
From the \(\beta_{9}\) diagnostics (fig 18), we see that
Figure 19: Diagnostic plots for \(beta_{10}\)
Interpretation
From the \(\beta_{10}\) diagnostics (fig 19), we see that
Figure 20: Diagnostic plots for \(beta_{11}\)
Interpretation
From the \(\beta_{11}\) diagnostics (fig 20), we see that
Figure 21: Diagnostic plots for \(beta_{12}\)
Interpretation
From the \(\beta_{12}\) diagnostics (fig 21), we see that
Figure 22: Diagnostic plots for \(beta_{13}\)
Interpretation
From the \(\beta_{13}\) diagnostics (fig 22), we see that
Figure 23: Diagnostic plots for \(beta_{14}\)
Interpretation
From the \(\beta_{14}\) diagnostics (fig 23), we see that
Figure 24: Diagnostic plots for \(beta_{15}\)
Interpretation
From the \(\beta_{15}\) diagnostics (fig 24), we see that
Figure 25: Diagnostic plots for \(beta_{16}\)
Interpretation
From the \(\beta_{16}\) diagnostics (fig 25), we see that
Figure 26: Diagnostic plots for \(beta_{17}\)
Interpretation
From the \(\beta_{17}\) diagnostics (fig 26), we see that
Figure 27: Diagnostic plots for \(beta_{18}\)
Interpretation
From the \(\beta_{18}\) diagnostics (fig 27), we see that
Figure 28: Diagnostic plots for \(beta_{19}\)
Interpretation
From the \(\beta_{19}\) diagnostics (fig 28), we see that
Figure 29: Diagnostic plots for \(beta_{20}\)
Interpretation
From the \(\beta_{20}\) diagnostics (fig 29), we see that
Figure 30: Diagnostic plots for \(beta_{21}\)
Interpretation
From the \(\beta_{21}\) diagnostics (fig 30), we see that
Figure 31: Diagnostic plots for \(beta_{22}\)
Interpretation
From the \(\beta_{22}\) diagnostics (fig 31), we see that
Figure 32: Diagnostic plots for \(beta_{23}\)
Interpretation
From the \(\beta_{23}\) diagnostics (fig 32), we see that
Figure 33: Diagnostic plots for \(beta_{24}\)
Interpretation
From the \(\beta_{24}\) diagnostics (fig 33), we see that
Figure 34: Diagnostic plots for \(beta_{25}\)
Interpretation
From the \(\beta_{25}\) diagnostics (fig 34), we see that
Figure 35: Diagnostic plots for \(beta_{26}\)
Interpretation
From the \(\beta_{26}\) diagnostics (fig 35), we see that
Figure 36: Diagnostic plots for \(beta_{27}\)
Interpretation
From the \(\beta_{27}\) diagnostics (fig 36), we see that
Figure 37: Diagnostic plots for guess
Interpretation
From the guess parameter diagnostics (fig 37), we see that
The table below (table 1) gives the Bayesian estimates for all coefficients.
| Mean | Median | Mode | ESS | HDImass | HDIlow | HDIhigh | |
|---|---|---|---|---|---|---|---|
| CHAIN | 2.5000000 | 2.5000000 | 1.9976523 | 1.5 | 0.95 | 1.0000000 | 4.0000000 |
| beta0 | -0.9429240 | -0.9897880 | -0.9805010 | 33474.6 | 0.95 | -12.6322000 | 10.5130000 |
| beta[1] | -0.0086887 | -0.0085927 | -0.0084821 | 48279.0 | 0.95 | -0.0233384 | 0.0055350 |
| beta[2] | 0.0461086 | 0.0455797 | 0.0454050 | 47304.3 | 0.95 | 0.0347933 | 0.0587648 |
| beta[3] | 1.2023071 | 1.0250350 | 0.7600194 | 12311.3 | 0.95 | -0.5400100 | 3.3816600 |
| beta[4] | 1.9134414 | 1.8939550 | 1.8874196 | 50000.0 | 0.95 | 1.2754700 | 2.5830200 |
| beta[5] | -0.6083301 | -0.5828250 | -0.5263448 | 48965.9 | 0.95 | -1.9099000 | 0.6105210 |
| beta[6] | -0.5898821 | -0.5838540 | -0.5431524 | 50000.0 | 0.95 | -1.0900200 | -0.1037930 |
| beta[7] | -0.0440617 | -0.0461140 | -0.0362615 | 53309.5 | 0.95 | -0.8666780 | 0.7404060 |
| beta[8] | -0.2605335 | -0.2570815 | -0.2305686 | 50000.0 | 0.95 | -1.2044000 | 0.6720400 |
| beta[9] | -0.1899954 | -0.1902695 | -0.1645840 | 50000.0 | 0.95 | -0.9035720 | 0.5375890 |
| beta[10] | 0.4474317 | 0.4939450 | 0.5996662 | 50000.0 | 0.95 | -1.4120200 | 2.1693400 |
| beta[11] | 0.1153628 | 0.1771660 | 0.3405421 | 50000.0 | 0.95 | -2.1081600 | 2.2268800 |
| beta[12] | 0.9276330 | 0.9015410 | 0.7938341 | 48896.4 | 0.95 | -0.3497580 | 2.2669500 |
| beta[13] | -2.0402505 | -1.9103400 | -1.5981884 | 50000.0 | 0.95 | -6.9720100 | 2.3416700 |
| beta[14] | -0.5664209 | -0.4771105 | -0.4547327 | 51742.3 | 0.95 | -2.7378200 | 1.3346900 |
| beta[15] | 2.6529404 | 2.6056250 | 2.5131043 | 50000.0 | 0.95 | -0.4489040 | 5.8450300 |
| beta[16] | 0.3657012 | 0.4163850 | 0.4937717 | 50000.0 | 0.95 | -3.3722100 | 4.0915000 |
| beta[17] | -1.2841846 | -1.0697100 | -0.5392907 | 50000.0 | 0.95 | -6.2287400 | 3.0623800 |
| beta[18] | -0.9028516 | -0.8852760 | -0.8473646 | 50000.0 | 0.95 | -1.8137800 | -0.0432879 |
| beta[19] | -0.3479895 | -0.3456610 | -0.3343690 | 49875.9 | 0.95 | -0.9025080 | 0.1870730 |
| beta[20] | -0.2228800 | -0.2189495 | -0.2099645 | 48071.1 | 0.95 | -0.9312560 | 0.4294400 |
| beta[21] | -0.5379596 | -0.5290155 | -0.5217323 | 50000.0 | 0.95 | -1.2862900 | 0.2314980 |
| beta[22] | -0.7345654 | -0.7081525 | -0.6588856 | 50000.0 | 0.95 | -1.6243400 | 0.1263530 |
| beta[23] | -0.2182389 | -0.2102815 | -0.2226587 | 50000.0 | 0.95 | -1.0473300 | 0.5883670 |
| beta[24] | -3.1360171 | -2.7926800 | -2.0571548 | 50000.0 | 0.95 | -6.7724800 | -0.2850510 |
| beta[25] | -0.5727760 | -0.5423465 | -0.4388815 | 50000.0 | 0.95 | -1.7670900 | 0.5679480 |
| beta[26] | -1.3232109 | -1.2863300 | -1.1541217 | 51557.4 | 0.95 | -2.6811500 | -0.0571146 |
| beta[27] | 1.0139087 | 1.0072050 | 0.9416865 | 50149.7 | 0.95 | -0.1490040 | 2.1675200 |
| guess | 0.0405235 | 0.0376888 | 0.0304350 | 39640.6 | 0.95 | 0.0000043 | 0.0852877 |
| zbeta[1] | -0.1541820 | -0.1524780 | -0.1505150 | 48279.0 | 0.95 | -0.4141430 | 0.0982192 |
| zbeta[2] | 1.4853673 | 1.4683300 | 1.4626998 | 47304.3 | 0.95 | 1.1208500 | 1.8930800 |
The posterior Distributions of the various parameters are as below.
Figure 38: Posterior distribution for beta 0
From the figure 38 and values printed above, the 95% HDI ranges for the \(\beta_{0}\) are -12.6322 and 10.513 with a mean value of -0.942924 and mode -0.980501.
Figure 39: Posterior distribution for beta 1
From the figure 39 and values printed above, the 95% HDI ranges for the \(\beta_{1}\) are -0.0233384 and 0.005535 with a mean value of -0.0086887 and mode -0.0084821.
Figure 40: Posterior distribution for beta 2
From the figure 40 and values printed above, the 95% HDI ranges for the \(\beta_{2}\) are 0.0347933 and 0.0587648 with a mean value of 0.0461086 and mode 0.045405.
Figure 41: Posterior distribution for beta 3
From the figure 41 and values printed above, the 95% HDI ranges for the \(\beta_{3}\) are -0.54001 and 3.38166 with a mean value of 1.2023071 and mode 0.7600194.
Figure 42: Posterior distribution for beta 4
From the figure 42 and values printed above, the 95% HDI ranges for the \(\beta_{4}\) are 1.27547 and 2.58302 with a mean value of 1.9134414 and mode 1.8874196.
Figure 43: Posterior distribution for beta 5
From the figure 43 and values printed above, the 95% HDI ranges for the \(\beta_{5}\) are -1.9099 and 0.610521 with a mean value of -0.6083301 and mode -0.5263448.
Figure 44: Posterior distribution for beta 6
From the figure 44 and values printed above, the 95% HDI ranges for the \(\beta_{6}\) are -1.09002 and -0.103793 with a mean value of -0.5898821 and mode -0.5431524.
Figure 45: Posterior distribution for beta 7
From the figure 45 and values printed above, the 95% HDI ranges for the \(\beta_{7}\) are -0.866678 and 0.740406 with a mean value of -0.0440617 and mode -0.0362615.
Figure 46: Posterior distribution for beta 8
From the figure 46 and values printed above, the 95% HDI ranges for the \(\beta_{8}\) are -1.2044 and 0.67204 with a mean value of -0.2605335 and mode -0.2305686.
Figure 47: Posterior distribution for beta 9
From the figure 47 and values printed above, the 95% HDI ranges for the \(\beta_{9}\) are -0.903572 and 0.537589 with a mean value of -0.1899954 and mode -0.164584.
Figure 48 : Posterior distribution for beta 10
From the figure 48 and values printed above, the 95% HDI ranges for the \(\beta_{10}\) are -1.41202 and 2.16934 with a mean value of 0.4474317 and mode 0.5996662.
Figure 49: Posterior distribution for beta 11
From the figure 49 and values printed above, the 95% HDI ranges for the \(\beta_{11}\) are -2.10816 and 2.22688 with a mean value of 0.1153628 and mode 0.3405421.
Figure 50: Posterior distribution for beta 12
From the figure 50 and values printed above, the 95% HDI ranges for the \(\beta_{12}\) are -0.349758 and 2.26695 with a mean value of 0.927633 and mode 0.7938341.
Figure 51: Posterior distribution for beta 13
From the figure 51 and values printed above, the 95% HDI ranges for the \(\beta_{13}\) are -6.97201 and 2.34167 with a mean value of -2.0402505 and mode -1.5981884.
Figure 52: Posterior distribution for beta 14
From the figure 52 and values printed above, the 95% HDI ranges for the \(\beta_{14}\) are -2.73782 and 1.33469 with a mean value of -0.5664209 and mode -0.4547327.
Figure 53: Posterior distribution for beta 15
From the figure 53 and values printed above, the 95% HDI ranges for the \(\beta_{15}\) are -0.448904 and 5.84503 with a mean value of 2.6529404 and mode 2.5131043.
Figure 54: Posterior distribution for beta 16
From the figure 54 and values printed above, the 95% HDI ranges for the \(\beta_{16}\) are -3.37221 and 4.0915 with a mean value of 0.3657012 and mode 0.4937717.
Figure 55: Posterior distribution for beta 17
From the figure 55 and values printed above, the 95% HDI ranges for the \(\beta_{17}\) are -6.22874 and 3.06238 with a mean value of -1.2841846 and mode -0.5392907.
Figure 56: Posterior distribution for beta 18
From the figure 56 and values printed above, the 95% HDI ranges for the \(\beta_{18}\) are -1.81378 and -0.0432879 with a mean value of -0.9028516 and mode -0.8473646.
Figure 57 : Posterior distribution for beta 19
From the figure 57 and values printed above, the 95% HDI ranges for the \(\beta_{19}\) are -0.902508 and 0.187073 with a mean value of -0.3479895 and mode -0.334369.
Figure 58: Posterior distribution for beta 20
From the figure 58 and values printed above, the 95% HDI ranges for the \(\beta_{20}\) are -0.931256 and 0.42944 with a mean value of -0.22288 and mode -0.2099645.
Figure 59: Posterior distribution for beta 21
From the figure 59 and values printed above, the 95% HDI ranges for the \(\beta_{21}\) are -1.28629 and 0.231498 with a mean value of -0.5379596 and mode -0.5217323.
Figure 60: Posterior distribution for beta 22
From the figure 60 and values printed above, the 95% HDI ranges for the \(\beta_{22}\) are -1.62434 and 0.126353 with a mean value of -0.7345654 and mode -0.6588856.
Figure 61: Posterior distribution for beta 23
From the figure 61 and values printed above, the 95% HDI ranges for the \(\beta_{23}\) are -1.04733 and 0.588367 with a mean value of -0.2182389 and mode -0.2226587.
Figure 62: Posterior distribution for beta 24
From the figure 62 and values printed above, the 95% HDI ranges for the \(\beta_{24}\) are -6.77248 and -0.285051 with a mean value of -3.1360171 and mode -2.0571548.
Figure 63: Posterior distribution for beta 25
From the figure 63 and values printed above, the 95% HDI ranges for the \(\beta_{25}\) are -1.76709 and 0.567948 with a mean value of -0.572776 and mode -0.4388815.
Figure 64 : Posterior distribution for beta 26
From the figure 64 and values printed above, the 95% HDI ranges for the \(\beta_{26}\) are -2.68115 and -0.0571146 with a mean value of -1.3232109 and mode -1.1541217.
Figure 65: Posterior distribution for beta 27
From the figure 65 and values printed above, the 95% HDI ranges for the \(\beta_{27}\) are -0.149004 and 2.16752 with a mean value of 1.0139087 and mode 0.9416865.
Figure 66 : Posterior distribution for guess
From the figure 66 and values printed above, the 95% HDI ranges for the guess are 0.0000043 and 0.0852877 with a mean value of 0.0405235 and mode 0.030435.
We see from the results that the alpha values are very less and the model of the parameter is at 0.05. This suggests that the data that we are using for the analysis contains most of the values for prediction and is not a random guess. This is extremely crucial since lives are at risk and randomness makes it harder to save lives.
We observe that one of the co-efficient \(\beta_{3}\) gives us a decent ESS of 10k. However, we still see that there is some correlation left in the dataset.
Our MCMC samples run took about 96 hours for completion. In an ideal world we would want to run it for much longer time periods but we are restricted by time available for execution.
There is still a minor < 0.04 co-efficient for the guess co-efficient, we have see n that with longer runs this keeps getting reduced. As discussed above we would like to run this data for a much longer period to test the co-efficient of all the parameters.
We want to expand the analysis to
Run for larger chains (possibly 1M), with longer thinning (> 500) over a cloud server for faster processing.
Perform a softmax regression to distinguish the impacts on GENDER, ETHNICITY etc.
Perform Hierarchical Modelling to correctly identify the priors.
We start off with an analysis of the In-ICU mortality rate by looking at the various patient data collected by the research agency. We then proceed to analyse the variables and do appropriate transformations, converting it to all metric datasets. After getting rid of columns that show multicollinearity we proceed to perform a Bayesian Logistic Regression of the data.
We proceed to perform Markov-Chain Monte Carlo Simulation Gibbs sampling across 27 variables to identify the significant variables that impact the outcome. We also add in a Guess co-efficient to test if the outcome is purely randomness. From running long chains, we can see that the model parameters all conform to good diagnostic and the guess co-efficient is not that significant.
We conclude that given the dataset we are able to predict the In-CPU mortality rate with a high level of accuracy.
Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.
Yin Wong, R., & Ismail, N. (2016). An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit. Retrieved from https://figshare.com/articles/An_Application_of_Bayesian_Approach_in_Modeling_Risk_of_Death_in_an_Intensive_Care_Unit/3133291
knitr::opts_chunk$set(fig.width=15, fig.height=6, fig.align = "center",
warning=FALSE, message=FALSE, cache = FALSE,tidy = FALSE)
rm(list = ls())
options(scipen=999)
library(easypackages)
lib_vec = c("knitr","car","jsonlite","lattice","stringr","sp","tidyr","dplyr",
"readr","data.table","rjags","coda","Hmisc","runjags","tictoc","readxl")
lib_installed = lib_vec %in% rownames(installed.packages())
if ( any(!lib_installed) ) { install.packages( lib_vec[!lib_installed] ) }
libraries(lib_vec)
# Setting debugging Flag
debug = FALSE
source("DBDA2E-utilities.R")
model_file_name = "icu_model"
icu_data = read_xlsx("S1File.xlsx")
if (debug == TRUE) {
seed = 20181014
icu_data = icu_data[sample(nrow(icu_data),200),] #icu_data # icu_data[sample(nrow(icu_data),500),]
nbr_of_chains = 5
thinning_steps = 10
nbr_of_iters = 1000
burn_in_steps = 150
adaptSteps = 500
} else {
icu_data = icu_data
nbr_of_chains = 4
thinning_steps = 100
nbr_of_iters = 50000
burn_in_steps = 10000
adaptSteps = 1000
}
gender = factor(icu_data$GENDER,levels = c("male","female"),
labels = c("male","female"))
ethnic = factor(icu_data$ETHNIC, levels = c("Others","Chinese","Indian","Malay"),
labels = c("Others","Chinese","Indian","Malay"))
chd = factor(icu_data$CHD, levels = c(0,1,2,3,4,5,6,7,8),
labels = c("None","AIDS", "metastaticCancer", "cirrhosis",
"hepaticFailure", "immunosuppression", "leukemia",
"lymphoma", "diabetes"))
opstat = factor (icu_data$OPSTAT, levels = c(0,1,2),
labels=c("Floor","SpecialCare","OpRoom"))
admdis = factor(icu_data$ADMDIS, levels = c(0,1,2,3,4,5,6,7,8),
labels=c("Trauma",'Cardiovascular', 'Respiratory', 'Neurologic',
'Gastrointestinal', 'Genitourinary', 'Metabolic',
'skin', 'Hematologic'))
gender_dummies = data.frame(model.matrix(~gender))
ethnic_dummies = data.frame(model.matrix(~ethnic))
chd_dummies = data.frame(model.matrix(~chd))
opstat_dummies = data.frame(model.matrix(~opstat))
admdis_dummies = data.frame(model.matrix(~admdis))
icu_data_upd = cbind(icu_data,gender_dummies,ethnic_dummies,chd_dummies,opstat_dummies,admdis_dummies)
drop_columns = c("GENDER","ETHNIC", "X.Intercept.","CHD","OPSTAT","ADMDIS")
icu_data_clean = icu_data_upd[,!(names(icu_data_upd) %in% drop_columns)]
rm(icu_data,icu_data_upd, gender_dummies,ethnic_dummies,drop_columns,ethnic,
gender, chd_dummies, admdis_dummies,opstat_dummies)
icu_data_clean = icu_data_clean[,c("AGE" , "APS" , "VENT" , "NOGCS" , "OUTCOME" ,"DIAB",
"genderfemale" , "ethnicChinese" , "ethnicIndian" ,
"ethnicMalay" , "chdAIDS" , "chdmetastaticCancer" ,
"chdcirrhosis" , "chdhepaticFailure" ,
"chdimmunosuppression" , "chdleukemia" ,
"chdlymphoma" , "chddiabetes",
"opstatSpecialCare" , "opstatOpRoom" ,
"admdisCardiovascular" , "admdisRespiratory" ,
"admdisNeurologic" , "admdisGastrointestinal" ,
"admdisGenitourinary" , "admdisMetabolic" ,
"admdisskin" , "admdisHematologic")]
summary(icu_data_clean)
round(cor(icu_data_clean),3)
modelString = "
# Standardize the data:
data {
for ( j in 1:2 ) {
xm[j] <- mean(x[,j])
xsd[j] <- sd(x[,j])
for ( i in 1:Ntotal ) {
zx[i,j] <- ( x[i,j] - xm[j] ) / xsd[j]
}
}
}
# Specify the model for standardized data:
model {
for ( i in 1:Ntotal ) {
# In JAGS, ilogit is logistic:
y[i] ~ dbern( mu[i] )
mu[i] <- ( guess*(1/2) + (1.0-guess)*ilogit(zbeta0+sum(zbeta[1:2]*zx[i,1:2]) + sum(beta[3:Nx]*x[i,3:Nx])) )
}
# Standardize zbetas:
zbeta0 ~ dnorm( 0, 1/10) # dunif( -50 ,50 )
for ( j in 1:2 ) {
zbeta[j] ~ dnorm( 0, 1/1000)
}
for ( j in 3:Nx ) {
beta[j] ~ dnorm( 0, 1/10) # dunif( -50 ,50 )
}
# Reverse Transform
beta[1:2] <- zbeta[1:2] / xsd[1:2]
beta0 <- zbeta0 - sum(zbeta[1:2]*xm[1:2]/xsd[1:2]) - sum(beta[3:Nx])
guess ~ dbeta(1,9)
}
"
genMCMC = function( data , xName="x" , yName="y" ,
numSavedSteps=10000 , thinSteps=thinning_steps , saveName=NULL ,
runjagsMethod=runjagsMethodDefault , burnInSteps = burn_in_steps,
nChains=nbr_of_chains, model=modelString ){ #, xPred = xPred) {
require(runjags)
tic("Starting Model Creation")
y = data[,yName]
x = as.matrix(data[,xName],ncol=length(xName))
if ( any( !is.finite(y) ) ) { stop("All y values must be finite.") }
if ( any( !is.finite(x) ) ) { stop("All x values must be finite.") }
dataList = list(
x = x ,
y = y ,
Nx = dim(x)[2] ,
Ntotal = dim(x)[1]
)
writeLines( model , con=saveName )
parameters = c( "beta0" , "beta[1]" , "beta[2]", "beta[3]", "beta[4]",
"beta[5]", "beta[6]", "beta[7]", "beta[8]", "beta[9]",
"beta[10]", "beta[11]", "beta[12]","beta[13]", "beta[14]",
"beta[15]", "beta[16]", "beta[17]", "beta[18]", "beta[19]",
"beta[20]", "beta[21]", "beta[22]", "beta[23]", "beta[24]",
"beta[25]", "beta[26]", "beta[27]", "guess" ,
"zbeta[1]" , "zbeta[2]")
tic("Starting Model Run")
runJagsOut <- run.jags( method=runjagsMethod ,
model=saveName ,
monitor=parameters ,
data=dataList ,
#inits=initsList ,
n.chains=nChains ,
adapt=adaptSteps ,
burnin=burnInSteps ,
sample=ceiling(numSavedSteps/nChains) ,
thin=thinning_steps ,
summarise=FALSE ,
plots=FALSE )
toc()
codaSamples = as.mcmc.list( runJagsOut )
if ( !is.null(saveName) ) {
save( codaSamples , file=paste(saveName,"Mcmc.Rdata",sep="") )
}
return( codaSamples )
}
smryMCMC = function( codaSamples , saveName=NULL) {
summaryInfo = NULL
mcmcMat = as.matrix(codaSamples,chains=TRUE)
paramName = colnames(mcmcMat)
for ( pName in paramName ) {
summaryInfo = rbind( summaryInfo , summarizePost( mcmcMat[,pName] ) )
}
rownames(summaryInfo) = paramName
if ( !is.null(saveName) ) {
write.csv( summaryInfo , file=paste(saveName,"SummaryInfo.csv",sep="") )
}
return( summaryInfo)
}
plotMCMC = function( codaSamples , data , xName="x" , yName="y" ,
showCurve=FALSE , pairsPlot=FALSE ,
saveName=NULL , saveType="jpg" ) {
y = data[,yName]
x = as.matrix(data[,xName])
mcmcMat = as.matrix(codaSamples,chains=TRUE)
chainLength = NROW( mcmcMat )
guess = mcmcMat[,"guess"]
if ( ncol(x)==1 ) { zbeta = matrix( zbeta , ncol=1 ) }
beta0 = mcmcMat[,"beta0"]
beta = mcmcMat[,grep("^beta$|^beta\\[",colnames(mcmcMat))]
if ( ncol(x)==1 ) { beta = matrix( beta , ncol=1 ) }
if ( pairsPlot ) {
openGraph()
nPtToPlot = 1000
plotIdx = floor(seq(1,chainLength,by=chainLength/nPtToPlot))
panel.cor = function(x, y, digits=2, prefix="", cex.cor, ...) {
usr = par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r = (cor(x, y))
txt = format(c(r, 0.123456789), digits=digits)[1]
txt = paste(prefix, txt, sep="")
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex=1.5 ) # was cex=cex.cor*r
}
pairs( cbind( beta0 , beta , guess )[plotIdx,] ,
labels=c( "beta[0]" ,
paste0("beta[",1:ncol(beta),"]\n",xName) , "guessing" ) ,
lower.panel=panel.cor , col="skyblue" )
if ( !is.null(saveName) ) {
saveGraph( file=paste(saveName,"PostPairs",sep=""), type=saveType)
}
}
if ( ncol(x)==1 ) {
openGraph(width=7,height=6)
par( mar=c(3.5,3.5,2,1) , mgp=c(2.0,0.7,0) )
plot( x[,1] , y , xlab=xName[1] , ylab=yName ,
cex=2.0 , cex.lab=1.5 , col="black" , main="Data with Post. Pred." )
abline(h=0.5,lty="dotted")
cVec = floor(seq(1,chainLength,length=30))
xWid=max(x)-min(x)
xComb = seq(min(x)-0.1*xWid,max(x)+0.1*xWid,length=201)
for ( cIdx in cVec ) {
lines( xComb ,
guess[cIdx]*0.5
+ (1.0-guess[cIdx])*1/(1+exp(-(beta0[cIdx]+beta[cIdx,1]*xComb ))) ,
lwd=1.5 ,
col="skyblue" )
xInt = -beta0[cIdx]/beta[cIdx,1]
arrows( xInt,0.5, xInt,-0.04, length=0.1 , col="skyblue" , lty="dashed" )
}
if ( !is.null(saveName) ) {
saveGraph( file=paste(saveName,"DataThresh",sep=""), type=saveType)
}
}
if ( ncol(x)==2 ) {
openGraph(width=7,height=7)
par( mar=c(3.5,3.5,2,1) , mgp=c(2.0,0.7,0) )
plot( x[,1] , x[,2] , pch=as.character(y) , xlab=xName[1] , ylab=xName[2] ,
col="black" , main="Data with Post. Pred.")
cVec = floor(seq(1,chainLength,length=30))
for ( cIdx in cVec ) {
abline( -beta0[cIdx]/beta[cIdx,2] , -beta[cIdx,1]/beta[cIdx,2] , col="skyblue" )
}
if ( !is.null(saveName) ) {
saveGraph( file=paste(saveName,"DataThresh",sep=""), type=saveType)
}
}
decideOpenGraph = function( panelCount , saveName , finished=FALSE ,
nRow=6, nCol=3 ) {
if ( finished==TRUE ) {
if ( !is.null(saveName) ) {
saveGraph( file=paste0(saveName,ceiling((panelCount-1)/(nRow*nCol))),
type=saveType)
}
panelCount = 1
return(panelCount)
} else {
if ( ( panelCount %% (nRow*nCol) ) == 1 ) {
if ( panelCount>1 & !is.null(saveName) ) {
saveGraph( file=paste0(saveName,(panelCount%/%(nRow*nCol))),
type=saveType)
}
openGraph(width=nCol*7.0/3,height=nRow*2.0)
layout( matrix( 1:(nRow*nCol) , nrow=nRow, byrow=TRUE ) )
par( mar=c(4,4,2.5,0.5) , mgp=c(2.5,0.7,0) )
}
panelCount = panelCount+1
return(panelCount)
}
}
panelCount = 1
panelCount = decideOpenGraph( panelCount , saveName=paste0(saveName,"PostMarg") )
histInfo = plotPost( beta0 , cex.lab = 1.75 , showCurve=showCurve ,
xlab=bquote(beta[0]) , main="Intercept" )
panelCount = decideOpenGraph( panelCount , saveName=paste0(saveName,"PostMarg") )
histInfo = plotPost( guess , cex.lab = 1.75 , showCurve=showCurve ,
xlab=bquote("Mix. Coef.") , main="'guessing'" )
for ( bIdx in 1:ncol(beta) ) {
panelCount = decideOpenGraph( panelCount , saveName=paste0(saveName,"PostMarg") )
histInfo = plotPost( beta[,bIdx] , cex.lab = 1.75 , showCurve=showCurve ,
xlab=bquote(beta[.(bIdx)]) , main=xName[bIdx] )
}
panelCount = decideOpenGraph( panelCount , finished=TRUE , saveName=paste0(saveName,"PostMarg") )
}
xname=c("AGE" , "APS" , "VENT" , "NOGCS" , "DIAB",
"genderfemale" , "ethnicChinese" , "ethnicIndian" ,
"ethnicMalay" , "chdAIDS" , "chdmetastaticCancer" ,
"chdcirrhosis" , "chdhepaticFailure" ,
"chdimmunosuppression" , "chdleukemia" ,
"chdlymphoma" , "chddiabetes",
"opstatSpecialCare" , "opstatOpRoom" ,
"admdisCardiovascular" , "admdisRespiratory" ,
"admdisNeurologic" , "admdisGastrointestinal" ,
"admdisGenitourinary" , "admdisMetabolic" ,
"admdisskin" , "admdisHematologic")
yname = "OUTCOME"
fileNameRoot = "icu_data"
graphFileType = "png"
mcmcCoda = genMCMC( data=icu_data_clean, xName=xname , yName=yname ,
numSavedSteps=nbr_of_iters , thinSteps=thinning_steps,
model = modelString,burnInSteps = burn_in_steps,
nChains=nbr_of_chains, saveName = paste(fileNameRoot,"_model.txt"))
parameterNames = varnames(mcmcCoda) # get all parameter names
for ( parName in parameterNames ) {
diagMCMC( codaObject=mcmcCoda , parName=parName ,
saveName=fileNameRoot , saveType=graphFileType )
}
summaryInfo = smryMCMC( mcmcCoda ,saveName=fileNameRoot )
show(summaryInfo)
plotMCMC( mcmcCoda , data=icu_data_clean , xName=xname , yName=yname ,
pairsPlot=TRUE , showCurve=FALSE ,
saveName=fileNameRoot , saveType=graphFileType )
print("All Finished")
save.image("log_model-normal_2018-10-14.RData")