Yuan Yang
04/16/2019
The SMR-type measure is calculated by \[ SMR_j = \frac{\sum_i^{n_j} O_{ji}}{\sum_i^{n_j} E_{ji}}, \] where
| Measure | Input | Data requirement | stratify.var |
|---|---|---|---|
| SMR | obs_death, exp_death, facility, stratify.var, stratify_cutoff, year | Remove ‘Short’ and small facilities (facility expected death < 3) | facility size |
| SHR | obs_admission, exp_admission, facility, stratify.var, stratify_cutoff, year | Remove ‘Short’ and small facilities (facility patient year < 5 ) | facility size |
| STrR | obs_transyr, exp_transyr, facility, stratify.var, stratify_cutoff, year | Remove ‘Short’ and small facilities (facility trans_yar<10) | facility size |
rm(list=ls())
uniqname = 'yuanyang'
output_path = 'K:/Users/kecc-yuanyang/IUR0321'
data_path = 'K:/Projects/Dialysis_Reports_Shared/Data/SMR/special_request'
raw_data_name = 'SMRSHR_2014to2017.sas7bdat'
obs_death = "dial_drd" #observed death
exp_death = "expectda" #expeceted death
facility = "provfs" #facility id
death_yar = "DIAL_yar" #death year
year = "year" #data year
IUR for 2014-2017 SMR data:
rm(list=ls())
uniqname = 'yuanyang'
output_path = 'K:/Users/kecc-yuanyang/IUR0321'
data_path = 'K:/Projects/Dialysis_Reports_Shared/Data/SMR/special_request'
raw_data_name = 'SMRSHR_2014to2017.sas7bdat'
obs_admission = "h_admits" #observed admission
exp_admission = "expectta" #expeceted admission
hosp_yar = "h_dy_yar" #hospital YAR
facility = "provfs" #facility id
year = "year" #data year
IUR for 2017 SHR data:
C statistic is to measure the proportion of concordance pairs of observed and expected outcomes to all possible pairs.
Normally, C statistic is defined as \[ C_{stat}=\frac{\rm number \ of\ concordances}{\rm number\ of\ pairs}= \frac{ \sum_{i < j} I(y_i < y_j\ \&\ \hat{y}_i < \hat{y}_j) }{\sum_{i < j}1}. \] where
\( 0 \le C \le 1 \). Higher C indicates higher concordance between y and y_hat, and thus better model fitting.
Data:
| ID | y | y_hat |
|---|---|---|
| 1 | 1.1 | 4.5 |
| 2 | 2.3 | 3.4 |
| 3 | 4.3 | 5.3 |
Data preparation:
rm(list=ls())
uniqname = 'yuanyang'
output_path = 'K:/Users/kecc-yuanyang/C_statistics0214'
data_path = 'K:/Projects/Dialysis_Reports_Shared/Data/SMR/special_request'
raw_data_name = 'SMRSHR_2014to2017.sas7bdat'
SHR = 'shrty'
obs_admission = "h_admits" #observed admission
exp_admission = "expectta" #expeceted admission
hosp_yar = "h_dy_yar" #hospital YAR
facility = "provfs" #facility id
year = "year" #data year
Sort data by event time, then the C statistic for Survival model is defined as \[ C_{stat} = \frac{ \rm number \ of\ concordances}{\rm number\ of\ pairs} =\frac{\sum_i \sum_{j:j > i} I(T_i < T_j\ \&\ ( X \hat{\beta})_i>( X \hat{\beta})_j)}{\sum_i \sum_{j:j > i}\delta_i}, \] where
Data:
if \( \delta_i = 1 \) and \( \delta_j = 1 \),
if \( \delta_i = 1 \) and \( \delta_j=0 \),
rm(list=ls())
uniqname = 'yuanyang'
output_path = 'K:/Users/kecc-yuanyang/C_statistics0214'
data_path = 'K:/Projects/Dialysis_Reports_Shared/Data/SMR/special_request'
raw_data_name = 'SMRSHR_2014to2017.sas7bdat'
obs_death = "dial_drd" #observed death
exp_xbeta = "xbeta_smr" #predicted Xbeta
death_yar = "DIAL_yar" #dialysis YAR
facility = "provfs" #facility id
Following Liu 2012, we first introduce some notation:
The partial likelihood for SHR model is \[ PL_f = \prod_{k=1}^{K} \prod_{l=1}^{L} \prod_{i=1}^{n_{kl}} \Big[ \frac{e^{ \{\mathbf{\beta}^T\mathbf{Z}_{kli}+o_{kli}\}}}{\sum_{j=1}^{n_{kl}}t_{kli}e^{\{\mathbf{\beta}^T\mathbf{Z}_{kli}\}}} \Big]^{d_{kli}}. \]
Then the log-partial likelihood is \[ l(\mathbf{\beta})=\sum_{j=1}^{K} \sum_{l=1}^{L} \sum_{i=1}^{n_{kl}} \Big[ \mathbf{\beta}^T\mathbf{Z}_{kli}+o_{kli} - log\{ \sum_{j=1}^{n_{kl}}t_{kli}e^{\{\mathbf{\beta}^T\mathbf{Z}_{kli}\}} \} \Big] {d_{kli}}. \]
First derivative: \[ U(\mathbf{\beta})=\frac{\partial l}{\partial \mathbf{\beta}}= \sum_{j=1}^{K} \sum_{l=1}^{L} \sum_{i=1}^{n_{kl}} \Big[ \mathbf{Z}_{kji} - \frac{\mathbf{S}_{kl}^{(1)}}{S_{kl}^{(0)}} \Big] {d_{kli}}, \] where \( S_{kl}^{(0)}=\sum_{i=1}^{n_{kl}} t_{kli}e^{\mathbf{\beta}^T\mathbf{Z}_{kli}} \) and \( \mathbf{S}_{kl}^{(1)}=\sum_{i=1}^{n_{kl}} t_{kli}e^{\mathbf{\beta}^T\mathbf{Z}_{kli}} \mathbf{Z}_{kli} \).
Second derivative: \[ G(\mathbf{\beta})=\frac{\partial^2 l}{\partial \mathbf{\beta}^T\partial \mathbf{\beta}}= \sum_{k=1}^{K} \sum_{l=1}^{L} \sum_{i=1}^{n_{kl}} \Big[ - \frac{\mathbf{S}_{kl}^{(2)} S_{kl}^{(0)} -\mathbf{S}_{kl}^{(1)} {\mathbf{S}_{kl}^{(1)}}^T}{{S_{kl}^{(0)}}^2} \Big] {d_{kli}}, \] where \( \mathbf{S}_{kl}^{(2)}=\sum_{i=1}^{n_{kl}}t_{kli}e^{\mathbf{\beta}^T\mathbf{Z}_{kli}}\mathbf{Z}_{kli}\mathbf{Z}_{kli}^T \).
The goal is to select important comorbidity groupers. At each step, we update the coefficients for all adjustment variables and one comorbidity grouper using Boosting Algorithm. The groupers should be normalized before boosting.
| Project | Status | Efficiency Improvement |
|---|---|---|
| IUR | Completed. User friendly code available on Gitlab. | new R vs. old R ~ 144 times |
| C Index | Completed. User friendly code available on Gitlab. | Rcpp vs. R ~60 times; Rcpp vs. SAS ~1000 times |
| Variable selection | 80% completed. Need to test on a real data. | Rcpp vs. R ~ 115 times |
Liu, D., Schaubel, D. E., and Kalbfleisch, J. D. (2012). Computationally efficient marginal models for clustered recurrent event data. Biometrics, 68(2), 637-647.