Analysis from February 9, 2018
Curves from February 9, 2018
The ‘short list’ of univariate predictors is: . These were selected on the basis of having an FDR <= - . `
| coef | exp(coef) | se(coef) | z | p | |
|---|---|---|---|---|---|
| v002_bn_crtnn_rt_3097_3_infoNormal | -1.118 | 0.3271 | 0.4644 | -2.406 | 0.01611 |
| e_opd_anlgscsTRUE | 0.6246 | 1.867 | 0.3407 | 1.833 | 0.06675 |
Likelihood ratio test=10.87 on 2 df, p=0.004357139 n= 540, number of events= 75
Full model from last time
| coef | exp(coef) | se(coef) | z | p | |
|---|---|---|---|---|---|
| e_hispTRUE | 0.2062 | 1.229 | 0.2792 | 0.7383 | 0.4603 |
| e_rdwrbc | 0.06473 | 1.067 | 0.0549 | 1.179 | 0.2383 |
| e_opd_anlgscsTRUE | 0.8273 | 2.287 | 0.4724 | 1.751 | 0.07988 |
Likelihood ratio test=4.33 on 3 df, p=0.2275098 n= 350, number of events= 52
This version only uses e_hisp as a predictor
| coef | exp(coef) | se(coef) | z | p | |
|---|---|---|---|---|---|
| e_hispTRUE | 0.2008 | 1.222 | 0.278 | 0.7224 | 0.47 |
Likelihood ratio test=0.52 on 1 df, p=0.4698551 n= 350, number of events= 52
This version omits e_hisp
| coef | exp(coef) | se(coef) | z | p | |
|---|---|---|---|---|---|
| e_rdwrbc | 0.0612 | 1.063 | 0.0546 | 1.121 | 0.2624 |
| e_opd_anlgscsTRUE | 0.842 | 2.321 | 0.4719 | 1.784 | 0.07439 |
Likelihood ratio test=3.79 on 2 df, p=0.1503818 n= 350, number of events= 52
Full model
Ethnicity-only
No Ethnicity
The following variables were used by the earlier analysis:
v029_Hspnc_or_Ltn is the Hispanic ethnicity indicator
e_hisp for EMR/i2b2a_hisp_naaccr for our NAACCR registry.v050_RDW_RBC_At_Rt_GENERIC_KUH_COMPONENT_ID_5629_numnona, the red bloodcell count.
v113_rdw_rbc_at_rt_788_0_num in EMR/i2b2v037_CN_ANLGSCS, analgesics,
v110_opd_anlgscs (opioid) and v123_nn_opd_anlgscs (non-opioid) in EMR/i2b2a_cens_1 is the event indicator variable, which in this case is the earliest occurrence of a secondary neoplasm as represented by any of these variables: v003_Scndr_nrndcrn, v004_mlgnt_unspcfd, v005_rsprtr_dgstv, and v006_unspcfd_mlgnt.
v008_scndr_nrndcrn, v008_scndr_nrndcrn_inactive, v009_mlgnt_unspcfd, v009_mlgnt_unspcfd_inactive, v010_rsprtr_dgstv, v010_rsprtr_dgstv_inactive, v011_unspcfd_mlgnt, v011_unspcfd_mlgnt_inactive, v012_unspcfd_mlgnt, v012_unspcfd_mlgnt_inactive, v013_rsprtr_dgstv, v013_rsprtr_dgstv_inactive, v014_mlgnt_spcfd, v014_mlgnt_spcfd_inactive.a_n_recur variable.a_dxage3 was obtained by getting the patient’s age at the earliest of: their first secondary neoplasm (i.e. a_cens_1 is equal to 1), their last followup, or 2723 days (in the old data, the last recurrence to be observed) and then that patient’s age at initial diagnosis was subtracted from this quantity, so that a_dxage3 is (supposed to be) the number of days from initial diagnosis to first recurrence.
a_e_kc is the analytic variable in the EMR/i2b2 data that has this information.a_tdiagSo, the following model should reproduce the old results:
Surv(a_emr_tdiag,a_emr_crecur == 1) ~ e_hisp + v113_rdw_rbc_at_rt_788_0_num + v110_opd_anlgscs
The following model just shows the effect of Hispanic ethnicity:
Surv(a_emr_tdiag,a_emr_crecur == 1) ~ e_hisp
TODO:
TIME and EVENT placeholders in the above models using a_e_kc and the secondary tumor codes in data.Rexploration.R in this script as well so that it’s readableTime: 2-3 days
TODO:
Only the second model can be run – Surv(TIME,EVENT) ~ a_hisp_naaccr.
Time: 1 day
TODO:
One source should be treated as the authoritative one and back-filled from the other source when missing/invalid. What standards should we use for missing or invalid?
Time: 1 day after above two are done and this question is answered.
TODO:
Time: 1-2 days
The results are saved and available for use by other scriports if you place 'disparity.R' among the values in their .deps variables.
UT Health San Antonio↩