## # A tibble: 6 × 44
## tx_date age_group muh_number full_name date_of_birth sex
## <dttm> <chr> <chr> <chr> <dttm> <chr>
## 1 2022-01-12 00:00:00 Adult 7756425 Bellamy, J… 1949-05-14 00:00:00 Fema…
## 2 2022-01-14 00:00:00 Adult 7774482 Pitt, Brys… 1998-05-11 00:00:00 Male
## 3 2022-01-21 00:00:00 Adult 953678 Klecha, Mi… 1951-11-02 00:00:00 Fema…
## 4 2022-01-25 00:00:00 Adult 7750707 Brentin, R… 1954-04-24 00:00:00 Male
## 5 2022-01-28 00:00:00 Adult 5842909 Smith, Mat… 1959-12-11 00:00:00 Male
## 6 2022-02-23 00:00:00 Adult 7451324 Gardner, J… 1951-05-08 00:00:00 Male
## # ℹ 38 more variables: diagnosis <chr>, subdiagnosis <chr>, bmt_status <chr>,
## # type_at_bmt <chr>, sub_type_at_bmt <chr>, date_of_dx <dttm>, outpt <lgl>,
## # conditioning_at_bmt <chr>, protocol_at_bmt <chr>, status_at_tx <chr>,
## # current_status_of_disease <chr>, anc_date <dttm>, anc_comment <chr>,
## # platelet_date <dttm>, platelet_comment_50 <chr>, rfi_classification <chr>,
## # hla <chr>, acute_gvhd <lgl>, acute_gvhd_peak <dbl>, chronic_gvhd <lgl>,
## # chronic_gvhd_peak <chr>, cmv_pt <lgl>, cmv_donor <lgl>, …
In this section, we count the number of allogeneic transplants per
year.
I excluded 2025 from the mean calculation since the year is
incomplete
and would artificially lower the average. I then show age and CMI
histograms to give a general distribution of the data.
## # A tibble: 4 × 2
## Tx_Year n
## <chr> <int>
## 1 2022 54
## 2 2023 48
## 3 2024 60
## 4 2025 38
## [1] 54
## **Median age at transplant:** 62.47632
## **Median CMI score:** 2
In this section, I looked at whether patient age at the time of
transplant was associated with survival at one year. I used a boxplot to
compare the age distributions between patients who were alive at one
year and those who were not. The wilcox.test was applied to formally
test whether there was a significant difference in age between the two
groups.
##
## Wilcoxon rank sum test with continuity correction
##
## data: age_at_tx by alive1year_recoded
## W = 2114, p-value = 0.1801
## alternative hypothesis: true location shift is not equal to 0
Next, I examined the relationship between comorbidity burden (CMI)
and survival at one year. The plot below shows the proportion of
patients alive at one year, stratified by CMI category. Each bar
represents the distribution of outcomes within a CMI group, with counts
displayed inside the bars for clarity.
To combine the effects of both age at transplant and comorbidity burden (CMI), I created a simple composite “risk score.” The score was defined as patient age plus five times the CMI value, giving extra weight to comorbidities. I then evaluated how well this combined score discriminated between patients who survived to one year versus those who did not. The ROC curve above shows how well the combined risk score separates patients by 1-year outcome. The area under the curve (AUC) provides a summary measure of discrimination: values closer to 1.0 indicate stronger predictive ability. I also extracted the best cutoff value, along with its sensitivity and specificity, to identify a threshold that could potentially classify patients into higher vs. lower risk groups.
## Area under the curve: 0.6099
## threshold sensitivity specificity
## 1 67.06108 0.4672897 0.8064516
## Call: survfit(formula = surv_obj ~ 1, data = surv_data)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 12 14 33 0.835 0.0262 0.785 0.888
## Call: survfit(formula = surv_obj ~ age_group3, data = surv_data)
##
## age_group3=<65
## time n.risk n.event survival std.err lower 95% CI
## 12.0000 8.0000 14.0000 0.8772 0.0307 0.8190
## upper 95% CI
## 0.9396
##
## age_group3=65–74
## time n.risk n.event survival std.err lower 95% CI
## 12.0000 4.0000 19.0000 0.7595 0.0481 0.6709
## upper 95% CI
## 0.8598
##
## age_group3=≥75
## time n.risk n.event survival std.err lower 95% CI
## 12 2 0 1 0 1
## upper 95% CI
## 1
## Call: survfit(formula = surv_obj ~ cmi_group3, data = surv_data)
##
## 11 observations deleted due to missingness
## cmi_group3=<3
## time n.risk n.event survival std.err lower 95% CI
## 12.0000 4.0000 15.0000 0.8718 0.0309 0.8133
## upper 95% CI
## 0.9345
##
## cmi_group3=3–6
## time n.risk n.event survival std.err lower 95% CI
## 12.0000 6.0000 15.0000 0.7619 0.0537 0.6637
## upper 95% CI
## 0.8747
##
## cmi_group3=≥6
## time n.risk n.event survival std.err lower 95% CI
## 12.000 3.000 1.000 0.889 0.105 0.706
## upper 95% CI
## 1.000
## Call: survfit(formula = surv_obj ~ risk67, data = surv_data)
##
## 11 observations deleted due to missingness
## risk67=High (≥67)
## time n.risk n.event survival std.err lower 95% CI
## 12.0000 11.0000 25.0000 0.7706 0.0403 0.6956
## upper 95% CI
## 0.8538
##
## risk67=Low (<67)
## time n.risk n.event survival std.err lower 95% CI
## 12.0000 2.0000 6.0000 0.9250 0.0294 0.8690
## upper 95% CI
## 0.9846
This score comes in later, but this is the cut-off if we can only have up to 15 patients per year (there are too many patients at the cut-off of 67)
## Call: survfit(formula = surv_obj ~ risk80, data = surv_data)
##
## 11 observations deleted due to missingness
## risk80=High (≥80)
## time n.risk n.event survival std.err lower 95% CI
## 12.0000 8.0000 13.0000 0.7679 0.0564 0.6649
## upper 95% CI
## 0.8868
##
## risk80=Low (<80)
## time n.risk n.event survival std.err lower 95% CI
## 12.0000 5.0000 18.0000 0.8647 0.0297 0.8084
## upper 95% CI
## 0.9248
Next I analyses whether LOS was associated with age or CMI score ## 6.1 Create LOS Dataset
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.00 18.00 20.00 24.83 26.00 100.00
##
## 0 1
## 154 30
To explore whether age at transplant and comorbidity burden (CMI) were associated with length of stay (LOS), I fit a multiple linear regression model. This model estimates how much LOS changes, on average, for each unit increase in age or CMI, while holding the other factor constant.
##
## Call:
## lm(formula = LOS_days ~ age_at_tx + cmi, data = data_los)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.358 -6.517 -4.110 0.696 73.782
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.661643 3.785091 5.987 1.12e-08 ***
## age_at_tx 0.008531 0.063175 0.135 0.893
## cmi 0.812862 0.516829 1.573 0.118
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.04 on 181 degrees of freedom
## Multiple R-squared: 0.01392, Adjusted R-squared: 0.003026
## F-statistic: 1.278 on 2 and 181 DF, p-value: 0.2812
## Area under the curve: 0.547
## threshold sensitivity specificity
## 1 67.65443 0.7 0.4675325
##
## 0 1
## High (≥80) 44 9
## Low (<80) 110 21
##
## Fisher's Exact Test for Count Data
##
## data: tbl_los80
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.3739103 2.5034470
## sample estimates:
## odds ratio
## 0.9336884
We worked backwards from a capacity of ~15 patients per year to determine a cutoff on the combined RiskScore = Age + 5×CMI.
## Cutoff MeanPerYear
## 1 60 33.25
## 2 65 28.75
## 3 70 25.00
## 4 75 18.75
## 5 80 14.00
## 6 85 7.50
## 7 90 3.75
## 8 95 2.25
## 9 100 1.25
## 10 105 1.00
## 11 110 1.00
## 12 115 NaN
## 13 120 NaN
# Identifying Patients for Geriatrics Evaluation Using Age Only
We worked backwards from a capacity of ~15 patients per year to determine an age cutoff. I then tested three different methods to try to identify the highest risk patients based on mortality - 1.) age alone, 2.) age + comorbidity combined score, 3.) Age > 70 OR CMI >5
## Cutoff MeanPerYear
## 1 50 35.00
## 2 51 34.50
## 3 52 34.25
## 4 53 33.50
## 5 54 33.00
## 6 55 32.50
## 7 56 31.50
## 8 57 30.75
## 9 58 30.25
## 10 59 29.75
## 11 60 29.25
## 12 61 27.25
## 13 62 25.50
## 14 63 24.50
## 15 64 23.00
## 16 65 21.50
## 17 66 20.00
## 18 67 18.25
## 19 68 16.50
## 20 69 14.75
## 21 70 11.00
## 22 71 7.50
## 23 72 5.75
## 24 73 4.25
## 25 74 3.25
## 26 75 1.75
## 27 76 1.50
## 28 77 1.25
## 29 78 1.00
## 30 79 NaN
## 31 80 NaN
We tested a combined rule: select patients if age_at_tx > 70 OR cmi > 5.
##
## Not Selected Selected
## 141 48
## # A tibble: 4 × 3
## Tx_Year `Not Selected` Selected
## <chr> <int> <int>
## 1 2022 34 15
## 2 2023 37 10
## 3 2024 45 15
## 4 2025 25 8
We compared two rules for selecting patients for geriatrics
evaluation:
1. Age > 70
2. RiskScore ≥ 80 (Age + 5×CMI)
## # A tibble: 4 × 3
## Tx_Year `Not Selected` Selected
## <chr> <int> <int>
## 1 2022 40 14
## 2 2023 40 8
## 3 2024 47 13
## 4 2025 29 9
## # A tibble: 4 × 3
## Tx_Year `Not Selected` Selected
## <chr> <int> <int>
## 1 2022 33 16
## 2 2023 33 14
## 3 2024 45 15
## 4 2025 22 11
We compared two rules for selecting patients for geriatrics
evaluation:
1. Age > 70
2. RiskScore ≥ 80 (Age + 5×CMI)
## # A tibble: 4 × 3
## Tx_Year `Not Selected` Selected
## <chr> <int> <int>
## 1 2022 40 14
## 2 2023 40 8
## 3 2024 47 13
## 4 2025 29 9
## # A tibble: 4 × 3
## Tx_Year `Not Selected` Selected
## <chr> <int> <int>
## 1 2022 33 16
## 2 2023 33 14
## 3 2024 45 15
## 4 2025 22 11
## # A tibble: 12 × 3
## Tx_Year Rule n_selected
## <chr> <chr> <dbl>
## 1 2022 Age >70 14
## 2 2023 Age >70 8
## 3 2024 Age >70 13
## 4 2025 Age >70 9
## 5 2022 RiskScore ≥80 16
## 6 2023 RiskScore ≥80 14
## 7 2024 RiskScore ≥80 15
## 8 2025 RiskScore ≥80 11
## 9 2022 Age >70 OR CMI >5 15
## 10 2023 Age >70 OR CMI >5 10
## 11 2024 Age >70 OR CMI >5 15
## 12 2025 Age >70 OR CMI >5 8
## Average Number of Patients per Year (Summary)
## # A tibble: 3 × 4
## Rule Avg_per_year Min_per_year Max_per_year
## <chr> <dbl> <dbl> <dbl>
## 1 Age >70 11 8 14
## 2 Age >70 OR CMI >5 12 8 15
## 3 RiskScore ≥80 14 11 16