1 Setup

2 1. Data Preparation

## # A tibble: 6 × 44
##   tx_date             age_group muh_number full_name   date_of_birth       sex  
##   <dttm>              <chr>     <chr>      <chr>       <dttm>              <chr>
## 1 2022-01-12 00:00:00 Adult     7756425    Bellamy, J… 1949-05-14 00:00:00 Fema…
## 2 2022-01-14 00:00:00 Adult     7774482    Pitt, Brys… 1998-05-11 00:00:00 Male 
## 3 2022-01-21 00:00:00 Adult     953678     Klecha, Mi… 1951-11-02 00:00:00 Fema…
## 4 2022-01-25 00:00:00 Adult     7750707    Brentin, R… 1954-04-24 00:00:00 Male 
## 5 2022-01-28 00:00:00 Adult     5842909    Smith, Mat… 1959-12-11 00:00:00 Male 
## 6 2022-02-23 00:00:00 Adult     7451324    Gardner, J… 1951-05-08 00:00:00 Male 
## # ℹ 38 more variables: diagnosis <chr>, subdiagnosis <chr>, bmt_status <chr>,
## #   type_at_bmt <chr>, sub_type_at_bmt <chr>, date_of_dx <dttm>, outpt <lgl>,
## #   conditioning_at_bmt <chr>, protocol_at_bmt <chr>, status_at_tx <chr>,
## #   current_status_of_disease <chr>, anc_date <dttm>, anc_comment <chr>,
## #   platelet_date <dttm>, platelet_comment_50 <chr>, rfi_classification <chr>,
## #   hla <chr>, acute_gvhd <lgl>, acute_gvhd_peak <dbl>, chronic_gvhd <lgl>,
## #   chronic_gvhd_peak <chr>, cmv_pt <lgl>, cmv_donor <lgl>, …

3 Summary Statistics - Allo

In this section, we count the number of allogeneic transplants per year.
I excluded 2025 from the mean calculation since the year is incomplete
and would artificially lower the average. I then show age and CMI histograms to give a general distribution of the data.

## # A tibble: 4 × 2
##   Tx_Year     n
##   <chr>   <int>
## 1 2022       54
## 2 2023       48
## 3 2024       60
## 4 2025       38

## [1] 54

## **Median age at transplant:** 62.47632

## **Median CMI score:** 2

4 2. Age Analysis

4.1 2.1 Histogram of Age

4.2 2.2 Age vs Outcome at 1 Year

In this section, I looked at whether patient age at the time of transplant was associated with survival at one year. I used a boxplot to compare the age distributions between patients who were alive at one year and those who were not. The wilcox.test was applied to formally test whether there was a significant difference in age between the two groups.

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  age_at_tx by alive1year_recoded
## W = 2114, p-value = 0.1801
## alternative hypothesis: true location shift is not equal to 0

5 3. CMI Analysis

5.1 3.1 Distribution of CMI

5.2 3.2 1-Year Outcome by CMI

Next, I examined the relationship between comorbidity burden (CMI) and survival at one year. The plot below shows the proportion of patients alive at one year, stratified by CMI category. Each bar represents the distribution of outcomes within a CMI group, with counts displayed inside the bars for clarity.

6 4. Combined Risk Score Analysis

To combine the effects of both age at transplant and comorbidity burden (CMI), I created a simple composite “risk score.” The score was defined as patient age plus five times the CMI value, giving extra weight to comorbidities. I then evaluated how well this combined score discriminated between patients who survived to one year versus those who did not. The ROC curve above shows how well the combined risk score separates patients by 1-year outcome. The area under the curve (AUC) provides a summary measure of discrimination: values closer to 1.0 indicate stronger predictive ability. I also extracted the best cutoff value, along with its sensitivity and specificity, to identify a threshold that could potentially classify patients into higher vs. lower risk groups.

## Area under the curve: 0.6099

##   threshold sensitivity specificity
## 1  67.06108   0.4672897   0.8064516

7 5. Survival Analyses (Kaplan–Meier)

7.1 5.1 Prepare Data

7.2 5.2 Overall Survival

## Call: survfit(formula = surv_obj ~ 1, data = surv_data)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    12     14      33    0.835  0.0262        0.785        0.888

7.3 5.3 By Age

## Call: survfit(formula = surv_obj ~ age_group3, data = surv_data)
## 
##                 age_group3=<65 
##         time       n.risk      n.event     survival      std.err lower 95% CI 
##      12.0000       8.0000      14.0000       0.8772       0.0307       0.8190 
## upper 95% CI 
##       0.9396 
## 
##                 age_group3=65–74 
##         time       n.risk      n.event     survival      std.err lower 95% CI 
##      12.0000       4.0000      19.0000       0.7595       0.0481       0.6709 
## upper 95% CI 
##       0.8598 
## 
##                 age_group3=≥75 
##         time       n.risk      n.event     survival      std.err lower 95% CI 
##           12            2            0            1            0            1 
## upper 95% CI 
##            1

7.4 5.4 By CMI

## Call: survfit(formula = surv_obj ~ cmi_group3, data = surv_data)
## 
## 11 observations deleted due to missingness 
##                 cmi_group3=<3 
##         time       n.risk      n.event     survival      std.err lower 95% CI 
##      12.0000       4.0000      15.0000       0.8718       0.0309       0.8133 
## upper 95% CI 
##       0.9345 
## 
##                 cmi_group3=3–6 
##         time       n.risk      n.event     survival      std.err lower 95% CI 
##      12.0000       6.0000      15.0000       0.7619       0.0537       0.6637 
## upper 95% CI 
##       0.8747 
## 
##                 cmi_group3=≥6 
##         time       n.risk      n.event     survival      std.err lower 95% CI 
##       12.000        3.000        1.000        0.889        0.105        0.706 
## upper 95% CI 
##        1.000

7.5 5.5 By Risk Score ≥67

## Call: survfit(formula = surv_obj ~ risk67, data = surv_data)
## 
## 11 observations deleted due to missingness 
##                 risk67=High (≥67) 
##         time       n.risk      n.event     survival      std.err lower 95% CI 
##      12.0000      11.0000      25.0000       0.7706       0.0403       0.6956 
## upper 95% CI 
##       0.8538 
## 
##                 risk67=Low (<67) 
##         time       n.risk      n.event     survival      std.err lower 95% CI 
##      12.0000       2.0000       6.0000       0.9250       0.0294       0.8690 
## upper 95% CI 
##       0.9846

7.6 5.6 By Risk Score ≥80

This score comes in later, but this is the cut-off if we can only have up to 15 patients per year (there are too many patients at the cut-off of 67)

## Call: survfit(formula = surv_obj ~ risk80, data = surv_data)
## 
## 11 observations deleted due to missingness 
##                 risk80=High (≥80) 
##         time       n.risk      n.event     survival      std.err lower 95% CI 
##      12.0000       8.0000      13.0000       0.7679       0.0564       0.6649 
## upper 95% CI 
##       0.8868 
## 
##                 risk80=Low (<80) 
##         time       n.risk      n.event     survival      std.err lower 95% CI 
##      12.0000       5.0000      18.0000       0.8647       0.0297       0.8084 
## upper 95% CI 
##       0.9248

8 6. Length of Stay (LOS) Analysis

Next I analyses whether LOS was associated with age or CMI score ## 6.1 Create LOS Dataset

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.00   18.00   20.00   24.83   26.00  100.00

## 
##   0   1 
## 154  30

8.1 6.2 Distribution of LOS

8.2 6.3 Regression of LOS on Age + CMI

To explore whether age at transplant and comorbidity burden (CMI) were associated with length of stay (LOS), I fit a multiple linear regression model. This model estimates how much LOS changes, on average, for each unit increase in age or CMI, while holding the other factor constant.

## 
## Call:
## lm(formula = LOS_days ~ age_at_tx + cmi, data = data_los)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.358  -6.517  -4.110   0.696  73.782 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 22.661643   3.785091   5.987 1.12e-08 ***
## age_at_tx    0.008531   0.063175   0.135    0.893    
## cmi          0.812862   0.516829   1.573    0.118    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.04 on 181 degrees of freedom
## Multiple R-squared:  0.01392,    Adjusted R-squared:  0.003026 
## F-statistic: 1.278 on 2 and 181 DF,  p-value: 0.2812

8.3 6.4 LOS by Age Decade

8.4 6.5 ROC Curve for LOS >30 Days

## Area under the curve: 0.547

##   threshold sensitivity specificity
## 1  67.65443         0.7   0.4675325

8.5 6.6 Outcome by Fixed Cutoff (≥80)

##             
##                0   1
##   High (≥80)  44   9
##   Low (<80)  110  21

## 
##  Fisher's Exact Test for Count Data
## 
## data:  tbl_los80
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.3739103 2.5034470
## sample estimates:
## odds ratio 
##  0.9336884

9 Identifying Patients for Geriatrics Evaluation (Capacity ≈15 per Year)

We worked backwards from a capacity of ~15 patients per year to determine a cutoff on the combined RiskScore = Age + 5×CMI.

9.1 Calculate RiskScore and Test Cutoffs

##    Cutoff MeanPerYear
## 1      60       33.25
## 2      65       28.75
## 3      70       25.00
## 4      75       18.75
## 5      80       14.00
## 6      85        7.50
## 7      90        3.75
## 8      95        2.25
## 9     100        1.25
## 10    105        1.00
## 11    110        1.00
## 12    115         NaN
## 13    120         NaN

9.2 Plot: Average Patients per Year vs Cutoff

9.3 Scatterplot: Patients Selected (Cutoff Example ≥80)

# Identifying Patients for Geriatrics Evaluation Using Age Only

We worked backwards from a capacity of ~15 patients per year to determine an age cutoff. I then tested three different methods to try to identify the highest risk patients based on mortality - 1.) age alone, 2.) age + comorbidity combined score, 3.) Age > 70 OR CMI >5

9.4 Calculate Age Cutoffs

##    Cutoff MeanPerYear
## 1      50       35.00
## 2      51       34.50
## 3      52       34.25
## 4      53       33.50
## 5      54       33.00
## 6      55       32.50
## 7      56       31.50
## 8      57       30.75
## 9      58       30.25
## 10     59       29.75
## 11     60       29.25
## 12     61       27.25
## 13     62       25.50
## 14     63       24.50
## 15     64       23.00
## 16     65       21.50
## 17     66       20.00
## 18     67       18.25
## 19     68       16.50
## 20     69       14.75
## 21     70       11.00
## 22     71        7.50
## 23     72        5.75
## 24     73        4.25
## 25     74        3.25
## 26     75        1.75
## 27     76        1.50
## 28     77        1.25
## 29     78        1.00
## 30     79         NaN
## 31     80         NaN

9.5 Plot: Average Patients per Year vs Age Cutoff

9.6 Scatterplot: Patients Selected (Cutoff Example ≥70)

10 Identifying Patients for Geriatrics Evaluation (Age >70 OR CMI >5)

We tested a combined rule: select patients if age_at_tx > 70 OR cmi > 5.

10.1 Flag Selected Patients

## 
## Not Selected     Selected 
##          141           48

10.2 Number of Selected Patients per Year

## # A tibble: 4 × 3
##   Tx_Year `Not Selected` Selected
##   <chr>            <int>    <int>
## 1 2022                34       15
## 2 2023                37       10
## 3 2024                45       15
## 4 2025                25        8

10.3 Plot: Number of Selected Patients per Year

10.4 Scatterplot: Age vs CMI (Selection Rule Highlighted)

11 Number of Patients Selected per Year (Different Rules)

We compared two rules for selecting patients for geriatrics evaluation:
1. Age > 70
2. RiskScore ≥ 80 (Age + 5×CMI)

11.1 Rule 1: Age > 70

## # A tibble: 4 × 3
##   Tx_Year `Not Selected` Selected
##   <chr>            <int>    <int>
## 1 2022                40       14
## 2 2023                40        8
## 3 2024                47       13
## 4 2025                29        9

11.2 Rule 2: RiskScore ≥ 80

## # A tibble: 4 × 3
##   Tx_Year `Not Selected` Selected
##   <chr>            <int>    <int>
## 1 2022                33       16
## 2 2023                33       14
## 3 2024                45       15
## 4 2025                22       11

12 Number of Patients Selected per Year (Different Rules)