Bone marrow transplants dataset

Vinicius Stelet

Dataset Overview

This dataset comprises 187 allogeneic unrelated donor hematopoietic stem cell transplants.

Variable to be predicted: survival_status. [ 0 - alive (102), 1 - dead (85) ]

It is composed by 37 variables, (factor: 27, numeric: 10) that can be divided in 05 groups:

Dataset Overview

  • Donor: donor_age, donor_ABO, donor_CMV, donor_age_below_35
  • Receptor: recipient_age, recipient_ABO, recipient_CMV, recipient_body_mass, recipient_rh, recipient_gender, recipient_age_below_10, recipient_age_int
  • Donor x Receptor match: ABO_match, CMV_status, HLA_match, HLA_mismatch, gender_match
  • Transplant procedure: risk_group, stem_cell_source, tx_post_relapse, CD34_x1e6_per_kg, CD3_x1e8_per_kg, CD3_to_CD34_ratio

  • Transplant outcomes: ANC_recovery, PLT_recovery, acute_GvHD_II_III_IV, acute_GvHD_III_IV, time_to_acute_GvHD_III_IV, extensive_chronic_GvHD, relapse, survival_time

  • Donors
Characteristic 0, N = 1021 1, N = 851
donor_age 32 (27-40) 35 (26-40)
donor_ABO

    0 40 (39%) 33 (39%)
    A 35 (34%) 36 (42%)
    B 16 (16%) 12 (14%)
    AB 11 (11%) 4 (4.7%)
donor_CMV

    absent 59 (58%) 54 (64%)
    present 42 (42%) 30 (36%)
    Unknown 1 1
donor_age_below_35 62 (61%) 42 (49%)
1 Median (25%-75%); n (%)
  • Receptors
Characteristic 0, N = 1021 1, N = 851
recipient_age 8.5 (4.7-13.2) 12.1 (6.1-16.0)
recipient_ABO

    A 39 (38%) 36 (43%)
    B 28 (27%) 22 (26%)
    0 27 (26%) 21 (25%)
    AB 8 (7.8%) 5 (6.0%)
    Unknown 0 1
recipient_CMV

    present 55 (56%) 45 (60%)
    absent 43 (44%) 30 (40%)
    Unknown 4 10
recipient_body_mass 27 (18-46) 40 (23-55)
    Unknown 0 2
recipient_rh

    plus 83 (81%) 75 (90%)
    minus 19 (19%) 8 (9.6%)
    Unknown 0 2
recipient_gender

    male 60 (59%) 52 (61%)
    female 42 (41%) 33 (39%)
recipient_age_below_10 61 (60%) 38 (45%)
recipient_age_int

    10_20 42 (41%) 47 (55%)
    5_10 30 (29%) 21 (25%)
    0_5 30 (29%) 17 (20%)
1 Median (25%-75%); n (%)
  • Donor x Receptor match
Characteristic 0, N = 1021 1, N = 851
ABO_match

    mismatched 77 (75%) 57 (68%)
    matched 25 (25%) 27 (32%)
    Unknown 0 1
CMV_status

    2 31 (32%) 26 (35%)
    0 26 (27%) 22 (29%)
    3 21 (22%) 18 (24%)
    1 18 (19%) 9 (12%)
    Unknown 6 10
HLA_match

    10/10 53 (52%) 41 (48%)
    9/10 34 (33%) 31 (36%)
    8/10 13 (13%) 10 (12%)
    7/10 2 (2.0%) 3 (3.5%)
HLA_mismatch

    matched 87 (85%) 72 (85%)
    mismatched 15 (15%) 13 (15%)
gender_match

    other 85 (83%) 70 (82%)
    female_to_male 17 (17%) 15 (18%)
1 n (%)
  • Transplant
Characteristic 0, N = 1021 1, N = 851
risk_group

    low 71 (70%) 47 (55%)
    high 31 (30%) 38 (45%)
stem_cell_source

    peripheral_blood 84 (82%) 61 (72%)
    bone_marrow 18 (18%) 24 (28%)
tx_post_relapse 9 (8.8%) 14 (16%)
CD34_x1e6_per_kg 11 (7-17) 8 (5-11)
CD3_x1e8_per_kg 5.1 (2.2-7.4) 3.3 (0.9-5.7)
    Unknown 1 4
CD3_to_CD34_ratio 2.7 (1.8-4.2) 2.9 (1.8-7.3)
    Unknown 1 4
1 n (%); Median (25%-75%)
  • Post transplant outcomes
Characteristic 0, N = 1021 1, N = 851
ANC_recovery 15.0 (14.0-17.0) 15.0 (13.0-18.0)
PLT_recovery 20 (15-31) 27 (17-48)
acute_GvHD_II_III_IV 58 (57%) 54 (64%)
acute_GvHD_III_IV 17 (17%) 23 (27%)
time_to_acute_GvHD_III_IV 1,000,000 (1,000,000-1,000,000) 1,000,000 (62-1,000,000)
extensive_chronic_GvHD 11 (11%) 17 (31%)
    Unknown 0 31
relapse 5 (4.9%) 23 (27%)
survival_time 1,428 (999-2,041) 149 (58-330)
1 Median (25%-75%); n (%)

Challenges:

  1. NA values

There are 81 missing values out of 6919 total values (1.17 %).

name value
extensive_chronic_GvHD 31
CMV_status 16
recipient_CMV 14
CD3_x1e8_per_kg 5
CD3_to_CD34_ratio 5
donor_CMV 2
recipient_body_mass 2
recipient_rh 2
recipient_ABO 1
ABO_match 1
antigen 1
allel 1
  1. Highly Correlated variables

Challenges:

3.1) Discrepant values

3.2) Discrepant values

3.3) Discrepant values