A Summary of all associated variables

x
age_at_procedure
weight
height
bsa
x
lvedp
max_d_p_dt
min_d_p_dt
tau
lv_syst_pressure
lv_diast_press
sbp
dbp
mbp
x
hr_echo
ext_lv_diam
int_lv_diam
lv_mass
mass_index
lv_mi
la_4c
la_2c
la_lengh
la_vol
la_vi
x
e
ivrt
s
d
s_d
ar
ar_dur
x
ea_lat
z_e_lat
ea_ivs
z_e_ivs
e_e_mean
e_elat
z_score_e_e_lat
e_e_ivs
z_score_e_e_med
mpi
inflow_duration
vp_cm_s
x
gl_ls
gl_ls_rs
gl_ls_re
gl_ls_ra
x
hr_cath
a
e_a
dt
a_duration
ln_e_a
z_e_a
a_ar_dur
bsa_pow_f
bsa_pow
bsa_pow_1
bsa_pow_2
bsa_pow_3
h2_7

This is a list of all of the variables I was interested in. The last column are the variables I threw out due to missing data or not being a true variable.

List of variables to be used in random forest

  missForest iteration 1 in progress...done!
  missForest iteration 2 in progress...done!
  missForest iteration 3 in progress...done!
  missForest iteration 4 in progress...done!
  missForest iteration 5 in progress...done!
List of selected variables to be used in random forest
age_at_procedure la_vi d z_e_lat e_elat mpi
bsa e s_d ea_ivs z_score_e_e_lat inflow_duration
hr_echo ivrt ar_dur z_e_ivs e_e_ivs vp_cm_s
mass_index s ea_lat e_e_mean z_score_e_e_med gl_ls_re

There were some random missing values in a small number of variables. Since randomforest does not like missing values, I imputed them using a package called missingForest which uses random forests for nonparametric imputation.

I ultimately chose this list of variables

RandomForest LVEDP


I ran the rf a few times with a few different combinations of trees, but the RMSE was consistently around 35 with %variance explained around 22-25%.

vp_cm_s, IVRT, MPI, LAVi consistently came out on top.

RandomForest Tau


Here the RMSE is also high at 131 and % variance explained is 12.

Again, the variables that explain tau the most are pretty similar to LVEDP: vp, ivrt, la_vi

Comparing Tau and LVEDP


I plotted the feature’s level of importance with respect to the two outcome variables, LVEDP and Tau. The first thing you can see is that tau is better explained by this model than lvedp is. The second thing to note is that lvedp and tau have similar variables that do the most explaining.

Tau and LVEDP’s top 5

lvedp tau
vp_cm_s vp_cm_s
la_vi ivrt
mpi z_e_lat
ivrt la_vi
e s

Workflow with diagnostic group variable


Adding Diagnostic group variable to the models improves their RMSE and %variance explained very modestly (a few percentage points). It ranks in the top 10 in lvedp but not tau.

Additional dx group plot