A Summary of all associated variables
|
x
|
|
age_at_procedure
|
|
weight
|
|
height
|
|
bsa
|
|
|
x
|
|
lvedp
|
|
max_d_p_dt
|
|
min_d_p_dt
|
|
tau
|
|
lv_syst_pressure
|
|
lv_diast_press
|
|
sbp
|
|
dbp
|
|
mbp
|
|
|
x
|
|
hr_echo
|
|
ext_lv_diam
|
|
int_lv_diam
|
|
lv_mass
|
|
mass_index
|
|
lv_mi
|
|
la_4c
|
|
la_2c
|
|
la_lengh
|
|
la_vol
|
|
la_vi
|
|
|
x
|
|
e
|
|
ivrt
|
|
s
|
|
d
|
|
s_d
|
|
ar
|
|
ar_dur
|
|
|
x
|
|
ea_lat
|
|
z_e_lat
|
|
ea_ivs
|
|
z_e_ivs
|
|
e_e_mean
|
|
e_elat
|
|
z_score_e_e_lat
|
|
e_e_ivs
|
|
z_score_e_e_med
|
|
mpi
|
|
inflow_duration
|
|
vp_cm_s
|
|
|
x
|
|
gl_ls
|
|
gl_ls_rs
|
|
gl_ls_re
|
|
gl_ls_ra
|
|
|
x
|
|
hr_cath
|
|
a
|
|
e_a
|
|
dt
|
|
a_duration
|
|
ln_e_a
|
|
z_e_a
|
|
a_ar_dur
|
|
bsa_pow_f
|
|
bsa_pow
|
|
bsa_pow_1
|
|
bsa_pow_2
|
|
bsa_pow_3
|
|
h2_7
|
|
This is a list of all of the variables I was interested in. The last column are the variables I threw out due to missing data or not being a true variable.
List of variables to be used in random forest
missForest iteration 1 in progress...done!
missForest iteration 2 in progress...done!
missForest iteration 3 in progress...done!
missForest iteration 4 in progress...done!
missForest iteration 5 in progress...done!
List of selected variables to be used in random forest
|
age_at_procedure
|
la_vi
|
d
|
z_e_lat
|
e_elat
|
mpi
|
|
bsa
|
e
|
s_d
|
ea_ivs
|
z_score_e_e_lat
|
inflow_duration
|
|
hr_echo
|
ivrt
|
ar_dur
|
z_e_ivs
|
e_e_ivs
|
vp_cm_s
|
|
mass_index
|
s
|
ea_lat
|
e_e_mean
|
z_score_e_e_med
|
gl_ls_re
|
There were some random missing values in a small number of variables. Since randomforest does not like missing values, I imputed them using a package called missingForest which uses random forests for nonparametric imputation.
I ultimately chose this list of variables
RandomForest LVEDP

I ran the rf a few times with a few different combinations of trees, but the RMSE was consistently around 35 with %variance explained around 22-25%.
vp_cm_s, IVRT, MPI, LAVi consistently came out on top.
RandomForest Tau

Here the RMSE is also high at 131 and % variance explained is 12.
Again, the variables that explain tau the most are pretty similar to LVEDP: vp, ivrt, la_vi
Comparing Tau and LVEDP

I plotted the feature’s level of importance with respect to the two outcome variables, LVEDP and Tau. The first thing you can see is that tau is better explained by this model than lvedp is. The second thing to note is that lvedp and tau have similar variables that do the most explaining.
Tau and LVEDP’s top 5
|
lvedp
|
tau
|
|
vp_cm_s
|
vp_cm_s
|
|
la_vi
|
ivrt
|
|
mpi
|
z_e_lat
|
|
ivrt
|
la_vi
|
|
e
|
s
|
Workflow with diagnostic group variable

Adding Diagnostic group variable to the models improves their RMSE and %variance explained very modestly (a few percentage points). It ranks in the top 10 in lvedp but not tau.
Additional dx group plot
