Data References :- https://www.kaggle.com/andrewmvd/heart-failure-clinical-data
Package References :- https://modelstudio.drwhy.ai/articles/ms-r-python-examples.html
## Warning: package 'readr' was built under R version 4.0.3
##
## -- Column specification --------------------------------------------------------
## cols(
## age = col_double(),
## anaemia = col_double(),
## creatinine_phosphokinase = col_double(),
## diabetes = col_double(),
## ejection_fraction = col_double(),
## high_blood_pressure = col_double(),
## platelets = col_double(),
## serum_creatinine = col_double(),
## serum_sodium = col_double(),
## sex = col_double(),
## smoking = col_double(),
## time = col_double(),
## DEATH_EVENT = col_double()
## )
## tibble [299 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ age : num [1:299] 75 55 65 50 65 90 75 60 65 80 ...
## $ anaemia : num [1:299] 0 0 0 1 1 1 1 1 0 1 ...
## $ creatinine_phosphokinase: num [1:299] 582 7861 146 111 160 ...
## $ diabetes : num [1:299] 0 0 0 0 1 0 0 1 0 0 ...
## $ ejection_fraction : num [1:299] 20 38 20 20 20 40 15 60 65 35 ...
## $ high_blood_pressure : num [1:299] 1 0 0 0 0 1 0 0 0 1 ...
## $ platelets : num [1:299] 265000 263358 162000 210000 327000 ...
## $ serum_creatinine : num [1:299] 1.9 1.1 1.3 1.9 2.7 2.1 1.2 1.1 1.5 9.4 ...
## $ serum_sodium : num [1:299] 130 136 129 137 116 132 137 131 138 133 ...
## $ sex : num [1:299] 1 1 1 1 0 1 1 1 0 1 ...
## $ smoking : num [1:299] 0 0 1 0 0 1 0 1 0 1 ...
## $ time : num [1:299] 4 6 7 7 8 8 10 10 10 10 ...
## $ DEATH_EVENT : num [1:299] 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "spec")=
## .. cols(
## .. age = col_double(),
## .. anaemia = col_double(),
## .. creatinine_phosphokinase = col_double(),
## .. diabetes = col_double(),
## .. ejection_fraction = col_double(),
## .. high_blood_pressure = col_double(),
## .. platelets = col_double(),
## .. serum_creatinine = col_double(),
## .. serum_sodium = col_double(),
## .. sex = col_double(),
## .. smoking = col_double(),
## .. time = col_double(),
## .. DEATH_EVENT = col_double()
## .. )
## age anaemia creatinine_phosphokinase diabetes
## Min. :40.00 Min. :0.0000 Min. : 23.0 Min. :0.0000
## 1st Qu.:51.00 1st Qu.:0.0000 1st Qu.: 116.5 1st Qu.:0.0000
## Median :60.00 Median :0.0000 Median : 250.0 Median :0.0000
## Mean :60.83 Mean :0.4314 Mean : 581.8 Mean :0.4181
## 3rd Qu.:70.00 3rd Qu.:1.0000 3rd Qu.: 582.0 3rd Qu.:1.0000
## Max. :95.00 Max. :1.0000 Max. :7861.0 Max. :1.0000
## ejection_fraction high_blood_pressure platelets serum_creatinine
## Min. :14.00 Min. :0.0000 Min. : 25100 Min. :0.500
## 1st Qu.:30.00 1st Qu.:0.0000 1st Qu.:212500 1st Qu.:0.900
## Median :38.00 Median :0.0000 Median :262000 Median :1.100
## Mean :38.08 Mean :0.3512 Mean :263358 Mean :1.394
## 3rd Qu.:45.00 3rd Qu.:1.0000 3rd Qu.:303500 3rd Qu.:1.400
## Max. :80.00 Max. :1.0000 Max. :850000 Max. :9.400
## serum_sodium sex smoking time
## Min. :113.0 Min. :0.0000 Min. :0.0000 Min. : 4.0
## 1st Qu.:134.0 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 73.0
## Median :137.0 Median :1.0000 Median :0.0000 Median :115.0
## Mean :136.6 Mean :0.6488 Mean :0.3211 Mean :130.3
## 3rd Qu.:140.0 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:203.0
## Max. :148.0 Max. :1.0000 Max. :1.0000 Max. :285.0
## DEATH_EVENT
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.3211
## 3rd Qu.:1.0000
## Max. :1.0000
## [1] 299 13
## [1] "age" "anaemia"
## [3] "creatinine_phosphokinase" "diabetes"
## [5] "ejection_fraction" "high_blood_pressure"
## [7] "platelets" "serum_creatinine"
## [9] "serum_sodium" "sex"
## [11] "smoking" "time"
## [13] "DEATH_EVENT"
## Warning: package 'tidymodels' was built under R version 4.0.3
## -- Attaching packages -------------------------------------- tidymodels 0.1.2 --
## v broom 0.7.4 v recipes 0.1.15
## v dials 0.0.9 v rsample 0.0.8
## v dplyr 1.0.3 v tibble 3.0.5
## v ggplot2 3.3.3 v tidyr 1.1.2
## v infer 0.5.4 v tune 0.1.2
## v modeldata 0.1.0 v workflows 0.2.1
## v parsnip 0.1.5 v yardstick 0.0.7
## v purrr 0.3.4
## Warning: package 'dials' was built under R version 4.0.3
## Warning: package 'scales' was built under R version 4.0.3
## Warning: package 'dplyr' was built under R version 4.0.3
## Warning: package 'ggplot2' was built under R version 4.0.3
## Warning: package 'infer' was built under R version 4.0.3
## Warning: package 'modeldata' was built under R version 4.0.3
## Warning: package 'parsnip' was built under R version 4.0.3
## Warning: package 'purrr' was built under R version 4.0.3
## Warning: package 'recipes' was built under R version 4.0.3
## Warning: package 'rsample' was built under R version 4.0.3
## Warning: package 'tibble' was built under R version 4.0.3
## Warning: package 'tidyr' was built under R version 4.0.3
## Warning: package 'tune' was built under R version 4.0.3
## Warning: package 'workflows' was built under R version 4.0.3
## Warning: package 'yardstick' was built under R version 4.0.3
## -- Conflicts ----------------------------------------- tidymodels_conflicts() --
## x purrr::discard() masks scales::discard()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x yardstick::spec() masks readr::spec()
## x recipes::step() masks stats::step()
## Warning: package 'DALEXtra' was built under R version 4.0.4
## Loading required package: DALEX
## Warning: package 'DALEX' was built under R version 4.0.4
## Welcome to DALEX (version: 2.1.1).
## Find examples and detailed introduction at: http://ema.drwhy.ai/
##
## Attaching package: 'DALEX'
## The following object is masked from 'package:dplyr':
##
## explain
## Warning: package 'modelStudio' was built under R version 4.0.4
## Preparation of a new explainer is initiated
## -> model label : tidymodels
## -> data : 90 rows 13 cols
## -> data : tibble converted into a data.frame
## -> target variable : 90 values
## -> predict function : yhat.workflow will be used ( [33m default [39m )
## -> predicted values : No value for predict function target column. ( [33m default [39m )
## -> model_info : package tidymodels , ver. 0.1.2 , task classification ( [33m default [39m )
## -> predicted values : numerical, min = 0.02850621 , mean = 0.3204935 , max = 0.7899482
## -> residual function : difference between y and yhat ( [33m default [39m )
## -> residuals : numerical, min = -0.5645715 , mean = 0.05728431 , max = 0.880372
## [32m A new explainer has been created! [39m
## Warning: Setting row names on a tibble is deprecated.
## Warning in value[[3L]](cond):
## Error occurred in ingredients::ceteris_paribus (1) function: 'x' must be atomic
## Warning in value[[3L]](cond):
## Error occurred in ingredients::ceteris_paribus (2) function: 'x' must be atomic
## Warning: package 'xgboost' was built under R version 4.0.3
##
## Attaching package: 'xgboost'
## The following object is masked from 'package:dplyr':
##
## slice
## Preparation of a new explainer is initiated
## -> model label : xgboost
## -> data : 90 rows 12 cols
## -> target variable : 90 values
## -> predict function : yhat.xgb.Booster will be used ( [33m default [39m )
## -> predicted values : No value for predict function target column. ( [33m default [39m )
## -> model_info : package xgboost , ver. 1.3.2.1 , task classification ( [33m default [39m )
## -> predicted values : numerical, min = 1.291269e-06 , mean = 0.3168861 , max = 0.999975
## -> residual function : difference between y and yhat ( [33m default [39m )
## -> residuals : numerical, min = -0.9774441 , mean = 0.08311394 , max = 0.9966721
## [32m A new explainer has been created! [39m