Gráficos con funModeling - part 3

Análisis exploratorio de los datos

Instalación del paquete funModeling

> install.packages("funModeling")

Carga del paquete

> library(funModeling)

Análisis de valores perdidos

> df_status(heart_disease)
                 variable q_zeros p_zeros q_na p_na q_inf p_inf    type
1                     age       0    0.00    0 0.00     0     0 integer
2                  gender       0    0.00    0 0.00     0     0  factor
3              chest_pain       0    0.00    0 0.00     0     0  factor
4  resting_blood_pressure       0    0.00    0 0.00     0     0 integer
5       serum_cholestoral       0    0.00    0 0.00     0     0 integer
6     fasting_blood_sugar     258   85.15    0 0.00     0     0  factor
7         resting_electro     151   49.83    0 0.00     0     0  factor
8          max_heart_rate       0    0.00    0 0.00     0     0 integer
9             exer_angina     204   67.33    0 0.00     0     0 integer
10                oldpeak      99   32.67    0 0.00     0     0 numeric
11                  slope       0    0.00    0 0.00     0     0 integer
12      num_vessels_flour     176   58.09    4 1.32     0     0 integer
13                   thal       0    0.00    2 0.66     0     0  factor
14 heart_disease_severity     164   54.13    0 0.00     0     0 integer
15           exter_angina     204   67.33    0 0.00     0     0  factor
16      has_heart_disease       0    0.00    0 0.00     0     0  factor
   unique
1      41
2       2
3       4
4      50
5     152
6       2
7       3
8      91
9       2
10     40
11      3
12      4
13      3
14      5
15      2
16      2

Trazado de distribuciones para variables númericas

> plot_num(heart_disease)

Resumen estadístico para variables númericas

> profiling_num(heart_disease)
                variable   mean std_dev variation_coef p_01 p_05 p_25
1                    age  54.44    9.04           0.17   35   40   48
2 resting_blood_pressure 131.69   17.60           0.13  100  108  120
3      serum_cholestoral 246.69   51.78           0.21  149  175  211
4         max_heart_rate 149.61   22.88           0.15   95  108  134
5            exer_angina   0.33    0.47           1.44    0    0    0
6                oldpeak   1.04    1.16           1.12    0    0    0
7                  slope   1.60    0.62           0.38    1    1    1
8      num_vessels_flour   0.67    0.94           1.39    0    0    0
9 heart_disease_severity   0.94    1.23           1.31    0    0    0
   p_50  p_75  p_95  p_99 skewness kurtosis  iqr        range_98
1  56.0  61.0  68.0  71.0    -0.21      2.5 13.0        [35, 71]
2 130.0 140.0 160.0 180.0     0.70      3.8 20.0      [100, 180]
3 241.0 275.0 326.9 406.7     1.13      7.4 64.0   [149, 406.74]
4 153.0 166.0 181.9 192.0    -0.53      2.9 32.5 [95.02, 191.96]
5   0.0   1.0   1.0   1.0     0.74      1.5  1.0          [0, 1]
6   0.8   1.6   3.4   4.2     1.26      4.5  1.6        [0, 4.2]
7   2.0   2.0   3.0   3.0     0.51      2.4  1.0          [1, 3]
8   0.0   1.0   3.0   3.0     1.18      3.2  1.0          [0, 3]
9   0.0   2.0   3.0   4.0     1.05      2.8  2.0          [0, 4]
        range_80
1       [42, 66]
2     [110, 152]
3 [188.8, 308.8]
4   [116, 176.6]
5         [0, 1]
6       [0, 2.8]
7         [1, 2]
8         [0, 2]
9         [0, 3]

Gráfico de barras

> library(dplyr)
> heart_disease2 <- heart_disease %>%
+   select(chest_pain, thal)
> freq(heart_disease2)

  chest_pain frequency percentage cumulative_perc
1          4       144      47.52           47.52
2          3        86      28.38           75.90
3          2        50      16.50           92.40
4          1        23       7.59          100.00

  thal frequency percentage cumulative_perc
1    3       166      54.79           54.79
2    7       117      38.61           93.40
3    6        18       5.94           99.34
4 <NA>         2       0.66          100.00
[1] "Variables processed: chest_pain, thal"

Correlación

> correlation_table(heart_disease, "has_heart_disease")
                Variable has_heart_disease
1      has_heart_disease              1.00
2 heart_disease_severity              0.83
3      num_vessels_flour              0.46
4                oldpeak              0.42
5                  slope              0.34
6                    age              0.23
7 resting_blood_pressure              0.15
8      serum_cholestoral              0.08
9         max_heart_rate             -0.42

Gráfico entre la variable de entrada y la variable objetivo

> cross_plot(data = heart_disease, input = c("age", "oldpeak"),
+            target = "has_heart_disease")
[1] "Plotting transformed variable 'age' with 'equal_freq', (too many values). Disable with 'auto_binning=FALSE'"

[1] "Plotting transformed variable 'oldpeak' with 'equal_freq', (too many values). Disable with 'auto_binning=FALSE'"

Gráfico de cajas

> plotar(data = heart_disease, input = c("age", "oldpeak"),
+        target = "has_heart_disease", plot_type = "boxplot")

Histogramas de densidad

> plotar(data = heart_disease, input = c("age", "oldpeak"),
+        target = "has_heart_disease", plot_type = "histdens")

Palomino Morales Edwin

25/01/2018