Decision Tree

Decision tree merupakan salah satu metode yang digunakan untuk melakukan prediksi data dimana target variabel yang digunakan berupa kategorikal data. Dalam artian decision tree dapat digunakan untuk melakukan klasifikasi. Dalam melakukan klasifikasi, decision tree akan menghasilkan rules yang membantu dalam menghasilkan hasil klasifikasi.

Dalam tutorial kali ini, kita akan belajar menggunakan decision tree dalam mengklasifikasikan orang yang tergolong Health atau Not Health menggunakan dataset diabetes disease

Read Tree

diabetes<-read.csv("diabetes.csv")
str(diabetes)
## 'data.frame':    296 obs. of  9 variables:
##  $ Pregnancies             : int  6 1 8 1 0 5 3 10 2 8 ...
##  $ Glucose                 : int  148 85 183 89 137 116 78 115 197 125 ...
##  $ BloodPressure           : int  72 66 64 66 40 74 50 0 70 96 ...
##  $ SkinThickness           : int  35 29 0 23 35 0 32 0 45 0 ...
##  $ Insulin                 : int  0 0 0 94 168 0 88 0 543 0 ...
##  $ BMI                     : num  33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 0 ...
##  $ DiabetesPedigreeFunction: num  0.627 0.351 0.672 0.167 2.288 ...
##  $ Age                     : int  50 31 32 21 33 30 26 29 53 54 ...
##  $ Outcome                 : int  1 0 1 0 1 0 1 0 1 1 ...
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
diabetes <- diabetes %>% 
  mutate_if(is.integer, as.factor) %>% 
  mutate(
         target = factor(Outcome, levels = c(0,1), 
                        labels = c("Health","Not Health")))
glimpse(diabetes)
## Rows: 296
## Columns: 10
## $ Pregnancies              <fct> 6, 1, 8, 1, 0, 5, 3, 10, 2, 8, 4, 10, 10, 1, ~
## $ Glucose                  <fct> 148, 85, 183, 89, 137, 116, 78, 115, 197, 125~
## $ BloodPressure            <fct> 72, 66, 64, 66, 40, 74, 50, 0, 70, 96, 92, 74~
## $ SkinThickness            <fct> 35, 29, 0, 23, 35, 0, 32, 0, 45, 0, 0, 0, 0, ~
## $ Insulin                  <fct> 0, 0, 0, 94, 168, 0, 88, 0, 543, 0, 0, 0, 0, ~
## $ BMI                      <dbl> 33.6, 26.6, 23.3, 28.1, 43.1, 25.6, 31.0, 35.~
## $ DiabetesPedigreeFunction <dbl> 0.627, 0.351, 0.672, 0.167, 2.288, 0.201, 0.2~
## $ Age                      <fct> 50, 31, 32, 21, 33, 30, 26, 29, 53, 54, 30, 3~
## $ Outcome                  <fct> 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, ~
## $ target                   <fct> Not Health, Health, Not Health, Health, Not H~

Checking Missing Value

colSums(is.na(diabetes))
##              Pregnancies                  Glucose            BloodPressure 
##                        0                        0                        0 
##            SkinThickness                  Insulin                      BMI 
##                        0                        0                        0 
## DiabetesPedigreeFunction                      Age                  Outcome 
##                        0                        0                        0 
##                   target 
##                        0

Tidak ada missing value dari data diabetes. ### Checking Variance

library(caret)
## Warning: package 'caret' was built under R version 4.1.2
## Loading required package: ggplot2
## Loading required package: lattice
nearZeroVar(diabetes)
## integer(0)

nearZeroVar() digunakan untuk melakukan pengecekan apakah variasi dari data yang digunakan mendekati 0 atau tidak. Berdasarkan hasil output menunjukkan bahwa tidak ada data atau variabel yang memiliki variasi mendekati 0.

Split Train Test Dataset

set.seed(250)
intrain <- sample(nrow(diabetes),nrow(diabetes)*0.8)
diabetes_train <- diabetes[intrain, ]
diabetes_test <- diabetes[-intrain, ]
diabetes_train
##     Pregnancies Glucose BloodPressure SkinThickness Insulin  BMI
## 202           1     138            82             0       0 40.1
## 18            7     107            74             0       0 29.6
## 105           2      85            65             0       0 39.6
## 131           4     173            70            14     168 29.7
## 206           5     111            72            28       0 23.9
## 187           8     181            68            36     495 30.1
## 97            2      92            62            28       0 31.6
## 99            6      93            50            30      64 28.7
## 59            0     146            82             0       0 40.5
## 64            2     141            58            34     128 25.4
## 263           4      95            70            32       0 32.1
## 196           5     158            84            41     210 39.4
## 76            1       0            48            20       0 24.7
## 284           7     161            86             0       0 30.4
## 163           0     114            80            34     285 44.2
## 268           2     128            64            42       0 40.0
## 108           4     144            58            28     140 29.5
## 237           7     181            84            21     192 35.9
## 209           1      96            64            27      87 33.2
## 181           6      87            80             0       0 23.2
## 70            4     146            85            27     100 28.9
## 101           1     163            72             0       0 39.0
## 86            2     110            74            29     125 32.4
## 175           2      75            64            24      55 29.7
## 211           2      81            60            22       0 27.7
## 253           2      90            80            14      55 24.4
## 222           2     158            90             0       0 31.6
## 16            7     100             0             0       0 30.0
## 246           9     184            85            15       0 30.0
## 84            0     101            65            28       0 24.6
## 279           5     114            74             0       0 24.9
## 203           0     108            68            20       0 27.3
## 271          10     101            86            37       0 45.6
## 252           2     129            84             0       0 28.0
## 221           0     177            60            29     478 34.6
## 264           3     142            80            15       0 32.4
## 79            0     131             0             0       0 43.2
## 179           5     143            78             0       0 45.0
## 180           5     130            82             0       0 39.1
## 137           0     100            70            26      50 30.8
## 149           5     147            78             0       0 33.7
## 225           1     100            66            15      56 23.6
## 295           0     161            50             0       0 21.9
## 212           0     147            85            54       0 42.8
## 292           0     107            62            30      74 36.6
## 56            1      73            50            10       0 23.0
## 71            2     100            66            20      90 32.9
## 150           2      90            70            17       0 27.3
## 51            1     103            80            11      82 19.4
## 290           5     108            72            43      75 36.1
## 182           0     119            64            18      92 34.9
## 254           0      86            68            32       0 35.8
## 55            7     150            66            42     342 34.7
## 174           1      79            60            42      48 43.5
## 12           10     168            74             0       0 38.0
## 11            4     110            92             0       0 37.6
## 110           0      95            85            25      36 37.4
## 41            3     180            64            25      70 34.0
## 281           0     146            70             0       0 37.9
## 224           7     142            60            33     190 28.8
## 100           1     122            90            51     220 49.7
## 68            2     109            92             0       0 42.7
## 233           1      79            80            25      37 25.4
## 262           3     141             0             0       0 30.0
## 118           5      78            48             0       0 33.7
## 167           3     148            66            25       0 32.5
## 139           0     129            80             0       0 31.2
## 120           4      99            76            15      51 23.2
## 230           0     117            80            31      53 45.2
## 106           1     126            56            29     152 28.7
## 199           4     109            64            44      99 34.8
## 58            0     100            88            60     110 46.8
## 134           8      84            74            31       0 38.3
## 161           4     151            90            38       0 29.7
## 146           0     102            75            23       0  0.0
## 126           1      88            30            42      99 55.0
## 213           7     179            95            31       0 34.2
## 154           1     153            82            42     485 40.6
## 69            1      95            66            13      38 19.6
## 62            8     133            72             0       0 32.9
## 229           4     197            70            39     744 36.7
## 287           5     155            84            44     545 38.7
## 82            2      74             0             0       0  0.0
## 25           11     143            94            33     146 36.6
## 122           6     111            64            39       0 34.2
## 159           2      88            74            19      53 29.0
## 198           3     107            62            13      48 22.9
## 183           1       0            74            20      23 27.7
## 13           10     139            80             0       0 27.1
## 245           2     146            76            35     194 38.2
## 259           1     193            50            16     375 25.9
## 185           4     141            74             0       0 27.6
## 83            7      83            78            26      71 29.3
## 234           4     122            68             0       0 35.0
## 204           2      99            70            16      44 20.4
## 141           3     128            78             0       0 21.1
## 14            1     189            60            23     846 30.1
## 93            7      81            78            40      48 46.7
## 218           6     125            68            30     120 30.0
## 26           10     125            70            26     115 31.1
## 148           2     106            64            35     119 30.5
## 210           7     184            84            33       0 35.5
## 73           13     126            90             0       0 43.4
## 267           0     138             0             0       0 36.3
## 27            7     147            76             0       0 39.4
## 111           3     171            72            33     135 33.3
## 286           7     136            74            26     135 26.0
## 158           1     109            56            21     135 25.2
## 147           9      57            80            37       0 32.8
## 189           8     109            76            39     114 27.9
## 102           1     151            60             0       0 26.1
## 22            8      99            84             0       0 35.4
## 236           4     171            72             0       0 43.6
## 127           3     120            70            30     135 42.9
## 31            5     109            75            26       0 36.0
## 63            5      44            62             0       0 25.0
## 112           8     155            62            26     495 34.0
## 272           2     108            62            32      56 25.2
## 33            3      88            58            11      54 24.8
## 65            7     114            66             0       0 32.8
## 129           1     117            88            24     145 34.5
## 88            2     100            68            25      71 38.5
## 143           2     108            52            26      63 32.5
## 74            4     129            86            20     270 35.1
## 44            9     171           110            24     240 45.4
## 94            4     134            72             0       0 23.8
## 104           1      81            72            18      40 26.6
## 156           7     152            88            44       0 50.0
## 30            5     117            92             0       0 34.1
## 171           6     102            82             0       0 30.8
## 6             5     116            74             0       0 25.6
## 241           1      91            64            24       0 29.2
## 61            2      84             0             0       0  0.0
## 153           9     156            86            28     155 34.3
## 157           2      99            52            15      94 24.6
## 128           1     118            58            36      94 33.3
## 208           5     162           104             0       0 37.7
## 193           7     159            66             0       0 30.4
## 155           8     188            78             0       0 47.9
## 151           1     136            74            50     204 37.4
## 107           1      96           122             0       0 22.4
## 278           0     104            64            23     116 27.8
## 29           13     145            82            19     110 22.2
## 169           4     110            66             0       0 31.9
## 23            7     196            90             0       0 39.8
## 170           3     111            90            12      78 28.4
## 194          11     135             0             0       0 52.3
## 255          12      92            62             7     258 27.6
## 168           4     120            68             0       0 29.6
## 75            1      79            75            30       0 32.0
## 124           5     132            80             0       0 26.8
## 36            4     103            60            33     192 24.0
## 121           0     162            76            56     100 53.2
## 244           6     119            50            22     176 27.1
## 67            0     109            88            30       0 32.5
## 231           4     142            86             0       0 44.0
## 145           4     154            62            31     284 32.8
## 5             0     137            40            35     168 43.1
## 130           0     105            84             0       0 27.9
## 165           0     131            88             0       0 31.6
## 91            1      80            55             0       0 19.1
## 260          11     155            76            28     150 33.3
## 34            6      92            92             0       0 19.9
## 3             8     183            64             0       0 23.3
## 92            4     123            80            15     176 32.0
## 1             6     148            72            35       0 33.6
## 192           9     123            70            44      94 33.1
## 160          17     163            72            41     114 40.9
## 289           4      96            56            17      49 20.8
## 47            1     146            56             0       0 29.7
## 266           5      96            74            18      67 33.6
## 9             2     197            70            45     543 30.5
## 54            8     176            90            34     300 33.7
## 109           3      83            58            31      18 34.3
## 39            2      90            68            42       0 38.2
## 258           2     114            68            22       0 28.7
## 89           15     136            70            32     110 37.1
## 269           0     102            52             0       0 25.1
## 85            5     137           108             0       0 48.8
## 242           4      91            70            32      88 33.1
## 66            5      99            74            27       0 29.0
## 52            1     101            50            15      36 24.2
## 288           1     119            86            39     220 45.6
## 219           5      85            74            22       0 29.0
## 283           7     133            88            15     155 32.4
## 214           0     140            65            26     130 42.6
## 35           10     122            78            31       0 27.6
## 257           3     111            56            39       0 30.1
## 140           5     105            72            29     325 36.9
## 138           0      93            60            25      92 28.7
## 103           0     125            96             0       0 22.5
## 215           9     112            82            32     175 34.2
## 277           7     106            60            24       0 26.5
## 256           1     113            64            35       0 33.6
## 87           13     106            72            54       0 36.6
## 195           8      85            55            20       0 24.4
## 95            2     142            82            18      64 24.7
## 207           8     196            76            29     280 37.5
## 136           2     125            60            20     140 33.8
## 96            6     144            72            27     228 33.9
## 191           3     111            62             0       0 22.6
## 123           2     107            74            30     100 33.6
## 173           2      87             0            23       0 28.9
## 280           2     108            62            10     278 25.3
## 235           3      74            68            28      45 29.7
## 240           0     104            76             0       0 18.4
## 50            7     105             0             0       0  0.0
## 57            7     187            68            39     304 37.7
## 205           6     103            72            32     190 37.7
## 172           6     134            70            23     130 35.4
## 114           4      76            62             0       0 34.0
## 19            1     103            30            38      83 43.3
## 282          10     129            76            28     122 35.9
## 188           1     128            98            41      58 32.0
## 265           4     123            62             0       0 32.0
## 178           0     129           110            46     130 67.1
## 250           1     111            86            19       0 30.1
## 239           9     164            84            21       0 30.8
## 166           6     104            74            18     156 29.9
## 4             1      89            66            23      94 28.1
## 274           1      71            78            50      45 33.2
## 261           3     191            68            15     130 30.9
## 40            4     111            72            47     207 37.1
## 216          12     151            70            40     271 41.8
## 249           9     124            70            33     402 35.4
## 17            0     118            84            47     230 45.8
## 248           0     165            90            33     680 52.3
## 32            3     158            76            36     245 31.6
## 42            7     133            84             0       0 40.2
## 15            5     166            72            19     175 25.8
## 45            7     159            64             0       0 27.4
## 113           1      89            76            34      37 31.2
## 291           0      78            88            29      40 36.9
## 217           5     109            62            41     129 35.8
## 90            1     107            68            19       0 26.5
## 78            5      95            72            33       0 37.7
##     DiabetesPedigreeFunction Age Outcome     target
## 202                    0.236  28       0     Health
## 18                     0.254  31       1 Not Health
## 105                    0.930  27       0     Health
## 131                    0.361  33       1 Not Health
## 206                    0.407  27       0     Health
## 187                    0.615  60       1 Not Health
## 97                     0.130  24       0     Health
## 99                     0.356  23       0     Health
## 59                     1.781  44       0     Health
## 64                     0.699  24       0     Health
## 263                    0.612  24       0     Health
## 196                    0.395  29       1 Not Health
## 76                     0.140  22       0     Health
## 284                    0.165  47       1 Not Health
## 163                    0.167  27       0     Health
## 268                    1.101  24       0     Health
## 108                    0.287  37       0     Health
## 237                    0.586  51       1 Not Health
## 209                    0.289  21       0     Health
## 181                    0.084  32       0     Health
## 70                     0.189  27       0     Health
## 101                    1.222  33       1 Not Health
## 86                     0.698  27       0     Health
## 175                    0.370  33       0     Health
## 211                    0.290  25       0     Health
## 253                    0.249  24       0     Health
## 222                    0.805  66       1 Not Health
## 16                     0.484  32       1 Not Health
## 246                    1.213  49       1 Not Health
## 84                     0.237  22       0     Health
## 279                    0.744  57       0     Health
## 203                    0.787  32       0     Health
## 271                    1.136  38       1 Not Health
## 252                    0.284  27       0     Health
## 221                    1.072  21       1 Not Health
## 264                    0.200  63       0     Health
## 79                     0.270  26       1 Not Health
## 179                    0.190  47       0     Health
## 180                    0.956  37       1 Not Health
## 137                    0.597  21       0     Health
## 149                    0.218  65       0     Health
## 225                    0.666  26       0     Health
## 295                    0.254  65       0     Health
## 212                    0.375  24       0     Health
## 292                    0.757  25       1 Not Health
## 56                     0.248  21       0     Health
## 71                     0.867  28       1 Not Health
## 150                    0.085  22       0     Health
## 51                     0.491  22       0     Health
## 290                    0.263  33       0     Health
## 182                    0.725  23       0     Health
## 254                    0.238  25       0     Health
## 55                     0.718  42       0     Health
## 174                    0.678  23       0     Health
## 12                     0.537  34       1 Not Health
## 11                     0.191  30       0     Health
## 110                    0.247  24       1 Not Health
## 41                     0.271  26       0     Health
## 281                    0.334  28       1 Not Health
## 224                    0.687  61       0     Health
## 100                    0.325  31       1 Not Health
## 68                     0.845  54       0     Health
## 233                    0.583  22       0     Health
## 262                    0.761  27       1 Not Health
## 118                    0.654  25       0     Health
## 167                    0.256  22       0     Health
## 139                    0.703  29       0     Health
## 120                    0.223  21       0     Health
## 230                    0.089  24       0     Health
## 106                    0.801  21       0     Health
## 199                    0.905  26       1 Not Health
## 58                     0.962  31       0     Health
## 134                    0.457  39       0     Health
## 161                    0.294  36       0     Health
## 146                    0.572  21       0     Health
## 126                    0.496  26       1 Not Health
## 213                    0.164  60       0     Health
## 154                    0.687  23       0     Health
## 69                     0.334  25       0     Health
## 62                     0.270  39       1 Not Health
## 229                    2.329  31       0     Health
## 287                    0.619  34       0     Health
## 82                     0.102  22       0     Health
## 25                     0.254  51       1 Not Health
## 122                    0.260  24       0     Health
## 159                    0.229  22       0     Health
## 198                    0.678  23       1 Not Health
## 183                    0.299  21       0     Health
## 13                     1.441  57       0     Health
## 245                    0.329  29       0     Health
## 259                    0.655  24       0     Health
## 185                    0.244  40       0     Health
## 83                     0.767  36       0     Health
## 234                    0.394  29       0     Health
## 204                    0.235  27       0     Health
## 141                    0.268  55       0     Health
## 14                     0.398  59       1 Not Health
## 93                     0.261  42       0     Health
## 218                    0.464  32       0     Health
## 26                     0.205  41       1 Not Health
## 148                    1.400  34       0     Health
## 210                    0.355  41       1 Not Health
## 73                     0.583  42       1 Not Health
## 267                    0.933  25       1 Not Health
## 27                     0.257  43       1 Not Health
## 111                    0.199  24       1 Not Health
## 286                    0.647  51       0     Health
## 158                    0.833  23       0     Health
## 147                    0.096  41       0     Health
## 189                    0.640  31       1 Not Health
## 102                    0.179  22       0     Health
## 22                     0.388  50       0     Health
## 236                    0.479  26       1 Not Health
## 127                    0.452  30       0     Health
## 31                     0.546  60       0     Health
## 63                     0.587  36       0     Health
## 112                    0.543  46       1 Not Health
## 272                    0.128  21       0     Health
## 33                     0.267  22       0     Health
## 65                     0.258  42       1 Not Health
## 129                    0.403  40       1 Not Health
## 88                     0.324  26       0     Health
## 143                    0.318  22       0     Health
## 74                     0.231  23       0     Health
## 44                     0.721  54       1 Not Health
## 94                     0.277  60       1 Not Health
## 104                    0.283  24       0     Health
## 156                    0.337  36       1 Not Health
## 30                     0.337  38       0     Health
## 171                    0.180  36       1 Not Health
## 6                      0.201  30       0     Health
## 241                    0.192  21       0     Health
## 61                     0.304  21       0     Health
## 153                    1.189  42       1 Not Health
## 157                    0.637  21       0     Health
## 128                    0.261  23       0     Health
## 208                    0.151  52       1 Not Health
## 193                    0.383  36       1 Not Health
## 155                    0.137  43       1 Not Health
## 151                    0.399  24       0     Health
## 107                    0.207  27       0     Health
## 278                    0.454  23       0     Health
## 29                     0.245  57       0     Health
## 169                    0.471  29       0     Health
## 23                     0.451  41       1 Not Health
## 170                    0.495  29       0     Health
## 194                    0.578  40       1 Not Health
## 255                    0.926  44       1 Not Health
## 168                    0.709  34       0     Health
## 75                     0.396  22       0     Health
## 124                    0.186  69       0     Health
## 36                     0.966  33       0     Health
## 121                    0.759  25       1 Not Health
## 244                    1.318  33       1 Not Health
## 67                     0.855  38       1 Not Health
## 231                    0.645  22       1 Not Health
## 145                    0.237  23       0     Health
## 5                      2.288  33       1 Not Health
## 130                    0.741  62       1 Not Health
## 165                    0.743  32       1 Not Health
## 91                     0.258  21       0     Health
## 260                    1.353  51       1 Not Health
## 34                     0.188  28       0     Health
## 3                      0.672  32       1 Not Health
## 92                     0.443  34       0     Health
## 1                      0.627  50       1 Not Health
## 192                    0.374  40       0     Health
## 160                    0.817  47       1 Not Health
## 289                    0.340  26       0     Health
## 47                     0.564  29       0     Health
## 266                    0.997  43       0     Health
## 9                      0.158  53       1 Not Health
## 54                     0.467  58       1 Not Health
## 109                    0.336  25       0     Health
## 39                     0.503  27       1 Not Health
## 258                    0.092  25       0     Health
## 89                     0.153  43       1 Not Health
## 269                    0.078  21       0     Health
## 85                     0.227  37       1 Not Health
## 242                    0.446  22       0     Health
## 66                     0.203  32       0     Health
## 52                     0.526  26       0     Health
## 288                    0.808  29       1 Not Health
## 219                    1.224  32       1 Not Health
## 283                    0.262  37       0     Health
## 214                    0.431  24       1 Not Health
## 35                     0.512  45       0     Health
## 257                    0.557  30       0     Health
## 140                    0.159  28       0     Health
## 138                    0.532  22       0     Health
## 103                    0.262  21       0     Health
## 215                    0.260  36       1 Not Health
## 277                    0.296  29       1 Not Health
## 256                    0.543  21       1 Not Health
## 87                     0.178  45       0     Health
## 195                    0.136  42       0     Health
## 95                     0.761  21       0     Health
## 207                    0.605  57       1 Not Health
## 136                    0.088  31       0     Health
## 96                     0.255  40       0     Health
## 191                    0.142  21       0     Health
## 123                    0.404  23       0     Health
## 173                    0.773  25       0     Health
## 280                    0.881  22       0     Health
## 235                    0.293  23       0     Health
## 240                    0.582  27       0     Health
## 50                     0.305  24       0     Health
## 57                     0.254  41       1 Not Health
## 205                    0.324  55       0     Health
## 172                    0.542  29       1 Not Health
## 114                    0.391  25       0     Health
## 19                     0.183  33       0     Health
## 282                    0.280  39       0     Health
## 188                    1.321  33       1 Not Health
## 265                    0.226  35       1 Not Health
## 178                    0.319  26       1 Not Health
## 250                    0.143  23       0     Health
## 239                    0.831  32       1 Not Health
## 166                    0.722  41       1 Not Health
## 4                      0.167  21       0     Health
## 274                    0.422  21       0     Health
## 261                    0.299  34       0     Health
## 40                     1.390  56       1 Not Health
## 216                    0.742  38       1 Not Health
## 249                    0.282  34       0     Health
## 17                     0.551  31       1 Not Health
## 248                    0.427  23       0     Health
## 32                     0.851  28       1 Not Health
## 42                     0.696  37       0     Health
## 15                     0.587  51       1 Not Health
## 45                     0.294  40       0     Health
## 113                    0.192  23       0     Health
## 291                    0.434  21       0     Health
## 217                    0.514  25       1 Not Health
## 90                     0.165  24       0     Health
## 78                     0.370  27       0     Health

Cek proporsi untuk masing-masing split data sudah memiliki proporsi klasifikasi yang seimbang atau belum.

prop.table(table(diabetes$target))
## 
##     Health Not Health 
##  0.6216216  0.3783784
prop.table(table(diabetes_train$target))
## 
##     Health Not Health 
##  0.6398305  0.3601695
prop.table(table(diabetes_test$target))
## 
##     Health Not Health 
##       0.55       0.45

Dari proporsi diatas, untuk test data memiliki proporsi yang cukup sama dari data awal dan data train.

Build Model

library(partykit)
## Warning: package 'partykit' was built under R version 4.1.2
## Loading required package: grid
## Loading required package: libcoin
## Warning: package 'libcoin' was built under R version 4.1.2
## Loading required package: mvtnorm
diabetes_tree <- ctree(target~., data = diabetes_train)
diabetes_tree
## 
## Model formula:
## target ~ Pregnancies + Glucose + BloodPressure + SkinThickness + 
##     Insulin + BMI + DiabetesPedigreeFunction + Age + Outcome
## 
## Fitted party:
## [1] root
## |   [2] Outcome in 0: Health (n = 151, err = 0.0%)
## |   [3] Outcome in 1: Not Health (n = 85, err = 0.0%)
## 
## Number of inner nodes:    1
## Number of terminal nodes: 2
plot(diabetes_tree, type = "simple")

Berdasarkan hasil model decision tree yang tebentuk, root node adalah variabel Outcome. inner node yang terbentuk sebanyak 2. Ketika nilai Outcome = 0, sebanyak 151 observasi diklasifikasikan sebagai Health dengan nilai error sebesar 0%. Sedangkan ketika nilai Outcome = 1 sebanyak 85 observasi akan diklasifikasikan sebagai Not Health dengan error sebesar 0%.

Model Evaluation

Sebelum melakukan prediksi terhadap data test, kita lihat dulu evaluasi model ketika digunakan pada data train. Hal ini untuk mengecek apakah model yang dibuat overfit atau underfit.

pred_diabetes_train <- predict(diabetes_tree, diabetes_train)
confusionMatrix(pred_diabetes_train, diabetes_train$target, positive = "Not Health")
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   Health Not Health
##   Health        151          0
##   Not Health      0         85
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9845, 1)
##     No Information Rate : 0.6398     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.0000     
##             Specificity : 1.0000     
##          Pos Pred Value : 1.0000     
##          Neg Pred Value : 1.0000     
##              Prevalence : 0.3602     
##          Detection Rate : 0.3602     
##    Detection Prevalence : 0.3602     
##       Balanced Accuracy : 1.0000     
##                                      
##        'Positive' Class : Not Health 
## 

Dari hasil evaluasi model, kita akan gunakan nilai recall karena kita tidak ingin salah tebak orang yang sebenarnya Not Health kita prediksi Health. Nilai recall yang diperoleh sebesar 40.3%. tidak terlalu tinggi. Selanjutnya coba menggunakan data test.

pred_diabetes_test <- predict(diabetes_tree, diabetes_test)
confusionMatrix(pred_diabetes_test, diabetes_test$target, positive = "Not Health")
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   Health Not Health
##   Health         33          0
##   Not Health      0         27
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9404, 1)
##     No Information Rate : 0.55       
##     P-Value [Acc > NIR] : 2.641e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.00       
##             Specificity : 1.00       
##          Pos Pred Value : 1.00       
##          Neg Pred Value : 1.00       
##              Prevalence : 0.45       
##          Detection Rate : 0.45       
##    Detection Prevalence : 0.45       
##       Balanced Accuracy : 1.00       
##                                      
##        'Positive' Class : Not Health 
## 

Saat model diinputkan untuk memprediksi data test, nilai sensitivitynya menjadi 36.7%. Untuk melakukan improve model, kita bisa melakukan resampling agar klasifikasi untuktarget variabel menjadi lebih stabil.

set.seed(250)
intrain <- sample(nrow(diabetes), nrow(diabetes)*0.8)
re_train <- diabetes[intrain,]
re_train <- upSample(re_train[,-9], re_train[,9], yname = "Outcome" )
re_test <- diabetes[-intrain,]

Selanjutnya kita cek proporsi dari data train dan test.

prop.table(table(re_train$target))
## 
##     Health Not Health 
##        0.5        0.5
prop.table(table(re_test$target))
## 
##     Health Not Health 
##       0.55       0.45

Setelah itu, kita buat model lagi dengan data train yang ada.

diabetes_tree_new <- ctree(target~., re_train)
diabetes_tree_new
## 
## Model formula:
## target ~ Pregnancies + Glucose + BloodPressure + SkinThickness + 
##     Insulin + BMI + DiabetesPedigreeFunction + Age + Outcome
## 
## Fitted party:
## [1] root
## |   [2] Outcome in 0: Health (n = 151, err = 0.0%)
## |   [3] Outcome in 1: Not Health (n = 151, err = 0.0%)
## 
## Number of inner nodes:    1
## Number of terminal nodes: 2
plot(diabetes_tree_new, type = "simple")

Berdasarkan plot model decision tree yang baru, root node dimulai dari variael Outcome dan memiliki internal nodes sebanyak 1 dan menghasilkan terminal nodes sebanyak 2.

pred_train <- predict(diabetes_tree_new, re_train)
confusionMatrix(pred_train, re_train$target)
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   Health Not Health
##   Health        151          0
##   Not Health      0        151
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9879, 1)
##     No Information Rate : 0.5        
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.0        
##             Specificity : 1.0        
##          Pos Pred Value : 1.0        
##          Neg Pred Value : 1.0        
##              Prevalence : 0.5        
##          Detection Rate : 0.5        
##    Detection Prevalence : 0.5        
##       Balanced Accuracy : 1.0        
##                                      
##        'Positive' Class : Health     
## 

Hasil recall yang diperoleh yaitu sebesar 100% dan nilai accuracy nya juga 100%.

pred_test <- predict(diabetes_tree_new, re_test)
confusionMatrix(pred_test, re_test$target)
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   Health Not Health
##   Health         33          0
##   Not Health      0         27
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9404, 1)
##     No Information Rate : 0.55       
##     P-Value [Acc > NIR] : 2.641e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.00       
##             Specificity : 1.00       
##          Pos Pred Value : 1.00       
##          Neg Pred Value : 1.00       
##              Prevalence : 0.55       
##          Detection Rate : 0.55       
##    Detection Prevalence : 0.55       
##       Balanced Accuracy : 1.00       
##                                      
##        'Positive' Class : Health     
## 

Dari proses resampling yang telah dilakukan ternyata mampu mengimprove model agar menghasilkan recall yang lebih baik. Artinya model decision tree yang dibuat sudah mampu digunakan untuk mengklasifikasikan target variabel yang diinginkan.

Kesimpulan

Kesimpulan yang dapat diambil yaitu :

Daftar Pustaka :

  1. SA’DIYAH, N. K. (2017). APLIKASI DATA MINING UNTUK DETEKSI DINI RESIKO PENYAKIT diabetes MENGGUNAKAN METODE DECISION TREE C4. 5 (Doctoral dissertation, Universitas Muhammadiyah Gresik).
  2. https://rpubs.com/inayatus/decision-tree
  3. https://www.kaggle.com/fedesoriano/diabetes-prediction-dataset