Anil Akyildirim
9/17/2019
## Installing package into 'C:/Users/Anil Akyildirim/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'entropy' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\Anil Akyildirim\AppData\Local\Temp\RtmpkPYmz8\downloaded_packages
## Installing package into 'C:/Users/Anil Akyildirim/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'frequency' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\Anil Akyildirim\AppData\Local\Temp\RtmpkPYmz8\downloaded_packages
## Installing package into 'C:/Users/Anil Akyildirim/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'plyr' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'plyr'
## Warning in file.copy(savedcopy, lib, recursive = TRUE):
## problem copying C:\Users\Anil Akyildirim\Documents\R\win-
## library\3.6\00LOCK\plyr\libs\x64\plyr.dll to C:\Users\Anil
## Akyildirim\Documents\R\win-library\3.6\plyr\libs\x64\plyr.dll: Permission
## denied
## Warning: restored 'plyr'
##
## The downloaded binary packages are in
## C:\Users\Anil Akyildirim\AppData\Local\Temp\RtmpkPYmz8\downloaded_packages
## Installing package into 'C:/Users/Anil Akyildirim/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'ggplot2' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\Anil Akyildirim\AppData\Local\Temp\RtmpkPYmz8\downloaded_packages
## Loading required package: rmarkdown
## Loading required package: knitr
## Loading required package: DT
## Loading required package: ggplot2
Imagine we have an histroical medical data of patients that suffered the same illness. In the process of their treatment, each patient were given different medication and each of them responded to the treatment differently. The Business Problem in question is “Without wasting any medication and time, can we predict the right medication for this same ilness for a future patient?”
What kind of a predictive model should we choose?
What attributes should we choose from the dataset?
How should we segment the data set for our predictive model?
What is an Entropy? and How do we calculate it?
What is the information gain and how do we calculate it?
What are the Steps to choose the right attributes?
## Age Sex BP Cholesterol Na_to_K Drug
## 1 23 F HIGH HIGH 25.355 drugY
## 2 47 M LOW HIGH 13.093 drugC
## 3 47 M LOW HIGH 10.114 drugC
## 4 28 F NORMAL HIGH 7.798 drugX
## 5 61 F LOW HIGH 18.043 drugY
## 6 22 F NORMAL HIGH 8.607 drugX
## Age Sex BP Cholesterol Na_to_K
## Min. :15.00 F: 96 HIGH :77 HIGH :103 Min. : 6.269
## 1st Qu.:31.00 M:104 LOW :64 NORMAL: 97 1st Qu.:10.445
## Median :45.00 NORMAL:59 Median :13.937
## Mean :44.31 Mean :16.084
## 3rd Qu.:58.00 3rd Qu.:19.380
## Max. :74.00 Max. :38.247
## Drug
## drugA:23
## drugB:16
## drugC:16
## drugX:54
## drugY:91
##
## 'data.frame': 200 obs. of 6 variables:
## $ Age : int 23 47 47 28 61 22 49 41 60 43 ...
## $ Sex : Factor w/ 2 levels "F","M": 1 2 2 1 1 1 1 2 2 2 ...
## $ BP : Factor w/ 3 levels "HIGH","LOW","NORMAL": 1 2 2 3 2 3 3 2 3 2 ...
## $ Cholesterol: Factor w/ 2 levels "HIGH","NORMAL": 1 1 1 1 1 1 1 1 1 2 ...
## $ Na_to_K : num 25.4 13.1 10.1 7.8 18 ...
## $ Drug : Factor w/ 5 levels "drugA","drugB",..: 5 3 3 4 5 4 5 3 5 5 ...
## [1] 0
## [1] 0
## [1] 0
## [1] 0
## [1] 0
## [1] 0
\[entropy = -p_{1}log(p_{1}) - p_{2}log(p_{2})....\]
## Building tables
##
|
| | 0%
|
|=================================================================| 100%
## $`x:`
## x label Freq Percent Valid Percent Cumulative Percent
## 2 Valid F 96 48.0 48.0 48.0
## 3 M 104 52.0 52.0 100.0
## 31 Total 200 100.0 100.0
## 1 Missing <blank> 0 0.0
## 4 <NA> 0 0.0
## 6 Total 200 100.0
## Building tables
##
|
| | 0%
|
|=================================================================| 100%
## $`x:`
## x label Freq Percent Valid Percent Cumulative Percent
## 2 Valid HIGH 77 38.5 38.5 38.5
## 3 LOW 64 32.0 32.0 70.5
## 4 NORMAL 59 29.5 29.5 100.0
## 41 Total 200 100.0 100.0
## 1 Missing <blank> 0 0.0
## 5 <NA> 0 0.0
## 7 Total 200 100.0
## Building tables
##
|
| | 0%
|
|=================================================================| 100%
## $`x:`
## x label Freq Percent Valid Percent Cumulative Percent
## 2 Valid HIGH 103 51.5 51.5 51.5
## 3 NORMAL 97 48.5 48.5 100.0
## 31 Total 200 100.0 100.0
## 1 Missing <blank> 0 0.0
## 4 <NA> 0 0.0
## 6 Total 200 100.0
## Building tables
##
|
| | 0%
|
|=================================================================| 100%
## $`x:`
## x label Freq Percent Valid Percent Cumulative Percent
## 2 Valid drugA 23 11.5 11.5 11.5
## 3 drugB 16 8.0 8.0 19.5
## 4 drugC 16 8.0 8.0 27.5
## 5 drugX 54 27.0 27.0 54.5
## 6 drugY 91 45.5 45.5 100.0
## 61 Total 200 100.0 100.0
## 1 Missing <blank> 0 0.0
## 7 <NA> 0 0.0
## 9 Total 200 100.0
96 out of 200 of the patients are Female. P(Female)=0.48
104 out of 200 of the pattients are Male. P(Male)=0.52
77 out of 200 of the patients have high Blood Pressure. P(High-BP)=0.385
64 out of 200 of the patients have low Blood Pressure. P(Low-BP)=0.32
59 out of 200 of the patients have Normal Blood Pressure. P(Normal-BP)=0.295
103 out of 200 patients have High Cholesterol. P(High-Chol)=0.515
97 out of 200 patients have Normal Cholesterol. P(Normal-Chol)=0.485
23 out of 200 patients used Drug A for Treatment. P(Drug-A)=0.115
16 out of 200 patients used Drug B for treatment. P(Drug-B)=0.08
16 out of 200 patients used Drug C for treatment. P(Drug-C)=0.08
54 out of 200 patients used Drug X for treatment. P(Drug-X)=0.27
91 out of 200 patients used Drug Y for treatment. P(Drug-Y)=0.455
## [1] 1.364655
## Age Sex BP Cholesterol Na_to_K Drug
## 1 23 F HIGH HIGH 25.355 drugY
## 4 28 F NORMAL HIGH 7.798 drugX
## 5 61 F LOW HIGH 18.043 drugY
## 6 22 F NORMAL HIGH 8.607 drugX
## 7 49 F NORMAL HIGH 16.275 drugY
## 11 47 F LOW HIGH 11.767 drugC
## 12 34 F HIGH NORMAL 19.199 drugY
## 14 74 F LOW HIGH 20.942 drugY
## 15 50 F NORMAL HIGH 12.703 drugX
## 16 16 F HIGH NORMAL 15.516 drugY
## 20 32 F HIGH NORMAL 25.974 drugY
## 24 48 F LOW HIGH 15.036 drugY
## 25 33 F LOW HIGH 33.486 drugY
## 26 28 F HIGH NORMAL 18.809 drugY
## 28 49 F NORMAL NORMAL 9.381 drugX
## 29 39 F LOW NORMAL 22.697 drugY
## 31 18 F NORMAL NORMAL 8.750 drugX
## 34 65 F HIGH NORMAL 31.876 drugY
## 39 39 F NORMAL NORMAL 9.709 drugX
## 41 73 F NORMAL HIGH 19.221 drugY
## 42 58 F HIGH NORMAL 14.239 drugB
## 45 50 F NORMAL NORMAL 12.295 drugX
## 46 66 F NORMAL NORMAL 8.107 drugX
## 47 37 F HIGH HIGH 13.091 drugA
## 50 28 F LOW HIGH 19.796 drugY
## 51 58 F HIGH HIGH 19.416 drugY
## 54 24 F HIGH NORMAL 18.457 drugY
## 55 68 F HIGH NORMAL 10.189 drugB
## 56 26 F LOW HIGH 14.160 drugC
## 61 38 F LOW NORMAL 29.875 drugY
## 65 60 F HIGH HIGH 13.303 drugB
## 66 68 F NORMAL NORMAL 27.050 drugY
## 70 18 F HIGH NORMAL 24.276 drugY
## 72 28 F NORMAL HIGH 19.675 drugY
## 73 24 F NORMAL HIGH 10.605 drugX
## 74 41 F NORMAL NORMAL 22.905 drugY
## 77 36 F HIGH HIGH 11.198 drugA
## 78 26 F HIGH NORMAL 19.161 drugY
## 79 19 F HIGH HIGH 13.313 drugA
## 80 32 F LOW NORMAL 10.840 drugX
## 83 32 F LOW HIGH 9.712 drugC
## 84 38 F HIGH NORMAL 11.326 drugA
## 85 47 F LOW HIGH 10.067 drugC
## 87 51 F NORMAL HIGH 13.597 drugX
## 89 37 F HIGH NORMAL 23.091 drugY
## 90 50 F NORMAL NORMAL 17.211 drugY
## 93 29 F HIGH HIGH 29.450 drugY
## 94 42 F LOW NORMAL 29.271 drugY
## 97 58 F LOW HIGH 38.247 drugY
## 98 56 F HIGH HIGH 25.395 drugY
## 100 15 F HIGH NORMAL 16.725 drugY
## 102 45 F HIGH HIGH 12.854 drugA
## 103 28 F LOW HIGH 13.127 drugC
## 112 47 F NORMAL NORMAL 6.683 drugX
## 114 65 F LOW NORMAL 13.769 drugX
## 115 20 F NORMAL NORMAL 9.281 drugX
## 118 40 F NORMAL HIGH 10.103 drugX
## 119 32 F HIGH NORMAL 10.292 drugA
## 120 61 F HIGH HIGH 25.475 drugY
## 124 36 F NORMAL HIGH 16.753 drugY
## 125 53 F HIGH NORMAL 12.495 drugB
## 126 19 F HIGH NORMAL 25.969 drugY
## 130 32 F NORMAL HIGH 7.477 drugX
## 131 70 F NORMAL HIGH 20.489 drugY
## 135 42 F HIGH HIGH 21.036 drugY
## 137 55 F HIGH HIGH 10.977 drugB
## 138 35 F HIGH HIGH 12.894 drugA
## 140 69 F NORMAL HIGH 10.065 drugX
## 142 64 F LOW NORMAL 25.741 drugY
## 147 37 F LOW NORMAL 12.006 drugX
## 148 26 F HIGH NORMAL 12.307 drugA
## 149 61 F LOW NORMAL 7.340 drugX
## 154 72 F LOW NORMAL 14.642 drugX
## 159 59 F LOW HIGH 10.444 drugC
## 160 34 F LOW NORMAL 12.923 drugX
## 161 30 F NORMAL HIGH 10.443 drugX
## 162 57 F HIGH NORMAL 9.945 drugB
## 164 21 F HIGH NORMAL 28.632 drugY
## 167 58 F LOW HIGH 26.645 drugY
## 168 57 F NORMAL HIGH 14.216 drugX
## 169 51 F LOW NORMAL 23.003 drugY
## 170 20 F HIGH HIGH 11.262 drugA
## 171 28 F NORMAL HIGH 12.879 drugX
## 173 39 F NORMAL NORMAL 17.225 drugY
## 174 41 F LOW NORMAL 18.739 drugY
## 176 73 F HIGH HIGH 18.348 drugY
## 180 67 F NORMAL HIGH 15.891 drugY
## 181 22 F HIGH NORMAL 22.818 drugY
## 182 59 F NORMAL HIGH 13.884 drugX
## 183 20 F LOW NORMAL 11.686 drugX
## 184 36 F HIGH NORMAL 15.490 drugY
## 185 18 F HIGH HIGH 37.188 drugY
## 186 57 F NORMAL NORMAL 25.893 drugY
## 195 46 F HIGH HIGH 34.686 drugY
## 196 56 F LOW HIGH 11.567 drugC
## 200 40 F LOW NORMAL 11.349 drugX
## female.Sex female.Drug
## 1 F drugY
## 2 F drugX
## 3 F drugY
## 4 F drugX
## 5 F drugY
## 6 F drugC
## 7 F drugY
## 8 F drugY
## 9 F drugX
## 10 F drugY
## 11 F drugY
## 12 F drugY
## 13 F drugY
## 14 F drugY
## 15 F drugX
## 16 F drugY
## 17 F drugX
## 18 F drugY
## 19 F drugX
## 20 F drugY
## 21 F drugB
## 22 F drugX
## 23 F drugX
## 24 F drugA
## 25 F drugY
## 26 F drugY
## 27 F drugY
## 28 F drugB
## 29 F drugC
## 30 F drugY
## 31 F drugB
## 32 F drugY
## 33 F drugY
## 34 F drugY
## 35 F drugX
## 36 F drugY
## 37 F drugA
## 38 F drugY
## 39 F drugA
## 40 F drugX
## 41 F drugC
## 42 F drugA
## 43 F drugC
## 44 F drugX
## 45 F drugY
## 46 F drugY
## 47 F drugY
## 48 F drugY
## 49 F drugY
## 50 F drugY
## 51 F drugY
## 52 F drugA
## 53 F drugC
## 54 F drugX
## 55 F drugX
## 56 F drugX
## 57 F drugX
## 58 F drugA
## 59 F drugY
## 60 F drugY
## 61 F drugB
## 62 F drugY
## 63 F drugX
## 64 F drugY
## 65 F drugY
## 66 F drugB
## 67 F drugA
## 68 F drugX
## 69 F drugY
## 70 F drugX
## 71 F drugA
## 72 F drugX
## 73 F drugX
## 74 F drugC
## 75 F drugX
## 76 F drugX
## 77 F drugB
## 78 F drugY
## 79 F drugY
## 80 F drugX
## 81 F drugY
## 82 F drugA
## 83 F drugX
## 84 F drugY
## 85 F drugY
## 86 F drugY
## 87 F drugY
## 88 F drugY
## 89 F drugX
## 90 F drugX
## 91 F drugY
## 92 F drugY
## 93 F drugY
## 94 F drugY
## 95 F drugC
## 96 F drugX
## female.Drug freq
## 1 drugA 9
## 2 drugB 6
## 3 drugC 7
## 4 drugX 27
## 5 drugY 47
There are 9 Female Patients that Drug A worked out of 96 patients. P(Drug_A_Female)=9/96
There are 6 Female Patients that Drug B worked out of 96 patients. P(Drug_B_Female)=6/96
There are 7 Female Patients that Drug C worked out of 96 patients. P(Drug_C_Female)=7/96
There are 27 Female Patients that Drug X worked out of 96 patients. P(Drug_X_Female)=27/96
There are 47 Female Patients that Drug Y worked out of 96 patients. P(Drug_Y_Female)=47/96
## [1] -1.280404
## Age Sex BP Cholesterol Na_to_K Drug
## 2 47 M LOW HIGH 13.093 drugC
## 3 47 M LOW HIGH 10.114 drugC
## 8 41 M LOW HIGH 11.037 drugC
## 9 60 M NORMAL HIGH 15.171 drugY
## 10 43 M LOW NORMAL 19.368 drugY
## 13 43 M LOW HIGH 15.376 drugY
## 17 69 M LOW NORMAL 11.455 drugX
## 18 43 M HIGH HIGH 13.972 drugA
## 19 23 M LOW HIGH 7.298 drugC
## 21 57 M LOW NORMAL 19.128 drugY
## 22 63 M NORMAL HIGH 25.917 drugY
## 23 47 M LOW NORMAL 30.568 drugY
## 27 31 M HIGH HIGH 30.366 drugY
## 30 45 M LOW HIGH 17.951 drugY
## 32 74 M HIGH HIGH 9.567 drugB
## 33 49 M LOW NORMAL 11.014 drugX
## 35 53 M NORMAL HIGH 14.133 drugX
## 36 46 M NORMAL NORMAL 7.285 drugX
## 37 32 M HIGH NORMAL 9.445 drugA
## 38 39 M LOW NORMAL 13.938 drugX
## 40 15 M NORMAL HIGH 9.084 drugX
## 43 50 M NORMAL NORMAL 15.790 drugY
## 44 23 M NORMAL HIGH 12.260 drugX
## 48 68 M LOW HIGH 10.291 drugC
## 49 23 M NORMAL HIGH 31.686 drugY
## 52 67 M NORMAL NORMAL 10.898 drugX
## 53 62 M LOW NORMAL 27.183 drugY
## 57 65 M HIGH NORMAL 11.340 drugB
## 58 40 M HIGH HIGH 27.826 drugY
## 59 60 M NORMAL NORMAL 10.091 drugX
## 60 34 M HIGH HIGH 18.703 drugY
## 62 24 M HIGH NORMAL 9.475 drugA
## 63 67 M LOW NORMAL 20.693 drugY
## 64 45 M LOW NORMAL 8.370 drugX
## 67 29 M HIGH HIGH 12.856 drugA
## 68 17 M NORMAL NORMAL 10.832 drugX
## 69 54 M NORMAL HIGH 24.658 drugY
## 71 70 M HIGH HIGH 13.967 drugB
## 75 31 M HIGH NORMAL 17.069 drugY
## 76 26 M LOW NORMAL 20.909 drugY
## 81 60 M HIGH HIGH 13.934 drugB
## 82 64 M NORMAL HIGH 7.761 drugX
## 86 59 M HIGH HIGH 13.935 drugB
## 88 69 M LOW HIGH 15.478 drugY
## 91 62 M NORMAL HIGH 16.594 drugY
## 92 41 M HIGH NORMAL 15.156 drugY
## 95 56 M LOW HIGH 15.015 drugY
## 96 36 M LOW NORMAL 11.424 drugX
## 99 20 M HIGH NORMAL 35.639 drugY
## 101 31 M HIGH NORMAL 11.871 drugA
## 104 56 M NORMAL HIGH 8.966 drugX
## 105 22 M HIGH NORMAL 28.294 drugY
## 106 37 M LOW NORMAL 8.968 drugX
## 107 22 M NORMAL HIGH 11.953 drugX
## 108 42 M LOW HIGH 20.013 drugY
## 109 72 M HIGH NORMAL 9.677 drugB
## 110 23 M NORMAL HIGH 16.850 drugY
## 111 50 M HIGH HIGH 7.490 drugA
## 113 35 M LOW NORMAL 9.170 drugX
## 116 51 M HIGH HIGH 18.295 drugY
## 117 67 M NORMAL NORMAL 9.514 drugX
## 121 28 M NORMAL HIGH 27.064 drugY
## 122 15 M HIGH NORMAL 17.206 drugY
## 123 34 M NORMAL HIGH 22.456 drugY
## 127 66 M HIGH HIGH 16.347 drugY
## 128 35 M NORMAL NORMAL 7.845 drugX
## 129 47 M LOW NORMAL 33.542 drugY
## 132 52 M LOW NORMAL 32.922 drugY
## 133 49 M LOW NORMAL 13.598 drugX
## 134 24 M NORMAL HIGH 25.786 drugY
## 136 74 M LOW NORMAL 11.939 drugX
## 139 51 M HIGH NORMAL 11.343 drugB
## 141 49 M HIGH NORMAL 6.269 drugA
## 143 60 M HIGH NORMAL 8.621 drugB
## 144 74 M HIGH NORMAL 15.436 drugY
## 145 39 M HIGH HIGH 9.664 drugA
## 146 61 M NORMAL HIGH 9.443 drugX
## 150 22 M LOW HIGH 8.151 drugC
## 151 49 M HIGH NORMAL 8.700 drugA
## 152 68 M HIGH HIGH 11.009 drugB
## 153 55 M NORMAL NORMAL 7.261 drugX
## 155 37 M LOW NORMAL 16.724 drugY
## 156 49 M LOW HIGH 10.537 drugC
## 157 31 M HIGH NORMAL 11.227 drugA
## 158 53 M LOW HIGH 22.963 drugY
## 163 43 M NORMAL NORMAL 12.859 drugX
## 165 16 M HIGH NORMAL 19.007 drugY
## 166 38 M LOW HIGH 18.295 drugY
## 172 45 M LOW NORMAL 10.017 drugX
## 175 42 M HIGH NORMAL 12.766 drugA
## 177 48 M HIGH NORMAL 10.446 drugA
## 178 25 M NORMAL HIGH 19.011 drugY
## 179 39 M NORMAL HIGH 15.969 drugY
## 187 70 M HIGH HIGH 9.849 drugB
## 188 47 M HIGH HIGH 10.403 drugA
## 189 65 M HIGH NORMAL 34.997 drugY
## 190 64 M HIGH NORMAL 20.932 drugY
## 191 58 M HIGH HIGH 18.991 drugY
## 192 23 M HIGH HIGH 8.011 drugA
## 193 72 M LOW HIGH 16.310 drugY
## 194 72 M LOW HIGH 6.769 drugC
## 197 16 M LOW HIGH 12.006 drugC
## 198 52 M NORMAL HIGH 9.894 drugX
## 199 23 M NORMAL NORMAL 14.020 drugX
## male.Sex male.Drug
## 1 M drugC
## 2 M drugC
## 3 M drugC
## 4 M drugY
## 5 M drugY
## 6 M drugY
## 7 M drugX
## 8 M drugA
## 9 M drugC
## 10 M drugY
## 11 M drugY
## 12 M drugY
## 13 M drugY
## 14 M drugY
## 15 M drugB
## 16 M drugX
## 17 M drugX
## 18 M drugX
## 19 M drugA
## 20 M drugX
## 21 M drugX
## 22 M drugY
## 23 M drugX
## 24 M drugC
## 25 M drugY
## 26 M drugX
## 27 M drugY
## 28 M drugB
## 29 M drugY
## 30 M drugX
## 31 M drugY
## 32 M drugA
## 33 M drugY
## 34 M drugX
## 35 M drugA
## 36 M drugX
## 37 M drugY
## 38 M drugB
## 39 M drugY
## 40 M drugY
## 41 M drugB
## 42 M drugX
## 43 M drugB
## 44 M drugY
## 45 M drugY
## 46 M drugY
## 47 M drugY
## 48 M drugX
## 49 M drugY
## 50 M drugA
## 51 M drugX
## 52 M drugY
## 53 M drugX
## 54 M drugX
## 55 M drugY
## 56 M drugB
## 57 M drugY
## 58 M drugA
## 59 M drugX
## 60 M drugY
## 61 M drugX
## 62 M drugY
## 63 M drugY
## 64 M drugY
## 65 M drugY
## 66 M drugX
## 67 M drugY
## 68 M drugY
## 69 M drugX
## 70 M drugY
## 71 M drugX
## 72 M drugB
## 73 M drugA
## 74 M drugB
## 75 M drugY
## 76 M drugA
## 77 M drugX
## 78 M drugC
## 79 M drugA
## 80 M drugB
## 81 M drugX
## 82 M drugY
## 83 M drugC
## 84 M drugA
## 85 M drugY
## 86 M drugX
## 87 M drugY
## 88 M drugY
## 89 M drugX
## 90 M drugA
## 91 M drugA
## 92 M drugY
## 93 M drugY
## 94 M drugB
## 95 M drugA
## 96 M drugY
## 97 M drugY
## 98 M drugY
## 99 M drugA
## 100 M drugY
## 101 M drugC
## 102 M drugC
## 103 M drugX
## 104 M drugX
## male.Drug freq
## 1 drugA 14
## 2 drugB 10
## 3 drugC 9
## 4 drugX 27
## 5 drugY 44
There are 14 Male Patients that Drug A worked out of 104 patients. P(Drug_A_Male)=14/104
There are 10 Male Patients that Drug B worked out of 104 patients. P(Drug_B_Male)=10/104
There are 9 Male Patients that Drug C worked out of 104 patients. P(Drug_C_Male)=9/104
There are 27 Male Patients that Drug X worked out of 104 patients. P(Drug_X_Male)=27/104
There are 44 Male Patients that Drug Y worked out of 104 patients. P(Drug_Y_Male)=44/104
## [1] -1.398591
\[\begin{aligned} IG(parent,children) = entropy(parent) - [(p(female) * entropy(female)) + (p(male) * entropy(male))]\end{aligned}\]
## [1] 2.706516
Select multiple attributes. Take each data subset, apply attribute selection to each to find the best attribute segmentation.
Example: Female Patients that are older than 35 whose BP is high and responded positive to Drug X
Tree is supervised segmentation, because each leaf contains a value of the target variable.
Start from the root node and descend through interior notes.
Chose branches based on specific attribute values.
None Leaf Nodes are decision nodes.
Use the values of each attribute to make a decision on which branch to follow.
Following each branch ultimately leads to final decision.
Segment data into groups based on the attributes.
Each time split into a group we consider another attribute.
Interpretation of classification trees are logical statememnts.
More informative Prediction rather than just classification.
Instead of predicting what Drug will work for a particular patient, find probability.
Example: Probability of Drug X for treatment of Patient B is 64%.