La base cuenta de 2201 registros y 4 variables. Para cada pasajero se dispone de la clase en la que viajaba (1ª, 2ª, 3ª o crew), la edad (Adult o Child), el sexo (Male o Female) y si sobrevivió o no. Una vez cargados los datos se creará un modelo que permitirá analizar qué tipo de persona tenía probabilidades de sobrevivir. Se utilizará un árbol de decisión para determinar si un pasajero sobreviviría o no. ———————————————————————–
## Class Sex Age Survived
## 1 3rd Male Child No
## 2 3rd Male Child No
## 3 3rd Male Child No
## 4 3rd Male Child No
## 5 3rd Male Child No
## 6 3rd Male Child No
## 7 3rd Male Child No
## 8 3rd Male Child No
## 9 3rd Male Child No
## 10 3rd Male Child No
## 11 3rd Male Child No
## 12 3rd Male Child No
## 13 3rd Male Child No
## 14 3rd Male Child No
## 15 3rd Male Child No
## 16 3rd Male Child No
## 17 3rd Male Child No
## 18 3rd Male Child No
## 19 3rd Male Child No
## 20 3rd Male Child No
## 21 3rd Male Child No
## 22 3rd Male Child No
## 23 3rd Male Child No
## 24 3rd Male Child No
## 25 3rd Male Child No
## 26 3rd Male Child No
## 27 3rd Male Child No
## 28 3rd Male Child No
## 29 3rd Male Child No
## 30 3rd Male Child No
## 31 3rd Male Child No
## 32 3rd Male Child No
## 33 3rd Male Child No
## 34 3rd Male Child No
## 35 3rd Male Child No
## 36 3rd Female Child No
## 37 3rd Female Child No
## 38 3rd Female Child No
## 39 3rd Female Child No
## 40 3rd Female Child No
## 41 3rd Female Child No
## 42 3rd Female Child No
## 43 3rd Female Child No
## 44 3rd Female Child No
## 45 3rd Female Child No
## 46 3rd Female Child No
## 47 3rd Female Child No
## 48 3rd Female Child No
## 49 3rd Female Child No
## 50 3rd Female Child No
## 51 3rd Female Child No
## 52 3rd Female Child No
## 53 1st Male Adult No
## 54 1st Male Adult No
## 55 1st Male Adult No
## 56 1st Male Adult No
## 57 1st Male Adult No
## 58 1st Male Adult No
## 59 1st Male Adult No
## 60 1st Male Adult No
## 61 1st Male Adult No
## 62 1st Male Adult No
## 63 1st Male Adult No
## 64 1st Male Adult No
## 65 1st Male Adult No
## 66 1st Male Adult No
## 67 1st Male Adult No
## 68 1st Male Adult No
## 69 1st Male Adult No
## 70 1st Male Adult No
## 71 1st Male Adult No
## 72 1st Male Adult No
## 73 1st Male Adult No
## 74 1st Male Adult No
## 75 1st Male Adult No
## 76 1st Male Adult No
## 77 1st Male Adult No
## 78 1st Male Adult No
## 79 1st Male Adult No
## 80 1st Male Adult No
## 81 1st Male Adult No
## 82 1st Male Adult No
## 83 1st Male Adult No
## 84 1st Male Adult No
## 85 1st Male Adult No
## 86 1st Male Adult No
## 87 1st Male Adult No
## 88 1st Male Adult No
## 89 1st Male Adult No
## 90 1st Male Adult No
## 91 1st Male Adult No
## 92 1st Male Adult No
## 93 1st Male Adult No
## 94 1st Male Adult No
## 95 1st Male Adult No
## 96 1st Male Adult No
## 97 1st Male Adult No
## 98 1st Male Adult No
## 99 1st Male Adult No
## 100 1st Male Adult No
## 101 1st Male Adult No
## 102 1st Male Adult No
## 103 1st Male Adult No
## 104 1st Male Adult No
## 105 1st Male Adult No
## 106 1st Male Adult No
## 107 1st Male Adult No
## 108 1st Male Adult No
## 109 1st Male Adult No
## 110 1st Male Adult No
## 111 1st Male Adult No
## 112 1st Male Adult No
## 113 1st Male Adult No
## 114 1st Male Adult No
## 115 1st Male Adult No
## 116 1st Male Adult No
## 117 1st Male Adult No
## 118 1st Male Adult No
## 119 1st Male Adult No
## 120 1st Male Adult No
## 121 1st Male Adult No
## 122 1st Male Adult No
## 123 1st Male Adult No
## 124 1st Male Adult No
## 125 1st Male Adult No
## 126 1st Male Adult No
## 127 1st Male Adult No
## 128 1st Male Adult No
## 129 1st Male Adult No
## 130 1st Male Adult No
## 131 1st Male Adult No
## 132 1st Male Adult No
## 133 1st Male Adult No
## 134 1st Male Adult No
## 135 1st Male Adult No
## 136 1st Male Adult No
## 137 1st Male Adult No
## 138 1st Male Adult No
## 139 1st Male Adult No
## 140 1st Male Adult No
## 141 1st Male Adult No
## 142 1st Male Adult No
## 143 1st Male Adult No
## 144 1st Male Adult No
## 145 1st Male Adult No
## 146 1st Male Adult No
## 147 1st Male Adult No
## 148 1st Male Adult No
## 149 1st Male Adult No
## 150 1st Male Adult No
## 151 1st Male Adult No
## 152 1st Male Adult No
## 153 1st Male Adult No
## 154 1st Male Adult No
## 155 1st Male Adult No
## 156 1st Male Adult No
## 157 1st Male Adult No
## 158 1st Male Adult No
## 159 1st Male Adult No
## 160 1st Male Adult No
## 161 1st Male Adult No
## 162 1st Male Adult No
## 163 1st Male Adult No
## 164 1st Male Adult No
## 165 1st Male Adult No
## 166 1st Male Adult No
## 167 1st Male Adult No
## 168 1st Male Adult No
## 169 1st Male Adult No
## 170 1st Male Adult No
## 171 2nd Male Adult No
## 172 2nd Male Adult No
## 173 2nd Male Adult No
## 174 2nd Male Adult No
## 175 2nd Male Adult No
## 176 2nd Male Adult No
## 177 2nd Male Adult No
## 178 2nd Male Adult No
## 179 2nd Male Adult No
## 180 2nd Male Adult No
## 181 2nd Male Adult No
## 182 2nd Male Adult No
## 183 2nd Male Adult No
## 184 2nd Male Adult No
## 185 2nd Male Adult No
## 186 2nd Male Adult No
## 187 2nd Male Adult No
## 188 2nd Male Adult No
## 189 2nd Male Adult No
## 190 2nd Male Adult No
## 191 2nd Male Adult No
## 192 2nd Male Adult No
## 193 2nd Male Adult No
## 194 2nd Male Adult No
## 195 2nd Male Adult No
## 196 2nd Male Adult No
## 197 2nd Male Adult No
## 198 2nd Male Adult No
## 199 2nd Male Adult No
## 200 2nd Male Adult No
## 201 2nd Male Adult No
## 202 2nd Male Adult No
## 203 2nd Male Adult No
## 204 2nd Male Adult No
## 205 2nd Male Adult No
## 206 2nd Male Adult No
## 207 2nd Male Adult No
## 208 2nd Male Adult No
## 209 2nd Male Adult No
## 210 2nd Male Adult No
## 211 2nd Male Adult No
## 212 2nd Male Adult No
## 213 2nd Male Adult No
## 214 2nd Male Adult No
## 215 2nd Male Adult No
## 216 2nd Male Adult No
## 217 2nd Male Adult No
## 218 2nd Male Adult No
## 219 2nd Male Adult No
## 220 2nd Male Adult No
## 221 2nd Male Adult No
## 222 2nd Male Adult No
## 223 2nd Male Adult No
## 224 2nd Male Adult No
## 225 2nd Male Adult No
## 226 2nd Male Adult No
## 227 2nd Male Adult No
## 228 2nd Male Adult No
## 229 2nd Male Adult No
## 230 2nd Male Adult No
## 231 2nd Male Adult No
## 232 2nd Male Adult No
## 233 2nd Male Adult No
## 234 2nd Male Adult No
## 235 2nd Male Adult No
## 236 2nd Male Adult No
## 237 2nd Male Adult No
## 238 2nd Male Adult No
## 239 2nd Male Adult No
## 240 2nd Male Adult No
## 241 2nd Male Adult No
## 242 2nd Male Adult No
## 243 2nd Male Adult No
## 244 2nd Male Adult No
## 245 2nd Male Adult No
## 246 2nd Male Adult No
## 247 2nd Male Adult No
## 248 2nd Male Adult No
## 249 2nd Male Adult No
## 250 2nd Male Adult No
## [1] 2201 4
## 'data.frame': 2201 obs. of 4 variables:
## $ Class : Factor w/ 4 levels "1st","2nd","3rd",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
## $ Age : Factor w/ 2 levels "Adult","Child": 2 2 2 2 2 2 2 2 2 2 ...
## $ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## [1] "Class" "Sex" "Age" "Survived"
##
## No Yes
## 0.676965 0.323035
## Warning: Use of `tic$Class` is discouraged. Use `Class` instead.
## Warning: Use of `tic$Survived` is discouraged. Use `Survived` instead.
## Warning: Use of `tic$Age` is discouraged. Use `Age` instead.
## Warning: Use of `tic$Survived` is discouraged. Use `Survived` instead.
## Warning: Use of `tic$Sex` is discouraged. Use `Sex` instead.
## Warning: Use of `tic$Survived` is discouraged. Use `Survived` instead.
Para que un modelo predictivo sea útil, debe de tener un porcentaje de acierto superior a lo esperado por azar o a un determinado nivel basal. En problemas de clasificación, el nivel basal es el que se obtiene si se asignan todas las observaciones a la clase mayoritaria (la moda). En el naufragio del Titanic, dado que el 68 por ciento de los pasajeros fallecieron, si siempre se predice Survived = No, el porcentaje de aciertos será aproximadamente del 68 por ciento. Este es el porcentaje mínimo que hay que intentar superar con los modelos predictivos. (Siendo estrictos, este porcentaje tendrá que ser recalculado únicamente con el conjunto de entrenamiento).
## Class Sex Age Survived
## 1559 1st Male Adult Yes
## 1920 1st Female Adult Yes
## 687 3rd Male Adult No
## 353 3rd Male Adult No
## 1724 Crew Male Adult Yes
## 1413 3rd Female Adult No
## 554 3rd Male Adult No
## 844 Crew Male Adult No
## 2193 Crew Female Adult Yes
## 1015 Crew Male Adult No
## 2143 3rd Female Adult Yes
## 1443 3rd Female Adult No
## 984 Crew Male Adult No
## 1876 Crew Male Adult Yes
## 21 3rd Male Child No
## 1477 3rd Female Adult No
## 323 2nd Male Adult No
## 578 3rd Male Adult No
## 1922 1st Female Adult Yes
## 1557 1st Male Adult Yes
## [1] 2201 4
## [1] 1453 4
## Class Sex Age Survived
## 1559 1st Male Adult Yes
## 1920 1st Female Adult Yes
## 687 3rd Male Adult No
## 353 3rd Male Adult No
## 1724 Crew Male Adult Yes
## 1413 3rd Female Adult No
## 554 3rd Male Adult No
## 844 Crew Male Adult No
## 2193 Crew Female Adult Yes
## 1015 Crew Male Adult No
## 2143 3rd Female Adult Yes
## 1443 3rd Female Adult No
## 984 Crew Male Adult No
## 1876 Crew Male Adult Yes
## 21 3rd Male Child No
## 1477 3rd Female Adult No
## 323 2nd Male Adult No
## 578 3rd Male Adult No
## 1922 1st Female Adult Yes
## 1557 1st Male Adult Yes
## [1] 748 4
## Class Sex Age Survived
## 940 Crew Male Adult No
## 1816 Crew Male Adult Yes
## 997 Crew Male Adult No
## 1313 Crew Male Adult No
## 831 Crew Male Adult No
## 1935 1st Female Adult Yes
## 356 3rd Male Adult No
## 620 3rd Male Adult No
## 700 3rd Male Adult No
## 707 3rd Male Adult No
## 1722 Crew Male Adult Yes
## 1317 Crew Male Adult No
## 1169 Crew Male Adult No
## 2201 Crew Female Adult Yes
## 1905 1st Female Adult Yes
## 2139 3rd Female Adult Yes
## 1416 3rd Female Adult No
## 62 1st Male Adult No
## 1906 1st Female Adult Yes
## 1157 Crew Male Adult No
## Class Sex Age Survived
## 1559 1st Male Adult Yes
## 1920 1st Female Adult Yes
## 687 3rd Male Adult No
## 353 3rd Male Adult No
## 1724 Crew Male Adult Yes
## Class Sex Age
## 1559 1st Male Adult
## 1920 1st Female Adult
## 687 3rd Male Adult
## 353 3rd Male Adult
## 1724 Crew Male Adult
## [1] Yes Yes No No Yes
## Levels: No Yes
##
## Call:
## C5.0.default(x = entrena[, -4], y = entrena[, 4])
##
##
## C5.0 [Release 2.07 GPL Edition] Tue Jan 05 16:05:40 2021
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 1453 cases (4 attributes) from undefined.data
##
## Decision tree:
##
## Sex = Male: No (1130/236)
## Sex = Female:
## :...Class in {1st,2nd,Crew}: Yes (183/15)
## Class = 3rd: No (140/61)
##
##
## Evaluation on training data (1453 cases):
##
## Decision Tree
## ----------------
## Size Errors
##
## 3 312(21.5%) <<
##
##
## (a) (b) <-classified as
## ---- ----
## 973 15 (a): class No
## 297 168 (b): class Yes
##
##
## Attribute usage:
##
## 100.00% Sex
## 22.23% Class
##
##
## Time: 0.0 secs
## Call:
## rpart(formula = Survived ~ ., data = entrena)
## n= 1453
##
## CP nsplit rel error xerror xstd
## 1 0.29032258 0 1.0000000 1.0000000 0.03824011
## 2 0.03870968 1 0.7096774 0.7096774 0.03434481
## 3 0.01290323 2 0.6709677 0.6709677 0.03366157
## 4 0.01000000 4 0.6451613 0.6580645 0.03342386
##
## Variable importance
## Sex Class Age
## 69 25 5
##
## Node number 1: 1453 observations, complexity param=0.2903226
## predicted class=No expected loss=0.3200275 P(node) =1
## class counts: 988 465
## probabilities: 0.680 0.320
## left son=2 (1130 obs) right son=3 (323 obs)
## Primary splits:
## Sex splits as RL, improve=125.663500, (0 missing)
## Class splits as RRLL, improve= 42.427350, (0 missing)
## Age splits as LR, improve= 7.339061, (0 missing)
##
## Node number 2: 1130 observations, complexity param=0.01290323
## predicted class=No expected loss=0.2088496 P(node) =0.7777013
## class counts: 894 236
## probabilities: 0.791 0.209
## left son=4 (1087 obs) right son=5 (43 obs)
## Primary splits:
## Age splits as LR, improve=6.985235, (0 missing)
## Class splits as RLLR, improve=5.865266, (0 missing)
##
## Node number 3: 323 observations, complexity param=0.03870968
## predicted class=Yes expected loss=0.2910217 P(node) =0.2222987
## class counts: 94 229
## probabilities: 0.291 0.709
## left son=6 (140 obs) right son=7 (183 obs)
## Primary splits:
## Class splits as RRLR, improve=36.904080, (0 missing)
## Age splits as RL, improve= 1.107995, (0 missing)
## Surrogate splits:
## Age splits as RL, agree=0.598, adj=0.071, (0 split)
##
## Node number 4: 1087 observations
## predicted class=No expected loss=0.1977921 P(node) =0.7481074
## class counts: 872 215
## probabilities: 0.802 0.198
##
## Node number 5: 43 observations, complexity param=0.01290323
## predicted class=No expected loss=0.4883721 P(node) =0.02959394
## class counts: 22 21
## probabilities: 0.512 0.488
## left son=10 (31 obs) right son=11 (12 obs)
## Primary splits:
## Class splits as RRL-, improve=8.714179, (0 missing)
##
## Node number 6: 140 observations
## predicted class=No expected loss=0.4357143 P(node) =0.09635237
## class counts: 79 61
## probabilities: 0.564 0.436
##
## Node number 7: 183 observations
## predicted class=Yes expected loss=0.08196721 P(node) =0.1259463
## class counts: 15 168
## probabilities: 0.082 0.918
##
## Node number 10: 31 observations
## predicted class=No expected loss=0.2903226 P(node) =0.02133517
## class counts: 22 9
## probabilities: 0.710 0.290
##
## Node number 11: 12 observations
## predicted class=Yes expected loss=0 P(node) =0.008258775
## class counts: 0 12
## probabilities: 0.000 1.000
## [1] No No No No No Yes No No No No No No No Yes Yes No No No
## [19] Yes No No No No No No No No No No No No No No No No No
## [37] No No No No No Yes No No No No No No No No No No No No
## [55] No Yes No No No No No No No No No No Yes No Yes No No No
## [73] No Yes No Yes No No No No No No No Yes No Yes No No No No
## [91] No No Yes No No No No No No No No Yes No No No No No No
## [109] No No No No No No No No No No No No No No No No No No
## [127] No No No No No No No No No No No No No No No No Yes No
## [145] No No No Yes No No No No No No No No No No Yes No No No
## [163] No No No No No Yes No No No No No No No No No No No No
## [181] No No Yes No No No Yes Yes No No No No Yes No Yes Yes No No
## [199] No No No No No No No No No No No No No No No No No No
## [217] No No No No Yes No No No Yes No No No No No No No No No
## [235] No No Yes No No No Yes No No No No No No No No No No No
## [253] No No Yes No Yes No No No No No No No No No No Yes No No
## [271] No No No No No No No No No No No No No Yes No No No No
## [289] No No No No No No Yes Yes Yes Yes No Yes No No No No No No
## [307] No No No No No No No No No No Yes Yes No No No No Yes No
## [325] Yes Yes No No No No No No No No Yes No No No No No No Yes
## [343] No No No No No No No Yes Yes Yes No No Yes Yes Yes No No No
## [361] No Yes Yes No No No No No No No Yes No No No Yes No No No
## [379] No No No No Yes No No No No No No No No No No No No No
## [397] No Yes No Yes No No No No No No No No No No No No No No
## [415] No No No Yes No No No No No No Yes No No No No No No No
## [433] Yes No No No No Yes No No No No No No Yes No No No No No
## [451] No Yes No No Yes No No No No No No No No No No No Yes No
## [469] No No No No No Yes No No No No No No No No No No No Yes
## [487] No No No No No No No No No No No No No No No Yes No No
## [505] No No No No No No No Yes No Yes No No No No Yes No No No
## [523] No No No No No No No No No Yes No No No No No No No No
## [541] No Yes No No No No No No No No No No Yes No No No No No
## [559] No No No No No No No No No No No No No No Yes No No No
## [577] No No No No No No No No No No No No Yes Yes No No No No
## [595] No No No No No No Yes No Yes No No No No No No Yes No No
## [613] No No Yes No No No No No No No No No No No No No No No
## [631] No No Yes No No No No No No No No Yes No No No No No No
## [649] No No No Yes No No No No No No No No No No No No No No
## [667] No No No No No No No No No No No No No No No No No No
## [685] No No No No No No No No No Yes No No No No No No No No
## [703] No No No No No No No No No No No No No No No No Yes Yes
## [721] No No No No No No Yes Yes No No No No No No No No No No
## [739] No No No No No No No No Yes Yes
## Levels: No Yes
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 748
##
##
## | Predicción
## Resultado real | No | Yes | Row Total |
## ---------------|-----------|-----------|-----------|
## No | 497 | 5 | 502 |
## | 0.664 | 0.007 | |
## ---------------|-----------|-----------|-----------|
## Yes | 160 | 86 | 246 |
## | 0.214 | 0.115 | |
## ---------------|-----------|-----------|-----------|
## Column Total | 657 | 91 | 748 |
## ---------------|-----------|-----------|-----------|
##
##
## [1] 0.7794118
## Warning: package 'party' was built under R version 4.0.3
## Warning: package 'modeltools' was built under R version 4.0.3
## Warning: package 'strucchange' was built under R version 4.0.3
## Warning: package 'sandwich' was built under R version 4.0.3
##
## No Yes
## No 997 330
## Yes 11 175
##
## testPred No Yes
## No 473 127
## Yes 9 79
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
##
## No Yes
## No 922 256
## Yes 75 240
## $names
## [1] "call" "type" "predicted" "err.rate"
## [5] "confusion" "votes" "oob.times" "classes"
## [9] "importance" "importanceSD" "localImportance" "proximity"
## [13] "ntree" "mtry" "forest" "y"
## [17] "test" "inbag" "terms"
##
## $class
## [1] "randomForest.formula" "randomForest"
## MeanDecreaseGini
## Class 32.275564
## Sex 118.927334
## Age 4.522967