Developinga model to predict permeability(see Sect.1.4) could savesig- nificant resourcesfor a pharmaceuticalcompany,while at the sametime more rapidly identifying molecules that have a sufficient permeability to become a drug:
Start R and use these commands to load the data: library(AppliedPredictiveModeling) data(permeability) The matrix fingerprints contains the 1,107 binary molecular predic- tors for the 165 compounds, while permeability contains permeability response.
library(AppliedPredictiveModeling)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data(permeability)
head(fingerprints)
## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21
## 1 0 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0
## 2 0 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 0 0
## 3 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 1 1 1 0 0
## 4 0 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 0 0
## 5 0 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 0 0
## 6 0 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 0 0
## X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40
## 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
## 2 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
## 3 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
## 4 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
## 5 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
## 6 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
## X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X60 X61 X62 X63 X64 X65 X66 X67 X68 X69 X70 X71 X72 X73 X74 X75 X76 X77 X78
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X79 X80 X81 X82 X83 X84 X85 X86 X87 X88 X89 X90 X91 X92 X93 X94 X95 X96 X97
## 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1
## 2 0 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 0
## 3 0 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 0
## 4 0 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 0
## 5 0 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 0
## 6 0 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 0
## X98 X99 X100 X101 X102 X103 X104 X105 X106 X107 X108 X109 X110 X111 X112 X113
## 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0
## 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
## X114 X115 X116 X117 X118 X119 X120 X121 X122 X123 X124 X125 X126 X127 X128
## 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0
## 2 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0
## 3 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0
## 4 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0
## 5 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0
## 6 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0
## X129 X130 X131 X132 X133 X134 X135 X136 X137 X138 X139 X140 X141 X142 X143
## 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
## 2 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1
## 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
## 4 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
## 5 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
## 6 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
## X144 X145 X146 X147 X148 X149 X150 X151 X152 X153 X154 X155 X156 X157 X158
## 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1
## 2 1 1 0 1 0 0 1 1 1 1 1 0 1 1 1
## 3 1 1 0 1 0 0 1 1 1 1 1 0 0 1 1
## 4 1 1 0 1 0 0 1 1 1 1 1 0 1 1 1
## 5 1 1 0 1 0 0 1 1 1 1 1 0 1 1 1
## 6 1 1 0 1 0 0 1 1 1 1 1 0 1 1 1
## X159 X160 X161 X162 X163 X164 X165 X166 X167 X168 X169 X170 X171 X172 X173
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## X174 X175 X176 X177 X178 X179 X180 X181 X182 X183 X184 X185 X186 X187 X188
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## X189 X190 X191 X192 X193 X194 X195 X196 X197 X198 X199 X200 X201 X202 X203
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## X204 X205 X206 X207 X208 X209 X210 X211 X212 X213 X214 X215 X216 X217 X218
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## X219 X220 X221 X222 X223 X224 X225 X226 X227 X228 X229 X230 X231 X232 X233
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1
## 3 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0
## 4 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1
## 5 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1
## 6 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1
## X234 X235 X236 X237 X238 X239 X240 X241 X242 X243 X244 X245 X246 X247 X248
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1
## 3 0 0 0 0 1 1 1 1 0 0 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1
## 5 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1
## X249 X250 X251 X252 X253 X254 X255 X256 X257 X258 X259 X260 X261 X262 X263
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1
## 3 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1
## 4 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1
## 5 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1
## 6 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1
## X264 X265 X266 X267 X268 X269 X270 X271 X272 X273 X274 X275 X276 X277 X278
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## 2 0 1 1 0 1 1 1 0 0 0 1 0 0 1 1
## 3 0 1 1 0 1 1 1 0 0 0 1 0 0 1 0
## 4 0 1 1 0 1 1 1 0 0 0 1 0 0 1 0
## 5 0 1 1 0 1 1 1 0 0 0 1 0 0 1 1
## 6 0 1 1 0 1 1 1 0 0 0 1 0 0 1 0
## X279 X280 X281 X282 X283 X284 X285 X286 X287 X288 X289 X290 X291 X292 X293
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1
## 5 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1
## 6 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1
## X294 X295 X296 X297 X298 X299 X300 X301 X302 X303 X304 X305 X306 X307 X308
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3 0 0 1 1 0 0 0 1 1 1 0 0 0 0 0
## 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## X309 X310 X311 X312 X313 X314 X315 X316 X317 X318 X319 X320 X321 X322 X323
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0
## 5 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0
## X324 X325 X326 X327 X328 X329 X330 X331 X332 X333 X334 X335 X336 X337 X338
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 0 1 1 1 1 1 1 1 1 1 0 1 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 1 1 1 1 1 1 1 1 1 0 1 0 0 0
## X339 X340 X341 X342 X343 X344 X345 X346 X347 X348 X349 X350 X351 X352 X353
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
## 6 0 0 0 0 0 0 1 0 1 1 0 1 1 1 1
## X354 X355 X356 X357 X358 X359 X360 X361 X362 X363 X364 X365 X366 X367 X368
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X369 X370 X371 X372 X373 X374 X375 X376 X377 X378 X379 X380 X381 X382 X383
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X384 X385 X386 X387 X388 X389 X390 X391 X392 X393 X394 X395 X396 X397 X398
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X399 X400 X401 X402 X403 X404 X405 X406 X407 X408 X409 X410 X411 X412 X413
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X414 X415 X416 X417 X418 X419 X420 X421 X422 X423 X424 X425 X426 X427 X428
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X429 X430 X431 X432 X433 X434 X435 X436 X437 X438 X439 X440 X441 X442 X443
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X444 X445 X446 X447 X448 X449 X450 X451 X452 X453 X454 X455 X456 X457 X458
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X459 X460 X461 X462 X463 X464 X465 X466 X467 X468 X469 X470 X471 X472 X473
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X474 X475 X476 X477 X478 X479 X480 X481 X482 X483 X484 X485 X486 X487 X488
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X489 X490 X491 X492 X493 X494 X495 X496 X497 X498 X499 X500 X501 X502 X503
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X504 X505 X506 X507 X508 X509 X510 X511 X512 X513 X514 X515 X516 X517 X518
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X519 X520 X521 X522 X523 X524 X525 X526 X527 X528 X529 X530 X531 X532 X533
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X534 X535 X536 X537 X538 X539 X540 X541 X542 X543 X544 X545 X546 X547 X548
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X549 X550 X551 X552 X553 X554 X555 X556 X557 X558 X559 X560 X561 X562 X563
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X564 X565 X566 X567 X568 X569 X570 X571 X572 X573 X574 X575 X576 X577 X578
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X579 X580 X581 X582 X583 X584 X585 X586 X587 X588 X589 X590 X591 X592 X593
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X594 X595 X596 X597 X598 X599 X600 X601 X602 X603 X604 X605 X606 X607 X608
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X609 X610 X611 X612 X613 X614 X615 X616 X617 X618 X619 X620 X621 X622 X623
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X624 X625 X626 X627 X628 X629 X630 X631 X632 X633 X634 X635 X636 X637 X638
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X639 X640 X641 X642 X643 X644 X645 X646 X647 X648 X649 X650 X651 X652 X653
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X654 X655 X656 X657 X658 X659 X660 X661 X662 X663 X664 X665 X666 X667 X668
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X669 X670 X671 X672 X673 X674 X675 X676 X677 X678 X679 X680 X681 X682 X683
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X684 X685 X686 X687 X688 X689 X690 X691 X692 X693 X694 X695 X696 X697 X698
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X699 X700 X701 X702 X703 X704 X705 X706 X707 X708 X709 X710 X711 X712 X713
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X714 X715 X716 X717 X718 X719 X720 X721 X722 X723 X724 X725 X726 X727 X728
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X729 X730 X731 X732 X733 X734 X735 X736 X737 X738 X739 X740 X741 X742 X743
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X744 X745 X746 X747 X748 X749 X750 X751 X752 X753 X754 X755 X756 X757 X758
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X759 X760 X761 X762 X763 X764 X765 X766 X767 X768 X769 X770 X771 X772 X773
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X774 X775 X776 X777 X778 X779 X780 X781 X782 X783 X784 X785 X786 X787 X788
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X789 X790 X791 X792 X793 X794 X795 X796 X797 X798 X799 X800 X801 X802 X803
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X804 X805 X806 X807 X808 X809 X810 X811 X812 X813 X814 X815 X816 X817 X818
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X819 X820 X821 X822 X823 X824 X825 X826 X827 X828 X829 X830 X831 X832 X833
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X834 X835 X836 X837 X838 X839 X840 X841 X842 X843 X844 X845 X846 X847 X848
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X849 X850 X851 X852 X853 X854 X855 X856 X857 X858 X859 X860 X861 X862 X863
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X864 X865 X866 X867 X868 X869 X870 X871 X872 X873 X874 X875 X876 X877 X878
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X879 X880 X881 X882 X883 X884 X885 X886 X887 X888 X889 X890 X891 X892 X893
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X894 X895 X896 X897 X898 X899 X900 X901 X902 X903 X904 X905 X906 X907 X908
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X909 X910 X911 X912 X913 X914 X915 X916 X917 X918 X919 X920 X921 X922 X923
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X924 X925 X926 X927 X928 X929 X930 X931 X932 X933 X934 X935 X936 X937 X938
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X939 X940 X941 X942 X943 X944 X945 X946 X947 X948 X949 X950 X951 X952 X953
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X954 X955 X956 X957 X958 X959 X960 X961 X962 X963 X964 X965 X966 X967 X968
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X969 X970 X971 X972 X973 X974 X975 X976 X977 X978 X979 X980 X981 X982 X983
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X984 X985 X986 X987 X988 X989 X990 X991 X992 X993 X994 X995 X996 X997 X998
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X999 X1000 X1001 X1002 X1003 X1004 X1005 X1006 X1007 X1008 X1009 X1010 X1011
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0
## X1012 X1013 X1014 X1015 X1016 X1017 X1018 X1019 X1020 X1021 X1022 X1023 X1024
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0
## X1025 X1026 X1027 X1028 X1029 X1030 X1031 X1032 X1033 X1034 X1035 X1036 X1037
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0
## X1038 X1039 X1040 X1041 X1042 X1043 X1044 X1045 X1046 X1047 X1048 X1049 X1050
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0
## X1051 X1052 X1053 X1054 X1055 X1056 X1057 X1058 X1059 X1060 X1061 X1062 X1063
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0
## X1064 X1065 X1066 X1067 X1068 X1069 X1070 X1071 X1072 X1073 X1074 X1075 X1076
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0
## X1077 X1078 X1079 X1080 X1081 X1082 X1083 X1084 X1085 X1086 X1087 X1088 X1089
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0
## X1090 X1091 X1092 X1093 X1094 X1095 X1096 X1097 X1098 X1099 X1100 X1101 X1102
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0
## X1103 X1104 X1105 X1106 X1107
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 0 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 0 0 0
## 6 0 0 0 0 0
The fingerprint predictors indicate the presence or absence of substruc- tures of a molecule and are often sparse meaning that relatively few ofthe molecules contain each substructure. Filter out the predictors that have low frequencies using the nearZeroVar function from the caret package. How many predictors are left for modeling?
library(caret)
## Warning: package 'caret' was built under R version 4.3.3
## Loading required package: ggplot2
## Loading required package: lattice
nearZeroPreds <- nearZeroVar(fingerprints)
nearZeroPreds
## [1] 7 8 9 10 13 14 17 18 19 22 23 24 30 31 32
## [16] 33 34 45 77 81 82 83 84 85 89 90 91 92 95 100
## [31] 104 105 106 107 109 110 112 113 114 115 116 117 119 120 122
## [46] 123 124 128 131 132 134 135 136 137 139 140 144 145 147 148
## [61] 149 151 155 160 161 164 165 166 216 217 218 219 220 222 243
## [76] 252 259 273 275 277 282 283 287 288 289 292 346 347 348 349
## [91] 350 351 352 353 354 363 364 365 369 375 379 384 391 393 397
## [106] 399 402 404 405 407 408 409 410 411 412 413 414 415 416 417
## [121] 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432
## [136] 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447
## [151] 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462
## [166] 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477
## [181] 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492
## [196] 493 494 495 498 500 501 502 513 523 525 526 527 528 530 531
## [211] 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546
## [226] 547 548 550 552 555 562 563 564 566 567 569 570 572 575 578
## [241] 579 580 581 582 583 584 585 586 587 588 589 596 605 606 607
## [256] 608 609 610 611 612 614 615 616 617 618 619 620 622 623 624
## [271] 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639
## [286] 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654
## [301] 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669
## [316] 670 671 672 673 674 675 676 677 678 680 681 682 683 684 685
## [331] 686 687 688 689 690 691 692 693 694 695 696 697 706 707 708
## [346] 709 710 711 712 713 714 715 716 717 718 720 721 722 723 724
## [361] 725 726 727 728 729 730 731 734 735 736 737 738 739 740 741
## [376] 742 743 744 745 746 747 748 749 756 757 758 759 760 761 762
## [391] 763 764 765 766 767 768 769 770 771 772 777 778 779 781 783
## [406] 784 785 786 787 788 789 790 791 794 796 797 799 802 803 804
## [421] 807 808 809 810 811 814 815 816 817 818 819 820 821 822 823
## [436] 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838
## [451] 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853
## [466] 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868
## [481] 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883
## [496] 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898
## [511] 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913
## [526] 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928
## [541] 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943
## [556] 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958
## [571] 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973
## [586] 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988
## [601] 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003
## [616] 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018
## [631] 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033
## [646] 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048
## [661] 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063
## [676] 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078
## [691] 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093
## [706] 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107
length(nearZeroPreds)
## [1] 719
We can see that out of the 1,107 predictors we have 719 predictors that have near zero variance.
Now lets filter out these predictors and see how many we are left with.
filtered_fingerprints <- fingerprints[, -nearZeroPreds]
head(filtered_fingerprints)
## X1 X2 X3 X4 X5 X6 X11 X12 X15 X16 X20 X21 X25 X26 X27 X28 X29 X35 X36 X37 X38
## 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## X39 X40 X41 X42 X43 X44 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X59 X60 X61 X62 X63 X64 X65 X66 X67 X68 X69 X70 X71 X72 X73 X74 X75 X76 X78
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X79 X80 X86 X87 X88 X93 X94 X96 X97 X98 X99 X101 X102 X103 X108 X111 X118
## 1 0 0 1 1 1 0 0 1 1 1 0 1 1 0 0 0 0
## 2 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0
## 3 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0
## 4 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0
## 5 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0
## 6 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0
## X121 X125 X126 X127 X129 X130 X133 X138 X141 X142 X143 X146 X150 X152 X153
## 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
## 2 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1
## 3 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
## 4 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
## 5 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
## 6 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
## X154 X156 X157 X158 X159 X162 X163 X167 X168 X169 X170 X171 X172 X173 X174
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## X175 X176 X177 X178 X179 X180 X181 X182 X183 X184 X185 X186 X187 X188 X189
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## X190 X191 X192 X193 X194 X195 X196 X197 X198 X199 X200 X201 X202 X203 X204
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## X205 X206 X207 X208 X209 X210 X211 X212 X213 X214 X215 X221 X223 X224 X225
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## 3 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
## 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## X226 X227 X228 X229 X230 X231 X232 X233 X234 X235 X236 X237 X238 X239 X240
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1
## 3 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1
## 4 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
## 5 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1
## 6 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
## X241 X242 X244 X245 X246 X247 X248 X249 X250 X251 X253 X254 X255 X256 X257
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1
## 3 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1
## 4 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1
## 5 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1
## 6 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1
## X258 X260 X261 X262 X263 X264 X265 X266 X267 X268 X269 X270 X271 X272 X274
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1
## 3 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1
## 4 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1
## 5 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1
## 6 0 1 1 1 1 0 1 1 0 1 1 1 0 0 1
## X276 X278 X279 X280 X281 X284 X285 X286 X290 X291 X293 X294 X295 X296 X297
## 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
## 4 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1
## 5 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1
## X298 X299 X300 X301 X302 X303 X304 X305 X306 X307 X308 X309 X310 X311 X312
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
## 3 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1
## 4 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
## 5 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
## 6 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
## X313 X314 X315 X316 X317 X318 X319 X320 X321 X322 X323 X324 X325 X326 X327
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1
## X328 X329 X330 X331 X332 X333 X334 X335 X336 X337 X338 X339 X340 X341 X342
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0
## X343 X344 X345 X355 X356 X357 X358 X359 X360 X361 X362 X366 X367 X368 X370
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
## X371 X372 X373 X374 X376 X377 X378 X380 X381 X382 X383 X385 X386 X387 X388
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X389 X390 X392 X394 X395 X396 X398 X400 X401 X403 X406 X496 X497 X499 X503
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X504 X505 X506 X507 X508 X509 X510 X511 X512 X514 X515 X516 X517 X518 X519
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X520 X521 X522 X524 X529 X549 X551 X553 X554 X556 X557 X558 X559 X560 X561
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X565 X568 X571 X573 X574 X576 X577 X590 X591 X592 X593 X594 X595 X597 X598
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X599 X600 X601 X602 X603 X604 X613 X621 X679 X698 X699 X700 X701 X702 X703
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X704 X705 X719 X732 X733 X750 X751 X752 X753 X754 X755 X773 X774 X775 X776
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X780 X782 X792 X793 X795 X798 X800 X801 X805 X806 X812 X813
## 1 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0
ncol(filtered_fingerprints)
## [1] 388
We can see that we are left with 388 predictors that have non zero variance.
First split the data into a train and test set. I will do a 70/30 train test split
set.seed(123) # for reproducibility
train_index <- createDataPartition(permeability, p = 0.7, list = FALSE)
X_train <- filtered_fingerprints[train_index, ]
y_train <- permeability[train_index]
X_test <- filtered_fingerprints[-train_index, ]
y_test <- permeability[-train_index]
dim(X_train)
## [1] 117 388
length(y_train)
## [1] 117
dim(X_test)
## [1] 48 388
length(y_test)
## [1] 48
Now we have a training and test set so lets preprocess the data if needed.
summary(X_train)
## X1 X2 X3 X4
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.265 Mean :0.2564 Mean :0.2393 Mean :0.2393
## 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X5 X6 X11 X12
## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.000 Median :0.0000 Median :1.0000
## Mean :0.2393 Mean :0.453 Mean :0.1538 Mean :0.7179
## 3rd Qu.:0.0000 3rd Qu.:1.000 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.000 Max. :1.0000 Max. :1.0000
## X15 X16 X20 X21
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1111 Mean :0.2821 Mean :0.2393 Mean :0.2393
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X25 X26 X27 X28
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.2991 Mean :0.2393 Mean :0.2393 Mean :0.2393
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X29 X35 X36 X37
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.2393 Mean :0.2479 Mean :0.1966 Mean :0.2991
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X38 X39 X40 X41
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.000
## Mean :0.2393 Mean :0.2393 Mean :0.2393 Mean :0.359
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.000
## X42 X43 X44 X46
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.3504 Mean :0.3504 Mean :0.3504 Mean :0.2393
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X47 X48 X49 X50
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.2393 Mean :0.2222 Mean :0.2991 Mean :0.2393
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X51 X52 X53 X54
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.359 Mean :0.3504 Mean :0.3504 Mean :0.2393
## 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X55 X56 X57 X58
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.2393 Mean :0.2393 Mean :0.2222 Mean :0.2222
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X59 X60 X61 X62
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.2393 Mean :0.2222 Mean :0.2222 Mean :0.2393
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X63 X64 X65 X66
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.2393 Mean :0.2393 Mean :0.2393 Mean :0.2222
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X67 X68 X69 X70
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.2222 Mean :0.2222 Mean :0.2222 Mean :0.2393
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X71 X72 X73 X74
## Min. :0.0000 Min. :0.0000 Min. :0.000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.0000 Median :0.0000 Median :0.000 Median :0.000
## Mean :0.2991 Mean :0.2991 Mean :0.265 Mean :0.265
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :1.0000 Max. :1.0000 Max. :1.000 Max. :1.000
## X75 X76 X78 X79
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.0000
## Median :0.000 Median :0.000 Median :0.000 Median :0.0000
## Mean :0.265 Mean :0.265 Mean :0.359 Mean :0.3504
## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.0000
## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.0000
## X80 X86 X87 X88
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000 Median :1.0000 Median :0.0000 Median :0.0000
## Mean :0.359 Mean :0.7009 Mean :0.4359 Mean :0.1624
## 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X93 X94 X96 X97
## Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:1.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.00000 Median :1.0000 Median :1.0000
## Mean :0.2564 Mean :0.06838 Mean :0.7949 Mean :0.5299
## 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.0000
## X98 X99 X101 X102
## Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.00000 Median :1.0000 Median :0.0000
## Mean :0.2564 Mean :0.09402 Mean :0.7009 Mean :0.4188
## 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.0000
## X103 X108 X111 X118
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.00000
## Mean :0.1709 Mean :0.1709 Mean :0.4444 Mean :0.06838
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000
## X121 X125 X126 X127
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.3419 Mean :0.1538 Mean :0.07692 Mean :0.07692
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## X129 X130 X133 X138
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1709 Mean :0.1538 Mean :0.2821 Mean :0.2821
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X141 X142 X143 X146
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :1.0000 Median :0.00000
## Mean :0.2991 Mean :0.2393 Mean :0.8803 Mean :0.05983
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000
## X150 X152 X153 X154
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.5983 Mean :0.6239 Mean :0.6154 Mean :0.6154
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X156 X157 X158 X159
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.7863 Mean :0.8034 Mean :0.5641 Mean :0.5214
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X162 X163 X167 X168
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.6239 Mean :0.6154 Mean :0.6154 Mean :0.6154
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X169 X170 X171 X172
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.6154 Mean :0.6154 Mean :0.9402 Mean :0.6154
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X173 X174 X175 X176
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.6154 Mean :0.6154 Mean :0.6068 Mean :0.6068
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X177 X178 X179 X180
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.6068 Mean :0.6068 Mean :0.6239 Mean :0.6154
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X181 X182 X183 X184
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.5983 Mean :0.6325 Mean :0.6154 Mean :0.6068
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X185 X186 X187 X188
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.6068 Mean :0.6068 Mean :0.6239 Mean :0.6154
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X189 X190 X191 X192
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.6154 Mean :0.5983 Mean :0.5983 Mean :0.6154
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X193 X194 X195 X196
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.6068 Mean :0.6068 Mean :0.6239 Mean :0.6239
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X197 X198 X199 X200
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.6239 Mean :0.6154 Mean :0.6154 Mean :0.5983
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X201 X202 X203 X204
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.5983 Mean :0.5983 Mean :0.5983 Mean :0.6154
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X205 X206 X207 X208
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.6154 Mean :0.6325 Mean :0.6325 Mean :0.5983
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X209 X210 X211 X212
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.5983 Mean :0.5983 Mean :0.5983 Mean :0.6068
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X213 X214 X215 X221
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.6068 Mean :0.6068 Mean :0.9402 Mean :0.7436
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X223 X224 X225 X226
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:1.0000
## Median :1.0000 Median :1.0000 Median :0.00000 Median :1.0000
## Mean :0.7436 Mean :0.7436 Mean :0.04274 Mean :0.7949
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000
## X227 X228 X229 X230
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :0.0000
## Mean :0.7949 Mean :0.7949 Mean :0.6325 Mean :0.2393
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X231 X232 X233 X234
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :1.000 Median :1.000 Median :1.000 Median :1.000
## Mean :0.812 Mean :0.812 Mean :0.812 Mean :0.812
## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000
## X235 X236 X237 X238
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :0.0000
## Mean :0.5726 Mean :0.6325 Mean :0.5556 Mean :0.2821
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X239 X240 X241 X242
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:1.0000 1st Qu.:1.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :0.0000
## Mean :0.8034 Mean :0.8034 Mean :0.7607 Mean :0.4701
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X244 X245 X246 X247
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:1.0000 1st Qu.:1.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.8034 Mean :0.8034 Mean :0.8034 Mean :0.7265
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X248 X249 X250 X251
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.000
## Median :1.0000 Median :1.0000 Median :0.0000 Median :0.000
## Mean :0.5812 Mean :0.7607 Mean :0.4701 Mean :0.188
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.000
## X253 X254 X255 X256
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.8034 Mean :0.8034 Mean :0.7265 Mean :0.5812
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X257 X258 X260 X261
## Min. :0.000 Min. :0.00000 Min. :0.000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.00000 1st Qu.:0.000 1st Qu.:1.0000
## Median :1.000 Median :0.00000 Median :1.000 Median :1.0000
## Mean :0.547 Mean :0.09402 Mean :0.735 Mean :0.7607
## 3rd Qu.:1.000 3rd Qu.:0.00000 3rd Qu.:1.000 3rd Qu.:1.0000
## Max. :1.000 Max. :1.00000 Max. :1.000 Max. :1.0000
## X262 X263 X264 X265
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
## Median :1.0000 Median :1.0000 Median :0.0000 Median :1.000
## Mean :0.7521 Mean :0.6667 Mean :0.3846 Mean :0.735
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.000
## X266 X267 X268 X269
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.7521 Mean :0.5128 Mean :0.6752 Mean :0.5214
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X270 X271 X272 X274
## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.000 Median :0.0000 Median :1.0000
## Mean :0.4786 Mean :0.359 Mean :0.4103 Mean :0.6154
## 3rd Qu.:1.0000 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.000 Max. :1.0000 Max. :1.0000
## X276 X278 X279 X280
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1709 Mean :0.1795 Mean :0.1795 Mean :0.1795
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X281 X284 X285 X286
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1795 Mean :0.1795 Mean :0.1795 Mean :0.1795
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X290 X291 X293 X294
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.1795 Mean :0.1795 Mean :0.07692 Mean :0.07692
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## X295 X296 X297 X298
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.08547 Mean :0.2991 Mean :0.1282 Mean :0.1795
## 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X299 X300 X301 X302
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1795 Mean :0.1795 Mean :0.2991 Mean :0.2991
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X303 X304 X305 X306
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.00000
## Mean :0.1282 Mean :0.1795 Mean :0.1795 Mean :0.05983
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000
## X307 X308 X309 X310
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0.00000 Median :0.0000 Median :0.0000
## Mean :0.07692 Mean :0.07692 Mean :0.1197 Mean :0.1709
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.0000
## X311 X312 X313 X314
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.2222 Mean :0.2735 Mean :0.2479 Mean :0.2821
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X315 X316 X317 X318
## Min. :0.000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :1.000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.641 Mean :0.08547 Mean :0.08547 Mean :0.08547
## 3rd Qu.:1.000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X319 X320 X321 X322
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.188 Mean :0.1795 Mean :0.1795 Mean :0.2308
## 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X323 X324 X325 X326
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.2051 Mean :0.2051 Mean :0.2735 Mean :0.2479
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X327 X328 X329 X330
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.2479 Mean :0.2479 Mean :0.1368 Mean :0.2479
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X331 X332 X333 X334
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.00000
## Mean :0.2479 Mean :0.2479 Mean :0.2479 Mean :0.09402
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000
## X335 X336 X337 X338
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.000
## Mean :0.1368 Mean :0.1026 Mean :0.07692 Mean :0.359
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:1.000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.000
## X339 X340 X341 X342
## Min. :0.000 Min. :0.0000 Min. :0.000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000
## Median :0.000 Median :0.0000 Median :0.000 Median :0.0000
## Mean :0.359 Mean :0.1197 Mean :0.359 Mean :0.3419
## 3rd Qu.:1.000 3rd Qu.:0.0000 3rd Qu.:1.000 3rd Qu.:1.0000
## Max. :1.000 Max. :1.0000 Max. :1.000 Max. :1.0000
## X343 X344 X345 X355
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.000
## Mean :0.3419 Mean :0.1197 Mean :0.05983 Mean :0.265
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:1.000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.000
## X356 X357 X358 X359
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.000
## Mean :0.2821 Mean :0.2308 Mean :0.2735 Mean :0.188
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.000
## X360 X361 X362 X366
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.188 Mean :0.1282 Mean :0.1709 Mean :0.1966
## 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X367 X368 X370 X371
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.2308 Mean :0.2821 Mean :0.2308 Mean :0.2308
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X372 X373 X374 X376
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.0000 Median :0.00000
## Mean :0.06838 Mean :0.06838 Mean :0.1282 Mean :0.07692
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.00000
## X377 X378 X380 X381
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.08547 Mean :0.08547 Mean :0.08547 Mean :0.08547
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X382 X383 X385 X386
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.08547 Mean :0.08547 Mean :0.08547 Mean :0.08547
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X387 X388 X389 X390
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.08547 Mean :0.08547 Mean :0.08547 Mean :0.08547
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X392 X394 X395 X396
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.08547 Mean :0.08547 Mean :0.08547 Mean :0.08547
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X398 X400 X401 X403
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.08547 Mean :0.08547 Mean :0.08547 Mean :0.08547
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X406 X496 X497 X499
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.08547 Mean :0.1197 Mean :0.1282 Mean :0.1368
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X503 X504 X505 X506
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1197 Mean :0.2479 Mean :0.2479 Mean :0.2479
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X507 X508 X509 X510
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1795 Mean :0.1709 Mean :0.2222 Mean :0.1026
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X511 X512 X514 X515
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.08547 Mean :0.1111 Mean :0.08547 Mean :0.08547
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## X516 X517 X518 X519
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.1026 Mean :0.1026 Mean :0.08547 Mean :0.09402
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## X520 X521 X522 X524
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.0000
## Mean :0.1111 Mean :0.1111 Mean :0.08547 Mean :0.1026
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000
## X529 X549 X551 X553
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.08547 Mean :0.2222 Mean :0.05128 Mean :0.05128
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## X554 X556 X557 X558
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.05128 Mean :0.2222 Mean :0.2222 Mean :0.2222
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X559 X560 X561 X565
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.0000
## Mean :0.2222 Mean :0.2222 Mean :0.05983 Mean :0.2222
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000
## X568 X571 X573 X574
## Min. :0.00000 Min. :0.000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.00000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.00000 Median :0.000 Median :0.0000 Median :0.000
## Mean :0.05128 Mean :0.188 Mean :0.2137 Mean :0.188
## 3rd Qu.:0.00000 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.000
## Max. :1.00000 Max. :1.000 Max. :1.0000 Max. :1.000
## X576 X577 X590 X591
## Min. :0.000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.188 Mean :0.1111 Mean :0.07692 Mean :0.07692
## 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## X592 X593 X594 X595
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.09402 Mean :0.09402 Mean :0.09402 Mean :0.07692
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X597 X598 X599 X600
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1197 Mean :0.1282 Mean :0.1709 Mean :0.1197
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X601 X602 X603 X604
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1709 Mean :0.1795 Mean :0.1709 Mean :0.1197
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X613 X621 X679 X698
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.05128 Mean :0.05128 Mean :0.09402 Mean :0.04274
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X699 X700 X701 X702
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.08547 Mean :0.1111 Mean :0.1111 Mean :0.1111
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## X703 X704 X705 X719
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.0000 Median :0.00000
## Mean :0.08547 Mean :0.07692 Mean :0.1111 Mean :0.04274
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.00000
## X732 X733 X750 X751
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.05128 Mean :0.05128 Mean :0.05983 Mean :0.08547
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X752 X753 X754 X755
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.05983 Mean :0.05983 Mean :0.08547 Mean :0.08547
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X773 X774 X775 X776
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.08547 Mean :0.08547 Mean :0.08547 Mean :0.08547
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X780 X782 X792 X793
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.05128 Mean :0.05128 Mean :0.05128 Mean :0.05128
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X795 X798 X800 X801
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.06838 Mean :0.06838 Mean :0.05128 Mean :0.05128
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## X805 X806 X812 X813
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.06838 Mean :0.05128 Mean :0.05983 Mean :0.05128
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
All 388 of our variables are binary predictors so we do not need to do any pre processing or normalizing.
Now lets tune a PLS model
ctrl <- trainControl(method = "cv", number = 10)
pls_model <- train(
x = X_train,
y = y_train,
method = "pls",
tuneLength = 30, # try up to 30 latent variables
trControl = ctrl)
plot(pls_model)
pls_model$results
## ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 1 13.99057 0.2632922 10.990276 2.666266 0.2042171 2.329243
## 2 2 12.36473 0.3927908 8.739689 3.094271 0.2891991 2.228934
## 3 3 12.28337 0.4203126 8.988762 3.219348 0.2831496 2.259523
## 4 4 12.61118 0.4200091 9.744753 3.367148 0.2821755 2.064688
## 5 5 12.47332 0.4398682 9.363258 3.937844 0.2764332 2.876394
## 6 6 12.27722 0.4454306 9.154514 4.328274 0.2746557 3.228768
## 7 7 12.42897 0.4293957 9.409811 4.435112 0.2695746 3.368292
## 8 8 12.54118 0.4303471 9.644543 4.060634 0.2725991 3.160871
## 9 9 13.15973 0.4013557 10.178219 4.148851 0.2794787 3.160699
## 10 10 13.46107 0.3842828 10.324400 4.188993 0.2740585 3.153917
## 11 11 13.92600 0.3705277 10.589860 4.267311 0.2761292 3.272115
## 12 12 14.22097 0.3555753 10.756449 4.276586 0.2833077 3.225461
## 13 13 14.59459 0.3399373 11.034311 4.142828 0.2891632 3.002470
## 14 14 15.14949 0.3132850 11.297229 3.985753 0.2880336 2.920489
## 15 15 15.52003 0.2982553 11.538648 3.962501 0.2860026 2.716527
## 16 16 15.91983 0.2824934 11.888595 3.916828 0.2843008 2.538151
## 17 17 16.35493 0.2673083 12.270187 3.725130 0.2769962 2.536216
## 18 18 16.67870 0.2619140 12.485529 3.867268 0.2722359 2.626051
## 19 19 16.78528 0.2610212 12.582949 4.032575 0.2756357 2.652309
## 20 20 17.02744 0.2493200 12.804788 4.134513 0.2804832 2.713170
## 21 21 16.95746 0.2565461 12.823946 3.816953 0.2750488 2.398808
## 22 22 17.01795 0.2571057 12.857521 3.827173 0.2725787 2.486868
## 23 23 17.06730 0.2542541 12.951830 3.577713 0.2646589 2.465520
## 24 24 17.39181 0.2481465 13.124719 3.647713 0.2574169 2.548013
## 25 25 17.65716 0.2380617 13.262988 3.727740 0.2541769 2.661649
## 26 26 17.89782 0.2361610 13.449007 3.805809 0.2475813 2.783008
## 27 27 18.30201 0.2275197 13.704734 3.662568 0.2400155 2.673213
## 28 28 18.56365 0.2213178 13.978130 3.678705 0.2344072 2.710218
## 29 29 18.75258 0.2173062 14.054935 3.687866 0.2331955 2.792637
## 30 30 18.97718 0.2139653 14.242906 3.567992 0.2329017 2.761886
Now lets find the optimal number of components and corresponding R squared.
pls_model$bestTune$ncomp
## [1] 6
pls_model$results %>%
filter(ncomp == 8)
## ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 8 12.54118 0.4303471 9.644543 4.060634 0.2725991 3.160871
We can see that 8 components are optimal we can tell this by looking at the plot because 9 components is where the RMSE starts to rise again we could use 3 components potentially as that does give the lowest RMSE but including 8 gives a more info to the model and still gives a low RMSE. The R squared associated with this number of components is 0.453212.
Predict the response for the test set. What is the test set estimate of R2?
pls_predictions <- predict(pls_model, newdata = X_test)
pls_test_values <- data.frame(obs = y_test,pred = pls_predictions)
defaultSummary(pls_test_values)
## RMSE Rsquared MAE
## 9.604105 0.578010 7.116708
We get a test set Rsquared of about 0.45.
Try building other models discussed in this chapter. Do any have better predictive performance?
Lets try a lm first
X_train_lm <- as.data.frame(X_train)
X_test_lm <- as.data.frame(X_test)
lm_model <- lm(y_train ~.,data=X_train_lm)
summary(lm_model)
##
## Call:
## lm(formula = y_train ~ ., data = X_train_lm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.992 -1.436 0.000 1.496 16.992
##
## Coefficients: (301 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.75781 16.68495 1.304 0.20248
## X1 91.05271 39.12365 2.327 0.02714 *
## X2 -26.32381 37.39691 -0.704 0.48711
## X3 107.66697 57.87925 1.860 0.07303 .
## X4 NA NA NA NA
## X5 NA NA NA NA
## X6 5.31207 3.24128 1.639 0.11204
## X11 -24.72696 15.54502 -1.591 0.12253
## X12 17.23917 18.80956 0.917 0.36696
## X15 -26.73304 21.52307 -1.242 0.22416
## X16 10.38152 8.10295 1.281 0.21027
## X20 NA NA NA NA
## X21 NA NA NA NA
## X25 -16.34576 12.39314 -1.319 0.19751
## X26 NA NA NA NA
## X27 NA NA NA NA
## X28 NA NA NA NA
## X29 NA NA NA NA
## X35 -16.44797 22.65900 -0.726 0.47372
## X36 10.04017 15.52915 0.647 0.52302
## X37 NA NA NA NA
## X38 NA NA NA NA
## X39 NA NA NA NA
## X40 NA NA NA NA
## X41 -41.04600 26.66559 -1.539 0.13458
## X42 -1.79470 33.73087 -0.053 0.95793
## X43 NA NA NA NA
## X44 NA NA NA NA
## X46 NA NA NA NA
## X47 NA NA NA NA
## X48 -117.99150 58.67730 -2.011 0.05372 .
## X49 NA NA NA NA
## X50 NA NA NA NA
## X51 NA NA NA NA
## X52 NA NA NA NA
## X53 NA NA NA NA
## X54 NA NA NA NA
## X55 NA NA NA NA
## X56 NA NA NA NA
## X57 NA NA NA NA
## X58 NA NA NA NA
## X59 NA NA NA NA
## X60 NA NA NA NA
## X61 NA NA NA NA
## X62 NA NA NA NA
## X63 NA NA NA NA
## X64 NA NA NA NA
## X65 NA NA NA NA
## X66 NA NA NA NA
## X67 NA NA NA NA
## X68 NA NA NA NA
## X69 NA NA NA NA
## X70 NA NA NA NA
## X71 NA NA NA NA
## X72 NA NA NA NA
## X73 NA NA NA NA
## X74 NA NA NA NA
## X75 NA NA NA NA
## X76 NA NA NA NA
## X78 NA NA NA NA
## X79 NA NA NA NA
## X80 NA NA NA NA
## X86 -40.02772 38.08069 -1.051 0.30188
## X87 91.44258 53.80166 1.700 0.09991 .
## X88 25.08852 58.29007 0.430 0.67008
## X93 -1.54417 3.75482 -0.411 0.68391
## X94 -5.92987 59.59247 -0.100 0.92142
## X96 -14.63353 25.89929 -0.565 0.57641
## X97 NA NA NA NA
## X98 -35.39208 33.81212 -1.047 0.30387
## X99 10.46745 21.92733 0.477 0.63668
## X101 NA NA NA NA
## X102 -34.27179 39.62013 -0.865 0.39413
## X103 12.61535 22.22723 0.568 0.57470
## X108 NA NA NA NA
## X111 32.96925 39.04938 0.844 0.40542
## X118 -24.45919 12.75555 -1.918 0.06507 .
## X121 -24.05350 32.22390 -0.746 0.46141
## X125 84.16069 43.36335 1.941 0.06206 .
## X126 -20.28304 21.52307 -0.942 0.35378
## X127 NA NA NA NA
## X129 NA NA NA NA
## X130 NA NA NA NA
## X133 NA NA NA NA
## X138 NA NA NA NA
## X141 8.24932 3.79875 2.172 0.03821 *
## X142 NA NA NA NA
## X143 -23.38986 15.16888 -1.542 0.13393
## X146 -8.73395 79.12507 -0.110 0.91287
## X150 -53.51175 35.72791 -1.498 0.14500
## X152 NA NA NA NA
## X153 60.12744 36.70351 1.638 0.11219
## X154 NA NA NA NA
## X156 16.54078 14.89454 1.111 0.27590
## X157 62.41889 50.10856 1.246 0.22285
## X158 -10.61579 24.02420 -0.442 0.66185
## X159 4.44864 18.08771 0.246 0.80745
## X162 NA NA NA NA
## X163 NA NA NA NA
## X167 NA NA NA NA
## X168 NA NA NA NA
## X169 NA NA NA NA
## X170 NA NA NA NA
## X171 NA NA NA NA
## X172 NA NA NA NA
## X173 NA NA NA NA
## X174 NA NA NA NA
## X175 NA NA NA NA
## X176 NA NA NA NA
## X177 NA NA NA NA
## X178 NA NA NA NA
## X179 NA NA NA NA
## X180 NA NA NA NA
## X181 NA NA NA NA
## X182 -11.12379 14.68256 -0.758 0.45479
## X183 NA NA NA NA
## X184 NA NA NA NA
## X185 NA NA NA NA
## X186 NA NA NA NA
## X187 NA NA NA NA
## X188 NA NA NA NA
## X189 NA NA NA NA
## X190 NA NA NA NA
## X191 NA NA NA NA
## X192 NA NA NA NA
## X193 NA NA NA NA
## X194 NA NA NA NA
## X195 NA NA NA NA
## X196 NA NA NA NA
## X197 NA NA NA NA
## X198 NA NA NA NA
## X199 NA NA NA NA
## X200 NA NA NA NA
## X201 NA NA NA NA
## X202 NA NA NA NA
## X203 NA NA NA NA
## X204 NA NA NA NA
## X205 NA NA NA NA
## X206 NA NA NA NA
## X207 NA NA NA NA
## X208 NA NA NA NA
## X209 NA NA NA NA
## X210 NA NA NA NA
## X211 NA NA NA NA
## X212 NA NA NA NA
## X213 NA NA NA NA
## X214 NA NA NA NA
## X215 NA NA NA NA
## X221 32.59700 16.93457 1.925 0.06410 .
## X223 NA NA NA NA
## X224 NA NA NA NA
## X225 34.43026 41.76048 0.824 0.41640
## X226 -61.63237 28.66969 -2.150 0.04005 *
## X227 NA NA NA NA
## X228 NA NA NA NA
## X229 32.27830 50.46355 0.640 0.52743
## X230 -79.16118 47.50128 -1.667 0.10638
## X231 NA NA NA NA
## X232 NA NA NA NA
## X233 NA NA NA NA
## X234 NA NA NA NA
## X235 -22.95537 22.78222 -1.008 0.32198
## X236 NA NA NA NA
## X237 5.11373 25.14573 0.203 0.84027
## X238 94.28135 38.30746 2.461 0.02004 *
## X239 NA NA NA NA
## X240 NA NA NA NA
## X241 28.51292 38.95339 0.732 0.47006
## X242 -52.54391 34.64402 -1.517 0.14017
## X244 NA NA NA NA
## X245 NA NA NA NA
## X246 NA NA NA NA
## X247 -0.80546 29.28971 -0.027 0.97825
## X248 -21.23364 36.21432 -0.586 0.56219
## X249 NA NA NA NA
## X250 NA NA NA NA
## X251 14.76330 77.22922 0.191 0.84973
## X253 NA NA NA NA
## X254 NA NA NA NA
## X255 NA NA NA NA
## X256 NA NA NA NA
## X257 26.49963 30.70300 0.863 0.39517
## X258 -35.05909 20.15904 -1.739 0.09262 .
## X260 NA NA NA NA
## X261 NA NA NA NA
## X262 -33.80935 50.61650 -0.668 0.50945
## X263 NA NA NA NA
## X264 NA NA NA NA
## X265 NA NA NA NA
## X266 NA NA NA NA
## X267 NA NA NA NA
## X268 NA NA NA NA
## X269 NA NA NA NA
## X270 NA NA NA NA
## X271 NA NA NA NA
## X272 12.12996 25.34775 0.479 0.63585
## X274 NA NA NA NA
## X276 NA NA NA NA
## X278 -2.85268 4.74243 -0.602 0.55217
## X279 NA NA NA NA
## X280 -0.62424 7.51866 -0.083 0.93440
## X281 NA NA NA NA
## X284 NA NA NA NA
## X285 NA NA NA NA
## X286 NA NA NA NA
## X290 NA NA NA NA
## X291 NA NA NA NA
## X293 28.78680 18.63848 1.544 0.13332
## X294 NA NA NA NA
## X295 -30.06120 15.49440 -1.940 0.06214 .
## X296 NA NA NA NA
## X297 NA NA NA NA
## X298 NA NA NA NA
## X299 NA NA NA NA
## X300 NA NA NA NA
## X301 NA NA NA NA
## X302 NA NA NA NA
## X303 NA NA NA NA
## X304 NA NA NA NA
## X305 NA NA NA NA
## X306 -8.73133 12.46353 -0.701 0.48917
## X307 NA NA NA NA
## X308 NA NA NA NA
## X309 NA NA NA NA
## X310 NA NA NA NA
## X311 15.35786 16.64430 0.923 0.36377
## X312 -63.00108 34.06227 -1.850 0.07459 .
## X313 NA NA NA NA
## X314 NA NA NA NA
## X315 -7.76439 7.79402 -0.996 0.32739
## X316 9.89220 14.59697 0.678 0.50334
## X317 NA NA NA NA
## X318 NA NA NA NA
## X319 37.27258 33.85179 1.101 0.27993
## X320 NA NA NA NA
## X321 NA NA NA NA
## X322 NA NA NA NA
## X323 NA NA NA NA
## X324 NA NA NA NA
## X325 NA NA NA NA
## X326 NA NA NA NA
## X327 NA NA NA NA
## X328 NA NA NA NA
## X329 -27.63501 21.83687 -1.266 0.21576
## X330 NA NA NA NA
## X331 NA NA NA NA
## X332 NA NA NA NA
## X333 NA NA NA NA
## X334 18.27825 16.70182 1.094 0.28279
## X335 NA NA NA NA
## X336 NA NA NA NA
## X337 -36.88771 21.11743 -1.747 0.09126 .
## X338 -5.21076 18.19264 -0.286 0.77659
## X339 NA NA NA NA
## X340 -29.35301 16.98895 -1.728 0.09467 .
## X341 NA NA NA NA
## X342 39.84663 17.37555 2.293 0.02927 *
## X343 NA NA NA NA
## X344 NA NA NA NA
## X345 -4.38150 10.09999 -0.434 0.66763
## X355 NA NA NA NA
## X356 NA NA NA NA
## X357 -20.79701 17.79084 -1.169 0.25193
## X358 -33.67432 11.49169 -2.930 0.00654 **
## X359 -26.47488 26.01400 -1.018 0.31723
## X360 NA NA NA NA
## X361 -8.78529 10.88965 -0.807 0.42637
## X362 NA NA NA NA
## X366 NA NA NA NA
## X367 NA NA NA NA
## X368 NA NA NA NA
## X370 26.99739 20.70275 1.304 0.20247
## X371 NA NA NA NA
## X372 NA NA NA NA
## X373 NA NA NA NA
## X374 -0.59805 7.52041 -0.080 0.93716
## X376 -12.63455 24.63548 -0.513 0.61193
## X377 NA NA NA NA
## X378 NA NA NA NA
## X380 NA NA NA NA
## X381 NA NA NA NA
## X382 NA NA NA NA
## X383 NA NA NA NA
## X385 NA NA NA NA
## X386 NA NA NA NA
## X387 NA NA NA NA
## X388 NA NA NA NA
## X389 NA NA NA NA
## X390 NA NA NA NA
## X392 NA NA NA NA
## X394 NA NA NA NA
## X395 NA NA NA NA
## X396 NA NA NA NA
## X398 NA NA NA NA
## X400 NA NA NA NA
## X401 NA NA NA NA
## X403 NA NA NA NA
## X406 NA NA NA NA
## X496 6.79975 9.83342 0.691 0.49475
## X497 NA NA NA NA
## X499 NA NA NA NA
## X503 -20.91595 19.74615 -1.059 0.29823
## X504 NA NA NA NA
## X505 NA NA NA NA
## X506 NA NA NA NA
## X507 14.12446 16.88804 0.836 0.40979
## X508 NA NA NA NA
## X509 11.71199 20.24931 0.578 0.56747
## X510 NA NA NA NA
## X511 NA NA NA NA
## X512 NA NA NA NA
## X514 NA NA NA NA
## X515 NA NA NA NA
## X516 NA NA NA NA
## X517 NA NA NA NA
## X518 NA NA NA NA
## X519 NA NA NA NA
## X520 NA NA NA NA
## X521 NA NA NA NA
## X522 NA NA NA NA
## X524 NA NA NA NA
## X529 NA NA NA NA
## X549 NA NA NA NA
## X551 NA NA NA NA
## X553 NA NA NA NA
## X554 NA NA NA NA
## X556 NA NA NA NA
## X557 NA NA NA NA
## X558 NA NA NA NA
## X559 NA NA NA NA
## X560 NA NA NA NA
## X561 NA NA NA NA
## X565 NA NA NA NA
## X568 NA NA NA NA
## X571 -19.80546 14.51603 -1.364 0.18294
## X573 NA NA NA NA
## X574 NA NA NA NA
## X576 NA NA NA NA
## X577 NA NA NA NA
## X590 NA NA NA NA
## X591 NA NA NA NA
## X592 NA NA NA NA
## X593 NA NA NA NA
## X594 NA NA NA NA
## X595 NA NA NA NA
## X597 NA NA NA NA
## X598 NA NA NA NA
## X599 NA NA NA NA
## X600 NA NA NA NA
## X601 NA NA NA NA
## X602 NA NA NA NA
## X603 NA NA NA NA
## X604 NA NA NA NA
## X613 NA NA NA NA
## X621 NA NA NA NA
## X679 NA NA NA NA
## X698 NA NA NA NA
## X699 NA NA NA NA
## X700 NA NA NA NA
## X701 NA NA NA NA
## X702 NA NA NA NA
## X703 NA NA NA NA
## X704 NA NA NA NA
## X705 NA NA NA NA
## X719 NA NA NA NA
## X732 -5.60363 5.92410 -0.946 0.35201
## X733 NA NA NA NA
## X750 -0.02374 12.62240 -0.002 0.99851
## X751 NA NA NA NA
## X752 NA NA NA NA
## X753 NA NA NA NA
## X754 NA NA NA NA
## X755 NA NA NA NA
## X773 NA NA NA NA
## X774 NA NA NA NA
## X775 NA NA NA NA
## X776 NA NA NA NA
## X780 NA NA NA NA
## X782 NA NA NA NA
## X792 NA NA NA NA
## X793 NA NA NA NA
## X795 NA NA NA NA
## X798 NA NA NA NA
## X800 NA NA NA NA
## X801 NA NA NA NA
## X805 NA NA NA NA
## X806 NA NA NA NA
## X812 NA NA NA NA
## X813 NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.669 on 29 degrees of freedom
## Multiple R-squared: 0.9437, Adjusted R-squared: 0.7747
## F-statistic: 5.584 on 87 and 29 DF, p-value: 1.027e-06
lm_predictions <- predict(lm_model, newdata = X_test_lm)
## Warning in predict.lm(lm_model, newdata = X_test_lm): prediction from
## rank-deficient fit; attr(*, "non-estim") has doubtful cases
lm_test_values <- data.frame(obs = y_test,pred = lm_predictions)
defaultSummary(lm_test_values)
## RMSE Rsquared MAE
## 30.3087993 0.2130654 20.4964880
The lm model gives a Rsquared of 0.21 not an improvement compared to the PLS model.
Lets try robust ridge regression
library(elasticnet)
## Warning: package 'elasticnet' was built under R version 4.3.3
## Loading required package: lars
## Warning: package 'lars' was built under R version 4.3.3
## Loaded lars 1.3
ridge_grid <- data.frame(.lambda = seq(0, .1, length = 15))
set.seed(100)
ridgeRegFit <- train(X_train_lm, y_train, method = "ridge",tuneGrid = ridge_grid,trControl = ctrl)
ridgeRegFit
## Ridge Regression
##
## 117 samples
## 388 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 105, 106, 105, 105, 105, 106, ...
## Resampling results across tuning parameters:
##
## lambda RMSE Rsquared MAE
## 0.000000000 3.420508e+20 0.1511077 1.852177e+20
## 0.007142857 5.488365e+03 0.2478913 3.583052e+03
## 0.014285714 1.668384e+01 0.3015268 1.243951e+01
## 0.021428571 1.603268e+01 0.3181464 1.204384e+01
## 0.028571429 1.565874e+01 0.3292670 1.181993e+01
## 0.035714286 1.541853e+01 0.3378542 1.167494e+01
## 0.042857143 1.519109e+01 0.3455063 1.147924e+01
## 0.050000000 1.504670e+01 0.3514188 1.136776e+01
## 0.057142857 1.498083e+01 0.3530273 1.128466e+01
## 0.064285714 1.481946e+01 0.3620584 1.118037e+01
## 0.071428571 1.473146e+01 0.3664619 1.110786e+01
## 0.078571429 1.466120e+01 0.3704889 1.104677e+01
## 0.085714286 1.459697e+01 0.3743362 1.099071e+01
## 0.092857143 1.454235e+01 0.3778743 1.094360e+01
## 0.100000000 1.451263e+01 0.3807330 1.090971e+01
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was lambda = 0.1.
ridge_preds <- predict(ridgeRegFit, newdata = X_test)
ridge_results_test <- data.frame(obs = y_test,pred = ridge_preds)
defaultSummary(ridge_results_test)
## RMSE Rsquared MAE
## 11.2148255 0.5311375 8.7431850
The tuned ridge model does give a better R squared of 0.53 but the RMSE is slightly higher than the PLS model.
Lets try an elastic net model.
enetGrid <- expand.grid(.lambda = c(0, 0.01, .1),.fraction = seq(.05, 1, length = 20))
set.seed(100)
enetTune <- train(X_train_lm, y_train,method = "enet",tuneGrid = enetGrid,trControl = ctrl)
enetTune
## Elasticnet
##
## 117 samples
## 388 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 105, 106, 105, 105, 105, 106, ...
## Resampling results across tuning parameters:
##
## lambda fraction RMSE Rsquared MAE
## 0.00 0.05 1.711668e+19 0.2298755 9.263289e+18
## 0.00 0.10 3.423267e+19 0.1865586 1.852815e+19
## 0.00 0.15 5.133658e+19 0.1635670 2.778953e+19
## 0.00 0.20 6.843741e+19 0.1532548 3.705001e+19
## 0.00 0.25 8.553824e+19 0.1455426 4.631049e+19
## 0.00 0.30 1.026391e+20 0.1400844 5.557097e+19
## 0.00 0.35 1.197399e+20 0.1358755 6.483145e+19
## 0.00 0.40 1.368407e+20 0.1336335 7.409193e+19
## 0.00 0.45 1.539416e+20 0.1324509 8.335241e+19
## 0.00 0.50 1.710424e+20 0.1322443 9.261289e+19
## 0.00 0.55 1.881432e+20 0.1326111 1.018734e+20
## 0.00 0.60 2.052441e+20 0.1330571 1.111339e+20
## 0.00 0.65 2.223449e+20 0.1333027 1.203943e+20
## 0.00 0.70 2.394458e+20 0.1322226 1.296548e+20
## 0.00 0.75 2.565466e+20 0.1312588 1.389153e+20
## 0.00 0.80 2.736475e+20 0.1312852 1.481758e+20
## 0.00 0.85 2.907483e+20 0.1368224 1.574363e+20
## 0.00 0.90 3.078491e+20 0.1425114 1.666967e+20
## 0.00 0.95 3.249500e+20 0.1470285 1.759572e+20
## 0.00 1.00 3.420508e+20 0.1511077 1.852177e+20
## 0.01 0.05 1.172782e+01 0.5088716 8.659022e+00
## 0.01 0.10 1.195538e+01 0.4768486 8.923971e+00
## 0.01 0.15 1.232927e+01 0.4433672 9.321255e+00
## 0.01 0.20 1.258429e+01 0.4269542 9.477163e+00
## 0.01 0.25 1.300489e+01 0.4092402 9.756145e+00
## 0.01 0.30 1.351828e+01 0.3893960 1.014309e+01
## 0.01 0.35 1.402245e+01 0.3697915 1.050417e+01
## 0.01 0.40 1.437275e+01 0.3573559 1.078588e+01
## 0.01 0.45 1.468341e+01 0.3491428 1.102210e+01
## 0.01 0.50 1.497927e+01 0.3425462 1.123914e+01
## 0.01 0.55 1.529098e+01 0.3336770 1.144284e+01
## 0.01 0.60 1.562572e+01 0.3224807 1.167871e+01
## 0.01 0.65 1.596847e+01 0.3119691 1.190944e+01
## 0.01 0.70 1.625201e+01 0.3050226 1.212393e+01
## 0.01 0.75 1.650293e+01 0.3002499 1.228912e+01
## 0.01 0.80 1.671373e+01 0.2971098 1.241612e+01
## 0.01 0.85 1.692242e+01 0.2930528 1.253676e+01
## 0.01 0.90 1.711738e+01 0.2894612 1.267407e+01
## 0.01 0.95 1.720204e+01 0.2886183 1.273618e+01
## 0.01 1.00 1.725984e+01 0.2883368 1.277181e+01
## 0.10 0.05 1.241968e+01 0.5109244 9.698252e+00
## 0.10 0.10 1.157462e+01 0.5189631 8.349230e+00
## 0.10 0.15 1.174377e+01 0.4996312 8.602587e+00
## 0.10 0.20 1.201987e+01 0.4802363 8.842522e+00
## 0.10 0.25 1.229078e+01 0.4606681 9.143619e+00
## 0.10 0.30 1.251363e+01 0.4442865 9.357213e+00
## 0.10 0.35 1.267395e+01 0.4356267 9.505066e+00
## 0.10 0.40 1.288603e+01 0.4261813 9.692488e+00
## 0.10 0.45 1.310680e+01 0.4184598 9.872842e+00
## 0.10 0.50 1.335035e+01 0.4096525 1.004887e+01
## 0.10 0.55 1.356841e+01 0.4019001 1.021269e+01
## 0.10 0.60 1.376548e+01 0.3953202 1.036546e+01
## 0.10 0.65 1.392382e+01 0.3906756 1.048748e+01
## 0.10 0.70 1.403235e+01 0.3885828 1.057177e+01
## 0.10 0.75 1.412779e+01 0.3872153 1.064735e+01
## 0.10 0.80 1.421758e+01 0.3858977 1.070786e+01
## 0.10 0.85 1.430631e+01 0.3845511 1.077067e+01
## 0.10 0.90 1.437890e+01 0.3835340 1.081909e+01
## 0.10 0.95 1.444966e+01 0.3821148 1.086451e+01
## 0.10 1.00 1.451263e+01 0.3807330 1.090971e+01
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were fraction = 0.1 and lambda = 0.1.
enet_preds <- predict(enetTune, newdata = X_test_lm)
enet_results_test <- data.frame(obs = y_test,pred = enet_preds)
defaultSummary(enet_results_test)
## RMSE Rsquared MAE
## 11.4641223 0.3528985 7.4471473
We get a test R squared of 0.35 which is not better than the PLS or ridge.
Yes, I would recommend the ridge model with lambda 0.1 for the experiment because it gave an improved RSquared and even tho the RMSE was slightly higher because of the higher RSquared I think it could be a better model.
A chemical manufacturing process for a pharmaceutical product was discussed in Sect.1.4. In this problem, the objective is to understand the relationship between biological measurements of the raw materials(predictors), measurements of the manufacturing process (predictors), and the response of product yield. Biological predictors cannot be changed but can be used to assess the quality of the raw material before processing. On the other hand, manufacturing process predictors can be changed in the manufacturing process. Improving product yield by 1% will boost revenue by approximately one hundred thousand dollars per batch:
Start R and use these commands to load the data:
library(AppliedPredictiveModeling) data(chemicalManufacturing)
The matrix processPredictors contains the 57 predictors (12 describing the input biological material and 45 describing the process predictors) for the 176 manufacturing runs. yield contains the percent yield for each run.
data(ChemicalManufacturingProcess)
A small percentage of cells in the predictor set contain missing values. Use an imputation function to fill in these missing values (e.g., see Sect.3.8).
sum(is.na(ChemicalManufacturingProcess))
## [1] 106
colSums(is.na(ChemicalManufacturingProcess))
## Yield BiologicalMaterial01 BiologicalMaterial02
## 0 0 0
## BiologicalMaterial03 BiologicalMaterial04 BiologicalMaterial05
## 0 0 0
## BiologicalMaterial06 BiologicalMaterial07 BiologicalMaterial08
## 0 0 0
## BiologicalMaterial09 BiologicalMaterial10 BiologicalMaterial11
## 0 0 0
## BiologicalMaterial12 ManufacturingProcess01 ManufacturingProcess02
## 0 1 3
## ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05
## 15 1 1
## ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08
## 2 1 1
## ManufacturingProcess09 ManufacturingProcess10 ManufacturingProcess11
## 0 9 10
## ManufacturingProcess12 ManufacturingProcess13 ManufacturingProcess14
## 1 0 1
## ManufacturingProcess15 ManufacturingProcess16 ManufacturingProcess17
## 0 0 0
## ManufacturingProcess18 ManufacturingProcess19 ManufacturingProcess20
## 0 0 0
## ManufacturingProcess21 ManufacturingProcess22 ManufacturingProcess23
## 0 1 1
## ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26
## 1 5 5
## ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29
## 5 5 5
## ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess32
## 5 5 0
## ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35
## 5 5 5
## ManufacturingProcess36 ManufacturingProcess37 ManufacturingProcess38
## 5 0 0
## ManufacturingProcess39 ManufacturingProcess40 ManufacturingProcess41
## 0 1 1
## ManufacturingProcess42 ManufacturingProcess43 ManufacturingProcess44
## 0 0 0
## ManufacturingProcess45
## 0
We have 106 missing values across numerous predictors. Lets use KNN to impute the data using the caret package preprocess function
# Create a pre-processing object to apply KNN imputation
preProc <- preProcess(ChemicalManufacturingProcess, method = "knnImpute")
# Apply the imputation to the data
imputed_data <- predict(preProc, ChemicalManufacturingProcess)
sum(is.na(imputed_data))
## [1] 0
head(imputed_data)
## Yield BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
## 1 -1.1792673 -0.2261036 -1.5140979 -2.68303622
## 2 1.2263678 2.2391498 1.3089960 -0.05623504
## 3 1.0042258 2.2391498 1.3089960 -0.05623504
## 4 0.6737219 2.2391498 1.3089960 -0.05623504
## 5 1.2534583 1.4827653 1.8939391 1.13594780
## 6 1.8386128 -0.4081962 0.6620886 -0.59859075
## BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
## 1 0.2201765 0.4941942 -1.3828880
## 2 1.2964386 0.4128555 1.1290767
## 3 1.2964386 0.4128555 1.1290767
## 4 1.2964386 0.4128555 1.1290767
## 5 0.9414412 -0.3734185 1.5348350
## 6 1.5894524 1.7305423 0.6192092
## BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
## 1 -0.1313107 -1.233131 -3.3962895
## 2 -0.1313107 2.282619 -0.7227225
## 3 -0.1313107 2.282619 -0.7227225
## 4 -0.1313107 2.282619 -0.7227225
## 5 -0.1313107 1.071310 -0.1205678
## 6 -0.1313107 1.189487 -1.7343424
## BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
## 1 1.1005296 -1.838655 -1.7709224
## 2 1.1005296 1.393395 1.0989855
## 3 1.1005296 1.393395 1.0989855
## 4 1.1005296 1.393395 1.0989855
## 5 0.4162193 0.136256 1.0989855
## 6 1.6346255 1.022062 0.7240877
## ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
## 1 0.2154105 0.5662872 0.3765810
## 2 -6.1497028 -1.9692525 0.1979962
## 3 -6.1497028 -1.9692525 0.1087038
## 4 -6.1497028 -1.9692525 0.4658734
## 5 -0.2784345 -1.9692525 0.1087038
## 6 0.4348971 -1.9692525 0.5551658
## ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
## 1 0.5655598 -0.44593467 -0.5414997
## 2 -2.3669726 0.99933318 0.9625383
## 3 -3.1638563 0.06246417 -0.1117745
## 4 -3.3232331 0.42279841 2.1850322
## 5 -2.2075958 0.84537219 -0.6304083
## 6 -1.2513352 0.49486525 0.5550403
## ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
## 1 -0.1596700 -0.3095182 -1.7201524
## 2 -0.9580199 0.8941637 0.5883746
## 3 1.0378549 0.8941637 -0.3815947
## 4 -0.9580199 -1.1119728 -0.4785917
## 5 1.0378549 0.8941637 -0.4527258
## 6 1.0378549 0.8941637 -0.2199332
## ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
## 1 -0.07700901 -0.09157342 -0.4806937
## 2 0.52297397 1.08204765 -0.4806937
## 3 0.31428424 0.55112383 -0.4806937
## 4 -0.02483658 0.80261406 -0.4806937
## 5 -0.39004361 0.10403009 -0.4806937
## 6 0.28819802 1.41736795 -0.4806937
## ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
## 1 0.97711512 0.8093999 1.1846438
## 2 -0.50030980 0.2775205 0.9617071
## 3 0.28765016 0.4425865 0.8245152
## 4 0.28765016 0.7910592 1.0817499
## 5 0.09066017 2.5334227 3.3282665
## 6 -0.50030980 2.4050380 3.1396277
## ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
## 1 0.3303945 0.9263296 0.1505348
## 2 0.1455765 -0.2753953 0.1559773
## 3 0.1455765 0.3655246 0.1831898
## 4 0.1967569 0.3655246 0.1695836
## 5 0.4754056 -0.3555103 0.2076811
## 6 0.6261033 -0.7560852 0.1423710
## ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
## 1 0.4563798 0.3109942 0.2109804
## 2 1.5095063 0.1849230 0.2109804
## 3 1.0926437 0.1849230 0.2109804
## 4 0.9829430 0.1562704 0.2109804
## 5 1.6192070 0.2938027 -0.6884239
## 6 1.9044287 0.3998171 -0.5599376
## ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
## 1 0.05833309 0.8317688 0.8907291
## 2 -0.72230090 -1.8147683 -1.0060115
## 3 -0.42205706 -1.2132826 -0.8335805
## 4 -0.12181322 -0.6117969 -0.6611496
## 5 0.77891831 0.5911745 1.5804530
## 6 1.07916216 -1.2132826 -1.3508734
## ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
## 1 0.1200183 0.1256347 0.3460352
## 2 0.1093082 0.1966227 0.1906613
## 3 0.1842786 0.2159831 0.2104362
## 4 0.1708910 0.2052273 0.1906613
## 5 0.2726365 0.2912733 0.3432102
## 6 0.1146633 0.2417969 0.3516852
## ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
## 1 0.7826636 0.5943242 0.7566948
## 2 0.8779201 0.8347250 0.7566948
## 3 0.8588688 0.7746248 0.2444430
## 4 0.8588688 0.7746248 0.2444430
## 5 0.8969714 0.9549255 -0.1653585
## 6 0.9160227 1.0150257 0.9615956
## ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
## 1 -0.1952552 -0.4568829 0.9890307
## 2 -0.2672523 1.9517531 0.9890307
## 3 -0.1592567 2.6928719 0.9890307
## 4 -0.1592567 2.3223125 1.7943843
## 5 -0.1412574 2.3223125 2.5997378
## 6 -0.3572486 2.6928719 2.5997378
## ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
## 1 -1.7202722 -0.88694718 -0.6557774
## 2 1.9568096 1.14638329 -0.6557774
## 3 1.9568096 1.23880740 -1.8000420
## 4 0.1182687 0.03729394 -1.8000420
## 5 0.1182687 -2.55058120 -2.9443066
## 6 0.1182687 -0.51725073 -1.8000420
## ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
## 1 -1.1540243 0.7174727 0.2317270
## 2 2.2161351 -0.8224687 0.2317270
## 3 -0.7046697 -0.8224687 0.2317270
## 4 0.4187168 -0.8224687 0.2317270
## 5 -1.8280562 -0.8224687 0.2981503
## 6 -1.3787016 -0.8224687 0.2317270
## ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
## 1 0.05969714 -0.06900773 0.20279570
## 2 2.14909691 2.34626280 -0.05472265
## 3 -0.46265281 -0.44058781 0.40881037
## 4 -0.46265281 -0.44058781 -0.31224099
## 5 -0.46265281 -0.44058781 -0.10622632
## 6 -0.46265281 -0.44058781 0.15129203
## ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
## 1 2.40564734 -0.01588055 0.64371849
## 2 -0.01374656 0.29467248 0.15220242
## 3 0.10146268 -0.01588055 0.39796046
## 4 0.21667191 -0.01588055 -0.09355562
## 5 0.21667191 -0.32643359 -0.09355562
## 6 1.48397347 -0.01588055 -0.33931365
Now we have no missing values in our dataset.
Split the data into a training and a test set, pre-process the data, and tune a model of your choice from this chapter. What is the optimal value of the performance metric?
First, split the data into a training and test set with a 70/30 split
set.seed(123)
train_index <- createDataPartition(imputed_data$Yield, p = 0.7, list = FALSE)
X_train <- imputed_data[train_index, ]
X_train <- X_train %>%
select(-Yield)
y_train <- imputed_data$Yield[train_index]
X_test <- imputed_data[-train_index, ] %>%
select(-Yield)
y_test <- imputed_data$Yield[-train_index]
head(X_test)
## BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
## 1 -0.2261036 -1.514098 -2.68303622
## 2 2.2391498 1.308996 -0.05623504
## 3 2.2391498 1.308996 -0.05623504
## 4 2.2391498 1.308996 -0.05623504
## 5 1.4827653 1.893939 1.13594780
## 10 0.7403878 1.960861 1.08846043
## BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
## 1 0.2201765 0.4941942 -1.382888
## 2 1.2964386 0.4128555 1.129077
## 3 1.2964386 0.4128555 1.129077
## 4 1.2964386 0.4128555 1.129077
## 5 0.9414412 -0.3734185 1.534835
## 10 1.8881010 0.4453910 1.550852
## BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
## 1 -0.1313107 -1.233131 -3.3962895
## 2 -0.1313107 2.282619 -0.7227225
## 3 -0.1313107 2.282619 -0.7227225
## 4 -0.1313107 2.282619 -0.7227225
## 5 -0.1313107 1.071310 -0.1205678
## 10 -0.1313107 2.001950 0.6742764
## BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
## 1 1.1005296 -1.838655 -1.770922
## 2 1.1005296 1.393395 1.098986
## 3 1.1005296 1.393395 1.098986
## 4 1.1005296 1.393395 1.098986
## 5 0.4162193 0.136256 1.098986
## 10 1.7514590 1.503343 1.616086
## ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
## 1 0.2154105 0.5662872 0.3765810
## 2 -6.1497028 -1.9692525 0.1979962
## 3 -6.1497028 -1.9692525 0.1087038
## 4 -6.1497028 -1.9692525 0.4658734
## 5 -0.2784345 -1.9692525 0.1087038
## 10 0.4348971 -1.9692525 0.4658734
## ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
## 1 0.5655598 -0.44593467 -0.5414997
## 2 -2.3669726 0.99933318 0.9625383
## 3 -3.1638563 0.06246417 -0.1117745
## 4 -3.3232331 0.42279841 2.1850322
## 5 -2.2075958 0.84537219 -0.6304083
## 10 0.9799394 0.06901570 0.8884478
## ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
## 1 -0.1596700 -0.3095182 -1.7201524
## 2 -0.9580199 0.8941637 0.5883746
## 3 1.0378549 0.8941637 -0.3815947
## 4 -0.9580199 -1.1119728 -0.4785917
## 5 1.0378549 0.8941637 -0.4527258
## 10 -0.9580199 -1.1119728 0.9375635
## ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
## 1 -0.07700901 -0.09157342 -0.4806937
## 2 0.52297397 1.08204765 -0.4806937
## 3 0.31428424 0.55112383 -0.4806937
## 4 -0.02483658 0.80261406 -0.4806937
## 5 -0.39004361 0.10403009 -0.4806937
## 10 1.20121560 1.13793436 -0.4806937
## ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
## 1 0.97711512 0.8093999 1.1846438
## 2 -0.50030980 0.2775205 0.9617071
## 3 0.28765016 0.4425865 0.8245152
## 4 0.28765016 0.7910592 1.0817499
## 5 0.09066017 2.5334227 3.3282665
## 10 -0.20482482 -0.1443149 0.6530254
## ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
## 1 0.3303945 0.9263296 0.15053478
## 2 0.1455765 -0.2753953 0.15597729
## 3 0.1455765 0.3655246 0.18318982
## 4 0.1967569 0.3655246 0.16958356
## 5 0.4754056 -0.3555103 0.20768110
## 10 0.1370464 0.7660996 0.08250345
## ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
## 1 0.4563798 0.3109942 0.2109804
## 2 1.5095063 0.1849230 0.2109804
## 3 1.0926437 0.1849230 0.2109804
## 4 0.9829430 0.1562704 0.2109804
## 5 1.6192070 0.2938027 -0.6884239
## 10 1.3778655 0.1648662 1.4958436
## ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
## 1 0.05833309 0.8317688 0.8907291
## 2 -0.72230090 -1.8147683 -1.0060115
## 3 -0.42205706 -1.2132826 -0.8335805
## 4 -0.12181322 -0.6117969 -0.6611496
## 5 0.77891831 0.5911745 1.5804530
## 10 -0.42205706 -1.2132826 -0.8335805
## ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
## 1 0.1200183 0.1256347 0.3460352
## 2 0.1093082 0.1966227 0.1906613
## 3 0.1842786 0.2159831 0.2104362
## 4 0.1708910 0.2052273 0.1906613
## 5 0.2726365 0.2912733 0.3432102
## 10 0.1735685 0.2568549 0.2471609
## ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
## 1 0.7826636 0.5943242 0.7566948
## 2 0.8779201 0.8347250 0.7566948
## 3 0.8588688 0.7746248 0.2444430
## 4 0.8588688 0.7746248 0.2444430
## 5 0.8969714 0.9549255 -0.1653585
## 10 0.9160227 1.0150257 0.6542445
## ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
## 1 -0.1952552 -0.4568829 0.9890307
## 2 -0.2672523 1.9517531 0.9890307
## 3 -0.1592567 2.6928719 0.9890307
## 4 -0.1592567 2.3223125 1.7943843
## 5 -0.1412574 2.3223125 2.5997378
## 10 -0.3032508 1.0253547 0.9890307
## ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
## 1 -1.7202722 -0.88694718 -0.6557774
## 2 1.9568096 1.14638329 -0.6557774
## 3 1.9568096 1.23880740 -1.8000420
## 4 0.1182687 0.03729394 -1.8000420
## 5 0.1182687 -2.55058120 -2.9443066
## 10 0.1182687 -0.70209896 -0.6557774
## ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
## 1 -1.1540243 0.7174727 0.2317270
## 2 2.2161351 -0.8224687 0.2317270
## 3 -0.7046697 -0.8224687 0.2317270
## 4 0.4187168 -0.8224687 0.2317270
## 5 -1.8280562 -0.8224687 0.2981503
## 10 1.7667805 0.7174727 0.1653036
## ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
## 1 0.05969714 -0.06900773 0.20279570
## 2 2.14909691 2.34626280 -0.05472265
## 3 -0.46265281 -0.44058781 0.40881037
## 4 -0.46265281 -0.44058781 -0.31224099
## 5 -0.46265281 -0.44058781 -0.10622632
## 10 -0.46265281 -0.44058781 0.04828469
## ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
## 1 2.40564734 -0.01588055 0.64371849
## 2 -0.01374656 0.29467248 0.15220242
## 3 0.10146268 -0.01588055 0.39796046
## 4 0.21667191 -0.01588055 -0.09355562
## 5 0.21667191 -0.32643359 -0.09355562
## 10 -0.12895579 0.29467248 0.64371849
head(y_test)
## [1] -1.1792673 1.2263678 1.0042258 0.6737219 1.2534583 1.2317859
Now, I will pre-process and tune an elastic net model I will scale the data.
# Set up cross-validation
train_control <- trainControl(method = "cv", number = 10)
# Grid of alpha (0 = ridge, 1 = lasso) and lambda values
elastic_grid <- expand.grid(
alpha = seq(0, 1, length = 11), # alpha from 0 to 1
lambda = 10^seq(-3, 1, length = 20) # lambda from 0.001 to 10
)
# Train Elastic Net model
elastic_model <- train(
x = X_train,
y = y_train,
method = "glmnet",
preProcess = c("center", "scale"),
tuneGrid = elastic_grid,
trControl = train_control,
metric = "RMSE"
)
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
print(elastic_model)
## glmnet
##
## 124 samples
## 57 predictor
##
## Pre-processing: centered (57), scaled (57)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 112, 112, 111, 112, 112, 112, ...
## Resampling results across tuning parameters:
##
## alpha lambda RMSE Rsquared MAE
## 0.0 0.001000000 1.3278634 0.57844003 0.7322428
## 0.0 0.001623777 1.3278634 0.57844003 0.7322428
## 0.0 0.002636651 1.3278634 0.57844003 0.7322428
## 0.0 0.004281332 1.3278634 0.57844003 0.7322428
## 0.0 0.006951928 1.3278634 0.57844003 0.7322428
## 0.0 0.011288379 1.3278634 0.57844003 0.7322428
## 0.0 0.018329807 1.3278634 0.57844003 0.7322428
## 0.0 0.029763514 1.3278634 0.57844003 0.7322428
## 0.0 0.048329302 1.3278634 0.57844003 0.7322428
## 0.0 0.078475997 1.2300121 0.58560774 0.6990532
## 0.0 0.127427499 1.0757928 0.58809733 0.6543799
## 0.0 0.206913808 0.9352806 0.60056785 0.6118112
## 0.0 0.335981829 0.8128609 0.62111809 0.5750638
## 0.0 0.545559478 0.7230554 0.63363659 0.5510543
## 0.0 0.885866790 0.6801405 0.62672074 0.5498386
## 0.0 1.438449888 0.6806812 0.61159822 0.5620915
## 0.0 2.335721469 0.7140542 0.58759375 0.5801108
## 0.0 3.792690191 0.7590269 0.54773764 0.6178966
## 0.0 6.158482111 0.8017282 0.51054849 0.6548829
## 0.0 10.000000000 0.8413307 0.48185532 0.6898674
## 0.1 0.001000000 3.0087671 0.48936502 1.2435410
## 0.1 0.001623777 2.8062625 0.48849372 1.1863329
## 0.1 0.002636651 2.5945288 0.48817210 1.1280650
## 0.1 0.004281332 2.4034946 0.48885555 1.0730579
## 0.1 0.006951928 2.1813942 0.49041605 1.0103106
## 0.1 0.011288379 1.9152732 0.49521083 0.9325039
## 0.1 0.018329807 1.5953278 0.51530578 0.8346973
## 0.1 0.029763514 1.3130720 0.55484985 0.7418870
## 0.1 0.048329302 1.1663597 0.51991223 0.6933085
## 0.1 0.078475997 0.9962708 0.55920964 0.6348725
## 0.1 0.127427499 0.8321493 0.59661714 0.5787555
## 0.1 0.206913808 0.7480890 0.64693615 0.5527826
## 0.1 0.335981829 0.7261603 0.63678699 0.5560211
## 0.1 0.545559478 0.6958242 0.63573021 0.5606597
## 0.1 0.885866790 0.6864220 0.62948205 0.5732344
## 0.1 1.438449888 0.7514296 0.59530876 0.6243326
## 0.1 2.335721469 0.8328162 0.58876228 0.6788964
## 0.1 3.792690191 0.9445853 0.57391567 0.7618832
## 0.1 6.158482111 1.0196045 0.25071699 0.8202739
## 0.1 10.000000000 1.0198547 NaN 0.8204133
## 0.2 0.001000000 2.8369084 0.48956677 1.1961205
## 0.2 0.001623777 2.6200354 0.48872833 1.1352437
## 0.2 0.002636651 2.3894620 0.48842518 1.0712227
## 0.2 0.004281332 2.2369144 0.48838790 1.0285503
## 0.2 0.006951928 2.0607848 0.48835277 0.9794875
## 0.2 0.011288379 1.7394811 0.49930887 0.8838380
## 0.2 0.018329807 1.3859265 0.55753389 0.7686266
## 0.2 0.029763514 1.2022810 0.51134306 0.7104849
## 0.2 0.048329302 0.9600991 0.56014169 0.6246727
## 0.2 0.078475997 0.8012848 0.58112656 0.5712338
## 0.2 0.127427499 0.6980526 0.65102489 0.5364270
## 0.2 0.206913808 0.6855033 0.64305293 0.5386345
## 0.2 0.335981829 0.6564882 0.64706234 0.5388780
## 0.2 0.545559478 0.6912773 0.62073411 0.5738587
## 0.2 0.885866790 0.7439466 0.60432458 0.6204757
## 0.2 1.438449888 0.8388899 0.59629036 0.6819404
## 0.2 2.335721469 0.9732801 0.54788151 0.7833792
## 0.2 3.792690191 1.0198547 NaN 0.8204133
## 0.2 6.158482111 1.0198547 NaN 0.8204133
## 0.2 10.000000000 1.0198547 NaN 0.8204133
## 0.3 0.001000000 2.6175980 0.49046849 1.1352789
## 0.3 0.001623777 2.4094439 0.48959488 1.0771305
## 0.3 0.002636651 2.2668367 0.48920268 1.0372521
## 0.3 0.004281332 2.2159958 0.48698178 1.0246847
## 0.3 0.006951928 1.9921816 0.48719973 0.9612176
## 0.3 0.011288379 1.5431340 0.52647945 0.8263773
## 0.3 0.018329807 1.3069131 0.51163221 0.7482595
## 0.3 0.029763514 1.0266852 0.53947924 0.6540259
## 0.3 0.048329302 0.8605860 0.56780265 0.5917000
## 0.3 0.078475997 0.7098592 0.61085249 0.5455314
## 0.3 0.127427499 0.6565133 0.64785418 0.5266847
## 0.3 0.206913808 0.6486920 0.64984376 0.5308735
## 0.3 0.335981829 0.6733742 0.63050094 0.5562003
## 0.3 0.545559478 0.7169880 0.60901626 0.5990750
## 0.3 0.885866790 0.7989384 0.60137281 0.6537125
## 0.3 1.438449888 0.9368342 0.56876327 0.7539952
## 0.3 2.335721469 1.0198547 NaN 0.8204133
## 0.3 3.792690191 1.0198547 NaN 0.8204133
## 0.3 6.158482111 1.0198547 NaN 0.8204133
## 0.3 10.000000000 1.0198547 NaN 0.8204133
## 0.4 0.001000000 2.4216218 0.49126526 1.0805883
## 0.4 0.001623777 2.2506819 0.49029862 1.0329423
## 0.4 0.002636651 2.2488466 0.48882061 1.0330879
## 0.4 0.004281332 2.1745715 0.48475964 1.0141832
## 0.4 0.006951928 1.8307266 0.49020008 0.9157939
## 0.4 0.011288379 1.4143658 0.55623674 0.7785372
## 0.4 0.018329807 1.2032051 0.50379335 0.7163990
## 0.4 0.029763514 0.9467232 0.55482025 0.6221988
## 0.4 0.048329302 0.7724333 0.58089073 0.5642119
## 0.4 0.078475997 0.6327641 0.65837670 0.5152799
## 0.4 0.127427499 0.6442275 0.65030477 0.5252044
## 0.4 0.206913808 0.6591120 0.64257879 0.5389385
## 0.4 0.335981829 0.6929674 0.61175246 0.5779363
## 0.4 0.545559478 0.7459763 0.61047037 0.6154863
## 0.4 0.885866790 0.8631670 0.58751789 0.6963850
## 0.4 1.438449888 1.0100894 0.44491963 0.8123530
## 0.4 2.335721469 1.0198547 NaN 0.8204133
## 0.4 3.792690191 1.0198547 NaN 0.8204133
## 0.4 6.158482111 1.0198547 NaN 0.8204133
## 0.4 10.000000000 1.0198547 NaN 0.8204133
## 0.5 0.001000000 2.2404096 0.49251034 1.0302696
## 0.5 0.001623777 2.2081982 0.49103984 1.0210415
## 0.5 0.002636651 2.2386623 0.48798706 1.0311877
## 0.5 0.004281332 2.1484796 0.48252147 1.0079457
## 0.5 0.006951928 1.6781831 0.50014704 0.8714623
## 0.5 0.011288379 1.3474871 0.52942518 0.7645652
## 0.5 0.018329807 1.0606103 0.52610095 0.6688384
## 0.5 0.029763514 0.8884373 0.56095049 0.6023078
## 0.5 0.048329302 0.7257194 0.59467872 0.5517087
## 0.5 0.078475997 0.6383156 0.65165076 0.5206514
## 0.5 0.127427499 0.6500751 0.64859065 0.5296612
## 0.5 0.206913808 0.6713140 0.62787604 0.5526607
## 0.5 0.335981829 0.7037244 0.61384881 0.5874274
## 0.5 0.545559478 0.7848592 0.59567070 0.6421593
## 0.5 0.885866790 0.9262629 0.55984204 0.7454041
## 0.5 1.438449888 1.0198547 NaN 0.8204133
## 0.5 2.335721469 1.0198547 NaN 0.8204133
## 0.5 3.792690191 1.0198547 NaN 0.8204133
## 0.5 6.158482111 1.0198547 NaN 0.8204133
## 0.5 10.000000000 1.0198547 NaN 0.8204133
## 0.6 0.001000000 2.0969150 0.49392220 0.9900562
## 0.6 0.001623777 2.1654235 0.49183428 1.0093508
## 0.6 0.002636651 2.2085024 0.48686630 1.0233229
## 0.6 0.004281332 2.0506276 0.48180414 0.9809582
## 0.6 0.006951928 1.5615566 0.51413999 0.8358062
## 0.6 0.011288379 1.2789354 0.50059321 0.7449696
## 0.6 0.018329807 0.9966740 0.54783884 0.6407564
## 0.6 0.029763514 0.8282476 0.56812988 0.5821954
## 0.6 0.048329302 0.6710768 0.62110178 0.5349909
## 0.6 0.078475997 0.6416944 0.65012048 0.5228561
## 0.6 0.127427499 0.6564132 0.64396779 0.5341661
## 0.6 0.206913808 0.6819957 0.61514307 0.5662901
## 0.6 0.335981829 0.7222390 0.61017412 0.5962515
## 0.6 0.545559478 0.8226084 0.58914255 0.6674269
## 0.6 0.885866790 0.9846851 0.48368905 0.7915703
## 0.6 1.438449888 1.0198547 NaN 0.8204133
## 0.6 2.335721469 1.0198547 NaN 0.8204133
## 0.6 3.792690191 1.0198547 NaN 0.8204133
## 0.6 6.158482111 1.0198547 NaN 0.8204133
## 0.6 10.000000000 1.0198547 NaN 0.8204133
## 0.7 0.001000000 2.0215001 0.49554078 0.9693061
## 0.7 0.001623777 2.1444386 0.49219271 1.0046150
## 0.7 0.002636651 2.1838946 0.48508052 1.0171857
## 0.7 0.004281332 1.9353842 0.48307894 0.9490804
## 0.7 0.006951928 1.4604545 0.53214083 0.8027846
## 0.7 0.011288379 1.2189734 0.49925398 0.7249038
## 0.7 0.018329807 0.9532214 0.55112392 0.6256745
## 0.7 0.029763514 0.7877433 0.57449408 0.5701790
## 0.7 0.048329302 0.6326735 0.65531217 0.5186896
## 0.7 0.078475997 0.6460885 0.64799454 0.5270335
## 0.7 0.127427499 0.6613950 0.63602842 0.5396400
## 0.7 0.206913808 0.6885419 0.61069259 0.5754734
## 0.7 0.335981829 0.7449782 0.59888766 0.6130356
## 0.7 0.545559478 0.8608030 0.57863352 0.6941457
## 0.7 0.885866790 1.0189670 0.09719598 0.8199775
## 0.7 1.438449888 1.0198547 NaN 0.8204133
## 0.7 2.335721469 1.0198547 NaN 0.8204133
## 0.7 3.792690191 1.0198547 NaN 0.8204133
## 0.7 6.158482111 1.0198547 NaN 0.8204133
## 0.7 10.000000000 1.0198547 NaN 0.8204133
## 0.8 0.001000000 1.9803278 0.49693842 0.9581675
## 0.8 0.001623777 2.1269033 0.49236223 1.0011804
## 0.8 0.002636651 2.1534563 0.48306032 1.0091017
## 0.8 0.004281332 1.8191562 0.48669080 0.9157293
## 0.8 0.006951928 1.3787232 0.55237878 0.7715350
## 0.8 0.011288379 1.1040501 0.51194879 0.6868823
## 0.8 0.018329807 0.9195601 0.55494207 0.6139070
## 0.8 0.029763514 0.7583896 0.58073546 0.5623739
## 0.8 0.048329302 0.6391604 0.65043843 0.5224692
## 0.8 0.078475997 0.6500160 0.64608063 0.5293914
## 0.8 0.127427499 0.6676606 0.62659872 0.5482547
## 0.8 0.206913808 0.6946373 0.61378373 0.5792340
## 0.8 0.335981829 0.7654323 0.59338946 0.6257220
## 0.8 0.545559478 0.9003351 0.56183306 0.7239460
## 0.8 0.885866790 1.0198547 NaN 0.8204133
## 0.8 1.438449888 1.0198547 NaN 0.8204133
## 0.8 2.335721469 1.0198547 NaN 0.8204133
## 0.8 3.792690191 1.0198547 NaN 0.8204133
## 0.8 6.158482111 1.0198547 NaN 0.8204133
## 0.8 10.000000000 1.0198547 NaN 0.8204133
## 0.9 0.001000000 1.9360099 0.49913431 0.9459655
## 0.9 0.001623777 2.0962142 0.49255828 0.9929895
## 0.9 0.002636651 2.1839145 0.48064011 1.0180043
## 0.9 0.004281332 1.7254098 0.49143439 0.8881214
## 0.9 0.006951928 1.3165029 0.53446410 0.7579816
## 0.9 0.011288379 1.0471004 0.53190116 0.6640137
## 0.9 0.018329807 0.8813975 0.55919137 0.6005711
## 0.9 0.029763514 0.7273519 0.59069050 0.5536868
## 0.9 0.048329302 0.6420416 0.64830024 0.5232645
## 0.9 0.078475997 0.6530477 0.64438935 0.5305766
## 0.9 0.127427499 0.6759427 0.61614774 0.5573518
## 0.9 0.206913808 0.7051069 0.61029897 0.5824587
## 0.9 0.335981829 0.7856098 0.59041976 0.6392198
## 0.9 0.545559478 0.9443861 0.52063402 0.7589195
## 0.9 0.885866790 1.0198547 NaN 0.8204133
## 0.9 1.438449888 1.0198547 NaN 0.8204133
## 0.9 2.335721469 1.0198547 NaN 0.8204133
## 0.9 3.792690191 1.0198547 NaN 0.8204133
## 0.9 6.158482111 1.0198547 NaN 0.8204133
## 0.9 10.000000000 1.0198547 NaN 0.8204133
## 1.0 0.001000000 1.8979329 0.50150778 0.9353285
## 1.0 0.001623777 2.0631665 0.49271186 0.9840520
## 1.0 0.002636651 2.1613334 0.47862605 1.0119807
## 1.0 0.004281332 1.6409139 0.49699392 0.8628744
## 1.0 0.006951928 1.2820817 0.50448769 0.7494401
## 1.0 0.011288379 0.9991887 0.54497460 0.6423042
## 1.0 0.018329807 0.8521514 0.56329564 0.5903246
## 1.0 0.029763514 0.6896843 0.60626850 0.5417622
## 1.0 0.048329302 0.6445061 0.64705751 0.5251471
## 1.0 0.078475997 0.6575286 0.64052716 0.5329462
## 1.0 0.127427499 0.6815510 0.61052230 0.5646903
## 1.0 0.206913808 0.7160579 0.60493605 0.5893166
## 1.0 0.335981829 0.8073021 0.58486432 0.6547974
## 1.0 0.545559478 0.9848073 0.46274114 0.7908575
## 1.0 0.885866790 1.0198547 NaN 0.8204133
## 1.0 1.438449888 1.0198547 NaN 0.8204133
## 1.0 2.335721469 1.0198547 NaN 0.8204133
## 1.0 3.792690191 1.0198547 NaN 0.8204133
## 1.0 6.158482111 1.0198547 NaN 0.8204133
## 1.0 10.000000000 1.0198547 NaN 0.8204133
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0.7 and lambda = 0.0483293.
plot(elastic_model)
I Chose RMSE as my performance metric and we can see above that the best elastic net model with the lowest RMSE is a model where alpha = 0.7 and lambda = 0.0483293.
elastic_model$results %>%
filter(RMSE == min(RMSE))
## alpha lambda RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 0.7 0.0483293 0.6326735 0.6553122 0.5186896 0.2545696 0.1801891 0.1793219
For this model we get a RMSE of 0.63 and an R Squared of 0.66.
Predict the response for the test set. What is the value of the performance metric and how does this compare with the resampled performance metric on the training set?
elastic_preds <- predict(elastic_model, newdata = X_test)
test_metrics <- postResample(pred = elastic_preds, obs = y_test)
test_metrics
## RMSE Rsquared MAE
## 0.6205362 0.6118390 0.5122034
When predicting on the test set we get an RMSE of 0.62 and a R squared of 0.61 these are very close to the training data RMSE of 0.63 and RSquared of 0.66. This shows that the model does not appear to be overfit or underfit.
Which predictors are most important in the model you have trained? Do either the biological or process predictors dominate the list?
var_imp <- varImp(elastic_model, scale = TRUE)
var_imp
## glmnet variable importance
##
## only 20 most important variables shown (out of 57)
##
## Overall
## ManufacturingProcess32 100.000
## ManufacturingProcess17 59.374
## ManufacturingProcess09 58.765
## ManufacturingProcess06 38.084
## ManufacturingProcess37 34.288
## BiologicalMaterial06 33.910
## ManufacturingProcess34 30.906
## ManufacturingProcess39 27.145
## ManufacturingProcess36 27.115
## ManufacturingProcess13 25.148
## ManufacturingProcess45 22.298
## BiologicalMaterial05 20.844
## ManufacturingProcess07 17.147
## ManufacturingProcess04 17.145
## ManufacturingProcess15 9.090
## ManufacturingProcess18 8.573
## ManufacturingProcess43 8.022
## ManufacturingProcess42 5.534
## ManufacturingProcess23 5.323
## ManufacturingProcess19 5.150
Of the top 20 Processes 18 of them are Manufacturing processes and only 2 were biological. Also of the 2 biological materials one was set at about 33 and the other 20 Overall importance. This is peanuts compared to the Manufacturing processes as they dominate and seem to be much more important to the model.
Explore the relationships between each of the top predictors and the response.How could this information be helpful in improving yield in future runs of the manufacturing process?
top_vars <- rownames(varImp(elastic_model)$importance)[order(-varImp(elastic_model)$importance$Overall)][1:6]
top_vars
## [1] "ManufacturingProcess32" "ManufacturingProcess17" "ManufacturingProcess09"
## [4] "ManufacturingProcess06" "ManufacturingProcess37" "BiologicalMaterial06"
These are the top 6 predictors so lets use ggplot to explore their relationship to the response Yield
library(ggplot2)
# Plot each top variable against yield
for (var in top_vars) {
print(
ggplot(imputed_data, aes_string(x = var, y = "Yield")) +
geom_point() +
labs(title = paste("Yield vs", var))
)
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Looking at the scatter plots above we can see that there are some clear relathionships between the predictors and the response Yield. The relationships are below: