23/10, 2018

Selección de modelos

Que variables explican la distribución del guanaco?

  • Selecciona el mejor modelo en regresión para distribución del guanaco
presence TempMedia TempMesCalido TempMesFrio TempRangoAnual PPAnual PPMesSeco PPMesHum
1 16.4 26.0 7.6 18.4 37 12 0
1 15.2 25.1 5.8 19.3 31 10 0
1 19.8 28.0 11.8 16.2 17 4 0
1 17.4 26.7 9.1 17.6 38 12 0
1 1.5 10.0 -5.7 15.7 1474 144 101
1 15.0 23.8 7.1 16.7 66 18 0
1 7.7 17.4 -0.8 18.2 696 91 32
1 6.9 16.8 -2.0 18.8 578 73 27
1 7.7 17.4 -0.8 18.2 696 91 32
1 1.5 10.0 -5.7 15.7 1474 144 101
1 7.7 17.4 -0.8 18.2 696 91 32
1 7.1 16.9 -1.7 18.6 604 77 28
1 7.3 17.4 -1.7 19.1 495 64 22
1 5.7 15.8 -2.9 18.7 791 87 49
1 8.1 22.0 -1.8 23.8 783 137 22
1 8.2 22.2 -1.7 23.9 760 139 21
1 5.0 13.0 -1.8 14.8 677 66 42
1 5.7 15.8 -2.9 18.7 791 87 49
1 5.7 15.8 -2.9 18.7 791 87 49
1 5.7 15.8 -2.9 18.7 791 87 49

Preparamos para ver los modelos

library(tidyverse)
library(MuMIn)
library(broom)
sp2 <- read_rds("sp2.rds")
SA <- read_rds("SA.rds")

Generar modelo global y lidiar con multicolinearidad

  • Modelo general
FitGlob <- glm(presence ~ ., family = binomial, 
    data = sp2)
  • Multicolinearidad
smat <- abs(cor(sp2[, -1])) <= 
    0.7
smat[!lower.tri(smat)] <- NA
  • Número de variables máxima
K = floor(nrow(sp2)/10)

Selección de modelos

options(na.action = "na.fail")
Selected <- dredge(FitGlob, subset = smat, 
    m.lim = c(0, K))
(Intercept) PPAnual PPMesHum PPMesSeco TempMedia TempMesCalido TempMesFrio TempRangoAnual df logLik AICc delta weight
76 7.010 -0.004 0.056 NA -0.257 NA NA -0.192 5 -169.152 348.356 0.000 1
87 6.547 NA 0.025 -0.024 NA -0.232 NA -0.053 5 -177.723 365.498 17.142 0
23 5.589 NA 0.027 -0.024 NA -0.243 NA NA 4 -178.758 365.550 17.194 0
84 6.790 -0.005 0.062 NA NA -0.229 NA -0.073 5 -178.775 367.602 19.246 0
20 5.452 -0.005 0.064 NA NA -0.243 NA NA 4 -180.696 369.426 21.070 0
74 8.048 -0.002 NA NA -0.295 NA NA -0.231 4 -183.025 374.084 25.728 0
12 2.854 -0.004 0.064 NA -0.280 NA NA NA 4 -183.432 374.898 26.542 0
85 7.650 NA NA -0.017 NA -0.241 NA -0.092 4 -184.972 377.980 29.624 0
21 5.952 NA NA -0.015 NA -0.257 NA NA 3 -188.641 383.302 34.946 0
75 6.040 NA -0.008 NA -0.333 NA NA -0.157 4 -197.941 403.916 55.560 0
73 5.041 NA NA NA -0.330 NA NA -0.121 3 -199.734 405.489 57.133 0
82 7.846 -0.002 NA NA NA -0.268 NA -0.093 4 -199.018 406.071 57.716 0
18 5.939 -0.001 NA NA NA -0.278 NA NA 3 -202.418 410.857 62.501 0
10 2.817 -0.001 NA NA -0.307 NA NA NA 3 -204.386 414.792 66.436 0
9 2.519 NA NA NA -0.323 NA NA NA 2 -208.132 420.274 71.918 0
11 2.468 NA 0.002 NA -0.323 NA NA NA 3 -208.020 422.061 73.705 0
19 5.884 NA -0.007 NA NA -0.313 NA NA 3 -217.442 440.905 92.549 0
17 5.638 NA NA NA NA -0.311 NA NA 2 -219.195 442.400 94.044 0
81 4.874 NA NA NA NA -0.310 NA 0.038 3 -218.326 442.672 94.316 0
97 4.874 NA NA NA NA NA -0.310 -0.272 3 -218.326 442.672 94.316 0
83 5.556 NA -0.006 NA NA -0.312 NA 0.015 4 -217.348 442.730 94.374 0
99 5.556 NA -0.006 NA NA NA -0.312 -0.298 4 -217.348 442.730 94.374 0
68 4.684 -0.007 0.087 NA NA NA NA -0.171 4 -237.073 482.180 133.824 0
71 4.588 NA 0.031 -0.036 NA NA NA -0.161 4 -238.559 485.152 136.796 0
35 -1.033 NA 0.010 NA NA NA -0.260 NA 3 -258.723 523.467 175.111 0
69 5.250 NA NA -0.027 NA NA NA -0.185 3 -259.255 524.531 176.175 0
7 0.875 NA 0.037 -0.034 NA NA NA NA 3 -260.103 526.227 177.871 0
4 0.751 -0.007 0.086 NA NA NA NA NA 3 -260.504 527.028 178.672 0
33 -0.721 NA NA NA NA NA -0.245 NA 2 -263.905 531.820 183.464 0
5 0.951 NA NA -0.021 NA NA NA NA 2 -292.561 589.133 240.777 0
66 4.059 -0.003 NA NA NA NA NA -0.153 3 -317.702 641.425 293.069 0
2 0.425 -0.002 NA NA NA NA NA NA 2 -343.323 690.657 342.301 0
65 -2.909 NA NA NA NA NA NA 0.059 2 -457.796 919.602 571.246 0
67 -2.749 NA -0.002 NA NA NA NA 0.054 3 -457.499 921.020 572.664 0
3 -1.656 NA -0.005 NA NA NA NA NA 2 -462.680 929.370 581.014 0
1 -1.833 NA NA NA NA NA NA NA 1 -465.380 932.764 584.408 0

El mejor modelo

options(na.action = "na.fail")
Selected <- dredge(FitGlob, subset = smat, 
    m.lim = c(0, K))
bestmodel <- get.models(Selected, 
    delta < 2)[[1]]
term estimate std.error statistic p.value
(Intercept) 7.010 0.939 7.462 0
PPAnual -0.004 0.001 -5.971 0
PPMesHum 0.056 0.012 4.732 0
TempMedia -0.257 0.027 -9.546 0
TempRangoAnual -0.192 0.039 -4.965 0

Hagamos cosas con el modelo

library(raster)
MapLM <- predict(SA, bestmodel, 
    type = "response")
plot(MapLM)

Respuestas?

NewData <- sp2 %>% dplyr::select(PPAnual, 
    PPMesHum, TempMedia, TempRangoAnual)
### Para temperatura media
NewDataTempMedia <- NewData %>% 
    mutate(PPAnual = mean(PPAnual), 
        PPMesHum = mean(PPMesHum), 
        TempRangoAnual = mean(TempRangoAnual))

Sigamos

NewDataTempMedia$TempMedia <- seq(from = min(NewDataTempMedia$TempMedia), 
    to = max(NewDataTempMedia$TempMedia), 
    length.out = nrow(NewDataTempMedia))
NewDataTempMedia$Prob <- predict(bestmodel, 
    NewDataTempMedia, type = "response")
NewDataTempMedia$se <- predict(bestmodel, 
    NewDataTempMedia, type = "response", 
    se.fit = TRUE)$se.fit

Que da esto?

Que variables explican que la gente salga a andar en bicicleta?

X1 Date Day High Temp (°F) Low Temp (°F) Precipitation Brooklyn Bridge Manhattan Bridge Williamsburg Bridge Queensboro Bridge Total
0 2016-04-01 2016-04-01 78.1 66.0 0.01 1704 3126 4115 2552 11497
1 2016-04-02 2016-04-02 55.0 48.9 0.15 827 1646 2565 1884 6922
2 2016-04-03 2016-04-03 39.9 34.0 0.09 526 1232 1695 1306 4759
3 2016-04-04 2016-04-04 44.1 33.1 0.47 (S) 521 1067 1440 1307 4335
4 2016-04-05 2016-04-05 42.1 26.1 0 1416 2617 3081 2357 9471
5 2016-04-06 2016-04-06 45.0 30.0 0 1885 3329 3856 2849 11919
6 2016-04-07 2016-04-07 57.0 53.1 0.09 1276 2581 3282 2457 9596
7 2016-04-08 2016-04-08 46.9 44.1 0.01 1982 3455 4113 3194 12744
8 2016-04-09 2016-04-09 43.0 37.9 0.09 504 997 1507 1502 4510
9 2016-04-10 2016-04-10 48.9 30.9 0 1447 2387 3132 2160 9126
10 2016-04-11 2016-04-11 62.1 46.0 0.01 2005 3791 4334 3182 13312
11 2016-04-12 2016-04-12 57.0 45.0 0.2 1045 2178 2762 2082 8067
12 2016-04-13 2016-04-13 57.0 39.9 0 2840 5395 5995 4192 18422
13 2016-04-14 2016-04-14 62.1 44.6 0 2861 5309 6030 4115 18315
14 2016-04-15 2016-04-15 64.0 44.1 0 2770 5072 5816 3912 17570
15 2016-04-16 2016-04-16 66.0 45.0 0 2384 4316 5624 4051 16375
16 2016-04-17 2016-04-17 73.9 46.0 0 3147 4969 5867 4197 18180
17 2016-04-18 2016-04-18 81.0 52.0 0 3871 6823 7432 4964 23090
18 2016-04-19 2016-04-19 71.1 63.0 0 3501 6951 7834 5032 23318
19 2016-04-20 2016-04-20 68.0 50.0 0 3450 6574 7639 4928 22591

Machine learning

Predecir la calidad del vino

  • Predecir quality type
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
7.4 0.700 0.00 1.9 0.076 11 34 0.998 3.51 0.56 9.4 5
7.8 0.760 0.04 2.3 0.092 15 54 0.997 3.26 0.65 9.8 5
11.2 0.280 0.56 1.9 0.075 17 60 0.998 3.16 0.58 9.8 6
7.4 0.700 0.00 1.9 0.076 11 34 0.998 3.51 0.56 9.4 5
7.4 0.660 0.00 1.8 0.075 13 40 0.998 3.51 0.56 9.4 5
7.9 0.600 0.06 1.6 0.069 15 59 0.996 3.30 0.46 9.4 5
7.8 0.580 0.02 2.0 0.073 9 18 0.997 3.36 0.57 9.5 7
7.5 0.500 0.36 6.1 0.071 17 102 0.998 3.35 0.80 10.5 5
7.5 0.500 0.36 6.1 0.071 17 102 0.998 3.35 0.80 10.5 5
5.6 0.615 0.00 1.6 0.089 16 59 0.994 3.58 0.52 9.9 5
7.8 0.610 0.29 1.6 0.114 9 29 0.997 3.26 1.56 9.1 5
8.9 0.620 0.19 3.9 0.170 51 148 0.999 3.17 0.93 9.2 5
8.5 0.280 0.56 1.8 0.092 35 103 0.997 3.30 0.75 10.5 7
8.1 0.560 0.28 1.7 0.368 16 56 0.997 3.11 1.28 9.3 5
7.4 0.590 0.08 4.4 0.086 6 29 0.997 3.38 0.50 9.0 4
7.9 0.320 0.51 1.8 0.341 17 56 0.997 3.04 1.08 9.2 6
8.9 0.220 0.48 1.8 0.077 29 60 0.997 3.39 0.53 9.4 6
7.6 0.390 0.31 2.3 0.082 23 71 0.998 3.52 0.65 9.7 5
8.5 0.490 0.11 2.3 0.084 9 67 0.997 3.17 0.53 9.4 5
6.9 0.400 0.14 2.4 0.085 21 40 0.997 3.43 0.63 9.7 6

Predicción del destino de perros en refugio

  • Predecir outcome type
animal_id animal_type breed color datetime name outcome_type sex_upon_outcome Age
A666430 Dog Beagle Mix White/Brown 2013-11-07 11:47:00 Lucy Transfer Spayed Female 1.00
A675708 Dog Pit Bull Blue/White 2014-06-03 14:20:00 *Johnny Adoption Neutered Male 1.18
A680386 Dog Miniature Schnauzer Mix White 2014-06-15 15:50:00 Monday Transfer Neutered Male 9.04
A692618 Dog Chihuahua Shorthair Mix Brown 2014-12-08 15:55:00 *Ella Transfer Spayed Female 3.05
A673652 Dog Papillon/Border Collie Black/White 2014-03-28 14:39:00 Fancy Transfer Neutered Male 2.08
A677679 Dog Chihuahua Shorthair/Pomeranian Black 2014-05-26 19:10:00 Kash Adoption Neutered Male 0.22
A640655 Dog Miniature Schnauzer/Miniature Poodle White 2014-04-25 11:17:00 Sandy Return to Owner Spayed Female 5.00
A690350 Dog Labrador Retriever Mix Black 2014-10-26 18:20:00 Shy Return to Owner Neutered Male 8.03
A674298 Dog Pit Bull Mix Brown Brindle/White 2014-04-16 12:51:00 *Newt Transfer Neutered Male 1.10
A692378 Dog German Shepherd/Labrador Retriever Black/White 2014-12-21 18:20:00 *Bonnie Adoption Spayed Female 0.17
A666660 Dog Beagle Mix Tan/White 2014-12-11 17:52:00 Dolly Adoption Spayed Female 2.60
A689952 Dog Mastiff Mix Brown Brindle 2014-10-19 17:54:00 Olive Adoption Spayed Female 1.02
A669621 Dog Plott Hound Mix Brown Brindle/Black 2013-12-25 09:47:00 NA Transfer Intact Female 0.34
A690699 Dog Chihuahua Shorthair Mix Blue/Tan 2014-11-04 18:03:00 Minnie Transfer Intact Female 0.39
A669918 Dog Labrador Retriever Mix Tan/White 2014-01-03 17:47:00 NA Adoption Spayed Female 0.26
A678098 Dog Labrador Retriever Mix Black/White 2014-07-09 11:54:00 Shades Adoption Neutered Male 12.18
A677038 Dog Chihuahua Shorthair Mix Buff 2014-04-26 13:00:00 *Penny Adoption Spayed Female 2.01
A665344 Dog Chihuahua Shorthair Mix Red/White 2013-11-02 17:47:00 Ginger Adoption Spayed Female 4.05
A675394 Dog Miniature Pinscher Mix Black/White 2014-04-03 13:26:00 *Josh Transfer Intact Male 1.52
A673439 Dog Cardigan Welsh Corgi/English Setter White/Black 2014-02-28 10:47:00 Aria Adoption Spayed Female 0.26

Tareas

Que explica la felicidad de la gente

Country Happiness.Rank Happiness.Score Whisker.high Whisker.low Economy..GDP.per.Capita. Family Health..Life.Expectancy. Freedom Generosity Trust..Government.Corruption. Dystopia.Residual
Norway 1 7.537 7.594 7.480 1.616 1.534 0.797 0.635 0.362 0.316 2.277
Denmark 2 7.522 7.582 7.462 1.482 1.551 0.793 0.626 0.355 0.401 2.314
Iceland 3 7.504 7.622 7.386 1.481 1.611 0.834 0.627 0.476 0.154 2.323
Switzerland 4 7.494 7.562 7.426 1.565 1.517 0.858 0.620 0.291 0.367 2.277
Finland 5 7.469 7.528 7.410 1.444 1.540 0.809 0.618 0.245 0.383 2.430
Netherlands 6 7.377 7.427 7.327 1.504 1.429 0.811 0.585 0.470 0.283 2.295
Canada 7 7.316 7.384 7.248 1.479 1.481 0.835 0.611 0.436 0.287 2.187
New Zealand 8 7.314 7.380 7.248 1.406 1.548 0.817 0.614 0.500 0.383 2.046
Sweden 9 7.284 7.344 7.224 1.494 1.478 0.831 0.613 0.385 0.384 2.098
Australia 10 7.284 7.357 7.211 1.484 1.510 0.844 0.602 0.478 0.301 2.065
Israel 11 7.213 7.280 7.146 1.375 1.376 0.838 0.406 0.330 0.085 2.802
Costa Rica 12 7.079 7.168 6.990 1.110 1.416 0.760 0.580 0.215 0.100 2.899
Austria 13 7.006 7.071 6.941 1.487 1.460 0.815 0.568 0.316 0.221 2.139
United States 14 6.993 7.075 6.911 1.546 1.420 0.774 0.506 0.393 0.136 2.218
Ireland 15 6.977 7.043 6.911 1.536 1.558 0.810 0.573 0.428 0.298 1.774
Germany 16 6.951 7.005 6.897 1.488 1.473 0.799 0.563 0.336 0.277 2.016
Belgium 17 6.891 6.956 6.826 1.464 1.462 0.818 0.540 0.232 0.251 2.124
Luxembourg 18 6.863 6.924 6.802 1.742 1.458 0.845 0.597 0.283 0.319 1.620
United Kingdom 19 6.714 6.784 6.644 1.442 1.496 0.805 0.508 0.493 0.265 1.704
Chile 20 6.652 6.739 6.565 1.253 1.284 0.819 0.377 0.327 0.082 2.510
United Arab Emirates 21 6.648 6.722 6.574 1.626 1.266 0.727 0.608 0.361 0.324 1.735
Brazil 22 6.635 6.725 6.545 1.107 1.431 0.617 0.437 0.162 0.111 2.769
Czech Republic 23 6.609 6.684 6.534 1.353 1.434 0.754 0.491 0.088 0.037 2.452
Argentina 24 6.599 6.690 6.508 1.185 1.440 0.695 0.495 0.109 0.060 2.614
Mexico 25 6.578 6.671 6.485 1.153 1.211 0.710 0.413 0.121 0.133 2.837

Recomendación de cervezas

  • Predecir review_overall
brewery_id brewery_name review_overall review_profilename beer_style beer_name beer_abv beer_beerid
10325 Vecchio Birraio 1.5 stcules Hefeweizen Sausa Weizen 5.0 47986
10325 Vecchio Birraio 3.0 stcules English Strong Ale Red Moon 6.2 48213
10325 Vecchio Birraio 3.0 stcules Foreign / Export Stout Black Horse Black Beer 6.5 48215
1075 Caldera Brewing Company 4.0 johnmichaelsen American Double / Imperial IPA Cauldron DIPA 7.7 64883
1075 Caldera Brewing Company 3.0 oline73 Herbed / Spiced Beer Caldera Ginger Beer 4.7 52159
1075 Caldera Brewing Company 3.0 alpinebryant Herbed / Spiced Beer Caldera Ginger Beer 4.7 52159
1075 Caldera Brewing Company 4.5 augustgarage Herbed / Spiced Beer Caldera Ginger Beer 4.7 52159
1075 Caldera Brewing Company 5.0 MadeInOregon Herbed / Spiced Beer Caldera Ginger Beer 4.7 52159
1075 Caldera Brewing Company 4.0 rawthar Herbed / Spiced Beer Caldera Ginger Beer 4.7 52159
1075 Caldera Brewing Company 4.0 Halcyondays Herbed / Spiced Beer Caldera Ginger Beer 4.7 52159
1075 Caldera Brewing Company 3.0 Beerandraiderfan Oatmeal Stout Caldera Oatmeal Stout 7.2 10789
1075 Caldera Brewing Company 4.5 UCLABrewN84 Rauchbier Rauch Ür Bock 7.4 58046
1075 Caldera Brewing Company 4.0 zaphodchak Rauchbier Rauch Ür Bock 7.4 58046
1075 Caldera Brewing Company 4.0 Tilley4 Rauchbier Rauch Ür Bock 7.4 58046
1075 Caldera Brewing Company 4.5 mikedrinksbeer2 Rauchbier Rauch Ür Bock 7.4 58046
1075 Caldera Brewing Company 4.5 titosupertramp Rauchbier Rauch Ür Bock 7.4 58046
1075 Caldera Brewing Company 5.0 optimator13 Rauchbier Rauch Ür Bock 7.4 58046
1075 Caldera Brewing Company 4.5 Blakaeris Rauchbier Rauch Ür Bock 7.4 58046
1075 Caldera Brewing Company 4.5 Klym Rauchbier Rauch Ür Bock 7.4 58046
1075 Caldera Brewing Company 4.5 flexabull Rauchbier Rauch Ür Bock 7.4 58046