Parte 2
Teniendo en cuenta todas las variables disponibles, estimar el Valor medio de la vivienda (‘Median_House_Value’) mediante Modelos de Machine Learning.
Después de probar entre las opciones disponibles -Árboles de decisión, Random Forest, con y sin tunning, en cada caso- la que mejor se ajusta es un Random Forest sin tunneo.
El dataset corresponde a precios de propiedades del estado de California, que se encuentra disponible en Kaggle; una plataforma de competencia de machine learning. Está disponible en: https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features
library(tidymodels)
library(tidyverse)
library(gridExtra)
library(corrplot)
library(magrittr)
library(corrr)
library(skimr)
library(vip)
houses <- read_csv("https://raw.githubusercontent.com/data-datum/datasets/main/california_houses.csv")
Con la función head() de rbase, vamos a ver las primeras filas del dataset.
head(houses)
## # A tibble: 6 × 14
## Median_House_Value Median_Income Median_Age Tot_Rooms Tot_Bedrooms Population
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 452600 8.33 41 880 129 322
## 2 358500 8.30 21 7099 1106 2401
## 3 352100 7.26 52 1467 190 496
## 4 341300 5.64 52 1274 235 558
## 5 342200 3.85 52 1627 280 565
## 6 269700 4.04 52 919 213 413
## # ℹ 8 more variables: Households <dbl>, Latitude <dbl>, Longitude <dbl>,
## # Distance_to_coast <dbl>, Distance_to_LA <dbl>, Distance_to_SanDiego <dbl>,
## # Distance_to_SanJose <dbl>, Distance_to_SanFrancisco <dbl>
Con la funcion skim() vamos a ver el análisis de resumen del conjunto de datos.
options(scipen = 999)
skim(houses)
Name | houses |
Number of rows | 20640 |
Number of columns | 14 |
_______________________ | |
Column type frequency: | |
numeric | 14 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Median_House_Value | 0 | 1 | 206855.82 | 115395.62 | 14999.00 | 119600.00 | 179700.00 | 264725.00 | 500001.00 | ▅▇▅▂▂ |
Median_Income | 0 | 1 | 3.87 | 1.90 | 0.50 | 2.56 | 3.53 | 4.74 | 15.00 | ▇▇▁▁▁ |
Median_Age | 0 | 1 | 28.64 | 12.59 | 1.00 | 18.00 | 29.00 | 37.00 | 52.00 | ▃▇▇▇▅ |
Tot_Rooms | 0 | 1 | 2635.76 | 2181.62 | 2.00 | 1447.75 | 2127.00 | 3148.00 | 39320.00 | ▇▁▁▁▁ |
Tot_Bedrooms | 0 | 1 | 537.90 | 421.25 | 1.00 | 295.00 | 435.00 | 647.00 | 6445.00 | ▇▁▁▁▁ |
Population | 0 | 1 | 1425.48 | 1132.46 | 3.00 | 787.00 | 1166.00 | 1725.00 | 35682.00 | ▇▁▁▁▁ |
Households | 0 | 1 | 499.54 | 382.33 | 1.00 | 280.00 | 409.00 | 605.00 | 6082.00 | ▇▁▁▁▁ |
Latitude | 0 | 1 | 35.63 | 2.14 | 32.54 | 33.93 | 34.26 | 37.71 | 41.95 | ▇▁▅▂▁ |
Longitude | 0 | 1 | -119.57 | 2.00 | -124.35 | -121.80 | -118.49 | -118.01 | -114.31 | ▂▆▃▇▁ |
Distance_to_coast | 0 | 1 | 40509.26 | 49140.04 | 120.68 | 9079.76 | 20522.02 | 49830.41 | 333804.69 | ▇▁▁▁▁ |
Distance_to_LA | 0 | 1 | 269421.98 | 247732.45 | 420.59 | 32111.25 | 173667.46 | 527156.24 | 1018260.12 | ▇▁▅▁▁ |
Distance_to_SanDiego | 0 | 1 | 398164.93 | 289400.56 | 484.92 | 159426.39 | 214739.83 | 705795.40 | 1196919.27 | ▇▁▃▃▁ |
Distance_to_SanJose | 0 | 1 | 349187.55 | 217149.88 | 569.45 | 113119.93 | 459758.88 | 516946.49 | 836762.68 | ▇▃▆▇▁ |
Distance_to_SanFrancisco | 0 | 1 | 386688.42 | 250122.19 | 456.14 | 117395.48 | 526546.66 | 584552.01 | 903627.66 | ▆▂▂▇▁ |
Con la función glimpse() de dplyr podemos ver una breve descripción de las variables del dataset, ver que tipo de dato forman parte del dataset, etc.
glimpse(houses)
## Rows: 20,640
## Columns: 14
## $ Median_House_Value <dbl> 452600, 358500, 352100, 341300, 342200, 26970…
## $ Median_Income <dbl> 8.3252, 8.3014, 7.2574, 5.6431, 3.8462, 4.036…
## $ Median_Age <dbl> 41, 21, 52, 52, 52, 52, 52, 52, 42, 52, 52, 5…
## $ Tot_Rooms <dbl> 880, 7099, 1467, 1274, 1627, 919, 2535, 3104,…
## $ Tot_Bedrooms <dbl> 129, 1106, 190, 235, 280, 213, 489, 687, 665,…
## $ Population <dbl> 322, 2401, 496, 558, 565, 413, 1094, 1157, 12…
## $ Households <dbl> 126, 1138, 177, 219, 259, 193, 514, 647, 595,…
## $ Latitude <dbl> 37.88, 37.86, 37.85, 37.85, 37.85, 37.85, 37.…
## $ Longitude <dbl> -122.23, -122.22, -122.24, -122.25, -122.25, …
## $ Distance_to_coast <dbl> 9263.041, 10225.733, 8259.085, 7768.087, 7768…
## $ Distance_to_LA <dbl> 556529.2, 554279.9, 554610.7, 555194.3, 55519…
## $ Distance_to_SanDiego <dbl> 735501.8, 733236.9, 733525.7, 734095.3, 73409…
## $ Distance_to_SanJose <dbl> 67432.52, 65049.91, 64867.29, 65287.14, 65287…
## $ Distance_to_SanFrancisco <dbl> 21250.21, 20880.60, 18811.49, 18031.05, 18031…
Se trata de un dataset con 20.640 registros y 14 columnas; de las cuales la variable que nos interesa predecir es ‘Median_House_Value’. Las demás serán las variables del modelo.
Todas las variables son cuantitativas continuas, lo que facilita no tener que codificarlas (como sí se tendría que hacer en caso que fueran categóricas).
En primer lugar, vamos a visualizar la distribución de la variable a estimar con el fin de comprender como se presentan los datos.
ggplot(data=houses,
aes(x=Median_House_Value)) +
geom_histogram(fill="tomato2",
color="tomato4",
bins=50) +
labs(title = "Histograma de la variable 'Median House Value'",
x="Median House Value",
y="",
caption = "Fuente: https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features" ) +
theme(plot.margin = margin(0.25, 1, 0.25, 0.1, "cm"),
panel.background = element_rect(fill = "gray100", colour = "gray100", linewidth = 2, linetype = "solid"),
panel.grid.major = element_line(linewidth = 0.5, linetype = "dashed", colour = "gray80"),
panel.grid.minor = element_line(linewidth = 0.25, linetype = "dashed", colour = "gray90"),
title=element_text(size=12, face = "bold"),
plot.caption=element_text(face = "italic", colour = "gray35",size=8),
axis.ticks = element_blank())
Vemos que muestra un sesgo hacia la derecha, con valores superiores de 500.000. Veamos si la distribución espacial aporta información de utilidad sobre esta valoración.
graf_esp_1 <-
ggplot(data= houses %>% filter(Median_House_Value >= 500001),
aes(Latitude, Longitude)) +
geom_point(aes(color = Median_House_Value),
color = "tomato") +
labs(title = "Ubicación de viviendas",
subtitle = "Valor mayor a 500.000 USD",
color = "",
x="",
y="",
caption = "" ) +
theme(plot.margin = margin(0.25, 1, 0.25, 0.1, "cm"),
panel.background = element_rect(fill = "gray100", colour = "gray100", linewidth = 2, linetype = "solid"),
panel.grid.major = element_line(linewidth = 0.5, linetype = "dashed", colour = "gray80"),
panel.grid.minor = element_line(linewidth = 0.25, linetype = "dashed", colour = "gray90"),
title=element_text(size=12, face = "bold"),
plot.caption=element_text(face = "italic", colour = "gray35",size=8),
axis.ticks = element_blank())
graf_esp_2 <-
ggplot(data = houses,
aes(Latitude, Longitude))+
geom_point(aes(color = Median_House_Value),
alpha=0.5)+
labs(title = "",
color = "Valor medio",
x="",
y="",
caption = "Fuente: https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features" ) +
scale_color_gradient(low = "cornsilk", high = "tomato",
breaks = c(100000, 500000),
labels = c("100000", "500000")) +
theme(plot.margin = margin(0.25, 1, 0.25, 0.1, "cm"),
panel.background = element_rect(fill = "gray100", colour = "gray100", linewidth = 2, linetype = "solid"),
panel.grid.major = element_line(linewidth = 0.5, linetype = "dashed", colour = "gray80"),
panel.grid.minor = element_line(linewidth = 0.25, linetype = "dashed", colour = "gray90"),
title=element_text(size=12, face = "bold"),
legend.position="top", #ubicacion de leyenda
legend.direction = "horizontal", #dirección de la leyenda
legend.title=element_text(size=7, face = "bold"), #tamaño de titulo de leyenda
legend.text=element_text(size=6), #tamaño de texto de leyenda
plot.caption=element_text(face = "italic", colour = "gray35",size=8),
axis.ticks = element_blank())
grid.arrange(graf_esp_1, graf_esp_2, ncol = 2)
Ahora, veamos una matriz de correlación
corrplot(cor(houses),
method = "color",
title = "Matriz de correlación",
type = "upper",
addgrid.col = "white",
tl.cex = 0.8,
tl.col = "black",
tl.srt = 50,
cl.length = 5)
Las tres variables de mayor importancia parecen ser ‘Median_Income’, ‘Distance_to_coast’, y en menor medida, ‘Tot_Rooms’. Veamos la relación con ‘Median_House_Value’ en cada caso.
ggplot(houses,
aes(x = Median_House_Value,
y = Median_Income)) +
geom_point(color= "tomato") +
labs(title = "Relación entre 'Median_House_Value' y 'Median_Income'",
x="'Median House Value'",
y="'Median Income'",
caption = "Fuente: https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features" ) +
theme(plot.margin = margin(0.25, 1, 0.25, 0.1, "cm"),
panel.background = element_rect(fill = "gray100", colour = "gray100", linewidth = 2, linetype = "solid"),
panel.grid.major = element_line(linewidth = 0.5, linetype = "dashed", colour = "gray80"),
panel.grid.minor = element_line(linewidth = 0.25, linetype = "dashed", colour = "gray90"),
title=element_text(size=12, face = "bold"),
plot.caption=element_text(face = "italic", colour = "gray35",size=8),
axis.ticks = element_blank())
Parecería que existe una relación positiva entre ‘Median_House_Value’ y ‘Median_Income’.
ggplot(houses,
aes(x = Median_House_Value, y = Distance_to_coast)) +
geom_point(color= "tomato") +
labs(title = "Relación entre 'Median_House_Value' y 'Distance_to_coast'",
x="'Median House Value'",
y="'Distance to coast'",
caption = "Fuente: https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features" ) +
theme(plot.margin = margin(0.25, 1, 0.25, 0.1, "cm"),
panel.background = element_rect(fill = "gray100", colour = "gray100", linewidth = 2, linetype = "solid"),
panel.grid.major = element_line(linewidth = 0.5, linetype = "dashed", colour = "gray80"),
panel.grid.minor = element_line(linewidth = 0.25, linetype = "dashed", colour = "gray90"),
title=element_text(size=12, face = "bold"),
plot.caption=element_text(face = "italic", colour = "gray35",size=8),
axis.ticks = element_blank())
Parecería que existe una relación negativa entre ‘Median_House_Value’ y ‘Distance_to_coast’
ggplot(houses,
aes(x = Median_House_Value,
y = Tot_Rooms)) +
geom_point(color= "tomato") +
labs(title = "Relación entre 'Median_House_Value' y 'Tot_Rooms'",
x="'Median House Value'",
y="'Total Rooms'",
caption = "Fuente: https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features" ) +
theme(plot.margin = margin(0.25, 1, 0.25, 0.1, "cm"),
panel.background = element_rect(fill = "gray100", colour = "gray100", linewidth = 2, linetype = "solid"),
panel.grid.major = element_line(linewidth = 0.5, linetype = "dashed", colour = "gray80"),
panel.grid.minor = element_line(linewidth = 0.25, linetype = "dashed", colour = "gray90"),
title=element_text(size=12, face = "bold"),
plot.caption=element_text(face = "italic", colour = "gray35",size=8),
axis.ticks = element_blank())
En este caso, la relación entre ‘Tot_Rooms’ y ‘Median_House_Value’ no es clara.
Dividimos los datos en train y test, con el criterio 75/25.
set.seed(1234)
p_split <- houses %>%
initial_split(prop = 0.75)
p_train <- training(p_split)
p_test <- testing(p_split)
glimpse(p_train)
## Rows: 15,480
## Columns: 14
## $ Median_House_Value <dbl> 165500, 225200, 125500, 206900, 114600, 17500…
## $ Median_Income <dbl> 4.6389, 4.0603, 2.4167, 3.4808, 3.0924, 2.562…
## $ Median_Age <dbl> 17, 37, 31, 45, 14, 52, 16, 15, 23, 43, 36, 3…
## $ Tot_Rooms <dbl> 1145, 2059, 1014, 944, 2391, 1114, 2512, 539,…
## $ Tot_Bedrooms <dbl> 209, 349, 252, 178, 451, 206, 356, 71, 835, 2…
## $ Population <dbl> 499, 825, 1064, 533, 798, 425, 795, 287, 2357…
## $ Households <dbl> 202, 334, 247, 193, 308, 207, 353, 66, 823, 1…
## $ Latitude <dbl> 33.94, 33.83, 34.03, 33.81, 37.29, 37.73, 33.…
## $ Longitude <dbl> -118.17, -118.10, -118.17, -118.20, -119.56, …
## $ Distance_to_coast <dbl> 21064.3345, 10513.2231, 28063.1951, 7476.0654…
## $ Distance_to_LA <dbl> 14208.836, 28041.689, 7225.341, 27235.138, 37…
## $ Distance_to_SanDiego <dbl> 165280.04, 151557.98, 173589.00, 155354.70, 5…
## $ Distance_to_SanJose <dbl> 505621.00, 519121.29, 498068.93, 514914.25, 2…
## $ Distance_to_SanFrancisco <dbl> 573641.78, 587133.23, 566101.74, 582890.50, 2…
p_split
## <Training/Testing/Total>
## <15480/5160/20640>
Los datos de TRAIN voy a dividirlos en 3-folds para hacer validación cruzada.
p_folds <- vfold_cv(p_train, v=3, repeats = 5)
Veamos los splits de validación cruzada, son 3 folds que se repiten 5 veces.
p_folds$splits
## [[1]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[2]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[3]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[4]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[5]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[6]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[7]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[8]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[9]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[10]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[11]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[12]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[13]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[14]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
##
## [[15]]
## <Analysis/Assess/Total>
## <10320/5160/15480>
Con ‘recipe’ agrupamos el preprocesamiento que vamos aplicar a los datos de train con el que haremos el modelado. Eliminamos las correlaciones, centramos y escalamos los datos.
recipe_rf <- p_train %>%
recipe(Median_House_Value~.) %>%
step_corr(all_predictors()) %>% #elimino las correlaciones
step_center(all_predictors(), -all_outcomes()) %>% #centrado
step_scale(all_predictors(), -all_outcomes()) %>% #escalado
prep()
En esta etapa definimos el modelo que vamos a implementar, en este caso, Random Forest Baseline.
rf_spec <- rand_forest() %>%
set_engine("ranger") %>%
set_mode("regression")
rf_spec
## Random Forest Model Specification (regression)
##
## Computational engine: ranger
Inicializo el workflow para trabajar de manera ordenada, luego visualizo los pasos.
rf_wf <- workflow() %>%
add_recipe(recipe_rf) %>%
add_model(rf_spec)
rf_wf
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
##
## • step_corr()
## • step_center()
## • step_scale()
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (regression)
##
## Computational engine: ranger
set.seed(123)
rf_res <- rf_wf %>%
fit_resamples(
p_folds,
control = control_resamples(save_pred = TRUE)
)
glimpse(rf_res)
## Rows: 15
## Columns: 6
## $ splits <list> [<vfold_split[10320 x 5160 x 15480 x 14]>], [<vfold_spli…
## $ id <chr> "Repeat1", "Repeat1", "Repeat1", "Repeat2", "Repeat2", "R…
## $ id2 <chr> "Fold1", "Fold2", "Fold3", "Fold1", "Fold2", "Fold3", "Fo…
## $ .metrics <list> [<tbl_df[2 x 4]>], [<tbl_df[2 x 4]>], [<tbl_df[2 x 4]>],…
## $ .notes <list> [<tbl_df[0 x 3]>], [<tbl_df[0 x 3]>], [<tbl_df[0 x 3]>],…
## $ .predictions <list> [<tbl_df[5160 x 4]>], [<tbl_df[5160 x 4]>], [<tbl_df[516…
rf_res %>%
collect_metrics()
## # A tibble: 2 × 6
## .metric .estimator mean n std_err .config
## <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 rmse standard 50430. 15 171. Preprocessor1_Model1
## 2 rsq standard 0.810 15 0.00128 Preprocessor1_Model1
El RMSE es de 50.430 en etapa de entrenamiento.
final_model <- finalize_model(rf_spec, select_best(rf_res, "rmse"))
final_model
## Random Forest Model Specification (regression)
##
## Computational engine: ranger
final_rs <- last_fit(final_model, Median_House_Value ~ ., p_split)
final_rs
## # Resampling results
## # Manual resampling
## # A tibble: 1 × 6
## splits id .metrics .notes .predictions .workflow
## <list> <chr> <list> <list> <list> <list>
## 1 <split [15480/5160]> train/test spl… <tibble> <tibble> <tibble> <workflow>
final_rs %>%
collect_metrics()
## # A tibble: 2 × 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 rmse standard 45522. Preprocessor1_Model1
## 2 rsq standard 0.848 Preprocessor1_Model1
El RMSE es de 45.522 en datos de test, es decir que mejora los datos de entramiento (50.430).
final_rs %>%
collect_predictions()
## # A tibble: 5,160 × 5
## id .pred .row Median_House_Value .config
## <chr> <dbl> <int> <dbl> <chr>
## 1 train/test split 418691. 2 358500 Preprocessor1_Model1
## 2 train/test split 411536. 3 352100 Preprocessor1_Model1
## 3 train/test split 348625. 4 341300 Preprocessor1_Model1
## 4 train/test split 264543. 5 342200 Preprocessor1_Model1
## 5 train/test split 246982. 8 241400 Preprocessor1_Model1
## 6 train/test split 276590. 10 261100 Preprocessor1_Model1
## 7 train/test split 229962. 11 281500 Preprocessor1_Model1
## 8 train/test split 276526. 12 241800 Preprocessor1_Model1
## 9 train/test split 155607. 19 158700 Preprocessor1_Model1
## 10 train/test split 142630. 20 162900 Preprocessor1_Model1
## # ℹ 5,150 more rows
Se comparan los valores predecidos con los valores reales:
collect_predictions(final_rs) %>%
ggplot(aes(Median_House_Value, .pred)) +
geom_abline(lty = 2, color = "tomato4") +
geom_point(alpha = 0.3, color = "tomato") +
coord_fixed() +
labs(title="Propiedades del estado de California",
subtitle= "Estimación del Valor Medio de la Vivienda (Random Forest regressor)",
x = "Median House Value",
y = "Predicción",
caption = "Fuente: https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features")+
theme(plot.margin = margin(0.25, 1, 0.25, 0.1, "cm"),
panel.background = element_rect(fill = "gray100", colour = "gray100", size = 2, linetype = "solid"),
panel.grid.major = element_line(size = 0.5, linetype = "dashed", colour = "gray80"),
panel.grid.minor = element_line(size = 0.25, linetype = "dashed", colour = "gray90"),
title=element_text(size=12, face = "bold"),
plot.caption=element_text(face = "italic", colour = "gray35",size=8),
axis.ticks = element_blank())
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Vamos a plotear las variables más importantes del MODELO, mediante la librería vip.
final_model %>%
set_engine("ranger", importance = "permutation") %>%
fit(Median_House_Value ~ .,
data = juice(recipe_rf)) %>%
vip(geom = "col",
aesthetics = list(fill= "tomato", color = "tomato4", size = 0.5)) +
labs(title="Variables mas importantes",
subtitle= "Estimación del Valor Medio de la Vivienda (Random Forest baseline)",
y = "Importacia",
x = "",
caption = "Fuente: https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features") +
theme(plot.margin = margin(0.25, 1, 0.25, 0.1, "cm"),
panel.background = element_rect(fill = "gray100", colour = "gray100", size = 2, linetype = "solid"),
panel.grid.major = element_line(size = 0.5, linetype = "dashed", colour = "gray80"),
panel.grid.minor = element_line(size = 0.25, linetype = "dashed", colour = "gray90"),
title=element_text(size=12, face = "bold"),
plot.caption=element_text(face = "italic", colour = "gray35",size=8),
axis.ticks = element_blank())
Este gráfico ratifica las hipótesis planteadas: las variables que podrían influir en el precio medio de las viviendas, son en primer lugar, el ingreso medio de los residentes y en segundo término, la distancia a la costa del inmueble. La tercera variable que había analizado, ‘Tot_Rooms’, no parece de mayor importancia. En cambio, la distancia a San José, primero, y a L.A. en segundo lugar, parecen ser más importantes para la estimación del valor medio de la casa.
Virginia Recagno - virginia.recagno@gmail.com Trabajo final de Machine Learning con aplicaciones espaciales para el posgrado Big Data e Inteligencia Territorial de FLACSO Argentina