Acerca del Dataset

Contexto El conjunto de datos proviene de uno de los Hackathones de MachineHack.

Contenido Conjunto de entrenamiento (train): 17,996 filas con 17 columnas Detalles de las columnas: nombre del artista, nombre de la canción, popularidad, ‘bailabilidad’ (danceability), energía, tonalidad (key), volumen (loudness), modo (mode), ‘speechiness’ (cantidad de palabras habladas), ‘acousticness’ (acústica), ‘instrumentalness’ (instrumentalidad), viveza (liveness), valencia (emoción positiva), tempo (velocidad), duración en milisegundos y compás (time_signature).

Variable objetivo (target): ‘Class’ que puede ser uno de los siguientes géneros musicales: Rock, Indie, Alt, Pop, Metal, HipHop, Alt_Music, Blues, Acoustic/Folk, Instrumental, Country, Bollywood.

Conjunto de prueba (test): 7,713 filas con 16 columnas (las mismas columnas que el de entrenamiento, pero sin la variable objetivo).

Librerias

library(tidyr)
library(dplyr)

## 
## Adjuntando el paquete: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(skimr)
library(naniar)

## 
## Adjuntando el paquete: 'naniar'

## The following object is masked from 'package:skimr':
## 
##     n_complete

library(mice)

## Warning: package 'mice' was built under R version 4.4.3

## 
## Adjuntando el paquete: 'mice'

## The following object is masked from 'package:stats':
## 
##     filter

## The following objects are masked from 'package:base':
## 
##     cbind, rbind

library(ggplot2)
library(viridis)

## Cargando paquete requerido: viridisLite

library(caret)

## Cargando paquete requerido: lattice

library(MASS)

## 
## Adjuntando el paquete: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

library(nnet)

## Warning: package 'nnet' was built under R version 4.4.3

library(corrplot)

## corrplot 0.94 loaded

library(xgboost)

## Warning: package 'xgboost' was built under R version 4.4.3

## 
## Adjuntando el paquete: 'xgboost'

## The following object is masked from 'package:dplyr':
## 
##     slice

library(tidymodels)

## ── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──

## ✔ broom        1.0.6     ✔ rsample      1.2.1
## ✔ dials        1.3.0     ✔ tibble       3.2.1
## ✔ infer        1.0.7     ✔ tune         1.2.1
## ✔ modeldata    1.4.0     ✔ workflows    1.1.4
## ✔ parsnip      1.2.1     ✔ workflowsets 1.1.0
## ✔ purrr        1.0.2     ✔ yardstick    1.3.1
## ✔ recipes      1.3.1

## Warning: package 'recipes' was built under R version 4.4.3

## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ purrr::discard()         masks scales::discard()
## ✖ mice::filter()           masks dplyr::filter(), stats::filter()
## ✖ dplyr::lag()             masks stats::lag()
## ✖ purrr::lift()            masks caret::lift()
## ✖ yardstick::precision()   masks caret::precision()
## ✖ yardstick::recall()      masks caret::recall()
## ✖ MASS::select()           masks dplyr::select()
## ✖ yardstick::sensitivity() masks caret::sensitivity()
## ✖ xgboost::slice()         masks dplyr::slice()
## ✖ yardstick::specificity() masks caret::specificity()
## ✖ recipes::step()          masks stats::step()
## • Use tidymodels_prefer() to resolve common conflicts.

library(themis)

## Warning: package 'themis' was built under R version 4.4.3

library(catboost)

Limpieza de datos

Comenzamos cargando nuestros datasets proporcionados:

train = read.csv('train.csv')
submission = read.csv('submission.csv')
test = read.csv('test.csv')

head(train)

head(submission)

head(test)

En R puedo asignar el nombre de la clase y convertirla a factor ya que asigna internamente un número a cada categoría (empezando desde 1), remplazare la clasifiación que se encuentra de forma númerica a la antes ya mencionada.

nombres_clases = c('Acoustic Folk', 'Alternative', 'Blues', 'Bollywood', 'Country', 'Hip Hop', 'Indie', 'Instrumental', 'Metal', 'Pop', 'Rock' )

train = train %>%
  mutate(Class = factor(nombres_clases[Class + 1]))

head(train)

Verificamos que nuestras 11 clases esten presentes y vemos la cantidad de registros en cada clase:

train %>% count(Class)

El dataset cuenta con más muestras de canciones que pertenecen al genero del Rock, seguido de Pop e Indie.

De igual manera comprobamos que nuestra clase además de las variables predictoras esten tipadas de manera correcta para su uso posterior

str(train)

## 'data.frame':    17996 obs. of  17 variables:
##  $ Artist.Name       : chr  "Bruno Mars" "Boston" "The Raincoats" "Deno" ...
##  $ Track.Name        : chr  "That's What I Like (feat. Gucci Mane)" "Hitch a Ride" "No Side to Fall In" "Lingo (feat. J.I & Chunkz)" ...
##  $ Popularity        : num  60 54 35 66 53 53 48 55 29 14 ...
##  $ danceability      : num  0.854 0.382 0.434 0.853 0.167 0.235 0.674 0.657 0.431 0.716 ...
##  $ energy            : num  0.564 0.814 0.614 0.597 0.975 0.977 0.658 0.415 0.776 0.885 ...
##  $ key               : num  1 3 6 10 2 6 5 5 10 1 ...
##  $ loudness          : num  -4.96 -7.23 -8.33 -6.53 -4.28 ...
##  $ mode              : int  1 1 1 0 1 1 0 1 1 0 ...
##  $ speechiness       : num  0.0485 0.0406 0.0525 0.0555 0.216 0.107 0.104 0.025 0.0527 0.0333 ...
##  $ acousticness      : num  1.71e-02 1.10e-03 4.86e-01 2.12e-02 1.69e-04 3.53e-03 4.04e-01 1.75e-01 2.21e-05 6.14e-02 ...
##  $ instrumentalness  : num  NA 4.01e-03 1.96e-04 NA 1.61e-02 6.04e-03 1.34e-06 5.65e-06 1.30e-03 NA ...
##  $ liveness          : num  0.0849 0.101 0.394 0.122 0.172 0.172 0.0981 0.132 0.179 0.253 ...
##  $ valence           : num  0.899 0.569 0.787 0.569 0.0918 0.241 0.677 0.347 0.318 0.833 ...
##  $ tempo             : num  134 116 148 107 199 ...
##  $ duration_in.min.ms: num  234596 251733 109667 173968 229960 ...
##  $ time_signature    : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ Class             : Factor w/ 11 levels "Acoustic Folk",..: 6 11 7 6 11 7 3 5 9 10 ...

train$mode = as.factor(train$mode)
train$time_signature = as.factor(train$time_signature)

Ya tipando esas dos variables parece ser que nuestros datos estan correctamente, en esta parte de la limpieza de datos y por que lo estamos observando en este str, debemos verificar si tenemos valores faltantes:

colSums(is.na(train))[colSums(is.na(train)) > 0]

##       Popularity              key instrumentalness 
##              428             2014             4377

sum(!complete.cases(train))

## [1] 6183

sum(is.na(train))

## [1] 6819

Tenemos un total de 6819 valores faltantes que se encuentran en 6183 registros, siendo instrumentalness la variable que más NAs presenta, lo que es aproximadamente el 35% de nuestros datos totales, decidir eliminar estos datos nos haría perder mucha información valiosa para decidir como se clasificaran las clases, por lo tanto tendre que explorar otros metodos de imputación.

Al momento de imputar una variable necesito conocer el caso al que pertenece, si la ausencia de un valor es de manera completamente aleatoria, parcialmente aleatoria o de manera sistematica (MCAR, MAR, MNAR) conocer la razón de esto me permite seleccionar el metodo de imputación que sea más adecuado para mis datos ya sea simple como imputar por la media, la mediana o usando una imputacion por KNN, o multiple que són métodos mas sofisticados mice.

La libreria naniar de R me permite tratar de identificar a que tipo de valor faltante corresponde, para esto haré uso de una visualización para tratar de encontrar patrones en los valores faltantes de mi conjunto de datos y también y un test de Little para verificar si los son MCAR (Missing Completely At Random).

vis_miss(train)

Realizamos el test:

mcar_test(train)

Este test evalúa la hipótesis nula de que los datos faltantes son completamente aleatorios (MCAR). Es decir:

H₀ (nula): Los valores faltantes son MCAR.

H₁ (alternativa): Los valores faltantes no son MCAR.

Como p.value 0 no es mayor a 0.05 se rechaza la hipótesis nula.

Ahora bien sabiendo que nuestros datos no son MCAR podemos verificar si la razón de su falta es debido a que son datos parcialemnte aleatorios o sistematicos, en caso de ser sistematicos tengo entendido que es un poco más complicado de tratar ya que se buscaría tratar de encontrar la razón principal de porque no se estan llenando esos valores ya que aplicar estas técnicas podrian introducir un sesgo que modificaría la distribución original de mis datos.

Sin embargo antes de llegar a ese punto, probemos con una imputación para ver si nuestros datos cambian en su distribución, para esto usaré un método de imputación multiple conocido como MICE que permite hacer multiples estimaciones que al final se combinan en un unico valor y estas remplazaran al dato faltante.

mice_train <- mice(train, method = "pmm", m = 5, maxit = 5, seed = 500)

## 
##  iter imp variable
##   1   1  Popularity  key  instrumentalness
##   1   2  Popularity  key  instrumentalness
##   1   3  Popularity  key  instrumentalness
##   1   4  Popularity  key  instrumentalness
##   1   5  Popularity  key  instrumentalness
##   2   1  Popularity  key  instrumentalness
##   2   2  Popularity  key  instrumentalness
##   2   3  Popularity  key  instrumentalness
##   2   4  Popularity  key  instrumentalness
##   2   5  Popularity  key  instrumentalness
##   3   1  Popularity  key  instrumentalness
##   3   2  Popularity  key  instrumentalness
##   3   3  Popularity  key  instrumentalness
##   3   4  Popularity  key  instrumentalness
##   3   5  Popularity  key  instrumentalness
##   4   1  Popularity  key  instrumentalness
##   4   2  Popularity  key  instrumentalness
##   4   3  Popularity  key  instrumentalness
##   4   4  Popularity  key  instrumentalness
##   4   5  Popularity  key  instrumentalness
##   5   1  Popularity  key  instrumentalness
##   5   2  Popularity  key  instrumentalness
##   5   3  Popularity  key  instrumentalness
##   5   4  Popularity  key  instrumentalness
##   5   5  Popularity  key  instrumentalness

## Warning: Number of logged events: 2

train_imputado <- complete(mice_train, 1)

Ahora bien ya que tengo mi dataset con valores imputados lo siguiente es verificar si las distribuciones de las variables a las cuales se les aplico MICE cambian su distribución.

hist(train$instrumentalness)

hist(train_imputado$instrumentalness)

hist(train$Popularity)

hist(train_imputado$Popularity)

hist(train$key)

hist(train_imputado$key)

Como se puede observar la distribución de estas características no esta cambiando, es decir nuestro método de imputación funciona correctamente ya que no se esta introduciendo un sesgo que cause la modificación de la forma en la que se distribuyen los datos.

Ahora bien a veces estos NAs nos pueden dar información de patrones importantes para mejorar la presición de nuestro modelo de machine learning por lo tanto crearé unas columnas extras conocidas como indicadores de valores faltantes que es una variable binaria que indica si se imputo un valor o no, estas nuevas caractericas nos puede decir si ciertos valores que fueron imputados se asocian a un genero musical.

# MVI para cada variable con datos faltantes
train_imputado$Popularity_Imputado = ifelse(is.na(train$Popularity), 1, 0)
train_imputado$key_Imputado = ifelse(is.na(train$key), 1, 0)
train_imputado$instrumentalness_Imputado = ifelse(is.na(train$instrumentalness), 1, 0)

Las convertimos a variables de tipo factor ya que son variables categoricas binarias:

train_imputado$Popularity_Imputado = as.factor(train_imputado$Popularity_Imputado)
train_imputado$key_Imputado = as.factor(train_imputado$key_Imputado)
train_imputado$instrumentalness_Imputado = as.factor(train_imputado$instrumentalness_Imputado)

head(train_imputado)

Note que Key es un factor por lo tanto lo convierto al tipo de dato que corresponde:

train_imputado$key = as.factor(train_imputado$key)

Verificamos nuevamente si tenemos valores faltantes:

sum(is.na(train_imputado))

## [1] 0

Con esto podemos continuar con nuestra limpieza de datos, el siguiente paso es verificar si tenemos registros repetidos:

sum(duplicated(train_imputado))

## [1] 0

Observamos si tenemos outliers en nuestras variables númericas:

variables <- c("Popularity", "danceability", "energy", "loudness", "speechiness", "acousticness", "instrumentalness", "liveness",
               "valence", "tempo", "duration_in.min.ms")

for (var in variables) {
  print(
    ggplot(train_imputado, aes_string(y = var)) +
      geom_boxplot(fill = "#ededaf") +
      theme_bw() +
      scale_fill_brewer(palette = "Set2") +
      ggtitle(paste("Boxplot de", var))
  )
}

## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Filtro la variable con valores considerados atípicos para observar sus características

train_imputado[train_imputado$Popularity > 90, ]

Para el caso de popularidad los outliers que presentan son totalmente creíbles ya que no todas las canciones alcanzan a ser tendencia, por lo tanto estos valores atípicos los conservaré porque incluso pueden decirme si estos éxitos se relacionan a un género en especifico.

var_popular = train_imputado[train_imputado$Popularity > 90, ]
var_popular %>% count(Class)

En este caso el Genero Pop es el que ha alcanzado mayores niveles de popularidad.

Para las demás caracteristicas musicales se puede observar que los valores atipicos que se muestran, son debido a los diferentes tipos de genero de música que se encuentran en el conjunto de datos de la misma forma estos valores que consideramos atípicos nos pueden dar información relacionada a algun genero en especial por ejemplo la variable loudness.

var_loudness = data.frame(train_imputado[train_imputado$loudness < -15, ])
var_loudness %>% count(Class)

Como se puede observar de este filtrado de datos atípicos en la variable loudnnes, el genero musical que más se encuentra presente en estos datos es el Instrumental.

Teniendo encuenta esta información acerca de las variables y outliers no eliminaré los valores atipicos ya que me estan aportando información que hasta el momento me puede resultar util.

Por ultimo visualizaré si tengo errores tipográficos en mis variables categóricas:

levels(train_imputado$Class)

##  [1] "Acoustic Folk" "Alternative"   "Blues"         "Bollywood"    
##  [5] "Country"       "Hip Hop"       "Indie"         "Instrumental" 
##  [9] "Metal"         "Pop"           "Rock"

No hay errores tipograficos.

Analisis Exploratorio

Visualizamos las estadísticas descriptivas del dataset para entender un poco más de nuestros datos.

skim(train_imputado)

Data summary
Name	train_imputado
Number of rows	17996
Number of columns	20
_______________________
Column type frequency:
character	2
factor	7
numeric	11
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
Artist.Name	0	1	1	153	0	9149	0
Track.Name	0	1	1	132	0	15129	0

Variable type: factor

skim_variable	complete_rate	ordered	n_unique	top_counts
key	1	FALSE	11	7: 2384, 2: 2289, 9: 2176, 1: 1879
mode	1	FALSE	2	1: 11459, 0: 6537
time_signature	1	FALSE	4	4: 16451, 3: 1228, 5: 203, 1: 114
Class	1	FALSE	11	Roc: 4949, Ind: 2587, Pop: 2524, Met: 1854
Popularity_Imputado	1	FALSE	2	0: 17568, 1: 428
key_Imputado	1	FALSE	2	0: 15982, 1: 2014
instrumentalness_Imputado	1	FALSE	2	0: 13619, 1: 4377

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Popularity	1	44.51	17.43	1.00	33.00	44.00	56.00	100.00	▂▇▇▃▁
danceability	1	0.54	0.17	0.06	0.43	0.54	0.66	0.99	▁▃▇▅▁
energy	1	0.66	0.24	0.00	0.51	0.70	0.86	1.00	▁▂▅▇▇
loudness	1	-7.91	4.05	-39.95	-9.54	-7.02	-5.19	1.35	▁▁▁▇▇
speechiness	1	0.08	0.08	0.02	0.03	0.05	0.08	0.96	▇▁▁▁▁
acousticness	1	0.25	0.31	0.00	0.00	0.08	0.43	1.00	▇▂▁▁▂
instrumentalness	1	0.15	0.28	0.00	0.00	0.00	0.11	1.00	▇▁▁▁▁
liveness	1	0.20	0.16	0.01	0.10	0.13	0.26	1.00	▇▂▁▁▁
valence	1	0.49	0.24	0.02	0.30	0.48	0.67	0.99	▅▇▇▆▃
tempo	1	122.62	29.57	30.56	99.62	120.07	141.97	217.42	▁▆▇▃▁
duration_in.min.ms	1	200744.46	111989.13	0.50	166337.00	209160.00	252490.00	1477187.00	▇▁▁▁▁

Como podemos observar principalmente en nuestras variables de tipo númericas la mayoría de estas variables se encuentran en un rango entre 0 y 1, sin embargo tenemos otras variables que estan saliendo de esta escala tales como popularity, tempo y especialmente duration_in.min.ms, esto nos dice que debido a la diferencia de magnitudes lo correcto sería normalizar nuestros datos numericos para que todos esten en una misma escala.

Para las variables categoricas o de tipo factor en especial las que no son de tipo binario, se encuentran representadas con un valor númerico por lo tanto se tienen que convertir a variables en codificzación One - Hot, para su correcto uso en modelos de ML, en este caso como usaré el paquete caret para entrenar los modelos este ya se encarga de realizar estos arreglos de manera automatica, sin embargo si esto no fuera así tendríamos que hacerlo de manera manual.

Para las dos variables de tipo caracter que indica el nombre del artista y el nombre de la canción los dejaré en un dataset aparte ya que para el entrenamiento del modelo este tipo de dato no me es funcional.

Ahora bien analicemos la distribución de cada una de nuestras variables.

Analisis Univariado

Como lo había mencionado antes, la variable duration_in.min.ms puede causarme algunos problemas al querer visualizar mis datos debido a los rangos de estos numeros, como estan en milisegundos transformare la variable a valores de minutos para que se pueda entender mejor la visualizacion de mi histograma al querer ver la distribución de esta variable

train_imputado$duration_in.min.ms <- train_imputado$duration_in.min.ms/ 60000

Enseguida ploteamos la distribución de cada variable númerica:

train_imputado %>%
  pivot_longer(cols = where(is.numeric)) %>%
  ggplot(aes(x = value)) +
  geom_histogram(bins = 15, fill = "skyblue", color = "white") +
  facet_wrap(~name, scales = "free") +
  theme_minimal()

Podemos tomar nota de algunas observaciones, variables como acousticness, instrumentalness, liveness y speechiness tienen valores con mayor frecuencia en 0 o cercano a 0, la mayoría de las canciones duran menos de 5 minutos, las caracteristicas energy y loudness tienen un sesgo a la izquierda es decir la mayoria de los datos se encuentran en los valores más altos, en tempo la mayoría de los datos parecen tener un valor de entre 80 a 140, mientras que tanto danceability como popularity estan siguiendo una distribución normal.

Para el caso de nuestras variables categoricas:

train_imputado %>%
  pivot_longer(cols = where(is.factor)) %>%
  filter(name != unique(name)[4]) %>% #Para filtrar la 4 variable categorica en orden del dataset
  ggplot(aes(x = value, fill = name)) +
  geom_bar(color = "white") +
  facet_wrap(~name, scales = "free") +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal() +
  theme(legend.position = "none") # Para que no aparezca la etiqueta que indica que es cada color

Podemos tomar nota de las siguientes observaciones:

De las variables que imputamos instrumentalness fue la que más casos tuvo de imputación. Casi todas las canciones tienen un time_signature de 4. En el mode que investigando es una variable que nos la escala de la canción donde para 1 es mayor y 0 es menor tambien informa acerca de la sensacion emocional de una canción siendo 1 una canción más alegre, vibrante y 0 una melodia más melancolica, en nueestro dataset predonminan las canciones con una escala mayor o más alegres. El Key usualmente se usa en combinación del mode en este caso observamos como la menor categoría de Key corresponde a 3.

Y para visualizar mejor mi variable objetivo Class ploteo por separado:

ggplot(train_imputado, aes(x = Class, fill = Class )) +
  geom_bar(color = 'white') +
  theme_minimal() +
  labs(title = "Distribución de la Variable Class", y = 'Conteo', x = 'Generos') +
  scale_fill_viridis(discrete = TRUE, option = "H") + #Uso de viridis para paletas dde más de 11 colores
  theme(legend.position = 'none')

Con respecto a nuestra variable objetivo podemos decir lo siguiente:

La mayoría de las canciones son de género Rock y el minimo de Country, tenemos una desproporción importante en la mayoría de las clases por lo que seguramente para obtener mejores resultados en la clasificación debamos de tratar este desbalance de datos.

Ahora bien, en este punto me gustaría analizar la relaciones existentes entre mis variables predictoras y mi variable objetivo, sin embargo tengo un problema de que al ser 11 generos y tener varias variables puede ser mas practico analizar aquellas que tienen un mayor valor predictivo en mi modelo, en este caso lo que decido hacer es una seleccion de caracteristicas, las que tienen mejores resultados usando una regresión logística con seleccion hacia adelante.

Para esto debo hacer unos ajustes al dataset

Guardamos el nombre del artista y de la canción en un frame aparte:

train_nombres = train_imputado[, 1:2]
head(train_nombres)

Eliminamos esas columnas en nuestro set de entrenamiento:

train_imputado = train_imputado[, -c(1,2)]

head(train_imputado)

Aplicamos el modelo de regresión:

set.seed(123)
simple_modelo = multinom(Class ~ 1, data = train_imputado)

## # weights:  22 (10 variable)
## initial  value 43152.523329 
## iter  10 value 38312.533880
## final  value 38225.740629 
## converged

modelo_seleccion = multinom(Class ~ ., data = train_imputado)

## # weights:  330 (290 variable)
## initial  value 43152.523329 
## iter  10 value 37930.566438
## iter  20 value 35168.732062
## iter  30 value 31761.685884
## iter  40 value 29696.293958
## iter  50 value 28180.569414
## iter  60 value 25894.919073
## iter  70 value 25016.622066
## iter  80 value 24499.917627
## iter  90 value 24190.260483
## iter 100 value 23991.502310
## final  value 23991.502310 
## stopped after 100 iterations

modelo_sel_adelante = stats::step (simple_modelo,
                   scope = formula(modelo_seleccion),
                   direction = "forward",
                   trace = FALSE)

## trying + Popularity 
## trying + danceability 
## trying + energy 
## trying + key 
## trying + loudness 
## trying + mode 
## trying + speechiness 
## trying + acousticness 
## trying + instrumentalness 
## trying + liveness 
## trying + valence 
## trying + tempo 
## trying + duration_in.min.ms 
## trying + time_signature 
## trying + Popularity_Imputado 
## trying + key_Imputado 
## trying + instrumentalness_Imputado 
## # weights:  33 (20 variable)
## initial  value 43152.523329 
## iter  10 value 37931.769917
## iter  20 value 34988.953809
## iter  30 value 32558.076621
## iter  40 value 32514.564101
## final  value 32514.354991 
## converged
## trying + Popularity 
## trying + danceability 
## trying + energy 
## trying + key 
## trying + loudness 
## trying + mode 
## trying + speechiness 
## trying + acousticness 
## trying + instrumentalness 
## trying + liveness 
## trying + valence 
## trying + tempo 
## trying + time_signature 
## trying + Popularity_Imputado 
## trying + key_Imputado 
## trying + instrumentalness_Imputado 
## # weights:  44 (30 variable)
## initial  value 43152.523329 
## iter  10 value 37177.611972
## iter  20 value 35092.406638
## iter  30 value 31561.120558
## iter  40 value 30068.783886
## iter  50 value 29990.963832
## iter  60 value 29983.513807
## final  value 29983.371724 
## converged
## trying + Popularity 
## trying + danceability 
## trying + key 
## trying + loudness 
## trying + mode 
## trying + speechiness 
## trying + acousticness 
## trying + instrumentalness 
## trying + liveness 
## trying + valence 
## trying + tempo 
## trying + time_signature 
## trying + Popularity_Imputado 
## trying + key_Imputado 
## trying + instrumentalness_Imputado 
## # weights:  55 (40 variable)
## initial  value 43152.523329 
## iter  10 value 36381.254841
## iter  20 value 33967.960513
## iter  30 value 30697.273472
## iter  40 value 28954.647866
## iter  50 value 28342.551057
## iter  60 value 28312.341811
## iter  70 value 28308.895130
## iter  80 value 28308.185768
## final  value 28308.164200 
## converged
## trying + Popularity 
## trying + danceability 
## trying + key 
## trying + loudness 
## trying + mode 
## trying + acousticness 
## trying + instrumentalness 
## trying + liveness 
## trying + valence 
## trying + tempo 
## trying + time_signature 
## trying + Popularity_Imputado 
## trying + key_Imputado 
## trying + instrumentalness_Imputado 
## # weights:  66 (50 variable)
## initial  value 43152.523329 
## iter  10 value 36092.917250
## iter  20 value 32794.053142
## iter  30 value 29706.609590
## iter  40 value 28198.579529
## iter  50 value 27363.742635
## iter  60 value 27182.760088
## iter  70 value 27160.744209
## iter  80 value 27157.834140
## iter  90 value 27156.786732
## final  value 27156.780573 
## converged
## trying + Popularity 
## trying + key 
## trying + loudness 
## trying + mode 
## trying + acousticness 
## trying + instrumentalness 
## trying + liveness 
## trying + valence 
## trying + tempo 
## trying + time_signature 
## trying + Popularity_Imputado 
## trying + key_Imputado 
## trying + instrumentalness_Imputado 
## # weights:  77 (60 variable)
## initial  value 43152.523329 
## iter  10 value 35214.884747
## iter  20 value 32502.785045
## iter  30 value 28787.991356
## iter  40 value 27549.788265
## iter  50 value 26810.294547
## iter  60 value 26351.491607
## iter  70 value 26278.421085
## iter  80 value 26265.813456
## iter  90 value 26262.514972
## iter 100 value 26260.526741
## final  value 26260.526741 
## stopped after 100 iterations
## trying + Popularity 
## trying + key 
## trying + loudness 
## trying + mode 
## trying + acousticness 
## trying + instrumentalness 
## trying + liveness 
## trying + tempo 
## trying + time_signature 
## trying + Popularity_Imputado 
## trying + key_Imputado 
## trying + instrumentalness_Imputado 
## # weights:  88 (70 variable)
## initial  value 43152.523329 
## iter  10 value 34258.589575
## iter  20 value 31091.588704
## iter  30 value 28784.946823
## iter  40 value 26823.805834
## iter  50 value 26283.960091
## iter  60 value 25796.421017
## iter  70 value 25535.931714
## iter  80 value 25464.898898
## iter  90 value 25451.600688
## iter 100 value 25448.117059
## final  value 25448.117059 
## stopped after 100 iterations
## trying + Popularity 
## trying + key 
## trying + loudness 
## trying + mode 
## trying + acousticness 
## trying + liveness 
## trying + tempo 
## trying + time_signature 
## trying + Popularity_Imputado 
## trying + key_Imputado 
## trying + instrumentalness_Imputado 
## # weights:  99 (80 variable)
## initial  value 43152.523329 
## iter  10 value 34541.785033
## iter  20 value 30984.474838
## iter  30 value 28534.261251
## iter  40 value 27002.777551
## iter  50 value 25934.651883
## iter  60 value 25495.453522
## iter  70 value 25016.995396
## iter  80 value 24835.436095
## iter  90 value 24774.012901
## iter 100 value 24761.834939
## final  value 24761.834939 
## stopped after 100 iterations
## trying + Popularity 
## trying + key 
## trying + loudness 
## trying + mode 
## trying + acousticness 
## trying + liveness 
## trying + tempo 
## trying + time_signature 
## trying + Popularity_Imputado 
## trying + key_Imputado 
## # weights:  110 (90 variable)
## initial  value 43152.523329 
## iter  10 value 37182.624555
## iter  20 value 31972.154951
## iter  30 value 29687.694710
## iter  40 value 27147.748797
## iter  50 value 26058.670367
## iter  60 value 25145.958907
## iter  70 value 24844.758415
## iter  80 value 24483.467851
## iter  90 value 24294.513743
## iter 100 value 24217.564795
## final  value 24217.564795 
## stopped after 100 iterations
## trying + key 
## trying + loudness 
## trying + mode 
## trying + acousticness 
## trying + liveness 
## trying + tempo 
## trying + time_signature 
## trying + Popularity_Imputado 
## trying + key_Imputado 
## # weights:  121 (100 variable)
## initial  value 43152.523329 
## iter  10 value 37117.233883
## iter  20 value 31403.214060
## iter  30 value 29396.588132
## iter  40 value 27045.575410
## iter  50 value 25688.641590
## iter  60 value 25062.778284
## iter  70 value 24604.433313
## iter  80 value 24347.046188
## iter  90 value 24138.423447
## iter 100 value 23999.693152
## final  value 23999.693152 
## stopped after 100 iterations
## trying + key 
## trying + loudness 
## trying + mode 
## trying + liveness 
## trying + tempo 
## trying + time_signature 
## trying + Popularity_Imputado 
## trying + key_Imputado 
## # weights:  132 (110 variable)
## initial  value 43152.523329 
## iter  10 value 37607.874677
## iter  20 value 33201.766099
## iter  30 value 31188.012248
## iter  40 value 29254.717732
## iter  50 value 26615.923217
## iter  60 value 25598.400367
## iter  70 value 24701.288000
## iter  80 value 24338.503828
## iter  90 value 24096.284216
## iter 100 value 23882.996405
## final  value 23882.996405 
## stopped after 100 iterations
## trying + key 
## trying + mode 
## trying + liveness 
## trying + tempo 
## trying + time_signature 
## trying + Popularity_Imputado 
## trying + key_Imputado

formula(modelo_sel_adelante)

## Class ~ duration_in.min.ms + energy + speechiness + danceability + 
##     valence + instrumentalness + instrumentalness_Imputado + 
##     Popularity + acousticness + loudness
## attr(,"variables")
## list(Class, duration_in.min.ms, energy, speechiness, danceability, 
##     valence, instrumentalness, instrumentalness_Imputado, Popularity, 
##     acousticness, loudness)
## attr(,"factors")
##                           duration_in.min.ms energy speechiness danceability
## Class                                      0      0           0            0
## duration_in.min.ms                         1      0           0            0
## energy                                     0      1           0            0
## speechiness                                0      0           1            0
## danceability                               0      0           0            1
## valence                                    0      0           0            0
## instrumentalness                           0      0           0            0
## instrumentalness_Imputado                  0      0           0            0
## Popularity                                 0      0           0            0
## acousticness                               0      0           0            0
## loudness                                   0      0           0            0
##                           valence instrumentalness instrumentalness_Imputado
## Class                           0                0                         0
## duration_in.min.ms              0                0                         0
## energy                          0                0                         0
## speechiness                     0                0                         0
## danceability                    0                0                         0
## valence                         1                0                         0
## instrumentalness                0                1                         0
## instrumentalness_Imputado       0                0                         1
## Popularity                      0                0                         0
## acousticness                    0                0                         0
## loudness                        0                0                         0
##                           Popularity acousticness loudness
## Class                              0            0        0
## duration_in.min.ms                 0            0        0
## energy                             0            0        0
## speechiness                        0            0        0
## danceability                       0            0        0
## valence                            0            0        0
## instrumentalness                   0            0        0
## instrumentalness_Imputado          0            0        0
## Popularity                         1            0        0
## acousticness                       0            1        0
## loudness                           0            0        1
## attr(,"term.labels")
##  [1] "duration_in.min.ms"        "energy"                   
##  [3] "speechiness"               "danceability"             
##  [5] "valence"                   "instrumentalness"         
##  [7] "instrumentalness_Imputado" "Popularity"               
##  [9] "acousticness"              "loudness"                 
## attr(,"order")
##  [1] 1 1 1 1 1 1 1 1 1 1
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 1
## attr(,"predvars")
## list(Class, duration_in.min.ms, energy, speechiness, danceability, 
##     valence, instrumentalness, instrumentalness_Imputado, Popularity, 
##     acousticness, loudness)
## attr(,"dataClasses")
##                     Class        duration_in.min.ms                    energy 
##                  "factor"                 "numeric"                 "numeric" 
##               speechiness              danceability                   valence 
##                 "numeric"                 "numeric"                 "numeric" 
##          instrumentalness instrumentalness_Imputado                Popularity 
##                 "numeric"                  "factor"                 "numeric" 
##              acousticness                  loudness 
##                 "numeric"                 "numeric"

Como podemos observar en los resultados la formula dada como:

Class ~ duration_in.min.ms + energy + speechiness + danceability + valence + instrumentalness + instrumentalness_Imputado + Popularity + acousticness + loudness

Son las variables que mejor aportan información al modelo, de 20 variables originales que teniamos hemos podido reducirlas a 10, otra cosa que se puede hacer durante esta fase del analisis exploratorio sería entrenar un modelo de predicción e inmediatamente mostrar cuales serían sus variables con mayor importancia.

En este caso aplique un random forest a los datos.

modelo_var_caret = train(
  Class ~.,
  data = train_imputado, 
  method = "rf",
  trControl = trainControl(method = "none")
)

# Importancia de variables
varImp(modelo_var_caret)

## rf variable importance
## 
##   only 20 most important variables shown (out of 28)
## 
##                            Overall
## duration_in.min.ms         100.000
## acousticness                83.448
## speechiness                 81.035
## danceability                70.875
## energy                      69.951
## instrumentalness            68.575
## Popularity                  63.470
## valence                     63.268
## loudness                    61.347
## tempo                       49.343
## liveness                    48.609
## instrumentalness_Imputado1  16.427
## mode1                        9.006
## key7                         5.972
## key9                         5.938
## key2                         5.691
## key_Imputado1                4.901
## key4                         4.640
## key5                         4.532
## key11                        4.274

En los resultados obtenidos de aplicar este modelo observamos que las variables con mayor importancia de igual manera coinciden con las variables que obtuvimos en nuestra seleccion usando regresioón logistica hacia adelante.

Con esto más adelante puedo entrenar dos modelos cada uno con un número deistinto de predictores y comprar resultados, ahora bien lo que me interesa es conocer como estas variables se relacionan con mi clase y si es verdad que puedo identificar datos que me ayuden a clasificar cada genero.

Analisis Bivariado

Antes de comenzar con esta parte, note que mis datos tenian un problema en la variable de de duración, habia generos en donde su duración era 0, ya que hice anteriorimnete la conversion a minutos habia canciones que al convertir se hacian extremadamente pequeñas, por lo tanto decidí revisar la columna original y note algunas cosas interesantes.

Como lo dice el nombre de esa misma variable, algunos registros estaban en minutos y otros en milisegundos por lo tanto para no tener este problema aplique un filtrado.

train[train$duration_in.min.ms > 20 & train$duration_in.min.ms < 100 , ]

Esto me sirve para conocer el maximo tiempo en minutos de duración de una canción y despues de verificarlo en la plataforma de Spotify estos datos me concuerdan por lo tanto conservo estos valores y aplico la conversión a minutos de los registros que esten en otro formato:

train_imputado$duration_in.min.ms = train$duration_in.min.ms
train_imputado <- train_imputado %>%
  mutate(
    ms_a_minutos = if_else(duration_in.min.ms > 50, 1, 0),
    duration_in.min.ms = if_else(duration_in.min.ms > 50, duration_in.min.ms / 60000, duration_in.min.ms))

De igual manera agregue una nueva columna que me indica que canciones fueron transformadas a minutos.

Ahora los datos de esa columna corresponden al mismo tipo empezare graficando cada una de mis variables númericas siguiendo el hallazgo comentado anteriormente, es decir mostrandolas por nivel de importancia:

vars_bivariado <- c("acousticness", "speechiness", "energy", "danceability", "instrumentalness", "Popularity", "valence", "loudness", "duration_in.min.ms", "tempo", "liveness")

# Bucle para crear los gráficos
for (var in vars_bivariado) {
  print(
    ggplot(train_imputado, aes(x = Class, y = .data[[var]], fill = Class)) +
      geom_boxplot(color = "black") +
      theme_minimal() +
      labs(title = paste("Distribución de", var, "por Género"),
           y = var) +
      scale_fill_viridis(discrete = TRUE, option = "H") +
      theme(legend.position = 'none')
    
  )
}

Observaciones:

Para el caso de acousticness nos esta aportando información que nos ayuda a identificar los valores en que se estan encontrando los datos, podemos observar que la mediana de rock esta muy cercana a 0 lo cual es entendible dado al tipo de música, caso contrario de el genero Instrumental que en su mayoria no son sonidos electronicos.

Speechiness hace una distinción clara con respecto al género de Hip Hop, ya que los intervalos en donde se encuentran la mayoría de sus datos difieren de notoriamente de los demás generos.

En energy encontramos que los valores más cercanos a 1 pertenecen al genero del Metal.

Para danceability los valores más cercanos a 1 son del género de Hip Hop

Es obvio que en instrumentalness es el genero que más cercano pertenece a 1 ya que la mayoría de estas canciones son completamente instrumentales y es de esperarse que generos como el country o el pop tengan valores cercanos a 0 ya que en estos predominan la música y la voz, claramente hay excepciones que podemos observar en los outliers.

En popularity podemos observar que los generos menos populares son el Acoustic Folks, Alternative, Blues y Bollywood.

En valence podemos notar que el genero instrumental es el aquel que produce sensaciones más melancolica además de tener los dB más bajos en comparación a los otros generos.

De las ultimas dos variables no hay mucho que decir, tienen valores muy similares.

Para el caso de nuestras variables categoricas tenemos:

vars_categoricas = c("mode", "time_signature", "key", "key_Imputado", "Popularity_Imputado", "instrumentalness_Imputado") 

for (var in vars_categoricas) {
  print(
    ggplot(train_imputado, aes(x = .data[[var]], fill = Class)) +
      geom_bar(position = "dodge", color = "black") +
      labs(title = paste("Distribución proporcional de", var, "por Género"),
           x = var,
           y = "Proporción") +
      scale_fill_viridis(discrete = TRUE, option = "H") +
      theme_minimal()
  )
}

Use graficas apiladas para ver la relación en los casos de los registros en una variable categorica con respecto a mi clase objetivo, según la selección de variables de mi regresión hacia adelante y las variables de imporrtancia de mi modelo con rf, la variable categorica que más me aportaba valor era Instrumentalness_Imputado y ahora que visualizo la gráfica puedo entender el por qué.

Por lo que puedo observar en todos los casos donde se imputo instrumentalness, tenemos varios registros en pop, rock y hiphop además de no haber ningun registro que pertenece a la clase de instrumental.

Con respecto a las demás variables debido al desbalance de clases en todos hay mucho más casos que se cumplen en Rock que puede ser porque representa una mayor cantidad de registros por lo que de igual manera hago enfasis en que mi modelo predictivo puede mejorar si trato este desbalance de clases.

Analisis Multivariado

En esta parte me gustaría conocer si hay alguna relación entre mis variables númericas:

matriz = train_imputado %>%
  select_if(is.numeric)
matriz_corr = cor(matriz)
ggcorrplot::ggcorrplot(matriz_corr, lab = TRUE, type = "upper", lab_size = 3)

Mi matriz de correlación me indica que energy se relaciona con dos variables, tiene una relación negativa con acousticness esto me dice que mientras el valor de esta aumente energy tiende a disminuir y viceversa en el caso que disminuya.

Caso contrario a loudness que al ser una correlación positiva ambas tienden a tener valores que suben o bajan juntos.

De igual manera tanto acosuticnees como loudness tienden a estar relacionadas de manera negativa.

Puede que exista el problema de multicolinealidad al haber estas relaciones con la variable energy, sin embargo los modelos que pienso usar son robustos a estos problemas y si en caso de que me causara algún detalle en el modelo, entonces ya adoptaría por tratar este problema, aunque como se ve en el plot, la mayoria de mis datos tienen una correlación muy baja.

Conclusiones de mi EDA:

Por lo que se vio en la selección de variables con regresión logística, además del modelo de rf para obtener la importancia de las características y de las visualizaciones en mi analisis bivariado y multivariado puedo decir que las caracteristicas númericas son las que mayor información predictiva aportan a mi modelo ya que predictores como acousticness me permite identificar intervalos claros en donde se encuentran valores que diferencian a los géneros, en el caso de mis variables categoricas para obtener información más acertada considero que es necesario aplicar un balance en mis clases además de que esto puede ayudar a mi modelo.

División Entrenamiento y Prueba

Para este punto, a mi conjunto de train le realizaré una partición del 20% para evaluar con mi matriz de confunsión, usando createDataPartition para obtener porcentajes iguales en la distribución de mis clases, sin embargo los modelos que trate solo con las características originales del dataset me daban resultados muy malos con un 30% de exactitud y un kappa de apenas un 20% aproximadamente.

Debido a los resultados obtenidos e investigando como poder mejorar el rendimiento en mi modelo, anexe nuevamente las columnas de “Artist_Name” y “Track_Name”, para trabajar con las estadísticas relacionadas a cada uno de los artistas.

En primer lugar converti toda la cadena a minusculuas:

train_imputado$Artist_Name = tolower(train$Artist.Name)
train_imputado$Track_Name = tolower(train$Track.Name)

Después de eso, aplique una pequeña limpieza en el texto, donde todo lo que no sea una letra o un numero se remplaza con un espacio, si existe mas de un espacio se reduce solo a uno y elimino los espacios que hay al inicio y al final de cada artista y canción.

# Fun limpia texto
clean_text = function(x) {
  x = gsub("[^a-zA-Z0-9]", " ", x)         # Reemplaza todo lo que NO sea letra o número por espacio    
  x = gsub("\\s+", " ", x)                 # Reduce múltiples espacios a uno solo
  x = trimws(x)                            # Elimina espacios al inicio y final
  return(x)
}

# Aplicar a columnas del dataframe
train_imputado$Artist_Name = sapply(train_imputado$Artist_Name, clean_text)
train_imputado$Track_Name  = sapply(train_imputado$Track_Name, clean_text)

Ingeniería de Características

En este caso para poder mejorar mi modelo, creé variables que me aportaran nueva información ahora con relación a los artistas ya que generalmente coinciden en canciones con el mismo género y agrupar estas caraterísticas me puede dar información que me ayude en la clasificación:

train_3 = train_imputado %>%
  mutate(stats_all = Popularity + danceability + energy + loudness + speechiness + #atats all es nueva columna con la suma de todas las stats
               acousticness + instrumentalness + liveness + valence + tempo)

# Creacion de nuevas variables con respescto a cada artista con cada stat
stats_artistas = train_3 %>%
  group_by(Artist_Name) %>%
  summarise(
    Artist_Popularity = max(Popularity),
    Artist_danceability = median(danceability),
    Artist_energy = median(energy),
    Artist_loudness = median(loudness),
    Artist_speechiness = median(speechiness),
    Artist_acousticness = median(acousticness),
    Artist_instrumentalness = median(instrumentalness),
    Artist_liveness = median(liveness),
    Artist_valence = median(valence),
    Artist_tempo = median(tempo),
    Artist_All = mean(stats_all),
    Artist_duration = mean(duration_in.min.ms),
    Artist_Track = n()
  )

# Unir las estadísticas agregadas de nuevo al dataset
train_3 <- left_join(train_3, stats_artistas, by = "Artist_Name")

Use la mediana de los valores de los artistas para generar estas columnas y con respecto a las originales, cuando las probe en el modelo junto a mis nuevas variables realmente no tenian un gran impacto en la mejora por lo tanto para evitar la maldición de dimensionalidad las elimine de mi dataset y me quede con las tranformadas.

train_3 = train_3[, -c(1:14)]

Anexe las variables categoricas:

train_3[, c("mode", "time_signature", "key") ] = train_imputado [,c("mode", "time_signature", "key")]

Les aplique codificación one hot, ya que no eran categoricas binarias y elimine las variables sin codificar:

dummies = dummyVars(~ key + time_signature, data = train_3)

# Aplicar la transformación a train y test
key_time_dummy = predict(dummies, newdata = train_3)

key_time_dummy = as.data.frame(key_time_dummy)

train_3$key = NULL
train_3$time_signature = NULL

Anexe las variables ya codificadas de manera correcta nuevamente al dataframe:

train_3 = cbind(train_3, key_time_dummy)

Para este punto después de dar un tratamiento más detallado a mis datos, nuevamente hice mi división correspondiente:

set.seed(1966)
trainIndex = caret::createDataPartition(train_3$Class, p = 0.8, list = FALSE)
train_model = train_3[trainIndex, ]
validation = train_3[-trainIndex, ]

Verificamos que se mantengan proporcionales:

train_model %>% 
  count(Class) %>% 
  mutate(Prop = n / sum(n))

validation %>% 
  count(Class) %>% 
  mutate(Prop = n / sum(n))

De igual manera para tratar de mejorar la clasificación del modelo trato el desbalance que existe generando nuevos datos sinteticos con ayuda de smote, para esto necesito que todas mis caracteristicas a excepción de la variable objetivo sean númericas, esto lo aplique tanto en el conjunto de entrenamiento tanto como en el de validación:

train_model <- train_model %>%
  mutate(across(where(is.factor), ~ as.numeric(.)))

validation = validation %>%
  mutate(across(where(is.factor), ~ as.numeric(.)))

Converti toda variable que era factor a númerica por lo tanto vuelvo a convertir en factor a la variable objetivo.

train_model$Class = as.factor(train_model$Class)

validation$Class = as.factor(validation$Class)

Ahora bien, con ayuda de la libreria themis aplico un smote del 0.15, eliminando las variables de Artist_Name y Track_Name ya que al ser de tipo caracter causan conflicto en la generación de estos nuevos datos.

rec <- recipe(Class ~ ., data = train_model) %>%
  step_rm(Artist_Name, Track_Name) %>%         # Quitar variables no deseadas
  step_normalize(all_numeric_predictors()) %>% # Normaliza solo predictores numéricos
  step_smote(Class, over_ratio = 0.15)                            # Aplica SMOTE

# Prepara la receta
rec_prep <- prep(rec)

# Obtén los datos balanceados
train_bal <- bake(rec_prep, new_data = NULL)

Estamos listo para entrenar nuestro modelo de Machine Learning.

Entrenamiento del Modelo de ML

Para ir comenzando usaré un simple arbol de decisión que anteriormente, sin aplicar transformaciones a variables, me daba un resultado de 33% de exactitud veamos si hay mejora ya con estas nuevas caractristicas y con el tratamienro del desbalance de las clases.

control = trainControl(method = "cv", number = 5)

modelo_rpart = train(
  Class ~ ., 
  data = train_bal, 
  method = "rpart",        
  trControl = control
)

modelo_rpart

## CART 
## 
## 15185 samples
##    34 predictor
##    11 classes: '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 12148, 12148, 12147, 12148, 12149 
## Resampling results across tuning parameters:
## 
##   cp          Accuracy   Kappa     
##   0.03492205  0.3833397  0.22737262
##   0.04650334  0.3489625  0.18431569
##   0.05060134  0.2896326  0.06560006
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.03492205.

varImp(modelo_rpart)

## rpart variable importance
## 
##   only 20 most important variables shown (out of 34)
## 
##                           Overall
## Artist_acousticness        100.00
## Artist_energy               92.03
## Artist_speechiness          74.21
## Artist_danceability         71.25
## ms_a_minutos                60.58
## Artist_instrumentalness     33.90
## Artist_loudness             32.94
## Artist_valence              21.02
## instrumentalness_Imputado   19.89
## key.8                        0.00
## Artist_tempo                 0.00
## key.10                       0.00
## Artist_Popularity            0.00
## mode                         0.00
## key.6                        0.00
## key.11                       0.00
## Artist_duration              0.00
## time_signature.4             0.00
## key.3                        0.00
## time_signature.3             0.00

Hubo una mejora significativa del 10% en la exactitud del modelo, sin embargo un 43% de exactitud sigue sin ser una buena metrica para un clasificador musical, para esto decidí entrenar un modelo mucho más sofisticado, XGBoost es un modelo de ensamble que va mejorando en cada iteración debiso al uso del modelos debiles que combinados dan mejores resultados, como los que obtuve:

modelo_rpartbal <- train(
  Class ~ ., 
  data = train_bal, 
  method = "xgbTree",   
  trControl = control
)

## [18:16:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:16:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:17:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:18:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:19:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:20:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:21:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:22:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:22:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:22:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:22:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:22:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:22:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:22:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:22:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:23:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:24:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:25:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:26:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:27:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:28:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:29:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:30:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [18:31:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.

modelo_rpartbal

## eXtreme Gradient Boosting 
## 
## 15185 samples
##    34 predictor
##    11 classes: '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 12146, 12150, 12149, 12148, 12147 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  Accuracy   Kappa    
##   0.3  1          0.6               0.50        50      0.5710880  0.4918785
##   0.3  1          0.6               0.50       100      0.5982846  0.5258033
##   0.3  1          0.6               0.50       150      0.6065831  0.5363377
##   0.3  1          0.6               0.75        50      0.5732613  0.4941774
##   0.3  1          0.6               0.75       100      0.5981529  0.5255423
##   0.3  1          0.6               0.75       150      0.6059249  0.5352657
##   0.3  1          0.6               1.00        50      0.5714181  0.4916213
##   0.3  1          0.6               1.00       100      0.5976922  0.5243359
##   0.3  1          0.6               1.00       150      0.6040150  0.5327019
##   0.3  1          0.8               0.50        50      0.5775411  0.4995215
##   0.3  1          0.8               0.50       100      0.5999973  0.5282136
##   0.3  1          0.8               0.50       150      0.6082303  0.5385046
##   0.3  1          0.8               0.75        50      0.5731292  0.4944462
##   0.3  1          0.8               0.75       100      0.5972314  0.5245617
##   0.3  1          0.8               0.75       150      0.6071758  0.5370898
##   0.3  1          0.8               1.00        50      0.5718791  0.4920704
##   0.3  1          0.8               1.00       100      0.5967049  0.5232454
##   0.3  1          0.8               1.00       150      0.6038178  0.5324466
##   0.3  2          0.6               0.50        50      0.6109960  0.5412750
##   0.3  2          0.6               0.50       100      0.6200843  0.5528667
##   0.3  2          0.6               0.50       150      0.6271966  0.5615788
##   0.3  2          0.6               0.75        50      0.6078345  0.5372286
##   0.3  2          0.6               0.75       100      0.6202816  0.5527755
##   0.3  2          0.6               0.75       150      0.6244966  0.5583437
##   0.3  2          0.6               1.00        50      0.6078998  0.5369951
##   0.3  2          0.6               1.00       100      0.6236407  0.5565602
##   0.3  2          0.6               1.00       150      0.6274599  0.5615034
##   0.3  2          0.8               0.50        50      0.6113913  0.5416281
##   0.3  2          0.8               0.50       100      0.6200844  0.5530957
##   0.3  2          0.8               0.50       150      0.6261423  0.5607426
##   0.3  2          0.8               0.75        50      0.6108624  0.5409579
##   0.3  2          0.8               0.75       100      0.6229810  0.5561722
##   0.3  2          0.8               0.75       150      0.6302912  0.5653044
##   0.3  2          0.8               1.00        50      0.6086244  0.5379499
##   0.3  2          0.8               1.00       100      0.6212035  0.5537871
##   0.3  2          0.8               1.00       150      0.6271956  0.5613211
##   0.3  3          0.6               0.50        50      0.6209385  0.5538369
##   0.3  3          0.6               0.50       100      0.6308177  0.5662357
##   0.3  3          0.6               0.50       150      0.6298300  0.5659651
##   0.3  3          0.6               0.75        50      0.6215319  0.5540152
##   0.3  3          0.6               0.75       100      0.6299615  0.5649355
##   0.3  3          0.6               0.75       150      0.6340448  0.5705851
##   0.3  3          0.6               1.00        50      0.6227183  0.5552537
##   0.3  3          0.6               1.00       100      0.6337144  0.5691840
##   0.3  3          0.6               1.00       150      0.6354265  0.5721021
##   0.3  3          0.8               0.50        50      0.6241006  0.5576946
##   0.3  3          0.8               0.50       100      0.6316734  0.5676970
##   0.3  3          0.8               0.50       150      0.6317403  0.5685439
##   0.3  3          0.8               0.75        50      0.6242970  0.5577376
##   0.3  3          0.8               0.75       100      0.6328592  0.5688448
##   0.3  3          0.8               0.75       150      0.6344387  0.5713775
##   0.3  3          0.8               1.00        50      0.6230485  0.5556557
##   0.3  3          0.8               1.00       100      0.6323332  0.5677076
##   0.3  3          0.8               1.00       150      0.6339800  0.5700496
##   0.4  1          0.6               0.50        50      0.5849171  0.5090199
##   0.4  1          0.6               0.50       100      0.6035542  0.5326857
##   0.4  1          0.6               0.50       150      0.6074405  0.5377803
##   0.4  1          0.6               0.75        50      0.5843903  0.5082207
##   0.4  1          0.6               0.75       100      0.6038825  0.5328409
##   0.4  1          0.6               0.75       150      0.6088225  0.5395479
##   0.4  1          0.6               1.00        50      0.5836002  0.5070939
##   0.4  1          0.6               1.00       100      0.6009188  0.5289143
##   0.4  1          0.6               1.00       150      0.6087556  0.5388583
##   0.4  1          0.8               0.50        50      0.5828753  0.5069491
##   0.4  1          0.8               0.50       100      0.6049374  0.5346175
##   0.4  1          0.8               0.50       150      0.6096139  0.5406860
##   0.4  1          0.8               0.75        50      0.5832043  0.5069773
##   0.4  1          0.8               0.75       100      0.6034223  0.5325228
##   0.4  1          0.8               0.75       150      0.6105349  0.5415360
##   0.4  1          0.8               1.00        50      0.5827435  0.5061459
##   0.4  1          0.8               1.00       100      0.6004592  0.5285045
##   0.4  1          0.8               1.00       150      0.6069787  0.5367016
##   0.4  2          0.6               0.50        50      0.6119172  0.5425821
##   0.4  2          0.6               0.50       100      0.6205458  0.5539362
##   0.4  2          0.6               0.50       150      0.6260110  0.5608870
##   0.4  2          0.6               0.75        50      0.6096779  0.5401212
##   0.4  2          0.6               0.75       100      0.6224544  0.5559712
##   0.4  2          0.6               0.75       150      0.6271962  0.5622410
##   0.4  2          0.6               1.00        50      0.6115229  0.5419697
##   0.4  2          0.6               1.00       100      0.6203472  0.5530650
##   0.4  2          0.6               1.00       150      0.6248257  0.5590956
##   0.4  2          0.8               0.50        50      0.6148160  0.5459660
##   0.4  2          0.8               0.50       100      0.6238387  0.5578211
##   0.4  2          0.8               0.50       150      0.6233763  0.5577827
##   0.4  2          0.8               0.75        50      0.6146833  0.5459793
##   0.4  2          0.8               0.75       100      0.6250208  0.5591788
##   0.4  2          0.8               0.75       150      0.6300929  0.5656615
##   0.4  2          0.8               1.00        50      0.6146839  0.5457757
##   0.4  2          0.8               1.00       100      0.6236407  0.5572481
##   0.4  2          0.8               1.00       150      0.6274607  0.5621221
##   0.4  3          0.6               0.50        50      0.6231134  0.5568324
##   0.4  3          0.6               0.50       100      0.6278543  0.5638194
##   0.4  3          0.6               0.50       150      0.6254836  0.5616601
##   0.4  3          0.6               0.75        50      0.6253523  0.5592914
##   0.4  3          0.6               0.75       100      0.6305541  0.5664279
##   0.4  3          0.6               0.75       150      0.6310148  0.5680112
##   0.4  3          0.6               1.00        50      0.6264722  0.5603348
##   0.4  3          0.6               1.00       100      0.6336498  0.5697952
##   0.4  3          0.6               1.00       150      0.6327931  0.5694720
##   0.4  3          0.8               0.50        50      0.6221904  0.5557721
##   0.4  3          0.8               0.50       100      0.6247585  0.5600265
##   0.4  3          0.8               0.50       150      0.6208728  0.5565306
##   0.4  3          0.8               0.75        50      0.6264709  0.5607313
##   0.4  3          0.8               0.75       100      0.6298289  0.5658978
##   0.4  3          0.8               0.75       150      0.6285115  0.5650916
##   0.4  3          0.8               1.00        50      0.6252860  0.5590636
##   0.4  3          0.8               1.00       100      0.6316756  0.5673161
##   0.4  3          0.8               1.00       150      0.6321360  0.5687191
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning
##  parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta
##  = 0.3, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample
##  = 1.

varImp(modelo_rpartbal)

## xgbTree variable importance
## 
##   only 20 most important variables shown (out of 34)
## 
##                            Overall
## ms_a_minutos              100.0000
## Artist_acousticness        70.9444
## Artist_speechiness         54.4921
## Artist_danceability        48.6729
## Artist_energy              44.7824
## Artist_valence             42.8530
## Artist_Popularity          39.9109
## Artist_duration            33.1109
## Artist_loudness            29.8282
## Artist_instrumentalness    29.3303
## Artist_Track               28.4430
## Artist_liveness            15.9407
## instrumentalness_Imputado  13.9839
## Artist_All                 10.4182
## Artist_tempo                9.9607
## stats_all                   5.0719
## mode                        2.5354
## Popularity_Imputado         0.9243
## time_signature.3            0.9049
## key.1                       0.3740

Aquí ya tenemos una mejora muy notable con un 20% más con respecto a un simple arbol de decisión, es decir el mejor modelo con la configuración más adecuada de hiperparamteros nos esta dando resultados de un 63% en exactitud.

¿Esto es suficiente para un modelo clasificador multiclase? Bien investigando para saber si aún podía mejorar aún más la exactitud del modelo llegue a la conlusión de que las dos columnas que elimino (Artist_Name y Track_Name) pueden contener información importante ya que esta relacionado directamente con el artista y como mencione anteriormente, muchos de estos artistas suelen hacer canciones orientadas a un género especifico sin embargo al ser de tipo texto estás variables necesitan un tratado especial para que el módelo pueda trabajar con ellas.

Intente usarlas como categoricas, tuve cerca de 9,000 nombres de artistas y unos 14,000 nombres de canciones como factores, sin embargo si intentaba entrenar un modelo de ML por ejemplo XGBoost el modelo tronaba ya que no podía manejar tal cantidad de factores y con justa razón ya que son muchísimos por lo tanto mi solución a esto fue usar un modelo que fuera capaz de manejar gran cantidad de factores de manera eficiente.

CatBoost es un modelo parecido a XGBoost o LightGBM sin embargo este hace uso de “Codificación por Permutación” (Ordered Target Encoding).

La Codificación por Permutación es una técnica única de CatBoost para procesar variables categóricas sin causar data leakage (fuga de datos). Se diferencia de otros métodos porque mantiene el orden natural de los datos, lo que mejora la estabilidad y precisión del modelo.

Y como nuestras variables antes mencionadas tienen demasiadas categorías, podemos hacer uso de este modelo para tratar de obtener mejores resultados.

Así que lo que hice fue convertir a factores a Artist_Name y Track_Name en mi conjunto de entrenamiento y prueba:

train_model$Artist_Name = as.factor(train_model$Artist_Name)
train_model$Track_Name = as.factor(train_model$Track_Name)

validation$Artist_Name = as.factor(validation$Artist_Name)
validation$Track_Name = as.factor(validation$Track_Name)

Para usar CatBoost en R tuve que instalarlo con RTools y seguido descargarlo desde github, este algoritmo de machine learning recibe una estructura especial optimizada para el modelo la cual se llama pool, en esta seleccionas las variables predictoras y la variable objetivo a predecir.

El ajuste de hiperparametros nos permite seleccionar el tipo de clasificación, el numero de arboles, su profundidad, regularización para evitar sobreajuste, la taza de aprendizaje y el balanceo de clases.

cat_features <- which(names(train_model) %in% c("Artist_name", "Track_Name"))

train_pool <- catboost.load_pool(
  data = train_model[, -which(names(train_model) == "Class")],
  label = train_model$Class,
  cat_features = cat_features
)

## Parameter 'cat_features' is meaningless because column types are taken from data.frame.
## Please, convert categorical columns to factors manually.

# Entrenar el modelo
model <- catboost.train(train_pool, params = list(
  loss_function = "MultiClass",
  iterations = 1000,
  learning_rate = 0.08,
  depth = 6,
  l2_leaf_reg = 3,
  auto_class_weights = "Balanced"
))

## 0:   learn: 2.2099696    total: 387ms    remaining: 6m 26s
## 1:   learn: 2.0492546    total: 554ms    remaining: 4m 36s
## 2:   learn: 1.9415593    total: 670ms    remaining: 3m 42s
## 3:   learn: 1.8500276    total: 786ms    remaining: 3m 15s
## 4:   learn: 1.7664982    total: 901ms    remaining: 2m 59s
## 5:   learn: 1.6968322    total: 1.02s    remaining: 2m 48s
## 6:   learn: 1.6386215    total: 1.14s    remaining: 2m 41s
## 7:   learn: 1.5844088    total: 1.25s    remaining: 2m 35s
## 8:   learn: 1.5351047    total: 1.37s    remaining: 2m 30s
## 9:   learn: 1.4907118    total: 1.48s    remaining: 2m 26s
## 10:  learn: 1.4500558    total: 1.6s remaining: 2m 23s
## 11:  learn: 1.4115988    total: 1.71s    remaining: 2m 21s
## 12:  learn: 1.3809111    total: 1.83s    remaining: 2m 18s
## 13:  learn: 1.3513899    total: 1.95s    remaining: 2m 17s
## 14:  learn: 1.3260779    total: 2.06s    remaining: 2m 15s
## 15:  learn: 1.2998488    total: 2.19s    remaining: 2m 14s
## 16:  learn: 1.2783000    total: 2.3s remaining: 2m 13s
## 17:  learn: 1.2534995    total: 2.43s    remaining: 2m 12s
## 18:  learn: 1.2312824    total: 2.56s    remaining: 2m 12s
## 19:  learn: 1.2055693    total: 2.68s    remaining: 2m 11s
## 20:  learn: 1.1922091    total: 2.79s    remaining: 2m 10s
## 21:  learn: 1.1749521    total: 2.9s remaining: 2m 8s
## 22:  learn: 1.1577931    total: 3.02s    remaining: 2m 8s
## 23:  learn: 1.1384598    total: 3.14s    remaining: 2m 7s
## 24:  learn: 1.1208938    total: 3.26s    remaining: 2m 7s
## 25:  learn: 1.1115187    total: 3.36s    remaining: 2m 5s
## 26:  learn: 1.0991765    total: 3.49s    remaining: 2m 5s
## 27:  learn: 1.0917041    total: 3.59s    remaining: 2m 4s
## 28:  learn: 1.0815798    total: 3.72s    remaining: 2m 4s
## 29:  learn: 1.0705883    total: 3.85s    remaining: 2m 4s
## 30:  learn: 1.0573619    total: 3.98s    remaining: 2m 4s
## 31:  learn: 1.0483515    total: 4.11s    remaining: 2m 4s
## 32:  learn: 1.0391649    total: 4.24s    remaining: 2m 4s
## 33:  learn: 1.0273764    total: 4.39s    remaining: 2m 4s
## 34:  learn: 1.0203720    total: 4.53s    remaining: 2m 5s
## 35:  learn: 1.0140255    total: 4.68s    remaining: 2m 5s
## 36:  learn: 1.0071318    total: 4.78s    remaining: 2m 4s
## 37:  learn: 0.9984860    total: 4.92s    remaining: 2m 4s
## 38:  learn: 0.9896761    total: 5.03s    remaining: 2m 3s
## 39:  learn: 0.9834020    total: 5.17s    remaining: 2m 3s
## 40:  learn: 0.9778709    total: 5.3s remaining: 2m 4s
## 41:  learn: 0.9720606    total: 5.42s    remaining: 2m 3s
## 42:  learn: 0.9680123    total: 5.53s    remaining: 2m 3s
## 43:  learn: 0.9619897    total: 5.65s    remaining: 2m 2s
## 44:  learn: 0.9570987    total: 5.77s    remaining: 2m 2s
## 45:  learn: 0.9527211    total: 5.87s    remaining: 2m 1s
## 46:  learn: 0.9492757    total: 6s   remaining: 2m 1s
## 47:  learn: 0.9448396    total: 6.14s    remaining: 2m 1s
## 48:  learn: 0.9417461    total: 6.25s    remaining: 2m 1s
## 49:  learn: 0.9379078    total: 6.38s    remaining: 2m 1s
## 50:  learn: 0.9327344    total: 6.51s    remaining: 2m 1s
## 51:  learn: 0.9274615    total: 6.65s    remaining: 2m 1s
## 52:  learn: 0.9238794    total: 6.79s    remaining: 2m 1s
## 53:  learn: 0.9207652    total: 6.93s    remaining: 2m 1s
## 54:  learn: 0.9164749    total: 7.04s    remaining: 2m 1s
## 55:  learn: 0.9127811    total: 7.18s    remaining: 2m 1s
## 56:  learn: 0.9092819    total: 7.32s    remaining: 2m 1s
## 57:  learn: 0.9065722    total: 7.42s    remaining: 2m
## 58:  learn: 0.9007538    total: 7.55s    remaining: 2m
## 59:  learn: 0.8948957    total: 7.69s    remaining: 2m
## 60:  learn: 0.8922066    total: 7.83s    remaining: 2m
## 61:  learn: 0.8878606    total: 7.97s    remaining: 2m
## 62:  learn: 0.8849701    total: 8.12s    remaining: 2m
## 63:  learn: 0.8843041    total: 8.15s    remaining: 1m 59s
## 64:  learn: 0.8823797    total: 8.29s    remaining: 1m 59s
## 65:  learn: 0.8803427    total: 8.41s    remaining: 1m 59s
## 66:  learn: 0.8789356    total: 8.55s    remaining: 1m 59s
## 67:  learn: 0.8756081    total: 8.69s    remaining: 1m 59s
## 68:  learn: 0.8745843    total: 8.8s remaining: 1m 58s
## 69:  learn: 0.8721304    total: 8.94s    remaining: 1m 58s
## 70:  learn: 0.8685838    total: 9.07s    remaining: 1m 58s
## 71:  learn: 0.8661296    total: 9.21s    remaining: 1m 58s
## 72:  learn: 0.8639121    total: 9.35s    remaining: 1m 58s
## 73:  learn: 0.8595276    total: 9.51s    remaining: 1m 59s
## 74:  learn: 0.8584018    total: 9.64s    remaining: 1m 58s
## 75:  learn: 0.8562095    total: 9.78s    remaining: 1m 58s
## 76:  learn: 0.8545074    total: 9.91s    remaining: 1m 58s
## 77:  learn: 0.8514025    total: 10.1s    remaining: 1m 58s
## 78:  learn: 0.8480649    total: 10.2s    remaining: 1m 58s
## 79:  learn: 0.8471918    total: 10.3s    remaining: 1m 58s
## 80:  learn: 0.8466298    total: 10.4s    remaining: 1m 58s
## 81:  learn: 0.8446325    total: 10.5s    remaining: 1m 58s
## 82:  learn: 0.8430023    total: 10.7s    remaining: 1m 58s
## 83:  learn: 0.8415609    total: 10.9s    remaining: 1m 58s
## 84:  learn: 0.8376482    total: 11s  remaining: 1m 58s
## 85:  learn: 0.8347887    total: 11.2s    remaining: 1m 58s
## 86:  learn: 0.8332347    total: 11.4s    remaining: 1m 59s
## 87:  learn: 0.8319242    total: 11.5s    remaining: 1m 59s
## 88:  learn: 0.8280916    total: 11.7s    remaining: 1m 59s
## 89:  learn: 0.8266831    total: 11.9s    remaining: 2m
## 90:  learn: 0.8243845    total: 12s  remaining: 2m
## 91:  learn: 0.8229909    total: 12.2s    remaining: 2m
## 92:  learn: 0.8205740    total: 12.3s    remaining: 2m
## 93:  learn: 0.8194618    total: 12.5s    remaining: 2m
## 94:  learn: 0.8176139    total: 12.6s    remaining: 2m
## 95:  learn: 0.8151647    total: 12.8s    remaining: 2m
## 96:  learn: 0.8130132    total: 12.9s    remaining: 2m
## 97:  learn: 0.8122278    total: 13.1s    remaining: 2m
## 98:  learn: 0.8110537    total: 13.2s    remaining: 2m
## 99:  learn: 0.8095275    total: 13.4s    remaining: 2m
## 100: learn: 0.8072697    total: 13.5s    remaining: 2m
## 101: learn: 0.8053780    total: 13.7s    remaining: 2m
## 102: learn: 0.8044298    total: 13.8s    remaining: 1m 59s
## 103: learn: 0.8027488    total: 13.9s    remaining: 1m 59s
## 104: learn: 0.8016519    total: 14s  remaining: 1m 59s
## 105: learn: 0.8010504    total: 14.1s    remaining: 1m 59s
## 106: learn: 0.8000304    total: 14.2s    remaining: 1m 58s
## 107: learn: 0.7985388    total: 14.4s    remaining: 1m 58s
## 108: learn: 0.7960797    total: 14.5s    remaining: 1m 58s
## 109: learn: 0.7944077    total: 14.7s    remaining: 1m 58s
## 110: learn: 0.7920326    total: 14.8s    remaining: 1m 58s
## 111: learn: 0.7911748    total: 14.9s    remaining: 1m 58s
## 112: learn: 0.7894253    total: 15s  remaining: 1m 58s
## 113: learn: 0.7879818    total: 15.2s    remaining: 1m 57s
## 114: learn: 0.7866462    total: 15.3s    remaining: 1m 57s
## 115: learn: 0.7850257    total: 15.4s    remaining: 1m 57s
## 116: learn: 0.7830126    total: 15.6s    remaining: 1m 57s
## 117: learn: 0.7819708    total: 15.7s    remaining: 1m 57s
## 118: learn: 0.7803928    total: 15.9s    remaining: 1m 57s
## 119: learn: 0.7786627    total: 16s  remaining: 1m 57s
## 120: learn: 0.7778302    total: 16.1s    remaining: 1m 57s
## 121: learn: 0.7742091    total: 16.3s    remaining: 1m 57s
## 122: learn: 0.7721518    total: 16.4s    remaining: 1m 57s
## 123: learn: 0.7701657    total: 16.5s    remaining: 1m 56s
## 124: learn: 0.7678475    total: 16.7s    remaining: 1m 56s
## 125: learn: 0.7668304    total: 16.8s    remaining: 1m 56s
## 126: learn: 0.7655176    total: 16.9s    remaining: 1m 56s
## 127: learn: 0.7636300    total: 17.1s    remaining: 1m 56s
## 128: learn: 0.7627023    total: 17.2s    remaining: 1m 56s
## 129: learn: 0.7605393    total: 17.3s    remaining: 1m 55s
## 130: learn: 0.7584395    total: 17.5s    remaining: 1m 55s
## 131: learn: 0.7566891    total: 17.6s    remaining: 1m 55s
## 132: learn: 0.7552141    total: 17.7s    remaining: 1m 55s
## 133: learn: 0.7536880    total: 17.9s    remaining: 1m 55s
## 134: learn: 0.7525446    total: 18s  remaining: 1m 55s
## 135: learn: 0.7507440    total: 18.1s    remaining: 1m 55s
## 136: learn: 0.7492426    total: 18.3s    remaining: 1m 54s
## 137: learn: 0.7472265    total: 18.4s    remaining: 1m 54s
## 138: learn: 0.7454883    total: 18.5s    remaining: 1m 54s
## 139: learn: 0.7436938    total: 18.6s    remaining: 1m 54s
## 140: learn: 0.7430739    total: 18.8s    remaining: 1m 54s
## 141: learn: 0.7408445    total: 18.9s    remaining: 1m 54s
## 142: learn: 0.7391420    total: 19s  remaining: 1m 54s
## 143: learn: 0.7383575    total: 19.2s    remaining: 1m 54s
## 144: learn: 0.7365381    total: 19.3s    remaining: 1m 53s
## 145: learn: 0.7351408    total: 19.5s    remaining: 1m 53s
## 146: learn: 0.7337345    total: 19.6s    remaining: 1m 53s
## 147: learn: 0.7322278    total: 19.7s    remaining: 1m 53s
## 148: learn: 0.7307067    total: 19.9s    remaining: 1m 53s
## 149: learn: 0.7295771    total: 20s  remaining: 1m 53s
## 150: learn: 0.7280283    total: 20.1s    remaining: 1m 53s
## 151: learn: 0.7267943    total: 20.3s    remaining: 1m 53s
## 152: learn: 0.7251788    total: 20.4s    remaining: 1m 53s
## 153: learn: 0.7239727    total: 20.6s    remaining: 1m 52s
## 154: learn: 0.7225532    total: 20.7s    remaining: 1m 52s
## 155: learn: 0.7211589    total: 20.8s    remaining: 1m 52s
## 156: learn: 0.7193918    total: 20.9s    remaining: 1m 52s
## 157: learn: 0.7183219    total: 21.1s    remaining: 1m 52s
## 158: learn: 0.7166267    total: 21.2s    remaining: 1m 52s
## 159: learn: 0.7151528    total: 21.3s    remaining: 1m 52s
## 160: learn: 0.7143245    total: 21.5s    remaining: 1m 51s
## 161: learn: 0.7129610    total: 21.6s    remaining: 1m 51s
## 162: learn: 0.7117593    total: 21.7s    remaining: 1m 51s
## 163: learn: 0.7103824    total: 21.9s    remaining: 1m 51s
## 164: learn: 0.7093310    total: 22s  remaining: 1m 51s
## 165: learn: 0.7078785    total: 22.1s    remaining: 1m 51s
## 166: learn: 0.7067810    total: 22.2s    remaining: 1m 50s
## 167: learn: 0.7055823    total: 22.4s    remaining: 1m 50s
## 168: learn: 0.7043274    total: 22.5s    remaining: 1m 50s
## 169: learn: 0.7032077    total: 22.6s    remaining: 1m 50s
## 170: learn: 0.7021343    total: 22.8s    remaining: 1m 50s
## 171: learn: 0.7010119    total: 22.9s    remaining: 1m 50s
## 172: learn: 0.6993644    total: 23s  remaining: 1m 50s
## 173: learn: 0.6982559    total: 23.1s    remaining: 1m 49s
## 174: learn: 0.6971406    total: 23.3s    remaining: 1m 49s
## 175: learn: 0.6959348    total: 23.4s    remaining: 1m 49s
## 176: learn: 0.6951895    total: 23.5s    remaining: 1m 49s
## 177: learn: 0.6942737    total: 23.7s    remaining: 1m 49s
## 178: learn: 0.6933520    total: 23.8s    remaining: 1m 49s
## 179: learn: 0.6922192    total: 23.9s    remaining: 1m 49s
## 180: learn: 0.6916724    total: 24.1s    remaining: 1m 49s
## 181: learn: 0.6903581    total: 24.2s    remaining: 1m 48s
## 182: learn: 0.6898049    total: 24.4s    remaining: 1m 48s
## 183: learn: 0.6888061    total: 24.5s    remaining: 1m 48s
## 184: learn: 0.6880732    total: 24.6s    remaining: 1m 48s
## 185: learn: 0.6868790    total: 24.8s    remaining: 1m 48s
## 186: learn: 0.6858791    total: 24.9s    remaining: 1m 48s
## 187: learn: 0.6854047    total: 25s  remaining: 1m 48s
## 188: learn: 0.6844520    total: 25.1s    remaining: 1m 47s
## 189: learn: 0.6831692    total: 25.3s    remaining: 1m 47s
## 190: learn: 0.6822121    total: 25.4s    remaining: 1m 47s
## 191: learn: 0.6814090    total: 25.5s    remaining: 1m 47s
## 192: learn: 0.6803993    total: 25.7s    remaining: 1m 47s
## 193: learn: 0.6796758    total: 25.8s    remaining: 1m 47s
## 194: learn: 0.6786494    total: 26s  remaining: 1m 47s
## 195: learn: 0.6779248    total: 26.1s    remaining: 1m 46s
## 196: learn: 0.6770297    total: 26.2s    remaining: 1m 46s
## 197: learn: 0.6759061    total: 26.4s    remaining: 1m 46s
## 198: learn: 0.6748084    total: 26.5s    remaining: 1m 46s
## 199: learn: 0.6738307    total: 26.6s    remaining: 1m 46s
## 200: learn: 0.6731131    total: 26.8s    remaining: 1m 46s
## 201: learn: 0.6718386    total: 26.9s    remaining: 1m 46s
## 202: learn: 0.6709151    total: 27s  remaining: 1m 46s
## 203: learn: 0.6698260    total: 27.2s    remaining: 1m 46s
## 204: learn: 0.6692046    total: 27.3s    remaining: 1m 45s
## 205: learn: 0.6678181    total: 27.5s    remaining: 1m 45s
## 206: learn: 0.6665797    total: 27.6s    remaining: 1m 45s
## 207: learn: 0.6657187    total: 27.7s    remaining: 1m 45s
## 208: learn: 0.6649139    total: 27.8s    remaining: 1m 45s
## 209: learn: 0.6639386    total: 28s  remaining: 1m 45s
## 210: learn: 0.6633118    total: 28.1s    remaining: 1m 45s
## 211: learn: 0.6625372    total: 28.2s    remaining: 1m 44s
## 212: learn: 0.6614434    total: 28.4s    remaining: 1m 44s
## 213: learn: 0.6611100    total: 28.5s    remaining: 1m 44s
## 214: learn: 0.6604929    total: 28.6s    remaining: 1m 44s
## 215: learn: 0.6594653    total: 28.8s    remaining: 1m 44s
## 216: learn: 0.6589571    total: 28.9s    remaining: 1m 44s
## 217: learn: 0.6577169    total: 29s  remaining: 1m 44s
## 218: learn: 0.6567654    total: 29.1s    remaining: 1m 43s
## 219: learn: 0.6562455    total: 29.3s    remaining: 1m 43s
## 220: learn: 0.6557932    total: 29.4s    remaining: 1m 43s
## 221: learn: 0.6550931    total: 29.6s    remaining: 1m 43s
## 222: learn: 0.6545229    total: 29.7s    remaining: 1m 43s
## 223: learn: 0.6538130    total: 29.8s    remaining: 1m 43s
## 224: learn: 0.6526965    total: 30s  remaining: 1m 43s
## 225: learn: 0.6519921    total: 30.1s    remaining: 1m 43s
## 226: learn: 0.6512383    total: 30.2s    remaining: 1m 42s
## 227: learn: 0.6505887    total: 30.4s    remaining: 1m 42s
## 228: learn: 0.6496971    total: 30.5s    remaining: 1m 42s
## 229: learn: 0.6492416    total: 30.6s    remaining: 1m 42s
## 230: learn: 0.6484014    total: 30.8s    remaining: 1m 42s
## 231: learn: 0.6478853    total: 30.9s    remaining: 1m 42s
## 232: learn: 0.6475325    total: 31s  remaining: 1m 42s
## 233: learn: 0.6470155    total: 31.2s    remaining: 1m 42s
## 234: learn: 0.6462376    total: 31.3s    remaining: 1m 41s
## 235: learn: 0.6455662    total: 31.4s    remaining: 1m 41s
## 236: learn: 0.6446417    total: 31.6s    remaining: 1m 41s
## 237: learn: 0.6433135    total: 31.7s    remaining: 1m 41s
## 238: learn: 0.6425186    total: 31.8s    remaining: 1m 41s
## 239: learn: 0.6417459    total: 32s  remaining: 1m 41s
## 240: learn: 0.6408522    total: 32.1s    remaining: 1m 41s
## 241: learn: 0.6404158    total: 32.2s    remaining: 1m 40s
## 242: learn: 0.6397358    total: 32.4s    remaining: 1m 40s
## 243: learn: 0.6389863    total: 32.5s    remaining: 1m 40s
## 244: learn: 0.6382546    total: 32.7s    remaining: 1m 40s
## 245: learn: 0.6371548    total: 32.8s    remaining: 1m 40s
## 246: learn: 0.6365835    total: 32.9s    remaining: 1m 40s
## 247: learn: 0.6356961    total: 33s  remaining: 1m 40s
## 248: learn: 0.6349740    total: 33.2s    remaining: 1m 40s
## 249: learn: 0.6346616    total: 33.3s    remaining: 1m 39s
## 250: learn: 0.6340268    total: 33.4s    remaining: 1m 39s
## 251: learn: 0.6328681    total: 33.6s    remaining: 1m 39s
## 252: learn: 0.6319586    total: 33.7s    remaining: 1m 39s
## 253: learn: 0.6310099    total: 33.8s    remaining: 1m 39s
## 254: learn: 0.6302735    total: 34s  remaining: 1m 39s
## 255: learn: 0.6291633    total: 34.1s    remaining: 1m 39s
## 256: learn: 0.6285312    total: 34.3s    remaining: 1m 39s
## 257: learn: 0.6279374    total: 34.4s    remaining: 1m 38s
## 258: learn: 0.6271964    total: 34.5s    remaining: 1m 38s
## 259: learn: 0.6263840    total: 34.7s    remaining: 1m 38s
## 260: learn: 0.6259974    total: 34.8s    remaining: 1m 38s
## 261: learn: 0.6254706    total: 35s  remaining: 1m 38s
## 262: learn: 0.6252719    total: 35.1s    remaining: 1m 38s
## 263: learn: 0.6247066    total: 35.2s    remaining: 1m 38s
## 264: learn: 0.6241990    total: 35.4s    remaining: 1m 38s
## 265: learn: 0.6238385    total: 35.5s    remaining: 1m 37s
## 266: learn: 0.6231834    total: 35.6s    remaining: 1m 37s
## 267: learn: 0.6227287    total: 35.8s    remaining: 1m 37s
## 268: learn: 0.6219995    total: 35.9s    remaining: 1m 37s
## 269: learn: 0.6212950    total: 36.1s    remaining: 1m 37s
## 270: learn: 0.6206040    total: 36.2s    remaining: 1m 37s
## 271: learn: 0.6199401    total: 36.4s    remaining: 1m 37s
## 272: learn: 0.6193797    total: 36.5s    remaining: 1m 37s
## 273: learn: 0.6183705    total: 36.7s    remaining: 1m 37s
## 274: learn: 0.6178201    total: 36.8s    remaining: 1m 37s
## 275: learn: 0.6172340    total: 37s  remaining: 1m 36s
## 276: learn: 0.6163998    total: 37.1s    remaining: 1m 36s
## 277: learn: 0.6157967    total: 37.2s    remaining: 1m 36s
## 278: learn: 0.6151002    total: 37.4s    remaining: 1m 36s
## 279: learn: 0.6142591    total: 37.5s    remaining: 1m 36s
## 280: learn: 0.6137550    total: 37.6s    remaining: 1m 36s
## 281: learn: 0.6130546    total: 37.8s    remaining: 1m 36s
## 282: learn: 0.6126271    total: 37.9s    remaining: 1m 36s
## 283: learn: 0.6121424    total: 38.1s    remaining: 1m 36s
## 284: learn: 0.6117461    total: 38.2s    remaining: 1m 35s
## 285: learn: 0.6110983    total: 38.4s    remaining: 1m 35s
## 286: learn: 0.6107255    total: 38.5s    remaining: 1m 35s
## 287: learn: 0.6104858    total: 38.7s    remaining: 1m 35s
## 288: learn: 0.6101687    total: 38.8s    remaining: 1m 35s
## 289: learn: 0.6098536    total: 38.9s    remaining: 1m 35s
## 290: learn: 0.6094002    total: 39.1s    remaining: 1m 35s
## 291: learn: 0.6089946    total: 39.2s    remaining: 1m 35s
## 292: learn: 0.6082037    total: 39.3s    remaining: 1m 34s
## 293: learn: 0.6076164    total: 39.5s    remaining: 1m 34s
## 294: learn: 0.6071068    total: 39.6s    remaining: 1m 34s
## 295: learn: 0.6067015    total: 39.8s    remaining: 1m 34s
## 296: learn: 0.6063630    total: 39.9s    remaining: 1m 34s
## 297: learn: 0.6058219    total: 40s  remaining: 1m 34s
## 298: learn: 0.6054050    total: 40.1s    remaining: 1m 34s
## 299: learn: 0.6045904    total: 40.3s    remaining: 1m 33s
## 300: learn: 0.6038807    total: 40.4s    remaining: 1m 33s
## 301: learn: 0.6034487    total: 40.5s    remaining: 1m 33s
## 302: learn: 0.6029783    total: 40.7s    remaining: 1m 33s
## 303: learn: 0.6022288    total: 40.8s    remaining: 1m 33s
## 304: learn: 0.6017385    total: 41s  remaining: 1m 33s
## 305: learn: 0.6013139    total: 41.1s    remaining: 1m 33s
## 306: learn: 0.6010074    total: 41.3s    remaining: 1m 33s
## 307: learn: 0.6003660    total: 41.4s    remaining: 1m 32s
## 308: learn: 0.5996797    total: 41.5s    remaining: 1m 32s
## 309: learn: 0.5991938    total: 41.7s    remaining: 1m 32s
## 310: learn: 0.5985086    total: 41.8s    remaining: 1m 32s
## 311: learn: 0.5976674    total: 41.9s    remaining: 1m 32s
## 312: learn: 0.5972729    total: 42s  remaining: 1m 32s
## 313: learn: 0.5968390    total: 42.2s    remaining: 1m 32s
## 314: learn: 0.5964201    total: 42.3s    remaining: 1m 32s
## 315: learn: 0.5960713    total: 42.4s    remaining: 1m 31s
## 316: learn: 0.5955064    total: 42.6s    remaining: 1m 31s
## 317: learn: 0.5950200    total: 42.7s    remaining: 1m 31s
## 318: learn: 0.5944774    total: 42.8s    remaining: 1m 31s
## 319: learn: 0.5939950    total: 43s  remaining: 1m 31s
## 320: learn: 0.5934235    total: 43.1s    remaining: 1m 31s
## 321: learn: 0.5928321    total: 43.2s    remaining: 1m 31s
## 322: learn: 0.5919034    total: 43.4s    remaining: 1m 30s
## 323: learn: 0.5914570    total: 43.5s    remaining: 1m 30s
## 324: learn: 0.5904164    total: 43.6s    remaining: 1m 30s
## 325: learn: 0.5899080    total: 43.7s    remaining: 1m 30s
## 326: learn: 0.5894501    total: 43.9s    remaining: 1m 30s
## 327: learn: 0.5892061    total: 44s  remaining: 1m 30s
## 328: learn: 0.5889591    total: 44.1s    remaining: 1m 30s
## 329: learn: 0.5883484    total: 44.3s    remaining: 1m 29s
## 330: learn: 0.5880968    total: 44.4s    remaining: 1m 29s
## 331: learn: 0.5875713    total: 44.6s    remaining: 1m 29s
## 332: learn: 0.5868890    total: 44.7s    remaining: 1m 29s
## 333: learn: 0.5864555    total: 44.8s    remaining: 1m 29s
## 334: learn: 0.5860139    total: 45s  remaining: 1m 29s
## 335: learn: 0.5857917    total: 45.1s    remaining: 1m 29s
## 336: learn: 0.5854097    total: 45.2s    remaining: 1m 28s
## 337: learn: 0.5849691    total: 45.4s    remaining: 1m 28s
## 338: learn: 0.5844450    total: 45.5s    remaining: 1m 28s
## 339: learn: 0.5840555    total: 45.6s    remaining: 1m 28s
## 340: learn: 0.5836537    total: 45.8s    remaining: 1m 28s
## 341: learn: 0.5833860    total: 45.9s    remaining: 1m 28s
## 342: learn: 0.5829571    total: 46.1s    remaining: 1m 28s
## 343: learn: 0.5821984    total: 46.2s    remaining: 1m 28s
## 344: learn: 0.5816755    total: 46.3s    remaining: 1m 27s
## 345: learn: 0.5814539    total: 46.5s    remaining: 1m 27s
## 346: learn: 0.5812507    total: 46.6s    remaining: 1m 27s
## 347: learn: 0.5807887    total: 46.7s    remaining: 1m 27s
## 348: learn: 0.5803213    total: 46.9s    remaining: 1m 27s
## 349: learn: 0.5794318    total: 47s  remaining: 1m 27s
## 350: learn: 0.5785763    total: 47.1s    remaining: 1m 27s
## 351: learn: 0.5783406    total: 47.3s    remaining: 1m 27s
## 352: learn: 0.5777286    total: 47.4s    remaining: 1m 26s
## 353: learn: 0.5772946    total: 47.5s    remaining: 1m 26s
## 354: learn: 0.5768882    total: 47.7s    remaining: 1m 26s
## 355: learn: 0.5765013    total: 47.8s    remaining: 1m 26s
## 356: learn: 0.5762155    total: 48s  remaining: 1m 26s
## 357: learn: 0.5756948    total: 48.1s    remaining: 1m 26s
## 358: learn: 0.5753616    total: 48.2s    remaining: 1m 26s
## 359: learn: 0.5745522    total: 48.4s    remaining: 1m 25s
## 360: learn: 0.5739670    total: 48.5s    remaining: 1m 25s
## 361: learn: 0.5734936    total: 48.6s    remaining: 1m 25s
## 362: learn: 0.5730852    total: 48.8s    remaining: 1m 25s
## 363: learn: 0.5726220    total: 48.9s    remaining: 1m 25s
## 364: learn: 0.5720858    total: 49s  remaining: 1m 25s
## 365: learn: 0.5714775    total: 49.1s    remaining: 1m 25s
## 366: learn: 0.5713058    total: 49.3s    remaining: 1m 24s
## 367: learn: 0.5709660    total: 49.4s    remaining: 1m 24s
## 368: learn: 0.5703985    total: 49.5s    remaining: 1m 24s
## 369: learn: 0.5700104    total: 49.7s    remaining: 1m 24s
## 370: learn: 0.5696277    total: 49.8s    remaining: 1m 24s
## 371: learn: 0.5692338    total: 49.9s    remaining: 1m 24s
## 372: learn: 0.5684522    total: 50s  remaining: 1m 24s
## 373: learn: 0.5679478    total: 50.2s    remaining: 1m 24s
## 374: learn: 0.5674777    total: 50.4s    remaining: 1m 23s
## 375: learn: 0.5669374    total: 50.5s    remaining: 1m 23s
## 376: learn: 0.5662619    total: 50.6s    remaining: 1m 23s
## 377: learn: 0.5661343    total: 50.8s    remaining: 1m 23s
## 378: learn: 0.5658130    total: 50.9s    remaining: 1m 23s
## 379: learn: 0.5655482    total: 51s  remaining: 1m 23s
## 380: learn: 0.5654348    total: 51.1s    remaining: 1m 23s
## 381: learn: 0.5650381    total: 51.3s    remaining: 1m 22s
## 382: learn: 0.5647027    total: 51.4s    remaining: 1m 22s
## 383: learn: 0.5640032    total: 51.5s    remaining: 1m 22s
## 384: learn: 0.5637169    total: 51.7s    remaining: 1m 22s
## 385: learn: 0.5633288    total: 51.8s    remaining: 1m 22s
## 386: learn: 0.5628466    total: 52s  remaining: 1m 22s
## 387: learn: 0.5625952    total: 52.1s    remaining: 1m 22s
## 388: learn: 0.5617467    total: 52.2s    remaining: 1m 22s
## 389: learn: 0.5615332    total: 52.4s    remaining: 1m 21s
## 390: learn: 0.5612377    total: 52.5s    remaining: 1m 21s
## 391: learn: 0.5607096    total: 52.6s    remaining: 1m 21s
## 392: learn: 0.5604473    total: 52.7s    remaining: 1m 21s
## 393: learn: 0.5600110    total: 52.9s    remaining: 1m 21s
## 394: learn: 0.5596888    total: 53.1s    remaining: 1m 21s
## 395: learn: 0.5591175    total: 53.2s    remaining: 1m 21s
## 396: learn: 0.5588141    total: 53.3s    remaining: 1m 20s
## 397: learn: 0.5583733    total: 53.4s    remaining: 1m 20s
## 398: learn: 0.5580274    total: 53.6s    remaining: 1m 20s
## 399: learn: 0.5574220    total: 53.7s    remaining: 1m 20s
## 400: learn: 0.5572056    total: 53.8s    remaining: 1m 20s
## 401: learn: 0.5567373    total: 54s  remaining: 1m 20s
## 402: learn: 0.5564565    total: 54.1s    remaining: 1m 20s
## 403: learn: 0.5561035    total: 54.2s    remaining: 1m 19s
## 404: learn: 0.5558434    total: 54.4s    remaining: 1m 19s
## 405: learn: 0.5556489    total: 54.5s    remaining: 1m 19s
## 406: learn: 0.5551385    total: 54.6s    remaining: 1m 19s
## 407: learn: 0.5549455    total: 54.8s    remaining: 1m 19s
## 408: learn: 0.5546724    total: 54.9s    remaining: 1m 19s
## 409: learn: 0.5541493    total: 55s  remaining: 1m 19s
## 410: learn: 0.5538254    total: 55.2s    remaining: 1m 19s
## 411: learn: 0.5531480    total: 55.3s    remaining: 1m 18s
## 412: learn: 0.5528027    total: 55.4s    remaining: 1m 18s
## 413: learn: 0.5525614    total: 55.6s    remaining: 1m 18s
## 414: learn: 0.5520786    total: 55.7s    remaining: 1m 18s
## 415: learn: 0.5515314    total: 55.8s    remaining: 1m 18s
## 416: learn: 0.5506623    total: 56s  remaining: 1m 18s
## 417: learn: 0.5501136    total: 56.1s    remaining: 1m 18s
## 418: learn: 0.5498506    total: 56.2s    remaining: 1m 17s
## 419: learn: 0.5496304    total: 56.4s    remaining: 1m 17s
## 420: learn: 0.5494074    total: 56.5s    remaining: 1m 17s
## 421: learn: 0.5488475    total: 56.6s    remaining: 1m 17s
## 422: learn: 0.5484538    total: 56.7s    remaining: 1m 17s
## 423: learn: 0.5481514    total: 56.9s    remaining: 1m 17s
## 424: learn: 0.5477335    total: 57s  remaining: 1m 17s
## 425: learn: 0.5472259    total: 57.2s    remaining: 1m 17s
## 426: learn: 0.5467987    total: 57.3s    remaining: 1m 16s
## 427: learn: 0.5464184    total: 57.4s    remaining: 1m 16s
## 428: learn: 0.5461138    total: 57.5s    remaining: 1m 16s
## 429: learn: 0.5457254    total: 57.7s    remaining: 1m 16s
## 430: learn: 0.5452165    total: 57.8s    remaining: 1m 16s
## 431: learn: 0.5447208    total: 57.9s    remaining: 1m 16s
## 432: learn: 0.5439782    total: 58.1s    remaining: 1m 16s
## 433: learn: 0.5436952    total: 58.2s    remaining: 1m 15s
## 434: learn: 0.5432462    total: 58.3s    remaining: 1m 15s
## 435: learn: 0.5429350    total: 58.5s    remaining: 1m 15s
## 436: learn: 0.5426196    total: 58.6s    remaining: 1m 15s
## 437: learn: 0.5421838    total: 58.7s    remaining: 1m 15s
## 438: learn: 0.5418970    total: 58.9s    remaining: 1m 15s
## 439: learn: 0.5412124    total: 59s  remaining: 1m 15s
## 440: learn: 0.5409231    total: 59.1s    remaining: 1m 14s
## 441: learn: 0.5407714    total: 59.3s    remaining: 1m 14s
## 442: learn: 0.5405829    total: 59.4s    remaining: 1m 14s
## 443: learn: 0.5402827    total: 59.5s    remaining: 1m 14s
## 444: learn: 0.5398933    total: 59.7s    remaining: 1m 14s
## 445: learn: 0.5395959    total: 59.8s    remaining: 1m 14s
## 446: learn: 0.5392777    total: 59.9s    remaining: 1m 14s
## 447: learn: 0.5386587    total: 1m   remaining: 1m 14s
## 448: learn: 0.5382389    total: 1m   remaining: 1m 13s
## 449: learn: 0.5378905    total: 1m   remaining: 1m 13s
## 450: learn: 0.5375029    total: 1m   remaining: 1m 13s
## 451: learn: 0.5372708    total: 1m   remaining: 1m 13s
## 452: learn: 0.5369488    total: 1m   remaining: 1m 13s
## 453: learn: 0.5367161    total: 1m   remaining: 1m 13s
## 454: learn: 0.5364818    total: 1m 1s    remaining: 1m 13s
## 455: learn: 0.5357302    total: 1m 1s    remaining: 1m 12s
## 456: learn: 0.5352326    total: 1m 1s    remaining: 1m 12s
## 457: learn: 0.5345093    total: 1m 1s    remaining: 1m 12s
## 458: learn: 0.5339846    total: 1m 1s    remaining: 1m 12s
## 459: learn: 0.5336594    total: 1m 1s    remaining: 1m 12s
## 460: learn: 0.5331905    total: 1m 1s    remaining: 1m 12s
## 461: learn: 0.5325822    total: 1m 1s    remaining: 1m 12s
## 462: learn: 0.5321945    total: 1m 2s    remaining: 1m 12s
## 463: learn: 0.5317963    total: 1m 2s    remaining: 1m 11s
## 464: learn: 0.5314557    total: 1m 2s    remaining: 1m 11s
## 465: learn: 0.5312042    total: 1m 2s    remaining: 1m 11s
## 466: learn: 0.5308528    total: 1m 2s    remaining: 1m 11s
## 467: learn: 0.5304885    total: 1m 2s    remaining: 1m 11s
## 468: learn: 0.5301268    total: 1m 2s    remaining: 1m 11s
## 469: learn: 0.5296529    total: 1m 3s    remaining: 1m 11s
## 470: learn: 0.5295076    total: 1m 3s    remaining: 1m 10s
## 471: learn: 0.5292562    total: 1m 3s    remaining: 1m 10s
## 472: learn: 0.5290974    total: 1m 3s    remaining: 1m 10s
## 473: learn: 0.5286938    total: 1m 3s    remaining: 1m 10s
## 474: learn: 0.5284554    total: 1m 3s    remaining: 1m 10s
## 475: learn: 0.5281376    total: 1m 3s    remaining: 1m 10s
## 476: learn: 0.5277222    total: 1m 3s    remaining: 1m 10s
## 477: learn: 0.5272307    total: 1m 4s    remaining: 1m 10s
## 478: learn: 0.5270056    total: 1m 4s    remaining: 1m 9s
## 479: learn: 0.5267504    total: 1m 4s    remaining: 1m 9s
## 480: learn: 0.5264530    total: 1m 4s    remaining: 1m 9s
## 481: learn: 0.5260954    total: 1m 4s    remaining: 1m 9s
## 482: learn: 0.5255724    total: 1m 4s    remaining: 1m 9s
## 483: learn: 0.5251183    total: 1m 4s    remaining: 1m 9s
## 484: learn: 0.5248494    total: 1m 5s    remaining: 1m 9s
## 485: learn: 0.5246658    total: 1m 5s    remaining: 1m 8s
## 486: learn: 0.5241870    total: 1m 5s    remaining: 1m 8s
## 487: learn: 0.5238800    total: 1m 5s    remaining: 1m 8s
## 488: learn: 0.5235873    total: 1m 5s    remaining: 1m 8s
## 489: learn: 0.5229475    total: 1m 5s    remaining: 1m 8s
## 490: learn: 0.5225992    total: 1m 5s    remaining: 1m 8s
## 491: learn: 0.5223855    total: 1m 6s    remaining: 1m 8s
## 492: learn: 0.5222256    total: 1m 6s    remaining: 1m 8s
## 493: learn: 0.5218458    total: 1m 6s    remaining: 1m 7s
## 494: learn: 0.5216248    total: 1m 6s    remaining: 1m 7s
## 495: learn: 0.5213706    total: 1m 6s    remaining: 1m 7s
## 496: learn: 0.5212891    total: 1m 6s    remaining: 1m 7s
## 497: learn: 0.5211310    total: 1m 6s    remaining: 1m 7s
## 498: learn: 0.5209707    total: 1m 7s    remaining: 1m 7s
## 499: learn: 0.5205783    total: 1m 7s    remaining: 1m 7s
## 500: learn: 0.5202815    total: 1m 7s    remaining: 1m 7s
## 501: learn: 0.5200064    total: 1m 7s    remaining: 1m 7s
## 502: learn: 0.5194791    total: 1m 7s    remaining: 1m 6s
## 503: learn: 0.5191173    total: 1m 7s    remaining: 1m 6s
## 504: learn: 0.5187513    total: 1m 8s    remaining: 1m 6s
## 505: learn: 0.5183659    total: 1m 8s    remaining: 1m 6s
## 506: learn: 0.5181283    total: 1m 8s    remaining: 1m 6s
## 507: learn: 0.5175780    total: 1m 8s    remaining: 1m 6s
## 508: learn: 0.5173977    total: 1m 8s    remaining: 1m 6s
## 509: learn: 0.5169480    total: 1m 8s    remaining: 1m 6s
## 510: learn: 0.5165276    total: 1m 8s    remaining: 1m 5s
## 511: learn: 0.5163565    total: 1m 9s    remaining: 1m 5s
## 512: learn: 0.5159058    total: 1m 9s    remaining: 1m 5s
## 513: learn: 0.5156079    total: 1m 9s    remaining: 1m 5s
## 514: learn: 0.5153899    total: 1m 9s    remaining: 1m 5s
## 515: learn: 0.5150461    total: 1m 9s    remaining: 1m 5s
## 516: learn: 0.5147488    total: 1m 9s    remaining: 1m 5s
## 517: learn: 0.5146126    total: 1m 10s   remaining: 1m 5s
## 518: learn: 0.5139443    total: 1m 10s   remaining: 1m 5s
## 519: learn: 0.5136106    total: 1m 10s   remaining: 1m 4s
## 520: learn: 0.5135021    total: 1m 10s   remaining: 1m 4s
## 521: learn: 0.5131123    total: 1m 10s   remaining: 1m 4s
## 522: learn: 0.5127552    total: 1m 10s   remaining: 1m 4s
## 523: learn: 0.5122876    total: 1m 10s   remaining: 1m 4s
## 524: learn: 0.5120728    total: 1m 11s   remaining: 1m 4s
## 525: learn: 0.5116943    total: 1m 11s   remaining: 1m 4s
## 526: learn: 0.5114536    total: 1m 11s   remaining: 1m 3s
## 527: learn: 0.5110605    total: 1m 11s   remaining: 1m 3s
## 528: learn: 0.5106929    total: 1m 11s   remaining: 1m 3s
## 529: learn: 0.5104261    total: 1m 11s   remaining: 1m 3s
## 530: learn: 0.5100440    total: 1m 11s   remaining: 1m 3s
## 531: learn: 0.5097488    total: 1m 12s   remaining: 1m 3s
## 532: learn: 0.5092921    total: 1m 12s   remaining: 1m 3s
## 533: learn: 0.5089849    total: 1m 12s   remaining: 1m 3s
## 534: learn: 0.5087647    total: 1m 12s   remaining: 1m 2s
## 535: learn: 0.5085051    total: 1m 12s   remaining: 1m 2s
## 536: learn: 0.5082540    total: 1m 12s   remaining: 1m 2s
## 537: learn: 0.5078950    total: 1m 12s   remaining: 1m 2s
## 538: learn: 0.5075149    total: 1m 13s   remaining: 1m 2s
## 539: learn: 0.5071701    total: 1m 13s   remaining: 1m 2s
## 540: learn: 0.5068688    total: 1m 13s   remaining: 1m 2s
## 541: learn: 0.5064263    total: 1m 13s   remaining: 1m 2s
## 542: learn: 0.5061918    total: 1m 13s   remaining: 1m 1s
## 543: learn: 0.5058619    total: 1m 13s   remaining: 1m 1s
## 544: learn: 0.5055509    total: 1m 13s   remaining: 1m 1s
## 545: learn: 0.5053238    total: 1m 13s   remaining: 1m 1s
## 546: learn: 0.5048025    total: 1m 14s   remaining: 1m 1s
## 547: learn: 0.5043411    total: 1m 14s   remaining: 1m 1s
## 548: learn: 0.5036727    total: 1m 14s   remaining: 1m 1s
## 549: learn: 0.5035301    total: 1m 14s   remaining: 1m
## 550: learn: 0.5032971    total: 1m 14s   remaining: 1m
## 551: learn: 0.5030260    total: 1m 14s   remaining: 1m
## 552: learn: 0.5029388    total: 1m 14s   remaining: 1m
## 553: learn: 0.5025154    total: 1m 14s   remaining: 1m
## 554: learn: 0.5022370    total: 1m 15s   remaining: 1m
## 555: learn: 0.5019233    total: 1m 15s   remaining: 1m
## 556: learn: 0.5016337    total: 1m 15s   remaining: 60s
## 557: learn: 0.5014303    total: 1m 15s   remaining: 59.8s
## 558: learn: 0.5012315    total: 1m 15s   remaining: 59.7s
## 559: learn: 0.5010365    total: 1m 15s   remaining: 59.6s
## 560: learn: 0.5007769    total: 1m 15s   remaining: 59.5s
## 561: learn: 0.5004137    total: 1m 16s   remaining: 59.3s
## 562: learn: 0.5001643    total: 1m 16s   remaining: 59.2s
## 563: learn: 0.5000073    total: 1m 16s   remaining: 59.1s
## 564: learn: 0.4997760    total: 1m 16s   remaining: 59s
## 565: learn: 0.4994673    total: 1m 16s   remaining: 58.8s
## 566: learn: 0.4991862    total: 1m 16s   remaining: 58.7s
## 567: learn: 0.4989597    total: 1m 16s   remaining: 58.6s
## 568: learn: 0.4987984    total: 1m 17s   remaining: 58.4s
## 569: learn: 0.4986388    total: 1m 17s   remaining: 58.3s
## 570: learn: 0.4983779    total: 1m 17s   remaining: 58.1s
## 571: learn: 0.4980642    total: 1m 17s   remaining: 58s
## 572: learn: 0.4978384    total: 1m 17s   remaining: 57.9s
## 573: learn: 0.4975409    total: 1m 17s   remaining: 57.7s
## 574: learn: 0.4971951    total: 1m 17s   remaining: 57.6s
## 575: learn: 0.4970020    total: 1m 18s   remaining: 57.5s
## 576: learn: 0.4969539    total: 1m 18s   remaining: 57.3s
## 577: learn: 0.4966149    total: 1m 18s   remaining: 57.2s
## 578: learn: 0.4961865    total: 1m 18s   remaining: 57s
## 579: learn: 0.4959537    total: 1m 18s   remaining: 56.9s
## 580: learn: 0.4957194    total: 1m 18s   remaining: 56.7s
## 581: learn: 0.4954598    total: 1m 18s   remaining: 56.6s
## 582: learn: 0.4950896    total: 1m 18s   remaining: 56.5s
## 583: learn: 0.4947932    total: 1m 19s   remaining: 56.4s
## 584: learn: 0.4945521    total: 1m 19s   remaining: 56.2s
## 585: learn: 0.4943824    total: 1m 19s   remaining: 56.1s
## 586: learn: 0.4937547    total: 1m 19s   remaining: 55.9s
## 587: learn: 0.4935583    total: 1m 19s   remaining: 55.8s
## 588: learn: 0.4933027    total: 1m 19s   remaining: 55.7s
## 589: learn: 0.4929230    total: 1m 19s   remaining: 55.5s
## 590: learn: 0.4926489    total: 1m 20s   remaining: 55.4s
## 591: learn: 0.4921911    total: 1m 20s   remaining: 55.2s
## 592: learn: 0.4917816    total: 1m 20s   remaining: 55.1s
## 593: learn: 0.4916768    total: 1m 20s   remaining: 55s
## 594: learn: 0.4914004    total: 1m 20s   remaining: 54.8s
## 595: learn: 0.4911675    total: 1m 20s   remaining: 54.7s
## 596: learn: 0.4908903    total: 1m 20s   remaining: 54.6s
## 597: learn: 0.4903662    total: 1m 20s   remaining: 54.4s
## 598: learn: 0.4901562    total: 1m 21s   remaining: 54.3s
## 599: learn: 0.4895149    total: 1m 21s   remaining: 54.1s
## 600: learn: 0.4893008    total: 1m 21s   remaining: 54s
## 601: learn: 0.4889480    total: 1m 21s   remaining: 53.9s
## 602: learn: 0.4887294    total: 1m 21s   remaining: 53.7s
## 603: learn: 0.4884232    total: 1m 21s   remaining: 53.6s
## 604: learn: 0.4881991    total: 1m 21s   remaining: 53.5s
## 605: learn: 0.4878865    total: 1m 22s   remaining: 53.3s
## 606: learn: 0.4876290    total: 1m 22s   remaining: 53.2s
## 607: learn: 0.4872697    total: 1m 22s   remaining: 53.1s
## 608: learn: 0.4870680    total: 1m 22s   remaining: 52.9s
## 609: learn: 0.4867250    total: 1m 22s   remaining: 52.8s
## 610: learn: 0.4860875    total: 1m 22s   remaining: 52.7s
## 611: learn: 0.4857741    total: 1m 22s   remaining: 52.5s
## 612: learn: 0.4854105    total: 1m 22s   remaining: 52.4s
## 613: learn: 0.4849924    total: 1m 23s   remaining: 52.3s
## 614: learn: 0.4846734    total: 1m 23s   remaining: 52.1s
## 615: learn: 0.4842376    total: 1m 23s   remaining: 52s
## 616: learn: 0.4837793    total: 1m 23s   remaining: 51.8s
## 617: learn: 0.4834918    total: 1m 23s   remaining: 51.7s
## 618: learn: 0.4833094    total: 1m 23s   remaining: 51.6s
## 619: learn: 0.4831542    total: 1m 23s   remaining: 51.4s
## 620: learn: 0.4829724    total: 1m 24s   remaining: 51.3s
## 621: learn: 0.4828112    total: 1m 24s   remaining: 51.2s
## 622: learn: 0.4826168    total: 1m 24s   remaining: 51s
## 623: learn: 0.4822952    total: 1m 24s   remaining: 50.9s
## 624: learn: 0.4820334    total: 1m 24s   remaining: 50.8s
## 625: learn: 0.4818348    total: 1m 24s   remaining: 50.6s
## 626: learn: 0.4816080    total: 1m 24s   remaining: 50.5s
## 627: learn: 0.4814751    total: 1m 25s   remaining: 50.4s
## 628: learn: 0.4813040    total: 1m 25s   remaining: 50.2s
## 629: learn: 0.4810484    total: 1m 25s   remaining: 50.1s
## 630: learn: 0.4807365    total: 1m 25s   remaining: 50s
## 631: learn: 0.4804478    total: 1m 25s   remaining: 49.8s
## 632: learn: 0.4801554    total: 1m 25s   remaining: 49.7s
## 633: learn: 0.4798073    total: 1m 25s   remaining: 49.5s
## 634: learn: 0.4796656    total: 1m 25s   remaining: 49.4s
## 635: learn: 0.4795091    total: 1m 26s   remaining: 49.3s
## 636: learn: 0.4791463    total: 1m 26s   remaining: 49.1s
## 637: learn: 0.4789624    total: 1m 26s   remaining: 49s
## 638: learn: 0.4785625    total: 1m 26s   remaining: 48.9s
## 639: learn: 0.4780949    total: 1m 26s   remaining: 48.7s
## 640: learn: 0.4777543    total: 1m 26s   remaining: 48.6s
## 641: learn: 0.4773769    total: 1m 26s   remaining: 48.4s
## 642: learn: 0.4771207    total: 1m 27s   remaining: 48.3s
## 643: learn: 0.4768413    total: 1m 27s   remaining: 48.2s
## 644: learn: 0.4765030    total: 1m 27s   remaining: 48s
## 645: learn: 0.4762179    total: 1m 27s   remaining: 47.9s
## 646: learn: 0.4758629    total: 1m 27s   remaining: 47.8s
## 647: learn: 0.4755879    total: 1m 27s   remaining: 47.6s
## 648: learn: 0.4751951    total: 1m 27s   remaining: 47.5s
## 649: learn: 0.4748813    total: 1m 27s   remaining: 47.4s
## 650: learn: 0.4746279    total: 1m 28s   remaining: 47.2s
## 651: learn: 0.4744668    total: 1m 28s   remaining: 47.1s
## 652: learn: 0.4742964    total: 1m 28s   remaining: 46.9s
## 653: learn: 0.4739069    total: 1m 28s   remaining: 46.8s
## 654: learn: 0.4734326    total: 1m 28s   remaining: 46.7s
## 655: learn: 0.4730969    total: 1m 28s   remaining: 46.5s
## 656: learn: 0.4728362    total: 1m 28s   remaining: 46.4s
## 657: learn: 0.4727561    total: 1m 29s   remaining: 46.3s
## 658: learn: 0.4722993    total: 1m 29s   remaining: 46.1s
## 659: learn: 0.4719041    total: 1m 29s   remaining: 46s
## 660: learn: 0.4714491    total: 1m 29s   remaining: 45.9s
## 661: learn: 0.4713339    total: 1m 29s   remaining: 45.7s
## 662: learn: 0.4712212    total: 1m 29s   remaining: 45.6s
## 663: learn: 0.4709530    total: 1m 29s   remaining: 45.5s
## 664: learn: 0.4707305    total: 1m 29s   remaining: 45.3s
## 665: learn: 0.4704166    total: 1m 30s   remaining: 45.2s
## 666: learn: 0.4701812    total: 1m 30s   remaining: 45s
## 667: learn: 0.4699514    total: 1m 30s   remaining: 44.9s
## 668: learn: 0.4695558    total: 1m 30s   remaining: 44.8s
## 669: learn: 0.4692096    total: 1m 30s   remaining: 44.6s
## 670: learn: 0.4687245    total: 1m 30s   remaining: 44.5s
## 671: learn: 0.4682643    total: 1m 30s   remaining: 44.4s
## 672: learn: 0.4681917    total: 1m 31s   remaining: 44.2s
## 673: learn: 0.4679006    total: 1m 31s   remaining: 44.1s
## 674: learn: 0.4676306    total: 1m 31s   remaining: 44s
## 675: learn: 0.4674223    total: 1m 31s   remaining: 43.8s
## 676: learn: 0.4671128    total: 1m 31s   remaining: 43.7s
## 677: learn: 0.4668963    total: 1m 31s   remaining: 43.5s
## 678: learn: 0.4667559    total: 1m 31s   remaining: 43.4s
## 679: learn: 0.4665719    total: 1m 31s   remaining: 43.3s
## 680: learn: 0.4662918    total: 1m 32s   remaining: 43.1s
## 681: learn: 0.4661949    total: 1m 32s   remaining: 43s
## 682: learn: 0.4658960    total: 1m 32s   remaining: 42.9s
## 683: learn: 0.4657800    total: 1m 32s   remaining: 42.7s
## 684: learn: 0.4653511    total: 1m 32s   remaining: 42.6s
## 685: learn: 0.4650412    total: 1m 32s   remaining: 42.5s
## 686: learn: 0.4649036    total: 1m 32s   remaining: 42.3s
## 687: learn: 0.4647355    total: 1m 33s   remaining: 42.2s
## 688: learn: 0.4644801    total: 1m 33s   remaining: 42.1s
## 689: learn: 0.4640678    total: 1m 33s   remaining: 41.9s
## 690: learn: 0.4637996    total: 1m 33s   remaining: 41.8s
## 691: learn: 0.4632168    total: 1m 33s   remaining: 41.6s
## 692: learn: 0.4629929    total: 1m 33s   remaining: 41.5s
## 693: learn: 0.4626218    total: 1m 33s   remaining: 41.4s
## 694: learn: 0.4624687    total: 1m 33s   remaining: 41.3s
## 695: learn: 0.4623405    total: 1m 34s   remaining: 41.1s
## 696: learn: 0.4620736    total: 1m 34s   remaining: 41s
## 697: learn: 0.4617323    total: 1m 34s   remaining: 40.8s
## 698: learn: 0.4615405    total: 1m 34s   remaining: 40.7s
## 699: learn: 0.4612509    total: 1m 34s   remaining: 40.6s
## 700: learn: 0.4610038    total: 1m 34s   remaining: 40.4s
## 701: learn: 0.4608262    total: 1m 34s   remaining: 40.3s
## 702: learn: 0.4606273    total: 1m 35s   remaining: 40.1s
## 703: learn: 0.4605091    total: 1m 35s   remaining: 40s
## 704: learn: 0.4603669    total: 1m 35s   remaining: 39.9s
## 705: learn: 0.4601886    total: 1m 35s   remaining: 39.7s
## 706: learn: 0.4599927    total: 1m 35s   remaining: 39.6s
## 707: learn: 0.4597459    total: 1m 35s   remaining: 39.5s
## 708: learn: 0.4596297    total: 1m 35s   remaining: 39.3s
## 709: learn: 0.4593519    total: 1m 35s   remaining: 39.2s
## 710: learn: 0.4590954    total: 1m 36s   remaining: 39.1s
## 711: learn: 0.4588986    total: 1m 36s   remaining: 38.9s
## 712: learn: 0.4585643    total: 1m 36s   remaining: 38.8s
## 713: learn: 0.4583564    total: 1m 36s   remaining: 38.6s
## 714: learn: 0.4580780    total: 1m 36s   remaining: 38.5s
## 715: learn: 0.4577291    total: 1m 36s   remaining: 38.4s
## 716: learn: 0.4574973    total: 1m 36s   remaining: 38.2s
## 717: learn: 0.4572782    total: 1m 37s   remaining: 38.1s
## 718: learn: 0.4571399    total: 1m 37s   remaining: 38s
## 719: learn: 0.4568813    total: 1m 37s   remaining: 37.8s
## 720: learn: 0.4567469    total: 1m 37s   remaining: 37.7s
## 721: learn: 0.4565791    total: 1m 37s   remaining: 37.6s
## 722: learn: 0.4563786    total: 1m 37s   remaining: 37.4s
## 723: learn: 0.4561547    total: 1m 37s   remaining: 37.3s
## 724: learn: 0.4558275    total: 1m 37s   remaining: 37.2s
## 725: learn: 0.4557093    total: 1m 38s   remaining: 37s
## 726: learn: 0.4554550    total: 1m 38s   remaining: 36.9s
## 727: learn: 0.4552886    total: 1m 38s   remaining: 36.8s
## 728: learn: 0.4551504    total: 1m 38s   remaining: 36.6s
## 729: learn: 0.4548544    total: 1m 38s   remaining: 36.5s
## 730: learn: 0.4546108    total: 1m 38s   remaining: 36.3s
## 731: learn: 0.4544206    total: 1m 38s   remaining: 36.2s
## 732: learn: 0.4541932    total: 1m 39s   remaining: 36.1s
## 733: learn: 0.4539858    total: 1m 39s   remaining: 35.9s
## 734: learn: 0.4538615    total: 1m 39s   remaining: 35.8s
## 735: learn: 0.4534343    total: 1m 39s   remaining: 35.7s
## 736: learn: 0.4532246    total: 1m 39s   remaining: 35.5s
## 737: learn: 0.4530593    total: 1m 39s   remaining: 35.4s
## 738: learn: 0.4527979    total: 1m 39s   remaining: 35.3s
## 739: learn: 0.4527198    total: 1m 39s   remaining: 35.1s
## 740: learn: 0.4522726    total: 1m 40s   remaining: 35s
## 741: learn: 0.4521469    total: 1m 40s   remaining: 34.9s
## 742: learn: 0.4519276    total: 1m 40s   remaining: 34.7s
## 743: learn: 0.4515689    total: 1m 40s   remaining: 34.6s
## 744: learn: 0.4512805    total: 1m 40s   remaining: 34.4s
## 745: learn: 0.4508630    total: 1m 40s   remaining: 34.3s
## 746: learn: 0.4504819    total: 1m 40s   remaining: 34.2s
## 747: learn: 0.4503433    total: 1m 41s   remaining: 34.1s
## 748: learn: 0.4499262    total: 1m 41s   remaining: 33.9s
## 749: learn: 0.4496296    total: 1m 41s   remaining: 33.8s
## 750: learn: 0.4492969    total: 1m 41s   remaining: 33.6s
## 751: learn: 0.4489484    total: 1m 41s   remaining: 33.5s
## 752: learn: 0.4486045    total: 1m 41s   remaining: 33.4s
## 753: learn: 0.4484055    total: 1m 41s   remaining: 33.2s
## 754: learn: 0.4481279    total: 1m 41s   remaining: 33.1s
## 755: learn: 0.4478810    total: 1m 42s   remaining: 33s
## 756: learn: 0.4475189    total: 1m 42s   remaining: 32.8s
## 757: learn: 0.4472787    total: 1m 42s   remaining: 32.7s
## 758: learn: 0.4470501    total: 1m 42s   remaining: 32.6s
## 759: learn: 0.4467253    total: 1m 42s   remaining: 32.4s
## 760: learn: 0.4465154    total: 1m 42s   remaining: 32.3s
## 761: learn: 0.4459363    total: 1m 42s   remaining: 32.2s
## 762: learn: 0.4457930    total: 1m 43s   remaining: 32s
## 763: learn: 0.4455538    total: 1m 43s   remaining: 31.9s
## 764: learn: 0.4454806    total: 1m 43s   remaining: 31.8s
## 765: learn: 0.4452388    total: 1m 43s   remaining: 31.6s
## 766: learn: 0.4450987    total: 1m 43s   remaining: 31.5s
## 767: learn: 0.4449764    total: 1m 43s   remaining: 31.4s
## 768: learn: 0.4447786    total: 1m 43s   remaining: 31.2s
## 769: learn: 0.4445823    total: 1m 44s   remaining: 31.1s
## 770: learn: 0.4443301    total: 1m 44s   remaining: 30.9s
## 771: learn: 0.4440509    total: 1m 44s   remaining: 30.8s
## 772: learn: 0.4437786    total: 1m 44s   remaining: 30.7s
## 773: learn: 0.4435695    total: 1m 44s   remaining: 30.5s
## 774: learn: 0.4432536    total: 1m 44s   remaining: 30.4s
## 775: learn: 0.4428427    total: 1m 44s   remaining: 30.3s
## 776: learn: 0.4427183    total: 1m 45s   remaining: 30.1s
## 777: learn: 0.4424238    total: 1m 45s   remaining: 30s
## 778: learn: 0.4420550    total: 1m 45s   remaining: 29.9s
## 779: learn: 0.4418546    total: 1m 45s   remaining: 29.7s
## 780: learn: 0.4416332    total: 1m 45s   remaining: 29.6s
## 781: learn: 0.4413334    total: 1m 45s   remaining: 29.5s
## 782: learn: 0.4411937    total: 1m 45s   remaining: 29.3s
## 783: learn: 0.4411337    total: 1m 45s   remaining: 29.2s
## 784: learn: 0.4409929    total: 1m 46s   remaining: 29s
## 785: learn: 0.4404662    total: 1m 46s   remaining: 28.9s
## 786: learn: 0.4401690    total: 1m 46s   remaining: 28.8s
## 787: learn: 0.4400185    total: 1m 46s   remaining: 28.6s
## 788: learn: 0.4398315    total: 1m 46s   remaining: 28.5s
## 789: learn: 0.4395800    total: 1m 46s   remaining: 28.4s
## 790: learn: 0.4394619    total: 1m 46s   remaining: 28.2s
## 791: learn: 0.4393081    total: 1m 46s   remaining: 28.1s
## 792: learn: 0.4390932    total: 1m 47s   remaining: 28s
## 793: learn: 0.4389359    total: 1m 47s   remaining: 27.8s
## 794: learn: 0.4386008    total: 1m 47s   remaining: 27.7s
## 795: learn: 0.4384008    total: 1m 47s   remaining: 27.6s
## 796: learn: 0.4381035    total: 1m 47s   remaining: 27.4s
## 797: learn: 0.4377513    total: 1m 47s   remaining: 27.3s
## 798: learn: 0.4376159    total: 1m 47s   remaining: 27.1s
## 799: learn: 0.4374740    total: 1m 48s   remaining: 27s
## 800: learn: 0.4372909    total: 1m 48s   remaining: 26.9s
## 801: learn: 0.4370134    total: 1m 48s   remaining: 26.7s
## 802: learn: 0.4369011    total: 1m 48s   remaining: 26.6s
## 803: learn: 0.4364706    total: 1m 48s   remaining: 26.5s
## 804: learn: 0.4363113    total: 1m 48s   remaining: 26.3s
## 805: learn: 0.4361012    total: 1m 48s   remaining: 26.2s
## 806: learn: 0.4358634    total: 1m 48s   remaining: 26.1s
## 807: learn: 0.4356859    total: 1m 49s   remaining: 25.9s
## 808: learn: 0.4354238    total: 1m 49s   remaining: 25.8s
## 809: learn: 0.4352075    total: 1m 49s   remaining: 25.7s
## 810: learn: 0.4350546    total: 1m 49s   remaining: 25.5s
## 811: learn: 0.4349072    total: 1m 49s   remaining: 25.4s
## 812: learn: 0.4347423    total: 1m 49s   remaining: 25.3s
## 813: learn: 0.4346280    total: 1m 49s   remaining: 25.1s
## 814: learn: 0.4344079    total: 1m 50s   remaining: 25s
## 815: learn: 0.4339380    total: 1m 50s   remaining: 24.9s
## 816: learn: 0.4337494    total: 1m 50s   remaining: 24.7s
## 817: learn: 0.4336079    total: 1m 50s   remaining: 24.6s
## 818: learn: 0.4333066    total: 1m 50s   remaining: 24.4s
## 819: learn: 0.4331145    total: 1m 50s   remaining: 24.3s
## 820: learn: 0.4328532    total: 1m 50s   remaining: 24.2s
## 821: learn: 0.4326650    total: 1m 51s   remaining: 24s
## 822: learn: 0.4322644    total: 1m 51s   remaining: 23.9s
## 823: learn: 0.4319837    total: 1m 51s   remaining: 23.8s
## 824: learn: 0.4318535    total: 1m 51s   remaining: 23.6s
## 825: learn: 0.4316018    total: 1m 51s   remaining: 23.5s
## 826: learn: 0.4313358    total: 1m 51s   remaining: 23.4s
## 827: learn: 0.4311582    total: 1m 51s   remaining: 23.2s
## 828: learn: 0.4308480    total: 1m 51s   remaining: 23.1s
## 829: learn: 0.4306524    total: 1m 52s   remaining: 23s
## 830: learn: 0.4304940    total: 1m 52s   remaining: 22.8s
## 831: learn: 0.4303196    total: 1m 52s   remaining: 22.7s
## 832: learn: 0.4301594    total: 1m 52s   remaining: 22.6s
## 833: learn: 0.4297996    total: 1m 52s   remaining: 22.4s
## 834: learn: 0.4295761    total: 1m 52s   remaining: 22.3s
## 835: learn: 0.4294074    total: 1m 52s   remaining: 22.1s
## 836: learn: 0.4291194    total: 1m 53s   remaining: 22s
## 837: learn: 0.4289049    total: 1m 53s   remaining: 21.9s
## 838: learn: 0.4285990    total: 1m 53s   remaining: 21.7s
## 839: learn: 0.4283765    total: 1m 53s   remaining: 21.6s
## 840: learn: 0.4282125    total: 1m 53s   remaining: 21.5s
## 841: learn: 0.4281184    total: 1m 53s   remaining: 21.3s
## 842: learn: 0.4280359    total: 1m 53s   remaining: 21.2s
## 843: learn: 0.4278442    total: 1m 53s   remaining: 21.1s
## 844: learn: 0.4273271    total: 1m 54s   remaining: 20.9s
## 845: learn: 0.4271645    total: 1m 54s   remaining: 20.8s
## 846: learn: 0.4270122    total: 1m 54s   remaining: 20.7s
## 847: learn: 0.4266819    total: 1m 54s   remaining: 20.5s
## 848: learn: 0.4261339    total: 1m 54s   remaining: 20.4s
## 849: learn: 0.4259120    total: 1m 54s   remaining: 20.3s
## 850: learn: 0.4257505    total: 1m 54s   remaining: 20.1s
## 851: learn: 0.4255122    total: 1m 55s   remaining: 20s
## 852: learn: 0.4253372    total: 1m 55s   remaining: 19.9s
## 853: learn: 0.4249384    total: 1m 55s   remaining: 19.7s
## 854: learn: 0.4245818    total: 1m 55s   remaining: 19.6s
## 855: learn: 0.4242113    total: 1m 55s   remaining: 19.4s
## 856: learn: 0.4240525    total: 1m 55s   remaining: 19.3s
## 857: learn: 0.4237997    total: 1m 55s   remaining: 19.2s
## 858: learn: 0.4233943    total: 1m 55s   remaining: 19s
## 859: learn: 0.4231377    total: 1m 56s   remaining: 18.9s
## 860: learn: 0.4229678    total: 1m 56s   remaining: 18.8s
## 861: learn: 0.4227914    total: 1m 56s   remaining: 18.6s
## 862: learn: 0.4225948    total: 1m 56s   remaining: 18.5s
## 863: learn: 0.4224165    total: 1m 56s   remaining: 18.4s
## 864: learn: 0.4221721    total: 1m 56s   remaining: 18.2s
## 865: learn: 0.4218557    total: 1m 56s   remaining: 18.1s
## 866: learn: 0.4215853    total: 1m 57s   remaining: 18s
## 867: learn: 0.4214892    total: 1m 57s   remaining: 17.8s
## 868: learn: 0.4213007    total: 1m 57s   remaining: 17.7s
## 869: learn: 0.4211204    total: 1m 57s   remaining: 17.5s
## 870: learn: 0.4209412    total: 1m 57s   remaining: 17.4s
## 871: learn: 0.4207995    total: 1m 57s   remaining: 17.3s
## 872: learn: 0.4206020    total: 1m 57s   remaining: 17.1s
## 873: learn: 0.4204819    total: 1m 57s   remaining: 17s
## 874: learn: 0.4202565    total: 1m 58s   remaining: 16.9s
## 875: learn: 0.4200175    total: 1m 58s   remaining: 16.7s
## 876: learn: 0.4198539    total: 1m 58s   remaining: 16.6s
## 877: learn: 0.4194544    total: 1m 58s   remaining: 16.5s
## 878: learn: 0.4193346    total: 1m 58s   remaining: 16.3s
## 879: learn: 0.4190376    total: 1m 58s   remaining: 16.2s
## 880: learn: 0.4188814    total: 1m 58s   remaining: 16.1s
## 881: learn: 0.4186139    total: 1m 59s   remaining: 15.9s
## 882: learn: 0.4183767    total: 1m 59s   remaining: 15.8s
## 883: learn: 0.4182179    total: 1m 59s   remaining: 15.7s
## 884: learn: 0.4181095    total: 1m 59s   remaining: 15.5s
## 885: learn: 0.4179734    total: 1m 59s   remaining: 15.4s
## 886: learn: 0.4178799    total: 1m 59s   remaining: 15.3s
## 887: learn: 0.4175908    total: 1m 59s   remaining: 15.1s
## 888: learn: 0.4174293    total: 1m 59s   remaining: 15s
## 889: learn: 0.4171842    total: 2m   remaining: 14.8s
## 890: learn: 0.4169076    total: 2m   remaining: 14.7s
## 891: learn: 0.4166303    total: 2m   remaining: 14.6s
## 892: learn: 0.4163780    total: 2m   remaining: 14.4s
## 893: learn: 0.4161793    total: 2m   remaining: 14.3s
## 894: learn: 0.4160109    total: 2m   remaining: 14.2s
## 895: learn: 0.4158289    total: 2m   remaining: 14s
## 896: learn: 0.4157835    total: 2m 1s    remaining: 13.9s
## 897: learn: 0.4155583    total: 2m 1s    remaining: 13.8s
## 898: learn: 0.4153074    total: 2m 1s    remaining: 13.6s
## 899: learn: 0.4150478    total: 2m 1s    remaining: 13.5s
## 900: learn: 0.4148968    total: 2m 1s    remaining: 13.4s
## 901: learn: 0.4147007    total: 2m 1s    remaining: 13.2s
## 902: learn: 0.4145587    total: 2m 1s    remaining: 13.1s
## 903: learn: 0.4144205    total: 2m 2s    remaining: 13s
## 904: learn: 0.4141196    total: 2m 2s    remaining: 12.8s
## 905: learn: 0.4138493    total: 2m 2s    remaining: 12.7s
## 906: learn: 0.4136233    total: 2m 2s    remaining: 12.6s
## 907: learn: 0.4134958    total: 2m 2s    remaining: 12.4s
## 908: learn: 0.4133380    total: 2m 2s    remaining: 12.3s
## 909: learn: 0.4129477    total: 2m 2s    remaining: 12.1s
## 910: learn: 0.4126954    total: 2m 2s    remaining: 12s
## 911: learn: 0.4124982    total: 2m 3s    remaining: 11.9s
## 912: learn: 0.4123810    total: 2m 3s    remaining: 11.7s
## 913: learn: 0.4122249    total: 2m 3s    remaining: 11.6s
## 914: learn: 0.4118898    total: 2m 3s    remaining: 11.5s
## 915: learn: 0.4117430    total: 2m 3s    remaining: 11.3s
## 916: learn: 0.4115619    total: 2m 3s    remaining: 11.2s
## 917: learn: 0.4113164    total: 2m 3s    remaining: 11.1s
## 918: learn: 0.4109950    total: 2m 4s    remaining: 10.9s
## 919: learn: 0.4107749    total: 2m 4s    remaining: 10.8s
## 920: learn: 0.4105524    total: 2m 4s    remaining: 10.7s
## 921: learn: 0.4100485    total: 2m 4s    remaining: 10.5s
## 922: learn: 0.4098346    total: 2m 4s    remaining: 10.4s
## 923: learn: 0.4097109    total: 2m 4s    remaining: 10.3s
## 924: learn: 0.4096164    total: 2m 4s    remaining: 10.1s
## 925: learn: 0.4093584    total: 2m 5s    remaining: 9.99s
## 926: learn: 0.4092212    total: 2m 5s    remaining: 9.86s
## 927: learn: 0.4089790    total: 2m 5s    remaining: 9.72s
## 928: learn: 0.4087264    total: 2m 5s    remaining: 9.59s
## 929: learn: 0.4083344    total: 2m 5s    remaining: 9.45s
## 930: learn: 0.4081238    total: 2m 5s    remaining: 9.31s
## 931: learn: 0.4079445    total: 2m 5s    remaining: 9.18s
## 932: learn: 0.4076977    total: 2m 5s    remaining: 9.04s
## 933: learn: 0.4074918    total: 2m 6s    remaining: 8.91s
## 934: learn: 0.4072841    total: 2m 6s    remaining: 8.77s
## 935: learn: 0.4070106    total: 2m 6s    remaining: 8.64s
## 936: learn: 0.4068875    total: 2m 6s    remaining: 8.5s
## 937: learn: 0.4067493    total: 2m 6s    remaining: 8.37s
## 938: learn: 0.4064355    total: 2m 6s    remaining: 8.23s
## 939: learn: 0.4062691    total: 2m 6s    remaining: 8.1s
## 940: learn: 0.4061022    total: 2m 7s    remaining: 7.96s
## 941: learn: 0.4059879    total: 2m 7s    remaining: 7.83s
## 942: learn: 0.4058141    total: 2m 7s    remaining: 7.69s
## 943: learn: 0.4056572    total: 2m 7s    remaining: 7.56s
## 944: learn: 0.4054623    total: 2m 7s    remaining: 7.42s
## 945: learn: 0.4051809    total: 2m 7s    remaining: 7.29s
## 946: learn: 0.4048559    total: 2m 7s    remaining: 7.15s
## 947: learn: 0.4047428    total: 2m 7s    remaining: 7.02s
## 948: learn: 0.4044958    total: 2m 8s    remaining: 6.88s
## 949: learn: 0.4043833    total: 2m 8s    remaining: 6.75s
## 950: learn: 0.4042993    total: 2m 8s    remaining: 6.61s
## 951: learn: 0.4040880    total: 2m 8s    remaining: 6.48s
## 952: learn: 0.4038538    total: 2m 8s    remaining: 6.34s
## 953: learn: 0.4036576    total: 2m 8s    remaining: 6.21s
## 954: learn: 0.4033820    total: 2m 8s    remaining: 6.07s
## 955: learn: 0.4031938    total: 2m 9s    remaining: 5.94s
## 956: learn: 0.4030110    total: 2m 9s    remaining: 5.8s
## 957: learn: 0.4028270    total: 2m 9s    remaining: 5.67s
## 958: learn: 0.4027539    total: 2m 9s    remaining: 5.53s
## 959: learn: 0.4025269    total: 2m 9s    remaining: 5.4s
## 960: learn: 0.4022310    total: 2m 9s    remaining: 5.26s
## 961: learn: 0.4020234    total: 2m 9s    remaining: 5.13s
## 962: learn: 0.4018603    total: 2m 9s    remaining: 4.99s
## 963: learn: 0.4016239    total: 2m 10s   remaining: 4.86s
## 964: learn: 0.4014629    total: 2m 10s   remaining: 4.72s
## 965: learn: 0.4011126    total: 2m 10s   remaining: 4.59s
## 966: learn: 0.4009613    total: 2m 10s   remaining: 4.45s
## 967: learn: 0.4008136    total: 2m 10s   remaining: 4.32s
## 968: learn: 0.4005764    total: 2m 10s   remaining: 4.18s
## 969: learn: 0.4004623    total: 2m 10s   remaining: 4.05s
## 970: learn: 0.4003245    total: 2m 11s   remaining: 3.91s
## 971: learn: 0.4001915    total: 2m 11s   remaining: 3.78s
## 972: learn: 0.3999830    total: 2m 11s   remaining: 3.64s
## 973: learn: 0.3997619    total: 2m 11s   remaining: 3.51s
## 974: learn: 0.3993998    total: 2m 11s   remaining: 3.37s
## 975: learn: 0.3991994    total: 2m 11s   remaining: 3.24s
## 976: learn: 0.3990413    total: 2m 11s   remaining: 3.1s
## 977: learn: 0.3988642    total: 2m 12s   remaining: 2.97s
## 978: learn: 0.3985925    total: 2m 12s   remaining: 2.83s
## 979: learn: 0.3984158    total: 2m 12s   remaining: 2.7s
## 980: learn: 0.3982080    total: 2m 12s   remaining: 2.56s
## 981: learn: 0.3979609    total: 2m 12s   remaining: 2.43s
## 982: learn: 0.3978562    total: 2m 12s   remaining: 2.29s
## 983: learn: 0.3976559    total: 2m 12s   remaining: 2.16s
## 984: learn: 0.3975595    total: 2m 12s   remaining: 2.02s
## 985: learn: 0.3974083    total: 2m 13s   remaining: 1.89s
## 986: learn: 0.3972091    total: 2m 13s   remaining: 1.75s
## 987: learn: 0.3970769    total: 2m 13s   remaining: 1.62s
## 988: learn: 0.3968143    total: 2m 13s   remaining: 1.48s
## 989: learn: 0.3966581    total: 2m 13s   remaining: 1.35s
## 990: learn: 0.3962819    total: 2m 13s   remaining: 1.21s
## 991: learn: 0.3960981    total: 2m 13s   remaining: 1.08s
## 992: learn: 0.3957691    total: 2m 14s   remaining: 945ms
## 993: learn: 0.3954892    total: 2m 14s   remaining: 810ms
## 994: learn: 0.3952756    total: 2m 14s   remaining: 675ms
## 995: learn: 0.3949695    total: 2m 14s   remaining: 540ms
## 996: learn: 0.3947663    total: 2m 14s   remaining: 405ms
## 997: learn: 0.3945420    total: 2m 14s   remaining: 270ms
## 998: learn: 0.3942773    total: 2m 14s   remaining: 135ms
## 999: learn: 0.3941061    total: 2m 14s   remaining: 0us

summary(model)

## CatBoost model (1000 trees)
## Loss function: MultiClass
## Fit to 36 feature(s)

Para ver los resultados de igual manera nuestro conjunto de validación lo asignamos como un objeto de tipo pool y realizamos las predicciones con respecto a la variable Class

validation_pool = catboost.load_pool(data = validation[, -which(names(validation) == "Class")])
preds = catboost.predict(model, validation_pool, prediction_type = "Class")

Ahora para tener nuestra matriz de confusión debemos verificar que nuestros datos esten etiquetados con el mismo número de clases ya que como recordamos R tiende a inicar desde 1 sus factores:

levels(validation$Class)

##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11"

levels(factor(preds))

##  [1] "0"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

Ajustamos los niveles de las predicciones para que coincidan con nuestro dataset de validation y generamos nuestra matríz de confusión:

preds <- as.character(as.numeric(preds) + 1)  #Suma 1 a preds
preds <- factor(preds, levels = levels(validation$Class))  #Hace las predicciones nuevamente y las guarda como factor
confusionMatrix(preds, validation$Class)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   1   2   3   4   5   6   7   8   9  10  11
##         1  108   0   0   7   8   0   0   5   0   6   7
##         2    0 139  13   0   0   1  52   0   6  28  74
##         3    0  12 200   0   0   5  19   0   5  17  45
##         4    6   0   0  69   0   0   0   4   0   0   1
##         5    5   0   0   0  65   0   0   0   0   1   3
##         6    0   7   1   0   0 250  14   0   0  46  14
##         7    0  62  18   0   0   8 320   0  13  28  77
##         8    3   0   0   3   0   0   0 106   0   0   0
##         9    0   7   1   0   0   0  18   0 318   4  96
##         10   2  13   4   1   3  22  32   0   2 332  60
##         11   1  34  17   0   1   3  62   0  26  42 612
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7009          
##                  95% CI : (0.6856, 0.7158)
##     No Information Rate : 0.2752          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6536          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 1 Class: 2 Class: 3 Class: 4 Class: 5 Class: 6
## Sensitivity           0.86400  0.50730  0.78740  0.86250  0.84416  0.86505
## Specificity           0.99049  0.94759  0.96916  0.99687  0.99744  0.97519
## Pos Pred Value        0.76596  0.44409  0.66007  0.86250  0.87838  0.75301
## Neg Pred Value        0.99508  0.95885  0.98359  0.99687  0.99659  0.98804
## Prevalence            0.03478  0.07624  0.07067  0.02226  0.02142  0.08041
## Detection Rate        0.03005  0.03868  0.05565  0.01920  0.01809  0.06956
## Detection Prevalence  0.03923  0.08709  0.08431  0.02226  0.02059  0.09238
## Balanced Accuracy     0.92724  0.72744  0.87828  0.92968  0.92080  0.92012
##                      Class: 7 Class: 8 Class: 9 Class: 10 Class: 11
## Sensitivity           0.61896  0.92174  0.85946   0.65873    0.6188
## Specificity           0.93305  0.99828  0.96092   0.95502    0.9286
## Pos Pred Value        0.60837  0.94643  0.71622   0.70488    0.7669
## Neg Pred Value        0.93579  0.99742  0.98349   0.94492    0.8652
## Prevalence            0.14385  0.03200  0.10295   0.14023    0.2752
## Detection Rate        0.08904  0.02949  0.08848   0.09238    0.1703
## Detection Prevalence  0.14636  0.03116  0.12354   0.13105    0.2220
## Balanced Accuracy     0.77600  0.96001  0.91019   0.80687    0.7737

Retomemos el primer resultado, un arbol de decisión simple con tan solo el 33% de exactitud que después de haber realizado ingeniería de características, balanceo de clases e incluso modelos más adaptados al tipo de problema nos da ahora una exatitud del 70% es decir un 40% de mejora con respecto al primer modelo original y que al ser un clasificador multiclase lo considero un buen resultado aceptable.

Importancia de Características

En cualquier modelo de Machine Learning podemos visualizar cuales fueron las variables con mayor valor predictivo, aquí también puedo darme cuenta si el haber realizado las transformacions de columnas con respescto al artista e incluso el haber anexado ambas variables que eran texto y se conviertieron en categoría son de importancia para el modelo.

importancia = catboost.get_feature_importance(model)

col_names = setdiff(colnames(train_model), "Class") #Recuerda quite Class porque se entreno sin esta variable

df_importancia = data.frame(
  Feature = col_names,
  Importance = importancia
)

df_importancia = df_importancia[order(df_importancia$Importance, decreasing = TRUE), ]

ggplot(df_importancia, aes(x = reorder(Feature, Importance), y = Importance)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(title = "Importancia de las características - CatBoost",
       x = "Característica",
       y = "Importancia") +
  theme_minimal()

Como se puede observar, nuestra variable con mayor valor predictivo es aquella que agregamos para marcar cuando sí se convirtio la duracion de milisegundos a minutos, seguida del nombre del artista la cual agregamos como factor para que catboost pudiera manejarla, de ahí siguen todas nuestras características númericas estadisticas relacionadas al artista e incluso el Track Name otra variable categorica que se manejo con catboost.

El uso de estas variables nos permiten una mejor clasificación para un problema como este, multiclase en donde variables de tipo texto juegan un papel importante y deben incluirse en el modelo.

Tambien me gutaría conocer sí mi modelo tiende a centrar las probabilidades en ciertos umbrales intermedios, o si la mayoría esta entre 0 y 1.

prob_catboost = catboost.predict(model, validation_pool, prediction_type = "Probability")

catboost_densidad = data.frame(prob_catboost, Clase_Real = validation$Class) %>%
  pivot_longer(cols = starts_with("X"), names_to = "Clase_Predicha", values_to = "Probabilidad") %>%
  mutate(Probabilidad = round(Probabilidad, 2))

ggplot(catboost_densidad, aes(x = Probabilidad, fill = Clase_Predicha)) +
  geom_density(alpha = 0.6, adjust = 0.5) +
  labs(title = "Distribución de Probabilidades con CatBoost (Multiclase)",
       x = "Probabilidad estimada",
       y = "Densidad",
       fill = "Clase Predicha") +
  theme_minimal()

ggplot(catboost_densidad, aes(x = Probabilidad, fill = Clase_Predicha)) +
  geom_histogram(position = "identity", alpha = 0.4, bins = 10) +  
  labs(title = "Histograma de Probabilidades con CatBoost (Multiclase)",
       x = "Probabilidad estimada",
       y = "Frecuencia",
       fill = "Clase Predicha") +
  theme_minimal()

Esto me ayuda a entender que el modelo diferencia claramente las clases ya que la mayoría esta cercana a 0 en probabilidad o es cercana a 1, para tanto cuando clasifica correctamente como cuando no.

Sin embargo esto me dice que podría haber una forma de que las probabilidades de dos modelos ayuden a clasificar mejor este problema es por esto que intentaré con una tecnica llamada stacking de mis dos modelos con mejores resultados, catboost y XGBoost.

Stacking de modelos

Para esto, necesito tomar el resultado de ambas predicciones de mis modelos en forma de probabilidad, para el caso de catboost debo convertir en una matriz en donde contenga las clases como columnas.

pred_xgb = predict(modelo_rpartbal, newdata = validation, type = "prob")  

pred_cat = catboost.predict(model, catboost.load_pool(validation), prediction_type = "Probability")

n_clases = length(unique(validation$Class))
pred_cat_matrix = matrix(pred_cat, ncol = n_clases, byrow = TRUE)

Verifico que tengan el mismo nivel de clases:

xgb_names = paste0("xgb_", 1:n_clases)
cat_names = paste0("cat_", 1:n_clases)
xgb_names

##  [1] "xgb_1"  "xgb_2"  "xgb_3"  "xgb_4"  "xgb_5"  "xgb_6"  "xgb_7"  "xgb_8" 
##  [9] "xgb_9"  "xgb_10" "xgb_11"

cat_names

##  [1] "cat_1"  "cat_2"  "cat_3"  "cat_4"  "cat_5"  "cat_6"  "cat_7"  "cat_8" 
##  [9] "cat_9"  "cat_10" "cat_11"

Ahora preparo el dataframe con su número de columnas de xgboost y catboost con la variable objetivo correcta:

stack_df <- data.frame(
  setNames(pred_xgb, xgb_names),
  setNames(pred_cat_matrix, cat_names),
  target = validation$Class
)

levels(stack_df$target) = paste0("class", levels(stack_df$target)) #Para Cambiar nombre de las clases de la variable obj

Ahora puedo evaluar mi modelo usando validación cruzada y un metodo que admita clasificación multiclase:

ctrl = trainControl(method = "cv", number = 5, classProbs = TRUE)

meta_model = train(
  target ~ .,
  data = stack_df,
  method = "gbm",      
  trControl = ctrl
)

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.1769
##      2        2.2799             nan     0.1000    0.1223
##      3        2.1899             nan     0.1000    0.0985
##      4        2.1264             nan     0.1000    0.0765
##      5        2.0728             nan     0.1000    0.0591
##      6        2.0307             nan     0.1000    0.0496
##      7        1.9976             nan     0.1000    0.0416
##      8        1.9689             nan     0.1000    0.0304
##      9        1.9447             nan     0.1000    0.0259
##     10        1.9245             nan     0.1000    0.0214
##     20        1.8106             nan     0.1000    0.0066
##     40        1.6969             nan     0.1000   -0.0016
##     60        1.6283             nan     0.1000   -0.0021
##     80        1.5739             nan     0.1000   -0.0041
##    100        1.5338             nan     0.1000   -0.0039
##    120        1.4988             nan     0.1000   -0.0040
##    140        1.4655             nan     0.1000   -0.0070
##    150        1.4503             nan     0.1000   -0.0061
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.1965
##      2        2.2542             nan     0.1000    0.1323
##      3        2.1588             nan     0.1000    0.1051
##      4        2.0801             nan     0.1000    0.0841
##      5        2.0182             nan     0.1000    0.0596
##      6        1.9721             nan     0.1000    0.0478
##      7        1.9329             nan     0.1000    0.0416
##      8        1.8954             nan     0.1000    0.0373
##      9        1.8647             nan     0.1000    0.0319
##     10        1.8358             nan     0.1000    0.0255
##     20        1.6747             nan     0.1000   -0.0021
##     40        1.5145             nan     0.1000   -0.0040
##     60        1.4092             nan     0.1000   -0.0038
##     80        1.3265             nan     0.1000   -0.0028
##    100        1.2571             nan     0.1000   -0.0076
##    120        1.1971             nan     0.1000   -0.0051
##    140        1.1415             nan     0.1000   -0.0057
##    150        1.1168             nan     0.1000   -0.0045
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.2124
##      2        2.2392             nan     0.1000    0.1521
##      3        2.1237             nan     0.1000    0.1044
##      4        2.0373             nan     0.1000    0.0849
##      5        1.9689             nan     0.1000    0.0648
##      6        1.9147             nan     0.1000    0.0515
##      7        1.8705             nan     0.1000    0.0449
##      8        1.8306             nan     0.1000    0.0283
##      9        1.7980             nan     0.1000    0.0281
##     10        1.7706             nan     0.1000    0.0280
##     20        1.5777             nan     0.1000   -0.0033
##     40        1.3878             nan     0.1000   -0.0070
##     60        1.2571             nan     0.1000   -0.0055
##     80        1.1536             nan     0.1000   -0.0072
##    100        1.0618             nan     0.1000   -0.0058
##    120        0.9799             nan     0.1000   -0.0057
##    140        0.9093             nan     0.1000   -0.0043
##    150        0.8779             nan     0.1000   -0.0062
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.1780
##      2        2.2780             nan     0.1000    0.1263
##      3        2.1922             nan     0.1000    0.0907
##      4        2.1276             nan     0.1000    0.0743
##      5        2.0756             nan     0.1000    0.0535
##      6        2.0387             nan     0.1000    0.0460
##      7        2.0065             nan     0.1000    0.0339
##      8        1.9805             nan     0.1000    0.0354
##      9        1.9551             nan     0.1000    0.0320
##     10        1.9316             nan     0.1000    0.0220
##     20        1.8159             nan     0.1000    0.0029
##     40        1.7100             nan     0.1000   -0.0015
##     60        1.6389             nan     0.1000   -0.0034
##     80        1.5886             nan     0.1000   -0.0039
##    100        1.5490             nan     0.1000   -0.0067
##    120        1.5139             nan     0.1000   -0.0043
##    140        1.4827             nan     0.1000   -0.0054
##    150        1.4681             nan     0.1000   -0.0044
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.1885
##      2        2.2563             nan     0.1000    0.1211
##      3        2.1624             nan     0.1000    0.1002
##      4        2.0874             nan     0.1000    0.0838
##      5        2.0289             nan     0.1000    0.0617
##      6        1.9804             nan     0.1000    0.0513
##      7        1.9397             nan     0.1000    0.0366
##      8        1.9077             nan     0.1000    0.0303
##      9        1.8790             nan     0.1000    0.0332
##     10        1.8527             nan     0.1000    0.0279
##     20        1.6854             nan     0.1000    0.0087
##     40        1.5241             nan     0.1000   -0.0063
##     60        1.4224             nan     0.1000   -0.0038
##     80        1.3367             nan     0.1000   -0.0046
##    100        1.2660             nan     0.1000   -0.0060
##    120        1.2059             nan     0.1000   -0.0040
##    140        1.1482             nan     0.1000   -0.0061
##    150        1.1217             nan     0.1000   -0.0050
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.2109
##      2        2.2453             nan     0.1000    0.1318
##      3        2.1329             nan     0.1000    0.1013
##      4        2.0511             nan     0.1000    0.0692
##      5        1.9899             nan     0.1000    0.0587
##      6        1.9345             nan     0.1000    0.0483
##      7        1.8897             nan     0.1000    0.0429
##      8        1.8498             nan     0.1000    0.0355
##      9        1.8178             nan     0.1000    0.0294
##     10        1.7845             nan     0.1000    0.0267
##     20        1.5961             nan     0.1000    0.0001
##     40        1.3977             nan     0.1000   -0.0060
##     60        1.2690             nan     0.1000   -0.0095
##     80        1.1608             nan     0.1000   -0.0089
##    100        1.0664             nan     0.1000   -0.0066
##    120        0.9844             nan     0.1000   -0.0038
##    140        0.9136             nan     0.1000   -0.0055
##    150        0.8847             nan     0.1000   -0.0056
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.1733
##      2        2.2771             nan     0.1000    0.1220
##      3        2.1923             nan     0.1000    0.0810
##      4        2.1306             nan     0.1000    0.0733
##      5        2.0797             nan     0.1000    0.0533
##      6        2.0405             nan     0.1000    0.0417
##      7        2.0104             nan     0.1000    0.0376
##      8        1.9814             nan     0.1000    0.0385
##      9        1.9541             nan     0.1000    0.0241
##     10        1.9338             nan     0.1000    0.0231
##     20        1.8123             nan     0.1000    0.0095
##     40        1.7015             nan     0.1000   -0.0041
##     60        1.6351             nan     0.1000   -0.0023
##     80        1.5838             nan     0.1000   -0.0054
##    100        1.5421             nan     0.1000   -0.0040
##    120        1.5069             nan     0.1000   -0.0030
##    140        1.4726             nan     0.1000   -0.0041
##    150        1.4584             nan     0.1000   -0.0047
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.2033
##      2        2.2491             nan     0.1000    0.1326
##      3        2.1549             nan     0.1000    0.0964
##      4        2.0856             nan     0.1000    0.0748
##      5        2.0264             nan     0.1000    0.0620
##      6        1.9796             nan     0.1000    0.0527
##      7        1.9375             nan     0.1000    0.0501
##      8        1.8992             nan     0.1000    0.0312
##      9        1.8689             nan     0.1000    0.0305
##     10        1.8427             nan     0.1000    0.0290
##     20        1.6770             nan     0.1000    0.0027
##     40        1.5246             nan     0.1000   -0.0035
##     60        1.4210             nan     0.1000   -0.0081
##     80        1.3357             nan     0.1000   -0.0051
##    100        1.2673             nan     0.1000   -0.0037
##    120        1.2017             nan     0.1000   -0.0059
##    140        1.1508             nan     0.1000   -0.0065
##    150        1.1226             nan     0.1000   -0.0057
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.2072
##      2        2.2364             nan     0.1000    0.1506
##      3        2.1274             nan     0.1000    0.1047
##      4        2.0450             nan     0.1000    0.0778
##      5        1.9800             nan     0.1000    0.0595
##      6        1.9286             nan     0.1000    0.0624
##      7        1.8783             nan     0.1000    0.0392
##      8        1.8375             nan     0.1000    0.0362
##      9        1.8049             nan     0.1000    0.0305
##     10        1.7755             nan     0.1000    0.0213
##     20        1.5915             nan     0.1000    0.0065
##     40        1.3920             nan     0.1000   -0.0058
##     60        1.2619             nan     0.1000   -0.0069
##     80        1.1565             nan     0.1000   -0.0063
##    100        1.0615             nan     0.1000   -0.0068
##    120        0.9786             nan     0.1000   -0.0053
##    140        0.9119             nan     0.1000   -0.0050
##    150        0.8818             nan     0.1000   -0.0064
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.1733
##      2        2.2753             nan     0.1000    0.1242
##      3        2.1920             nan     0.1000    0.0927
##      4        2.1291             nan     0.1000    0.0758
##      5        2.0805             nan     0.1000    0.0502
##      6        2.0439             nan     0.1000    0.0478
##      7        2.0084             nan     0.1000    0.0340
##      8        1.9838             nan     0.1000    0.0358
##      9        1.9578             nan     0.1000    0.0301
##     10        1.9346             nan     0.1000    0.0209
##     20        1.8081             nan     0.1000    0.0056
##     40        1.7004             nan     0.1000   -0.0037
##     60        1.6332             nan     0.1000   -0.0032
##     80        1.5783             nan     0.1000   -0.0039
##    100        1.5375             nan     0.1000   -0.0042
##    120        1.5023             nan     0.1000   -0.0070
##    140        1.4743             nan     0.1000   -0.0067
##    150        1.4616             nan     0.1000   -0.0044
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.1922
##      2        2.2496             nan     0.1000    0.1278
##      3        2.1568             nan     0.1000    0.1098
##      4        2.0825             nan     0.1000    0.0748
##      5        2.0244             nan     0.1000    0.0598
##      6        1.9784             nan     0.1000    0.0534
##      7        1.9363             nan     0.1000    0.0380
##      8        1.9003             nan     0.1000    0.0368
##      9        1.8686             nan     0.1000    0.0280
##     10        1.8420             nan     0.1000    0.0228
##     20        1.6771             nan     0.1000    0.0003
##     40        1.5234             nan     0.1000   -0.0039
##     60        1.4259             nan     0.1000   -0.0056
##     80        1.3426             nan     0.1000   -0.0080
##    100        1.2716             nan     0.1000   -0.0045
##    120        1.2120             nan     0.1000   -0.0078
##    140        1.1571             nan     0.1000   -0.0063
##    150        1.1289             nan     0.1000   -0.0056
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.2043
##      2        2.2306             nan     0.1000    0.1249
##      3        2.1263             nan     0.1000    0.1083
##      4        2.0431             nan     0.1000    0.0797
##      5        1.9811             nan     0.1000    0.0665
##      6        1.9268             nan     0.1000    0.0515
##      7        1.8801             nan     0.1000    0.0510
##      8        1.8385             nan     0.1000    0.0304
##      9        1.8075             nan     0.1000    0.0225
##     10        1.7784             nan     0.1000    0.0261
##     20        1.5914             nan     0.1000   -0.0001
##     40        1.4073             nan     0.1000   -0.0069
##     60        1.2725             nan     0.1000   -0.0080
##     80        1.1675             nan     0.1000   -0.0081
##    100        1.0748             nan     0.1000   -0.0074
##    120        0.9971             nan     0.1000   -0.0068
##    140        0.9300             nan     0.1000   -0.0069
##    150        0.8963             nan     0.1000   -0.0069
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.1741
##      2        2.2753             nan     0.1000    0.1245
##      3        2.1913             nan     0.1000    0.0870
##      4        2.1319             nan     0.1000    0.0743
##      5        2.0824             nan     0.1000    0.0619
##      6        2.0365             nan     0.1000    0.0505
##      7        1.9995             nan     0.1000    0.0317
##      8        1.9738             nan     0.1000    0.0338
##      9        1.9500             nan     0.1000    0.0274
##     10        1.9295             nan     0.1000    0.0264
##     20        1.8067             nan     0.1000    0.0056
##     40        1.6953             nan     0.1000   -0.0031
##     60        1.6276             nan     0.1000   -0.0040
##     80        1.5745             nan     0.1000   -0.0040
##    100        1.5338             nan     0.1000   -0.0020
##    120        1.4987             nan     0.1000   -0.0029
##    140        1.4684             nan     0.1000   -0.0055
##    150        1.4535             nan     0.1000   -0.0042
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.2153
##      2        2.2482             nan     0.1000    0.1289
##      3        2.1529             nan     0.1000    0.0956
##      4        2.0823             nan     0.1000    0.0724
##      5        2.0257             nan     0.1000    0.0624
##      6        1.9765             nan     0.1000    0.0516
##      7        1.9355             nan     0.1000    0.0422
##      8        1.8991             nan     0.1000    0.0413
##      9        1.8677             nan     0.1000    0.0268
##     10        1.8432             nan     0.1000    0.0194
##     20        1.6760             nan     0.1000   -0.0002
##     40        1.5136             nan     0.1000   -0.0033
##     60        1.4098             nan     0.1000   -0.0066
##     80        1.3297             nan     0.1000   -0.0081
##    100        1.2595             nan     0.1000   -0.0057
##    120        1.1961             nan     0.1000   -0.0047
##    140        1.1375             nan     0.1000   -0.0066
##    150        1.1113             nan     0.1000   -0.0067
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.2018
##      2        2.2359             nan     0.1000    0.1348
##      3        2.1302             nan     0.1000    0.1042
##      4        2.0478             nan     0.1000    0.0777
##      5        1.9818             nan     0.1000    0.0682
##      6        1.9251             nan     0.1000    0.0543
##      7        1.8801             nan     0.1000    0.0427
##      8        1.8411             nan     0.1000    0.0308
##      9        1.8084             nan     0.1000    0.0270
##     10        1.7780             nan     0.1000    0.0184
##     20        1.5778             nan     0.1000    0.0061
##     40        1.3844             nan     0.1000   -0.0091
##     60        1.2543             nan     0.1000   -0.0090
##     80        1.1474             nan     0.1000   -0.0054
##    100        1.0568             nan     0.1000   -0.0049
##    120        0.9797             nan     0.1000   -0.0070
##    140        0.9125             nan     0.1000   -0.0053
##    150        0.8818             nan     0.1000   -0.0042
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        2.3979             nan     0.1000    0.2189
##      2        2.2347             nan     0.1000    0.1416
##      3        2.1291             nan     0.1000    0.0997
##      4        2.0543             nan     0.1000    0.0837
##      5        1.9922             nan     0.1000    0.0649
##      6        1.9424             nan     0.1000    0.0543
##      7        1.8991             nan     0.1000    0.0434
##      8        1.8612             nan     0.1000    0.0369
##      9        1.8293             nan     0.1000    0.0317
##     10        1.7998             nan     0.1000    0.0255
##     20        1.6126             nan     0.1000    0.0054
##     40        1.4295             nan     0.1000   -0.0050
##     50        1.3679             nan     0.1000   -0.0060

Imprimo el resultado:

meta_model

## Stochastic Gradient Boosting 
## 
## 3594 samples
##   22 predictor
##   11 classes: 'class1', 'class2', 'class3', 'class4', 'class5', 'class6', 'class7', 'class8', 'class9', 'class10', 'class11' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 2878, 2875, 2876, 2874, 2873 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.trees  Accuracy   Kappa    
##   1                   50      0.3748204  0.2188873
##   1                  100      0.3881631  0.2429028
##   1                  150      0.3912058  0.2519222
##   2                   50      0.3953876  0.2555536
##   2                  100      0.3998371  0.2667736
##   2                  150      0.3964980  0.2647965
##   3                   50      0.4051269  0.2701325
##   3                  100      0.4020570  0.2703151
##   3                  150      0.3973274  0.2662116
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## 
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were n.trees = 50, interaction.depth =
##  3, shrinkage = 0.1 and n.minobsinnode = 10.

Probe con XGBoost, Regresión Lineal, SVM con kernel radial y GBM, siendo este ultimo el que me arrojo mejores resultados, sin embargp los resultados obtenidos no eran mejor que los que obtuve solo usando Catboost.

Conclusiones

En este caso específico, se concluye que la mejora en la clasificación de los géneros musicales estuvo fuertemente influenciada por el adecuado tratamiento de los datos. Inicialmente, el modelo presentaba métricas de desempeño muy bajas, con una precisión aproximada del 30%. Sin embargo, tras aplicar una serie de transformaciones y estrategias sobre los datos, se logró alcanzar una precisión del 70%, lo que representa una mejora significativa del 40%.

Entre los principales factores que contribuyeron a esta mejora destacan:

La creación de nuevas variables relacionadas con el artista, lo que permitió capturar patrones relevantes asociados al estilo musical.

El tratamiento del texto en los nombres de artistas y canciones, transformando estas variables categóricas en representaciones más útiles para que el modelo pudiera manejarlas como categoricas.

La elección de un modelo capaz de manejar eficazmente variables categóricas, lo cual fue clave para aprovechar la riqueza semántica de los datos textuales.

Aunque las probabilidades de predicción por clase resultaron ser relativamente bajas (en su mayoría entre 0 y 0.13), el modelo fue capaz de identificar correctamente la clase más probable en un alto porcentaje de los casos. Esto indica que, si bien el modelo aún muestra cierta incertidumbre en sus decisiones, logra capturar información suficiente para realizar predicciones correctas con frecuencia.

Por lo tanto, se puede afirmar que el éxito en la clasificación no solo depende del algoritmo elegido, sino especialmente del trabajo previo de ingeniería de características y procesamiento de datos. Esta experiencia subraya la importancia de dedicar tiempo y esfuerzo a comprender, limpiar, transformar y enriquecer los datos antes de aplicar cualquier modelo de aprendizaje automático.

Clasificador Musical