Utilizando os dados sobre filmes e suas revisões extraídos do MovieLens + IMDb/Rotten Tomatoes, tentaremos predizer a variável rtAudienceScore (quantas pessoas gostaram do filme segundo o rotten tomatoes, numa escala de 0 a 100).
Como estamos usando dados de diversas fontes temos bastante informações sobre cada filme, como por exemplo nome, diretor, gênero, atores, localidade em que o filme foi filmado, etc e suas avaliações nos dois maiores sites de mídia social sobre filmes.
Analisando os dados do arquivo movies.dat temos:
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
movies <- read.delim("~/Projetos/DataAnalysis/Assignment7/hetrec2011-movielens-2k-v2/movies_m.dat")
# Data set
head(movies)
## id title imdbID
## 1 1 Toy story 114709
## 2 2 Jumanji 113497
## 3 3 Grumpy Old Men 107050
## 4 4 Waiting to Exhale 114885
## 5 5 Father of the Bride Part II 113041
## 6 6 Heat 113277
## spanishTitle
## 1 Toy story (juguetes)
## 2 Jumanji
## 3 Dos viejos grunones
## 4 Esperando un respiro
## 5 Vuelve el padre de la novia (Ahora tambien abuelo)
## 6 Heat
## imdbPictureURL
## 1 http://ia.media-imdb.com/images/M/MV5BMTMwNDU0NTY2Nl5BMl5BanBnXkFtZTcwOTUxOTM5Mw@@._V1._SX214_CR0,0,214,314_.jpg
## 2 http://ia.media-imdb.com/images/M/MV5BMzM5NjE1OTMxNV5BMl5BanBnXkFtZTcwNDY2MzEzMQ@@._V1._SY314_CR3,0,214,314_.jpg
## 3 http://ia.media-imdb.com/images/M/MV5BMTI5MTgyMzE0OF5BMl5BanBnXkFtZTYwNzAyNjg5._V1._SX214_CR0,0,214,314_.jpg
## 4 http://ia.media-imdb.com/images/M/MV5BMTczMTMyMTgyM15BMl5BanBnXkFtZTcwOTc4OTQyMQ@@._V1._SY314_CR4,0,214,314_.jpg
## 5 http://ia.media-imdb.com/images/M/MV5BMTg1NDc2MjExOF5BMl5BanBnXkFtZTcwNjU1NDAzMQ@@._V1._SY314_CR5,0,214,314_.jpg
## 6 http://ia.media-imdb.com/images/M/MV5BMTM1NDc4ODkxNV5BMl5BanBnXkFtZTcwNTI4ODE3MQ@@._V1._SY314_CR1,0,214,314_.jpg
## year rtID rtAllCriticsRating
## 1 1995 toy_story 9
## 2 1995 1068044-jumanji 5.6
## 3 1993 grumpy_old_men 5.9
## 4 1995 waiting_to_exhale 5.6
## 5 1995 father_of_the_bride_part_ii 5.3
## 6 1995 1068182-heat 7.7
## rtAllCriticsNumReviews rtAllCriticsNumFresh rtAllCriticsNumRotten
## 1 73 73 0
## 2 28 13 15
## 3 36 24 12
## 4 25 14 11
## 5 19 9 10
## 6 58 50 8
## rtAllCriticsScore rtTopCriticsRating rtTopCriticsNumReviews
## 1 100 8.5 17
## 2 46 5.8 5
## 3 66 7 6
## 4 56 5.5 11
## 5 47 5.4 5
## 6 86 7.2 17
## rtTopCriticsNumFresh rtTopCriticsNumRotten rtTopCriticsScore
## 1 17 0 100
## 2 2 3 40
## 3 5 1 83
## 4 5 6 45
## 5 1 4 20
## 6 14 3 82
## rtAudienceRating rtAudienceNumRatings rtAudienceScore
## 1 3.7 102338 81
## 2 3.2 44587 61
## 3 3.2 10489 66
## 4 3.3 5666 79
## 5 3 13761 64
## 6 3.9 42785 92
## rtPictureURL
## 1 http://content7.flixster.com/movie/10/93/63/10936393_det.jpg
## 2 http://content8.flixster.com/movie/56/79/73/5679734_det.jpg
## 3 http://content6.flixster.com/movie/25/60/256020_det.jpg
## 4 http://content9.flixster.com/movie/10/94/17/10941715_det.jpg
## 5 http://content8.flixster.com/movie/25/54/255426_det.jpg
## 6 http://content9.flixster.com/movie/26/80/268099_det.jpg
# Quantidade de filmes diferentes
length(unique(movies$id))
## [1] 10197
# Ano do filme
summary(movies$year)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1903 1981 1995 1988 2002 2011
Podemos observar que alguns filmes tem o valor do rtAudienceScore igual à NA. Decidimos retirar os filmes que tem o valor de rtAudienceScore igual à NA.
movies$rtAudienceScore <- as.numeric(as.character(movies$rtAudienceScore))
movies <- movies[complete.cases(movies),]
# Histograma da Audience Score
hist(movies$rtAudienceScore, main = "Histograma da Audience Score", xlab = "Audience Score")
Foi observado que os filmes com valor rtAudienceScore igual à 0 não tiveram nenhuma avaliação. Por essa razão decidimos retirar esses filmes da nossa amostra.
movies <- filter(movies, rtAudienceScore > 0)
hist(movies$rtAudienceScore, main = "Histograma da Audience Score", xlab = "Audience Score")
Como os dados estão espalhados em várias tabelas, o próximo passo é juntar esses dados em uma única tabela. Para só depois criar o modelo que vai predizer o valor de Audience Score de cada filme.
movie_countries <- read.delim("~/Projetos/DataAnalysis/Assignment7/hetrec2011-movielens-2k-v2/movie_countries.dat")
movie_directors <- read.delim("~/Projetos/DataAnalysis/Assignment7/hetrec2011-movielens-2k-v2/movie_directors.dat")
movie_directors$directorID <- as.numeric(movie_directors$directorID)
movies_data <- movies %>%
select(id, title, year, rtAudienceScore) %>%
left_join(movie_countries, c("id" = "movieID")) %>%
left_join(movie_directors, c("id" = "movieID"))
head(movies_data)
## id title year rtAudienceScore country directorID
## 1 1 Toy story 1995 81 USA 2209
## 2 2 Jumanji 1995 61 USA 2130
## 3 3 Grumpy Old Men 1993 66 USA 1267
## 4 4 Waiting to Exhale 1995 79 USA 1477
## 5 5 Father of the Bride Part II 1995 64 USA 887
## 6 6 Heat 1995 92 USA 2834
## directorName
## 1 John Lasseter
## 2 Joe Johnston
## 3 Donald Petrie
## 4 Forest Whitaker
## 5 Charles Shyer
## 6 Michael Mann
Antes de criar o modelo é importante verificar se existe algum filme que possui algum valor de alguma coluna igual à NA. Se existir vamos optar por retirar o filme do dataset.
# Removendo linhas com valores NA
movies_data <- movies_data[complete.cases(movies_data),]
Vamos dividir os dados em treino e teste.
require(caret)
## Loading required package: caret
## Loading required package: lattice
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
##
## The following object is masked _by_ '.GlobalEnv':
##
## movies
require(lattice)
require(ggplot2)
#Transformando as variáveis para numeric
movies_data$title <- as.numeric(movies_data$title)
movies_data$country <- as.numeric(movies_data$country)
movies_data$directorID <- as.numeric(movies_data$directorID)
movies_data$directorName <- as.numeric(movies_data$directorName)
movies_data_new <- movies_data
set.seed(12345)
split<-createDataPartition(y = movies_data$rtAudienceScore,
p = 0.7,
list = FALSE)
# Divisão em treino e teste
movies_data.treino <- movies_data[split,]
movies_data.teste <- movies_data[-split,]
# Criando partições de treinamento
ctrl <- trainControl(method = "cv", number = 10)
Vamos agora criar o nosso primeiro modelo, com base na tabela criada.
# Trienando
lm <- train(rtAudienceScore ~. ,
data = movies_data.treino,
method = "lm",
trControl = ctrl,
metric = "Rsquared")
lm
## Linear Regression
##
## 5139 samples
## 6 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
##
## Summary of sample sizes: 4625, 4624, 4625, 4625, 4624, 4626, ...
##
## Resampling results
##
## RMSE Rsquared RMSE SD Rsquared SD
## 16.97969 0.09961777 0.4196769 0.02768347
##
##
Podemos observar que o valor do Rsquared foi de 0.104. Esse é o percentual de variância explicada pelo o modelo. Não podemos afimar se esse valor é bom ou ruim ainda. Para poder afimar isso é necessário que a gente compare esse valor com outros modelos. É possível notar quais variáveis são as mais importantes para o modelo.
plot(varImp(lm))
summary(lm)
##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -55.888 -11.996 2.148 13.169 38.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.864e+02 3.046e+01 22.530 < 2e-16 ***
## id 7.009e-05 1.359e-05 5.156 2.61e-07 ***
## title 7.638e-05 8.643e-05 0.884 0.377
## year -3.069e-01 1.529e-02 -20.072 < 2e-16 ***
## country -1.917e-01 1.429e-02 -13.415 < 2e-16 ***
## directorID 3.472e-04 4.767e-04 0.728 0.466
## directorName 1.254e-05 4.627e-04 0.027 0.978
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.97 on 5132 degrees of freedom
## Multiple R-squared: 0.09967, Adjusted R-squared: 0.09862
## F-statistic: 94.69 on 6 and 5132 DF, p-value: < 2.2e-16
Antes de comparar com outros modelos, vamos verificar qual é o valor do Rsquared passando os dados de teste.
predictedVal <- predict(lm, movies_data.teste)
modelvalueslm <-data.frame(obs = movies_data.teste$rtAudienceScore, pred = predictedVal)
summary(movies_data.teste$rtAudienceScore)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 52.00 68.00 65.67 80.00 100.00
defaultSummary(modelvalueslm)
## RMSE Rsquared
## 16.7035884 0.1053235
É possível verificar que o valor do Rsquared deu um pouco diferente do valor do treino, isso já era esperado. Vamos agora repetir o procedimento para outros modelos para só depois escolher o modelo e melhorá-lo.
Criando outro modelos, Boosted LM.
require(bst)
## Loading required package: bst
# Trienando
bstLs <- train(rtAudienceScore ~. ,
data = movies_data.treino,
method = "bstLs",
trControl = ctrl,
metric = "Rsquared")
bstLs
## Boosted Linear Model
##
## 5139 samples
## 6 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
##
## Summary of sample sizes: 4626, 4626, 4624, 4625, 4625, 4626, ...
##
## Resampling results across tuning parameters:
##
## mstop RMSE Rsquared RMSE SD Rsquared SD
## 50 17.82715 0.01518627 0.2631804 0.006972554
## 100 17.79750 0.01839019 0.2639842 0.008070009
## 150 17.77121 0.02148814 0.2631823 0.009112617
##
## Tuning parameter 'nu' was held constant at a value of 0.1
## Rsquared was used to select the optimal model using the largest value.
## The final values used for the model were mstop = 150 and nu = 0.1.
plot(varImp(bstLs))
predictedVal <- predict(bstLs, movies_data.teste)
modelvaluesbstLs <-data.frame(obs = movies_data.teste$rtAudienceScore, pred = predictedVal)
defaultSummary(modelvaluesbstLs)
## RMSE Rsquared
## 17.54204844 0.02409413
Para Knn
# Trienando
knn <- train(rtAudienceScore ~. ,
data = movies_data.treino,
method = "knn",
trControl = ctrl,
preProcess = c("center","scale"),
tuneGrid = expand.grid(.k = 3:6),
metric = "Rsquared")
knn
## k-Nearest Neighbors
##
## 5139 samples
## 6 predictors
##
## Pre-processing: centered, scaled
## Resampling: Cross-Validated (10 fold)
##
## Summary of sample sizes: 4625, 4625, 4624, 4625, 4625, 4626, ...
##
## Resampling results across tuning parameters:
##
## k RMSE Rsquared RMSE SD Rsquared SD
## 3 18.86263 0.05842262 0.4335260 0.01448581
## 4 18.33816 0.06415954 0.4456524 0.01504604
## 5 18.04841 0.06698007 0.3559604 0.01441641
## 6 17.83106 0.07116678 0.3311527 0.01561322
##
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was k = 6.
predictedVal <- predict(knn, movies_data.teste)
modelvaluesknn <-data.frame(obs = movies_data.teste$rtAudienceScore, pred = predictedVal)
plot(varImp(knn))
defaultSummary(modelvaluesknn)
## RMSE Rsquared
## 17.40348300 0.08785419
Para Boosted Tree
# Trienando
bstTree <- train(rtAudienceScore ~. ,
data = movies_data.treino,
method = "bstTree",
trControl = ctrl,
metric = "Rsquared")
bstTree
## Boosted Tree
##
## 5139 samples
## 6 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
##
## Summary of sample sizes: 4624, 4624, 4625, 4625, 4624, 4627, ...
##
## Resampling results across tuning parameters:
##
## maxdepth mstop RMSE Rsquared RMSE SD Rsquared SD
## 1 50 16.67926 0.1344349 0.3900484 0.02643764
## 1 100 16.59078 0.1423434 0.3930347 0.02623926
## 1 150 16.54617 0.1463535 0.3912763 0.02635218
## 2 50 16.46734 0.1552640 0.4346779 0.03320780
## 2 100 16.35321 0.1658153 0.4437705 0.03352897
## 2 150 16.30016 0.1707879 0.4794858 0.03669232
## 3 50 16.33290 0.1685160 0.4552249 0.03414061
## 3 100 16.20553 0.1803845 0.5033434 0.03751579
## 3 150 16.16259 0.1843452 0.5369949 0.04029391
##
## Tuning parameter 'nu' was held constant at a value of 0.1
## Rsquared was used to select the optimal model using the largest value.
## The final values used for the model were mstop = 150, maxdepth = 3 and
## nu = 0.1.
predictedVal <- predict(bstTree, movies_data.teste)
modelvaluesbstTree <-data.frame(obs = movies_data.teste$rtAudienceScore, pred = predictedVal)
plot(varImp(bstTree))
defaultSummary(modelvaluesbstTree)
## RMSE Rsquared
## 15.795556 0.200198
Depois de criar vários modelos, vamos escolher o modelo que teve maior valor de Rsquared.
results <- resamples(list(lm = lm, bstLs = bstLs, Knn = knn, bstTree = bstTree))
bwplot(results)
bwplot(results, xlim=0:1)
Removendo dados que não vão ser mais usados para liberar mais memória no pc
rm(modelvaluesbstLs)
rm(modelvaluesknn)
rm(movie_countries)
rm(movies)
rm(movies_data)
rm(modelvalueslm)
rm(movies_data.teste)
rm(movies_data.treino)
rm(split)
rm(bstLs)
rm(knn)
rm(lm)
É possível notar que bstTree obteve o melhor valor de Rsquared, o que significa que ele é o melhor modelo para o nosso problema. Tentando aumentar o valor do Rsquared criamos novas variáveis.
# Quantidade de filmes de cada diretor
movies_director <- as.data.frame(table(movie_directors$directorID))
colnames(movies_director) <- c("directorID", "nFilmes")
movies_director$directorID <- as.numeric(movies_director$directorID)
summary(movies_director$nFilmes)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 2.501 3.000 48.000
movie_actors <- read.delim("~/Projetos/DataAnalysis/Assignment7/hetrec2011-movielens-2k-v2/movie_actors.dat")
# Soma dos Ranking dos atores de um filmes
groupbyActors <- group_by(movie_actors, movieID)
sumActores <- summarise(groupbyActors, sum(ranking))
colnames(sumActores) <- c("movieID", "nRanking")
# Ator com o maior ranking
maxActores <- summarise(groupbyActors, max(ranking))
colnames(maxActores) <- c("movieID", "maxRanking")
# Media do ranking dos atores
meanActores <- summarise(groupbyActors, mean(ranking))
colnames(meanActores) <- c("movieID", "meanRanking")
# Mediana do ranking dos atores
medianActores <- summarise(groupbyActors, median(ranking))
colnames(medianActores) <- c("movieID", "medianRanking")
# Gênero de cada filme
movie_genres <- read.delim("~/Projetos/DataAnalysis/Assignment7/hetrec2011-movielens-2k-v2/movie_genres.dat")
movie_genres$genre <- as.numeric(movie_genres$genre)
groupbyGenres <- group_by(movie_genres, movieID)
firstGenres <- summarise(groupbyGenres, max(genre))
colnames(firstGenres) <- c("movieID", "firstGenres")
lastGenres <- summarise(groupbyGenres, min(genre))
colnames(lastGenres) <- c("movieID", "lastGenres")
medianGenres <- summarise(groupbyGenres, median(genre))
colnames(medianGenres) <- c("movieID", "medianGenres")
meanGenres <- summarise(groupbyGenres, mean(genre))
colnames(meanGenres) <- c("movieID", "meanGenres")
# Tags de cada filme
movie_tags <- read.delim("~/Projetos/DataAnalysis/Assignment7/hetrec2011-movielens-2k-v2/movie_tags.dat")
movie_tags$tagID <- as.numeric(movie_tags$tagID)
groupbyTags <- group_by(movie_tags, movieID)
firstTags <- summarise(groupbyTags, max(tagID))
colnames(firstTags) <- c("movieID", "firstTags")
lastTags <- summarise(groupbyTags, min(tagID))
colnames(lastTags) <- c("movieID", "lastTags")
medianTags <- summarise(groupbyTags, median(tagID))
colnames(medianTags) <- c("movieID", "medianTags")
meanTags <- summarise(groupbyTags, mean(tagID))
colnames(meanTags) <- c("movieID", "meanTags")
Depois de criar as novas variáveis vamos juntar com a nossa tabela.
movies_data_new <- movies_data_new %>%
left_join(movies_director, c("directorID" = "directorID")) %>%
left_join(sumActores, c("id" = "movieID")) %>%
left_join(maxActores, c("id" = "movieID")) %>%
left_join(meanActores, c("id" = "movieID")) %>%
left_join(medianActores, c("id" = "movieID")) %>%
left_join(firstGenres, c("id" = "movieID")) %>%
left_join(lastGenres, c("id" = "movieID")) %>%
left_join(medianGenres, c("id" = "movieID")) %>%
left_join(meanGenres, c("id" = "movieID")) %>%
left_join(firstTags, c("id" = "movieID")) %>%
left_join(lastTags, c("id" = "movieID")) %>%
left_join(medianTags, c("id" = "movieID")) %>%
left_join(meanTags, c("id" = "movieID"))
head(movies_data_new)
## id title year rtAudienceScore country directorID directorName nFilmes
## 1 1 8709 1995 81 68 2209 2025 5
## 2 2 3743 1995 61 68 2130 1936 6
## 3 3 2933 1993 66 68 1267 1020 12
## 4 4 9031 1995 79 68 1477 1239 3
## 5 5 2437 1995 64 68 887 609 8
## 6 6 3073 1995 92 68 2834 2718 10
## nRanking maxRanking meanRanking medianRanking firstGenres lastGenres
## 1 300 24 12.5 12.5 9 2
## 2 171 18 9.5 9.5 9 2
## 3 136 16 8.5 8.5 15 5
## 4 210 20 10.5 10.5 15 5
## 5 351 26 13.5 13.5 5 5
## 6 1128 47 24.0 24.0 18 1
## medianGenres meanGenres firstTags lastTags medianTags meanTags
## 1 4 4.600000 15170 7 1925.0 4198.884
## 2 4 5.000000 14371 13 1893.5 3589.500
## 3 10 10.000000 13668 380 3219.0 5109.000
## 4 8 9.333333 NA NA NA NA
## 5 5 5.000000 4953 125 2185.0 2523.375
## 6 6 8.333333 15248 351 2773.0 4287.542
# Substituindo NAs por 0
movies_data_new[is.na(movies_data_new)] <- 0
Agora vamos criar o novo modelo
set.seed(12345)
split<-createDataPartition(y = movies_data_new$rtAudienceScore,
p = 0.7,
list = FALSE)
# Divisão em treino e teste
movies_data_new.treino <- movies_data_new[split,]
movies_data_new.teste <- movies_data_new[-split,]
# Criando partições de treinamento
ctrl <- trainControl(method = "cv", number = 10)
# Trienando
bstTree_new <- train(rtAudienceScore ~. ,
data = movies_data_new.treino,
method = "bstTree",
trControl = ctrl,
metric = "Rsquared")
bstTree_new
## Boosted Tree
##
## 5139 samples
## 19 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
##
## Summary of sample sizes: 4625, 4624, 4625, 4625, 4624, 4626, ...
##
## Resampling results across tuning parameters:
##
## maxdepth mstop RMSE Rsquared RMSE SD Rsquared SD
## 1 50 15.77575 0.2579118 0.3740444 0.02872593
## 1 100 15.26956 0.2905114 0.4177861 0.03112831
## 1 150 15.03854 0.3045682 0.4544440 0.03423687
## 2 50 14.97435 0.3131390 0.4398069 0.03315997
## 2 100 14.67772 0.3308739 0.5040734 0.03735540
## 2 150 14.59160 0.3362230 0.5156930 0.03704260
## 3 50 14.68480 0.3324162 0.4519338 0.03315860
## 3 100 14.50532 0.3439243 0.5037983 0.03754901
## 3 150 14.41783 0.3510464 0.5329146 0.04010407
##
## Tuning parameter 'nu' was held constant at a value of 0.1
## Rsquared was used to select the optimal model using the largest value.
## The final values used for the model were mstop = 150, maxdepth = 3 and
## nu = 0.1.
predictedVal <- predict(bstTree_new, movies_data_new.teste)
modelvaluesbstTree_new <-data.frame(obs = movies_data_new.teste$rtAudienceScore, pred = predictedVal)
plot(varImp(bstTree_new))
defaultSummary(modelvaluesbstTree_new)
## RMSE Rsquared
## 13.9993938 0.3726231
Podemos notar que o novo modelo apresentou uma melhora significativa no valor do Rsquared o que significa que ele é o melhor modelo para o nosso problema.
#Modelo antigo
bstTree
## Boosted Tree
##
## 5139 samples
## 6 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
##
## Summary of sample sizes: 4624, 4624, 4625, 4625, 4624, 4627, ...
##
## Resampling results across tuning parameters:
##
## maxdepth mstop RMSE Rsquared RMSE SD Rsquared SD
## 1 50 16.67926 0.1344349 0.3900484 0.02643764
## 1 100 16.59078 0.1423434 0.3930347 0.02623926
## 1 150 16.54617 0.1463535 0.3912763 0.02635218
## 2 50 16.46734 0.1552640 0.4346779 0.03320780
## 2 100 16.35321 0.1658153 0.4437705 0.03352897
## 2 150 16.30016 0.1707879 0.4794858 0.03669232
## 3 50 16.33290 0.1685160 0.4552249 0.03414061
## 3 100 16.20553 0.1803845 0.5033434 0.03751579
## 3 150 16.16259 0.1843452 0.5369949 0.04029391
##
## Tuning parameter 'nu' was held constant at a value of 0.1
## Rsquared was used to select the optimal model using the largest value.
## The final values used for the model were mstop = 150, maxdepth = 3 and
## nu = 0.1.
# Novo modelo
bstTree_new
## Boosted Tree
##
## 5139 samples
## 19 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
##
## Summary of sample sizes: 4625, 4624, 4625, 4625, 4624, 4626, ...
##
## Resampling results across tuning parameters:
##
## maxdepth mstop RMSE Rsquared RMSE SD Rsquared SD
## 1 50 15.77575 0.2579118 0.3740444 0.02872593
## 1 100 15.26956 0.2905114 0.4177861 0.03112831
## 1 150 15.03854 0.3045682 0.4544440 0.03423687
## 2 50 14.97435 0.3131390 0.4398069 0.03315997
## 2 100 14.67772 0.3308739 0.5040734 0.03735540
## 2 150 14.59160 0.3362230 0.5156930 0.03704260
## 3 50 14.68480 0.3324162 0.4519338 0.03315860
## 3 100 14.50532 0.3439243 0.5037983 0.03754901
## 3 150 14.41783 0.3510464 0.5329146 0.04010407
##
## Tuning parameter 'nu' was held constant at a value of 0.1
## Rsquared was used to select the optimal model using the largest value.
## The final values used for the model were mstop = 150, maxdepth = 3 and
## nu = 0.1.