We decided to use the FIFA World Cup Matches dataset in order to predict the winner of the 2018 World Cup.
Reading the Dataset
Iaquinta = read.csv("C:\\Users\\student\\Desktop\\MATH 421\\Math 421 Final Project\\WorldCupMatches.csv")
library(ggplot2)
library(caret)
## Loading required package: lattice
library(rpart)
library(rattle)
## Rattle: A free graphical interface for data science with R.
## Version 5.2.0 Copyright (c) 2006-2018 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.
library(lattice)
summary(Iaquinta)
## Year Datetime Stage
## Min. :1930 :3720 :3720
## 1st Qu.:1970 27 May 1934 - 16:30 : 8 Round of 16 : 72
## Median :1990 08 Jun 1958 - 19:00 : 7 Quarter-finals: 66
## Mean :1985 11 Jun 1958 - 19:00 : 7 Group 1 : 62
## 3rd Qu.:2002 15 Jun 1958 - 19:00 : 7 Group A : 60
## Max. :2014 02 Jul 1950 - 15:00 : 4 Group B : 60
## NA's :3720 (Other) : 819 (Other) : 532
## Stadium City Home.Team.Name
## :3720 :3720 :3720
## Estadio Azteca : 19 Mexico City : 23 Brazil : 82
## Jalisco : 14 Montevideo : 18 Italy : 57
## Olympiastadion : 14 Rio De Janeiro : 18 Argentina : 54
## Nou Camp - Estadio Le�n: 11 Guadalajara : 17 Germany FR: 43
## Estadio Centenario : 10 Johannesburg : 15 England : 35
## (Other) : 784 (Other) : 761 (Other) : 581
## Home.Team.Goals Away.Team.Goals Away.Team.Name
## Min. : 0.000 Min. :0.000 :3720
## 1st Qu.: 1.000 1st Qu.:0.000 Mexico : 38
## Median : 2.000 Median :1.000 France : 30
## Mean : 1.811 Mean :1.022 Spain : 29
## 3rd Qu.: 3.000 3rd Qu.:2.000 Argentina: 27
## Max. :10.000 Max. :7.000 England : 27
## NA's :3720 NA's :3720 (Other) : 701
## Win.conditions Attendance
## :3720 Min. : 2000
## : 787 1st Qu.: 30000
## Italy win after extra time : 5 Median : 41580
## Argentina win after extra time : 4 Mean : 45165
## Germany win after extra time : 4 3rd Qu.: 61375
## Belgium win after extra time : 3 Max. :173850
## (Other) : 49 NA's :3722
## Half.time.Home.Goals Half.time.Away.Goals Referee
## Min. :0.000 Min. :0.000 :3720
## 1st Qu.:0.000 1st Qu.:0.000 Ravshan IRMATOV (UZB) : 10
## Median :0.000 Median :0.000 ARCHUNDIA Benito (MEX): 8
## Mean :0.709 Mean :0.428 LARRIONDA Jorge (URU) : 8
## 3rd Qu.:1.000 3rd Qu.:1.000 QUINIOU Joel (FRA) : 8
## Max. :6.000 Max. :5.000 RODRIGUEZ Marco (MEX) : 8
## NA's :3720 NA's :3720 (Other) : 810
## Assistant.1 Assistant.2
## :3720 :3720
## ACHIK Redouane (MAR) : 7 KOCHKAROV Bakhadyr (KGZ): 10
## BERANEK Alois (AUT) : 7 LISTKIEWICZ Michal (POL): 7
## GONZALEZ ARCHUNDIA Alfonso (MEX): 7 VERGARA Hector (CAN) : 7
## HERMANS Peter (BEL) : 7 VROMANS Walter (BEL) : 7
## VERGARA Hector (CAN) : 7 YUSTE Juan (ESP) : 7
## (Other) : 817 (Other) : 814
## RoundID MatchID Home.Team.Initials
## Min. : 201 Min. : 25 :3720
## 1st Qu.: 262 1st Qu.: 1189 BRA : 82
## Median : 337 Median : 2191 ITA : 57
## Mean :10661773 Mean : 61346868 ARG : 54
## 3rd Qu.: 249722 3rd Qu.: 43950059 FRG : 43
## Max. :97410600 Max. :300186515 ENG : 35
## NA's :3720 NA's :3720 (Other): 581
## Away.Team.Initials
## :3720
## MEX : 38
## FRA : 30
## ESP : 29
## ARG : 27
## ENG : 27
## (Other): 701
Removing useless columns
Iaquinta[,"RoundID"] = NULL
Iaquinta[,"MatchID"] = NULL
Iaquinta[,"Referee"] = NULL
Iaquinta[,"Assistant.1"] = NULL
Iaquinta[,"Assistant.2"] = NULL
Iaquinta[,"Datetime"] = NULL
Iaquinta[,"Home.Team.Initials"] = NULL
Iaquinta[,"Away.Team.Initials"] = NULL
CHecking for missing values
sum(is.na(Iaquinta))
## [1] 22322
DISCUSSION ON MISSING VALUES
Handling missing values (1) - A9Q4 Missing values of categorical variables are replaced by the most frequent category in the variables
AL=function(x){
for (i in 1:ncol(x)){
if (is.numeric(x[,i])){
x[,i][is.na(x[,i])]=mean(x[,i], na.rm=TRUE)
}else{
levels=unique(x[,i])
x[,i][is.na(x[,i])]=levels[which.max(tabulate(match(x[,i], levels)))]
}
}
return (x)
}
Iaquinta <- AL(Iaquinta)
Comenting on the result
sum(is.na(Iaquinta))
## [1] 0
#We had 22322 missing values in the first place
#This method brings the #of missing values to 0
Handling Missing Values (3) -A17 # Missing values of numeric variables are replaced by the means of the non-missing values in the variables
Iaquinta22=function(x){
for (i in 1:ncol(x)){
if (is.numeric(x[,i])){
x[,i][is.na(x[,i])]=mean(x[,i], na.rm=TRUE)
}else{
levels=unique(x[,i])
x[,i][is.na(x[,i])]=levels[which.max(tabulate(match(x[,i], levels)))]
}
}
return (x)
}
Iaquinta <- Iaquinta22(Iaquinta)
sum(is.na(Iaquinta))
## [1] 0
#We go from 22322 to 4507 missing values
Taking Care of the levels
levels(Iaquinta$Stage)
## [1] "" "Final"
## [3] "First round" "Group 1"
## [5] "Group 2" "Group 3"
## [7] "Group 4" "Group 5"
## [9] "Group 6" "Group A"
## [11] "Group B" "Group C"
## [13] "Group D" "Group E"
## [15] "Group F" "Group G"
## [17] "Group H" "Match for third place"
## [19] "Play-off for third place" "Preliminary round"
## [21] "Quarter-finals" "Round of 16"
## [23] "Semi-finals" "Third place"
levels(Iaquinta$Stage)=c("Prelim", "Final", "Prelim", "Prelim","Prelim","Prelim","Prelim","Prelim","Prelim","Prelim","Prelim","Prelim","Prelim","Prelim","Prelim","Prelim","Prelim","Semi_Final","Semi_Final","Prelim","Quarter_Final","Round_16","Semi_Final","Semi_Final")
levels(Iaquinta$Stage)
## [1] "Prelim" "Final" "Semi_Final" "Quarter_Final"
## [5] "Round_16"
Encoding/Recoding Categorical Variables
Recoding categorical variable using one hot encoding (dummy encoding)- Q5A11
dummies_model <- dummyVars(Year ~., data=Iaquinta)
trainData_mat <- predict(dummies_model, newdata =Iaquinta)
trainData <- data.frame(trainData_mat)
trainData$Year <- Iaquinta$Year
This helps he models assigns the year to its corresping World Cup
dummies_model <- dummyVars(Away.Team.Goals ~., data=Iaquinta)
trainData_mat <- predict(dummies_model, newdata =Iaquinta)
trainData <- data.frame(trainData_mat)
trainData$Away.Team.Goals <- Iaquinta$Away.Team.Goals
Based on the number, it helps the model detecting whether or not the number of goals scored belong to a Home Team or an Away Team.
VISUALIZATION AND GRAPHS
library(ggplot2)
ggplot(data = Iaquinta) + geom_density(mapping = aes(x = Attendance, fill = Stage)) + facet_wrap(~Stage)

This graph shows that on average the attendace for the preliminary rounds is essentially below 50,000 people.
For the round of 16, it is pretty diverse, but the concentration is in between 25,000 and 75,000.
For the Quarter final, the attendance is also very diverse where there is no real number that stands out more than the others.
For the semi-finals, the attendance is essentially around 70,000.
For the final, the pick of the attendance is 75,000.
Overall the attendance will vary based on the teams that are playing and the capacity of a stadium.
library(ggplot2)
ggplot(data = Iaquinta) + geom_density(mapping = aes(x = Home.Team.Goals, fill = Stage)) + facet_wrap(~Stage)

This graph shows the number of goals the Home Team scores during the match.
library(ggplot2)
ggplot(data = Iaquinta) + geom_density(mapping = aes(x = Away.Team.Goals, fill = Stage)) + facet_wrap(~Stage)

This graph shows the number of goals the Away team has scored during the match.
library(ggplot2)
ggplot(data = Iaquinta) + geom_density(mapping = aes(x = Attendance, fill = Year)) + facet_wrap(~Year)

This graph show the overall attendace of the audiance during each edition of the World Cup; we can see that from 1930 to 1938 the World Cup became more popular. For obvious reasons, there was no WOrld Cup in 1942 and 1946, before restarting slow in 1950 and regaining popularity afterward.
library(ggplot2)
ggplot(data = Iaquinta) + geom_density(mapping = aes(x = Half.time.Home.Goals, fill = Stage)) + facet_wrap(~Stage)

This graph shows the number of goals the Home teams have scored after 45 minutes.
library(ggplot2)
ggplot(data = Iaquinta) + geom_density(mapping = aes(x = Half.time.Away.Goals, fill = Stage)) + facet_wrap(~Stage)

This graph shows the number of goals the away teams have scored after 45 minutes.
Al22 <- function(Iaquinta,var1,var2) {
rt = ggplot(data=Iaquinta) + geom_bar(mapping = aes(x = Iaquinta[,var1], fill = Iaquinta[,var2]), position = "dodge")
return(rt)
}
Al22(Iaquinta, 2, 2)

This graph shows that in term of the density of the observations, the prelims are much more imposing than any other catefory.
Al24 <- function(Iaquinta,var1,var2) {
rt = ggplot(data=Iaquinta) + geom_histogram(mapping = aes(x = Iaquinta[,var1], fill = Iaquinta[,var2]), position = "dodge")
return(rt)
}
Al24(Iaquinta, 11, 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Thus graph shows the density of many variables such as attendance, Home Teams’ goals, Away Teams’ goals. Interesting to see that during a match, the Home Team scores frequently 2 goals while the Away Team scores 1.
Al26= function(x){
for (i in 1:ncol(x)){
if (is.numeric(x[,i])){
print(ggplot(data=x)+geom_histogram(mapping=aes(x=x[,i]),fill="red")+xlab(names(x)[i]))
}
}
}
Al26(Iaquinta)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Thus graph shows the number of obervations regarding many variables such as the number of goals scored by the home teams after 45 and 90 minutes, the number of goals scored by the away teams after 45 and 90 minutes.
Model Training and Model Tuning
Random Forest
AL5 <- expand.grid(mtry = 3, splitrule = c("gini"),
min.node.size = 5)
AL6 <- train(Stage ~ ., data = Iaquinta, method = "ranger",
trControl = trainControl(method ="cv",
number = 3, verboseIter = TRUE),
tuneGrid = AL5)
## + Fold1: mtry=3, splitrule=gini, min.node.size=5
## - Fold1: mtry=3, splitrule=gini, min.node.size=5
## + Fold2: mtry=3, splitrule=gini, min.node.size=5
## - Fold2: mtry=3, splitrule=gini, min.node.size=5
## + Fold3: mtry=3, splitrule=gini, min.node.size=5
## - Fold3: mtry=3, splitrule=gini, min.node.size=5
## Aggregating results
## Fitting final model on full training set
confusionMatrix(AL6)
## Cross-Validated (3 fold) Confusion Matrix
##
## (entries are percentual average cell counts across resamples)
##
## Reference
## Prediction Prelim Final Semi_Final Quarter_Final Round_16
## Prelim 95.3 0.4 1.2 1.4 1.6
## Final 0.0 0.0 0.0 0.0 0.0
## Semi_Final 0.0 0.0 0.0 0.0 0.0
## Quarter_Final 0.0 0.0 0.0 0.0 0.0
## Round_16 0.0 0.0 0.0 0.0 0.0
##
## Accuracy (average) : 0.9534
GLMNET
#myGrid = expand.grid(alpha = 0.1,
# lambda = 0.1)
#myControl = trainControl(method = "cv", number = 5)
#model2 = train(target ~ ., train, method = "glmnet",
# trControl = myControl,
# tuneGrid = myGrid)
#confusionMatrix(model2)
#Cross-Validated (5 fold) Confusion Matrix
#(entries are percentual average cell counts across resamples)
# Reference
#Prediction 0 1
# 0 91.2 8.0
# 1 0.0 0.8
# Accuracy (average) : 0.92
random forest with 10-fold cross validation
myGrid = expand.grid(mtry = c(1:2), splitrule = c("gini"),
min.node.size = c(1:2))
rf_Iaquinta10 <- train(Stage~.,data = Iaquinta, method = "ranger",
trControl = trainControl(method ="cv", number = 10, verboseIter = TRUE),
tuneGrid = myGrid)
## + Fold01: mtry=1, splitrule=gini, min.node.size=1
## - Fold01: mtry=1, splitrule=gini, min.node.size=1
## + Fold01: mtry=2, splitrule=gini, min.node.size=1
## - Fold01: mtry=2, splitrule=gini, min.node.size=1
## + Fold01: mtry=1, splitrule=gini, min.node.size=2
## - Fold01: mtry=1, splitrule=gini, min.node.size=2
## + Fold01: mtry=2, splitrule=gini, min.node.size=2
## - Fold01: mtry=2, splitrule=gini, min.node.size=2
## + Fold02: mtry=1, splitrule=gini, min.node.size=1
## - Fold02: mtry=1, splitrule=gini, min.node.size=1
## + Fold02: mtry=2, splitrule=gini, min.node.size=1
## - Fold02: mtry=2, splitrule=gini, min.node.size=1
## + Fold02: mtry=1, splitrule=gini, min.node.size=2
## - Fold02: mtry=1, splitrule=gini, min.node.size=2
## + Fold02: mtry=2, splitrule=gini, min.node.size=2
## - Fold02: mtry=2, splitrule=gini, min.node.size=2
## + Fold03: mtry=1, splitrule=gini, min.node.size=1
## - Fold03: mtry=1, splitrule=gini, min.node.size=1
## + Fold03: mtry=2, splitrule=gini, min.node.size=1
## - Fold03: mtry=2, splitrule=gini, min.node.size=1
## + Fold03: mtry=1, splitrule=gini, min.node.size=2
## - Fold03: mtry=1, splitrule=gini, min.node.size=2
## + Fold03: mtry=2, splitrule=gini, min.node.size=2
## - Fold03: mtry=2, splitrule=gini, min.node.size=2
## + Fold04: mtry=1, splitrule=gini, min.node.size=1
## - Fold04: mtry=1, splitrule=gini, min.node.size=1
## + Fold04: mtry=2, splitrule=gini, min.node.size=1
## - Fold04: mtry=2, splitrule=gini, min.node.size=1
## + Fold04: mtry=1, splitrule=gini, min.node.size=2
## - Fold04: mtry=1, splitrule=gini, min.node.size=2
## + Fold04: mtry=2, splitrule=gini, min.node.size=2
## - Fold04: mtry=2, splitrule=gini, min.node.size=2
## + Fold05: mtry=1, splitrule=gini, min.node.size=1
## - Fold05: mtry=1, splitrule=gini, min.node.size=1
## + Fold05: mtry=2, splitrule=gini, min.node.size=1
## - Fold05: mtry=2, splitrule=gini, min.node.size=1
## + Fold05: mtry=1, splitrule=gini, min.node.size=2
## - Fold05: mtry=1, splitrule=gini, min.node.size=2
## + Fold05: mtry=2, splitrule=gini, min.node.size=2
## - Fold05: mtry=2, splitrule=gini, min.node.size=2
## + Fold06: mtry=1, splitrule=gini, min.node.size=1
## - Fold06: mtry=1, splitrule=gini, min.node.size=1
## + Fold06: mtry=2, splitrule=gini, min.node.size=1
## - Fold06: mtry=2, splitrule=gini, min.node.size=1
## + Fold06: mtry=1, splitrule=gini, min.node.size=2
## - Fold06: mtry=1, splitrule=gini, min.node.size=2
## + Fold06: mtry=2, splitrule=gini, min.node.size=2
## - Fold06: mtry=2, splitrule=gini, min.node.size=2
## + Fold07: mtry=1, splitrule=gini, min.node.size=1
## - Fold07: mtry=1, splitrule=gini, min.node.size=1
## + Fold07: mtry=2, splitrule=gini, min.node.size=1
## - Fold07: mtry=2, splitrule=gini, min.node.size=1
## + Fold07: mtry=1, splitrule=gini, min.node.size=2
## - Fold07: mtry=1, splitrule=gini, min.node.size=2
## + Fold07: mtry=2, splitrule=gini, min.node.size=2
## - Fold07: mtry=2, splitrule=gini, min.node.size=2
## + Fold08: mtry=1, splitrule=gini, min.node.size=1
## - Fold08: mtry=1, splitrule=gini, min.node.size=1
## + Fold08: mtry=2, splitrule=gini, min.node.size=1
## - Fold08: mtry=2, splitrule=gini, min.node.size=1
## + Fold08: mtry=1, splitrule=gini, min.node.size=2
## - Fold08: mtry=1, splitrule=gini, min.node.size=2
## + Fold08: mtry=2, splitrule=gini, min.node.size=2
## - Fold08: mtry=2, splitrule=gini, min.node.size=2
## + Fold09: mtry=1, splitrule=gini, min.node.size=1
## - Fold09: mtry=1, splitrule=gini, min.node.size=1
## + Fold09: mtry=2, splitrule=gini, min.node.size=1
## - Fold09: mtry=2, splitrule=gini, min.node.size=1
## + Fold09: mtry=1, splitrule=gini, min.node.size=2
## - Fold09: mtry=1, splitrule=gini, min.node.size=2
## + Fold09: mtry=2, splitrule=gini, min.node.size=2
## - Fold09: mtry=2, splitrule=gini, min.node.size=2
## + Fold10: mtry=1, splitrule=gini, min.node.size=1
## - Fold10: mtry=1, splitrule=gini, min.node.size=1
## + Fold10: mtry=2, splitrule=gini, min.node.size=1
## - Fold10: mtry=2, splitrule=gini, min.node.size=1
## + Fold10: mtry=1, splitrule=gini, min.node.size=2
## - Fold10: mtry=1, splitrule=gini, min.node.size=2
## + Fold10: mtry=2, splitrule=gini, min.node.size=2
## - Fold10: mtry=2, splitrule=gini, min.node.size=2
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 1, splitrule = gini, min.node.size = 1 on full training set
rf_Iaquinta10
## Random Forest
##
## 4572 samples
## 11 predictor
## 5 classes: 'Prelim', 'Final', 'Semi_Final', 'Quarter_Final', 'Round_16'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 4113, 4115, 4114, 4115, 4115, 4114, ...
## Resampling results across tuning parameters:
##
## mtry min.node.size Accuracy Kappa
## 1 1 0.953415 0
## 1 2 0.953415 0
## 2 1 0.953415 0
## 2 2 0.953415 0
##
## Tuning parameter 'splitrule' was held constant at a value of gini
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were mtry = 1, splitrule = gini
## and min.node.size = 1.
random forest with 7-fold cross validation
myGrid = expand.grid(mtry = c(1:2), splitrule = c("gini"),
min.node.size = c(1:2))
rf_Iaquinta7 <- train(Stage~.,data = Iaquinta, method = "ranger",
trControl = trainControl(method ="cv", number = 7, verboseIter = TRUE),
tuneGrid = myGrid)
## + Fold1: mtry=1, splitrule=gini, min.node.size=1
## - Fold1: mtry=1, splitrule=gini, min.node.size=1
## + Fold1: mtry=2, splitrule=gini, min.node.size=1
## - Fold1: mtry=2, splitrule=gini, min.node.size=1
## + Fold1: mtry=1, splitrule=gini, min.node.size=2
## - Fold1: mtry=1, splitrule=gini, min.node.size=2
## + Fold1: mtry=2, splitrule=gini, min.node.size=2
## - Fold1: mtry=2, splitrule=gini, min.node.size=2
## + Fold2: mtry=1, splitrule=gini, min.node.size=1
## - Fold2: mtry=1, splitrule=gini, min.node.size=1
## + Fold2: mtry=2, splitrule=gini, min.node.size=1
## - Fold2: mtry=2, splitrule=gini, min.node.size=1
## + Fold2: mtry=1, splitrule=gini, min.node.size=2
## - Fold2: mtry=1, splitrule=gini, min.node.size=2
## + Fold2: mtry=2, splitrule=gini, min.node.size=2
## - Fold2: mtry=2, splitrule=gini, min.node.size=2
## + Fold3: mtry=1, splitrule=gini, min.node.size=1
## - Fold3: mtry=1, splitrule=gini, min.node.size=1
## + Fold3: mtry=2, splitrule=gini, min.node.size=1
## - Fold3: mtry=2, splitrule=gini, min.node.size=1
## + Fold3: mtry=1, splitrule=gini, min.node.size=2
## - Fold3: mtry=1, splitrule=gini, min.node.size=2
## + Fold3: mtry=2, splitrule=gini, min.node.size=2
## - Fold3: mtry=2, splitrule=gini, min.node.size=2
## + Fold4: mtry=1, splitrule=gini, min.node.size=1
## - Fold4: mtry=1, splitrule=gini, min.node.size=1
## + Fold4: mtry=2, splitrule=gini, min.node.size=1
## - Fold4: mtry=2, splitrule=gini, min.node.size=1
## + Fold4: mtry=1, splitrule=gini, min.node.size=2
## - Fold4: mtry=1, splitrule=gini, min.node.size=2
## + Fold4: mtry=2, splitrule=gini, min.node.size=2
## - Fold4: mtry=2, splitrule=gini, min.node.size=2
## + Fold5: mtry=1, splitrule=gini, min.node.size=1
## - Fold5: mtry=1, splitrule=gini, min.node.size=1
## + Fold5: mtry=2, splitrule=gini, min.node.size=1
## - Fold5: mtry=2, splitrule=gini, min.node.size=1
## + Fold5: mtry=1, splitrule=gini, min.node.size=2
## - Fold5: mtry=1, splitrule=gini, min.node.size=2
## + Fold5: mtry=2, splitrule=gini, min.node.size=2
## - Fold5: mtry=2, splitrule=gini, min.node.size=2
## + Fold6: mtry=1, splitrule=gini, min.node.size=1
## - Fold6: mtry=1, splitrule=gini, min.node.size=1
## + Fold6: mtry=2, splitrule=gini, min.node.size=1
## - Fold6: mtry=2, splitrule=gini, min.node.size=1
## + Fold6: mtry=1, splitrule=gini, min.node.size=2
## - Fold6: mtry=1, splitrule=gini, min.node.size=2
## + Fold6: mtry=2, splitrule=gini, min.node.size=2
## - Fold6: mtry=2, splitrule=gini, min.node.size=2
## + Fold7: mtry=1, splitrule=gini, min.node.size=1
## - Fold7: mtry=1, splitrule=gini, min.node.size=1
## + Fold7: mtry=2, splitrule=gini, min.node.size=1
## - Fold7: mtry=2, splitrule=gini, min.node.size=1
## + Fold7: mtry=1, splitrule=gini, min.node.size=2
## - Fold7: mtry=1, splitrule=gini, min.node.size=2
## + Fold7: mtry=2, splitrule=gini, min.node.size=2
## - Fold7: mtry=2, splitrule=gini, min.node.size=2
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 1, splitrule = gini, min.node.size = 1 on full training set
rf_Iaquinta7
## Random Forest
##
## 4572 samples
## 11 predictor
## 5 classes: 'Prelim', 'Final', 'Semi_Final', 'Quarter_Final', 'Round_16'
##
## No pre-processing
## Resampling: Cross-Validated (7 fold)
## Summary of sample sizes: 3918, 3919, 3919, 3918, 3918, 3921, ...
## Resampling results across tuning parameters:
##
## mtry min.node.size Accuracy Kappa
## 1 1 0.9534142 0
## 1 2 0.9534142 0
## 2 1 0.9534142 0
## 2 2 0.9534142 0
##
## Tuning parameter 'splitrule' was held constant at a value of gini
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were mtry = 1, splitrule = gini
## and min.node.size = 1.