Fifa 19 merupakan video game yang dikembangkan oleh EA Vancouver, sebagai bagian dari seri video game FIFA. FIFA 19 merupakan seri ke 26 dan dirilis pada 28 September 2018 (Wikipedia).
Fifa 19 memiliki memiliki beberapa mode permainan, salah satunya adalah career mode dimana pemain game (gamer) dapat membentuk tim impian mereka sendiri. Pada project ini, akan dilakukan analisa terhadap statistik pemain untuk membantu gamer menyusun tim terbaik sesuai karakteristik dan skill pemain yang mereka inginkan.
Implementasi hasil analisa dapat dilihat pada aplikasi Shiny di link berikut ini : https://windaru.shinyapps.io/fifaultimateteam/
options(scipen = 999)
library(tidyverse)
library(ggthemes)
library(scales)
library(plotly)
library(ggradar)
library(fmsb)
library(caret)
library(lpSolve)
library(FactoMineR)
library(factoextra)
library(knitr)
library(xgboost)
## Warning: package 'xgboost' was built under R version 3.6.2
fifa <- read_csv("data-input/data.csv")
league <- read_delim("data-input/league.csv", delim = ";")
league <- league[, -c(3,4)]
SkillCat <- read_delim("data-input/SkillCat.csv", delim = ";")
## Observations: 18,207
## Variables: 89
## $ X1 <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,…
## $ ID <dbl> 158023, 20801, 190871, 193080, 192985…
## $ Name <chr> "L. Messi", "Cristiano Ronaldo", "Ney…
## $ Age <dbl> 31, 33, 26, 27, 27, 27, 32, 31, 32, 2…
## $ Photo <chr> "https://cdn.sofifa.org/players/4/19/…
## $ Nationality <chr> "Argentina", "Portugal", "Brazil", "S…
## $ Flag <chr> "https://cdn.sofifa.org/flags/52.png"…
## $ Overall <dbl> 94, 94, 92, 91, 91, 91, 91, 91, 91, 9…
## $ Potential <dbl> 94, 94, 93, 93, 92, 91, 91, 91, 91, 9…
## $ Club <chr> "FC Barcelona", "Juventus", "Paris Sa…
## $ `Club Logo` <chr> "https://cdn.sofifa.org/teams/2/light…
## $ Value <chr> "€110.5M", "€77M", "€118.5M", "€72M",…
## $ Wage <chr> "€565K", "€405K", "€290K", "€260K", "…
## $ Special <dbl> 2202, 2228, 2143, 1471, 2281, 2142, 2…
## $ `Preferred Foot` <chr> "Left", "Right", "Right", "Right", "R…
## $ `International Reputation` <dbl> 5, 5, 5, 4, 4, 4, 4, 5, 4, 3, 4, 4, 3…
## $ `Weak Foot` <dbl> 4, 4, 5, 3, 5, 4, 4, 4, 3, 3, 4, 5, 3…
## $ `Skill Moves` <dbl> 4, 5, 5, 1, 4, 4, 4, 3, 3, 1, 4, 3, 2…
## $ `Work Rate` <chr> "Medium/ Medium", "High/ Low", "High/…
## $ `Body Type` <chr> "Messi", "C. Ronaldo", "Neymar", "Lea…
## $ `Real Face` <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Y…
## $ Position <chr> "RF", "ST", "LW", "GK", "RCM", "LF", …
## $ `Jersey Number` <dbl> 10, 7, 10, 1, 7, 10, 10, 9, 15, 1, 9,…
## $ Joined <chr> "Jul 1, 2004", "Jul 10, 2018", "Aug 3…
## $ `Loaned From` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Contract Valid Until` <chr> "2021", "2022", "2022", "2020", "2023…
## $ Height <chr> "5'7", "6'2", "5'9", "6'4", "5'11", "…
## $ Weight <chr> "159lbs", "183lbs", "150lbs", "168lbs…
## $ LS <chr> "88+2", "91+3", "84+3", NA, "82+3", "…
## $ ST <chr> "88+2", "91+3", "84+3", NA, "82+3", "…
## $ RS <chr> "88+2", "91+3", "84+3", NA, "82+3", "…
## $ LW <chr> "92+2", "89+3", "89+3", NA, "87+3", "…
## $ LF <chr> "93+2", "90+3", "89+3", NA, "87+3", "…
## $ CF <chr> "93+2", "90+3", "89+3", NA, "87+3", "…
## $ RF <chr> "93+2", "90+3", "89+3", NA, "87+3", "…
## $ RW <chr> "92+2", "89+3", "89+3", NA, "87+3", "…
## $ LAM <chr> "93+2", "88+3", "89+3", NA, "88+3", "…
## $ CAM <chr> "93+2", "88+3", "89+3", NA, "88+3", "…
## $ RAM <chr> "93+2", "88+3", "89+3", NA, "88+3", "…
## $ LM <chr> "91+2", "88+3", "88+3", NA, "88+3", "…
## $ LCM <chr> "84+2", "81+3", "81+3", NA, "87+3", "…
## $ CM <chr> "84+2", "81+3", "81+3", NA, "87+3", "…
## $ RCM <chr> "84+2", "81+3", "81+3", NA, "87+3", "…
## $ RM <chr> "91+2", "88+3", "88+3", NA, "88+3", "…
## $ LWB <chr> "64+2", "65+3", "65+3", NA, "77+3", "…
## $ LDM <chr> "61+2", "61+3", "60+3", NA, "77+3", "…
## $ CDM <chr> "61+2", "61+3", "60+3", NA, "77+3", "…
## $ RDM <chr> "61+2", "61+3", "60+3", NA, "77+3", "…
## $ RWB <chr> "64+2", "65+3", "65+3", NA, "77+3", "…
## $ LB <chr> "59+2", "61+3", "60+3", NA, "73+3", "…
## $ LCB <chr> "47+2", "53+3", "47+3", NA, "66+3", "…
## $ CB <chr> "47+2", "53+3", "47+3", NA, "66+3", "…
## $ RCB <chr> "47+2", "53+3", "47+3", NA, "66+3", "…
## $ RB <chr> "59+2", "61+3", "60+3", NA, "73+3", "…
## $ Crossing <dbl> 84, 84, 79, 17, 93, 81, 86, 77, 66, 1…
## $ Finishing <dbl> 95, 94, 87, 13, 82, 84, 72, 93, 60, 1…
## $ HeadingAccuracy <dbl> 70, 89, 62, 21, 55, 61, 55, 77, 91, 1…
## $ ShortPassing <dbl> 90, 81, 84, 50, 92, 89, 93, 82, 78, 2…
## $ Volleys <dbl> 86, 87, 84, 13, 82, 80, 76, 88, 66, 1…
## $ Dribbling <dbl> 97, 88, 96, 18, 86, 95, 90, 87, 63, 1…
## $ Curve <dbl> 93, 81, 88, 21, 85, 83, 85, 86, 74, 1…
## $ FKAccuracy <dbl> 94, 76, 87, 19, 83, 79, 78, 84, 72, 1…
## $ LongPassing <dbl> 87, 77, 78, 51, 91, 83, 88, 64, 77, 2…
## $ BallControl <dbl> 96, 94, 95, 42, 91, 94, 93, 90, 84, 1…
## $ Acceleration <dbl> 91, 89, 94, 57, 78, 94, 80, 86, 76, 4…
## $ SprintSpeed <dbl> 86, 91, 90, 58, 76, 88, 72, 75, 75, 6…
## $ Agility <dbl> 91, 87, 96, 60, 79, 95, 93, 82, 78, 6…
## $ Reactions <dbl> 95, 96, 94, 90, 91, 90, 90, 92, 85, 8…
## $ Balance <dbl> 95, 70, 84, 43, 77, 94, 94, 83, 66, 4…
## $ ShotPower <dbl> 85, 95, 80, 31, 91, 82, 79, 86, 79, 2…
## $ Jumping <dbl> 68, 95, 61, 67, 63, 56, 68, 69, 93, 7…
## $ Stamina <dbl> 72, 88, 81, 43, 90, 83, 89, 90, 84, 4…
## $ Strength <dbl> 59, 79, 49, 64, 75, 66, 58, 83, 83, 7…
## $ LongShots <dbl> 94, 93, 82, 12, 91, 80, 82, 85, 59, 1…
## $ Aggression <dbl> 48, 63, 56, 38, 76, 54, 62, 87, 88, 3…
## $ Interceptions <dbl> 22, 29, 36, 30, 61, 41, 83, 41, 90, 1…
## $ Positioning <dbl> 94, 95, 89, 12, 87, 87, 79, 92, 60, 1…
## $ Vision <dbl> 94, 82, 87, 68, 94, 89, 92, 84, 63, 7…
## $ Penalties <dbl> 75, 85, 81, 40, 79, 86, 82, 85, 75, 1…
## $ Composure <dbl> 96, 95, 94, 68, 88, 91, 84, 85, 82, 7…
## $ Marking <dbl> 33, 28, 27, 15, 68, 34, 60, 62, 87, 2…
## $ StandingTackle <dbl> 28, 31, 24, 21, 58, 27, 76, 45, 92, 1…
## $ SlidingTackle <dbl> 26, 23, 33, 13, 51, 22, 73, 38, 91, 1…
## $ GKDiving <dbl> 6, 7, 9, 90, 15, 11, 13, 27, 11, 86, …
## $ GKHandling <dbl> 11, 11, 9, 85, 13, 12, 9, 25, 8, 92, …
## $ GKKicking <dbl> 15, 15, 15, 87, 5, 6, 7, 31, 9, 78, 1…
## $ GKPositioning <dbl> 14, 14, 15, 88, 10, 8, 14, 33, 7, 88,…
## $ GKReflexes <dbl> 8, 11, 11, 94, 13, 8, 9, 37, 11, 89, …
## $ `Release Clause` <chr> "€226.5M", "€127.1M", "€228.1M", "€13…
Data Value dan Wage masih memiliki tipe data Char, dengan karakter terakhir menunjukan value dalam M (Milions) atau K (Thousands). Untuk memperbaikinya akan dibuat function untuk melakukan perubahan dan penyesuaian tipe data.
change_value <- function(x) {
if(str_detect(x, "M")){
kali = 1000000
} else if (str_detect(x, "K") ) {
kali = 1000
} else {
kali = 1
}
hasil = as.numeric(str_extract(x, "[[:digit:]]+\\.*[[:digit:]]*"))*kali
return(hasil)
}
fifa$ValueP <- sapply(fifa$Value, change_value)
fifa$WageP <- sapply(fifa$Wage, change_value)
print("Player Value")
## [1] "Player Value"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 300000 675000 2410696 2000000 118500000
## [1] "Player Wage"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 1000 3000 9731 9000 565000
Data Usia Pemain perlu dikelompokan kembali. Pengelompokan dilakukan dengan membagi menjadi 5 kelompok
age_grouping <- function(age){
if(age <= 20){
kel_usia = "Kurang dari 20"
} else if(age > 20 & age <= 25) {
kel_usia = "21 - 25"
} else if(age > 25 & age <= 30) {
kel_usia = "26 - 30"
} else if(age > 30 & age <= 35) {
kel_usia = "31 - 35"
} else{
kel_usia = "diatas 35"
}
return(kel_usia)
}
fifa$Age_group <- sapply(fifa$Age, age_grouping)
fifa$Age_group <- as.factor(fifa$Age_group)
fifa$Age_group <- ordered(fifa$Age_group, levels =c( "Kurang dari 20", "21 - 25", "26 - 30", "31 - 35", "diatas 35"))
fifa %>%
group_by(Age_group) %>%
summarise(Jumlah = n())
## # A tibble: 5 x 2
## Age_group Jumlah
## <ord> <int>
## 1 Kurang dari 20 3327
## 2 21 - 25 6772
## 3 26 - 30 5526
## 4 31 - 35 2289
## 5 diatas 35 293
Mengelompokan Posisi pemain menjadi 4 Posisi utama, Forward(FWD), Middle(MID), Deffense(DEF), dan Goalkeeper(GK)
fifa$Position[which(is.na(fifa$Position))] <- "-"
positions <- unique(fifa$Position)
gk <- "GK"
defs <- positions[str_detect(positions, "B$")]
mids <- positions[str_detect(positions, "M$")]
f1 <- positions[str_detect(positions, "F$")]
f2 <- positions[str_detect(positions, "S$")]
f3 <- positions[str_detect(positions, "T$")]
f4 <- positions[str_detect(positions, "W$")]
fwds <- c(f1, f2, f3, f4)
fifa <- fifa %>%
mutate(PositionGroup = as.factor(ifelse(Position %in% gk, "GK", ifelse(Position %in% defs, "DEF", ifelse(Position %in% mids, "MID", ifelse(Position %in% fwds, "FWD", "Unknown"))))))
fifa %>%
group_by(PositionGroup) %>%
summarise(Jumlah = n())
## # A tibble: 5 x 2
## PositionGroup Jumlah
## <fct> <int>
## 1 DEF 5866
## 2 FWD 3418
## 3 GK 2025
## 4 MID 6838
## 5 Unknown 60
Mengubah data Club yang NA menjadi -, untuk menunjukan bahwa pemain tersebut belum memiliki Club (free agent)
Mengelompokkan Attribute Skill kedalam beberapa kategori
SkillCat<- SkillCat %>%
filter(Skill != "TotalSkill" & Skill != "TotalMovement")
attacking <- SkillCat %>%
filter(SkillCategory == "Attacking") %>%
.$Skill
skill <- SkillCat %>%
filter(SkillCategory == "Skill") %>%
.$Skill
movement <- SkillCat %>%
filter(SkillCategory == "Movement") %>%
.$Skill
power <- SkillCat %>%
filter(SkillCategory == "Power") %>%
.$Skill
mentality <- SkillCat %>%
filter(SkillCategory == "Mentality") %>%
.$Skill
defending <- SkillCat %>%
filter(SkillCategory == "Defending") %>%
.$Skill
goalkeeping <- SkillCat %>%
filter(SkillCategory == "GoalKeeping") %>%
.$Skill
SkillCategory<- SkillCat$SkillCategory %>%
unique()
fifa <- fifa %>%
mutate(Attacking = round(rowMeans(fifa[,attacking])),
Skill = round(rowMeans(fifa[,skill])),
Power = round(rowMeans(fifa[,power])),
Mentality = round(rowMeans(fifa[,mentality])),
Defending = round(rowMeans(fifa[,defending])),
GoalKeeping = round(rowMeans(fifa[,goalkeeping])))
ggplot(fifa, aes(x = Age_group, y = ValueP)) +
geom_boxplot(fill = "#7ad2f6") +
scale_y_log10(labels = dollar_format(prefix = "€")) +
labs(title = "Boxplot Value Pemain berdasarkan usia",
x = NULL,
y = NULL) +
theme_economist()
Boxplot diatas menunjukan bahwa Value pemain memiliki kecenderungan naik mengikuti usia sampai dengan usia 30 tahun. kemudian turun di usia 31 - 35, dan kemudian turun lagi setelah melewati usia 35 Tahun.
ggplot(fifa, aes(x = Overall, y = ValueP, col =Overall )) +
geom_jitter() +
scale_y_continuous(labels = dollar_format(prefix = "€")) +
labs(title = "Sebaran Rating Pemain Berdasarkan Value",
x = "Overall Rating",
y = NULL) +
guides(col = FALSE) +
theme_economist()
scatter plot diatas menunjukkan kecenderungan sebaran, semakin tinggi overall rating pemain, maka semakin tinggi value nya. Tetapi, juga ada beberapa pemain yang memiliki overall rating tinggi, namun Valuenya rendah.
Hal yang sama juga ditunjukkan oleh sebaran Wage berdasarkan overall rating
ggplot(fifa, aes(x = Overall, y = WageP, col =Overall )) +
geom_jitter() +
scale_y_continuous(labels = dollar_format(prefix = "€")) +
labs(title = "Sebaran Rating Pemain Berdasarkan Wage",
x = "Overall Rating",
y = NULL) +
guides(col = FALSE) +
theme_economist()
Selanjutnya dilihat sebaran value pemain berdasarkan posisinya :
fifa %>%
filter(PositionGroup != "Unknown") %>%
ggplot(aes(x = PositionGroup, y = ValueP)) +
geom_boxplot(fill = "#7ad2f6") +
scale_y_log10(labels = dollar_format(prefix = "€")) +
labs(title = "Boxplot Value Pemain Berdasarkan Posisi",
x = NULL,
y = NULL) +
theme_economist()
Pemain posisi depan memiliki rata - rata value yang lebih tinggi dibandingkan posisi lainnya.
Melihat sebaran Skill pemain pada masing masing group posisi :
fifa %>%
filter(Overall > 75 & PositionGroup != "GK" & PositionGroup != "Unknown") %>%
group_by(PositionGroup) %>%
summarise_at(55 : 83, median) %>%
ungroup() %>%
gather(key = "Skill", value = "Rating", 2 : 30 ) %>%
arrange(PositionGroup) %>%
ggplot2::ggplot(aes(x = Skill, y = PositionGroup)) +
labs(title = "Heatmap skill pemain berdasarkan posisi",
x = NULL,
y = NULL) +
geom_tile(aes(fill = Rating)) +
geom_text(aes(label = Rating)) +
theme_economist()+
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1), strip.text = element_text(face = "bold", size = 12), legend.position = "none")
Melihat kelompok skill yang paling menonjol pada masing - masing posisi
# fifad <- fifa %>%
# filter(Overall > 75 & PositionGroup != "GK" ) %>%
# group_by(PositionGroup) %>%
# summarise_at(SkillCategory, mean) %>%
# select(-GoalKeeping) %>%
# data.frame()
#
# row.names(fifad) <- c("DEF", "FWD","MID")
#
# fifaf <- fifad %>%
# select(-PositionGroup)
#
# data <- rbind(rep(100,6) , rep(0,6) , fifaf)
#
# # Color vector
# colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9) )
# colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4) )
#
# # plot with default options:
# radarchart( data , axistype=1 ,
# #custom polygon
# pcol=colors_border , pfcol=colors_in , plwd=4 , plty=1,
# #custom the grid
# cglcol="grey", cglty=1, axislabcol="grey", caxislabels=seq(20,100,20), cglwd=0.8,
# #custom labels
# vlcex=0.8
# )
#
# # Add a legend
# legend(x=0.7, y=1, legend = rownames(data[-c(1,2),]), bty = "n", pch=20 , col=colors_in , text.col = "grey", cex=1.2, pt.cex=3)
Melihat Sebaran kelompok Skill yang paling menonjol pada posisi “FWD” yang ada didalam data :
fifa %>%
filter(PositionGroup == "FWD") %>%
gather(key = "SkillGroup", value = "Rating", Attacking : Mentality ) %>%
ggplot(aes(Rating, fill = SkillGroup, col = SkillGroup)) +
geom_density(alpha = 0.3) +
labs(x = NULL,
y = NULL) +
theme_economist()
Cluster pemain untuk mengelompokan pemain - pemain berdasarkan skill - skill yang paling menonjol yang mereka miliki
Memisahkan data menjadi 4 posisi utama :
## # A tibble: 6 x 99
## X1 ID Name Age Photo Nationality Flag Overall Potential Club
## <dbl> <dbl> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr>
## 1 0 158023 L. M… 31 http… Argentina http… 94 94 FC B…
## 2 1 20801 Cris… 33 http… Portugal http… 94 94 Juve…
## 3 2 190871 Neym… 26 http… Brazil http… 92 93 Pari…
## 4 3 193080 De G… 27 http… Spain http… 91 93 Manc…
## 5 4 192985 K. D… 27 http… Belgium http… 91 92 Manc…
## 6 5 183277 E. H… 27 http… Belgium http… 91 91 Chel…
## # … with 89 more variables: `Club Logo` <chr>, Value <chr>, Wage <chr>,
## # Special <dbl>, `Preferred Foot` <chr>, `International
## # Reputation` <dbl>, `Weak Foot` <dbl>, `Skill Moves` <dbl>, `Work
## # Rate` <chr>, `Body Type` <chr>, `Real Face` <chr>, Position <chr>,
## # `Jersey Number` <dbl>, Joined <chr>, `Loaned From` <chr>, `Contract
## # Valid Until` <chr>, Height <chr>, Weight <chr>, LS <chr>, ST <chr>,
## # RS <chr>, LW <chr>, LF <chr>, CF <chr>, RF <chr>, RW <chr>, LAM <chr>,
## # CAM <chr>, RAM <chr>, LM <chr>, LCM <chr>, CM <chr>, RCM <chr>,
## # RM <chr>, LWB <chr>, LDM <chr>, CDM <chr>, RDM <chr>, RWB <chr>,
## # LB <chr>, LCB <chr>, CB <chr>, RCB <chr>, RB <chr>, Crossing <dbl>,
## # Finishing <dbl>, HeadingAccuracy <dbl>, ShortPassing <dbl>,
## # Volleys <dbl>, Dribbling <dbl>, Curve <dbl>, FKAccuracy <dbl>,
## # LongPassing <dbl>, BallControl <dbl>, Acceleration <dbl>,
## # SprintSpeed <dbl>, Agility <dbl>, Reactions <dbl>, Balance <dbl>,
## # ShotPower <dbl>, Jumping <dbl>, Stamina <dbl>, Strength <dbl>,
## # LongShots <dbl>, Aggression <dbl>, Interceptions <dbl>,
## # Positioning <dbl>, Vision <dbl>, Penalties <dbl>, Composure <dbl>,
## # Marking <dbl>, StandingTackle <dbl>, SlidingTackle <dbl>,
## # GKDiving <dbl>, GKHandling <dbl>, GKKicking <dbl>,
## # GKPositioning <dbl>, GKReflexes <dbl>, `Release Clause` <chr>,
## # ValueP <dbl>, WageP <dbl>, Age_group <ord>, PositionGroup <fct>,
## # Attacking <dbl>, Skill <dbl>, Power <dbl>, Mentality <dbl>,
## # Defending <dbl>, GoalKeeping <dbl>
## .
## DEF FWD GK MID Unknown
## 5866 3418 2025 6838 0
Mencari K optimum dengan elbow method
wss <- function(data, maxCluster = 9) {
# Initialize within sum of squares
SSw <- (nrow(data) - 1) * sum(apply(data, 2, var))
SSw <- vector()
for (i in 2:maxCluster) {
SSw[i] <- sum(kmeans(data, centers = i)$withinss)
}
plot(1:maxCluster, SSw, type = "o", xlab = "Number of Clusters", ylab = "Within groups sum of squares", pch=19)
}
Pada plot diatas ditunjukkan bahwa K optimum berada di angka 4
fifa_FWD %>%
select(Crossing : SlidingTackle) %>%
prcomp(scale = T) %>%
fviz_pca_biplot(habillage = fifa_FWD %>%
.$Cluster, geom = "point")
Pada Biplot diatas terlihat, banyak sekali skill yang mengarah ke arah kiri, sehingga kelompok pemain yang berada di sebelah kiri, memiliki skill diatas rata - rata
Skill strength dan heading accuracy mengarah ke arah bawah, sehingga kelompok pemain yang berada di sisi bawah memiliki skill yang menonjol di heading accuracy dan strenght
Pada Kelompok pemain yang berada diatas skill yang menonjol adalah sprint speed, agility, dan balance
Cluster 4 tidak memiliki skill - skill yang menonjol, atau bisa dikatakan skill yang mereka miliki dibawah rata - rata
fifa_FWD <- fifa_FWD %>%
mutate(Class = ifelse(Cluster == 1, "SuperStar", ifelse(Cluster == 2, "Tanker", ifelse(Cluster == 3, "Speedster", "BelowAverage"))))
fifa_FWD$Class %>%
table()
## .
## BelowAverage Speedster SuperStar Tanker
## 871 901 696 950
fifa_FWD %>%
ggplot(aes(x = Class)) +
geom_bar(fill = "#7ad2f6") +
labs(title = "Jumlah Pemain Pada Tiap Cluster di Posisi FWD",
x = NULL,
y = NULL) +
theme_economist()
fifa_MID %>%
select(Crossing : SlidingTackle) %>%
prcomp(scale = T) %>%
fviz_pca_biplot(habillage = fifa_MID %>%
.$Cluster, geom = "point")
Cluster 1 tidak memiliki skill - skill yang menonjol, atau bisa dikatakan skill yang mereka miliki dibawah rata - rata
Kelompok pemain yang berada diatas skill yang menonjol adalah acceleration, agility, dan balance
Kelompok pemain yang berada di sisi atas memiliki skill yang menonjol di strength, standing tackle, marking, dan interception
Pada Biplot diatas terlihat, banyak sekali skill yang mengarah ke arah kiri, sehingga kelompok pemain yang berada di sebelah kiri, memiliki skill diatas rata - rata
fifa_MID <- fifa_MID %>%
mutate(Class = ifelse(Cluster == 1, "BelowAverage", ifelse(Cluster == 2, "Speedster", ifelse(Cluster == 3, "Tanker", "SuperStar"))))
fifa_MID$Class %>%
table()
## .
## BelowAverage Speedster SuperStar Tanker
## 1478 1867 1559 1934
fifa_MID %>%
ggplot(aes(x = Class)) +
geom_bar(fill = "#7ad2f6") +
labs(title = "Jumlah Pemain Pada Tiap Cluster di Posisi MID",
x = NULL,
y = NULL) +
theme_economist()
fifa_DEF %>%
select(Crossing : SlidingTackle) %>%
prcomp(scale = T) %>%
fviz_pca_biplot(habillage = fifa_DEF %>%
.$Cluster, geom = "point")
Kelompok pemain yang berada di sisi atas memiliki skill yang menonjol di heading accuracy dan strenght
Pada Biplot diatas terlihat, banyak sekali skill yang mengarah ke arah kiri, sehingga kelompok pemain yang berada di sebelah kiri, memiliki skill diatas rata - rata
Cluster 3 tidak memiliki skill - skill yang menonjol, atau bisa dikatakan skill yang mereka miliki dibawah rata - rata
Kelompok pemain yang berada diatas skill yang menonjol adalah SPrint speed, acceleration, agility, dan balance
fifa_DEF <- fifa_DEF %>%
mutate(Class = ifelse(Cluster == 1, "Tanker", ifelse(Cluster == 2, "SuperStar", ifelse(Cluster == 3, "BelowAverage", "Speedster"))))
fifa_DEF$Class %>%
table()
## .
## BelowAverage Speedster SuperStar Tanker
## 1388 1679 1284 1515
fifa_DEF %>%
ggplot(aes(x = Class)) +
geom_bar(fill = "#7ad2f6") +
labs(title = "Jumlah Pemain Pada Tiap Cluster di Posisi DEF",
x = NULL,
y = NULL) +
theme_economist()
fifa_GK %>%
select(GKHandling : GKReflexes) %>%
prcomp(scale = T) %>%
fviz_pca_biplot(habillage = fifa_GK %>%
.$Cluster, geom = "point")
Pada pemain posisi GK, karena skill - skill yang dimiliki seorang kiper berbeda dengan posisi posisi lainnya. maka untuk posisi GK tidak dilakukan clustering dan hanya menjadi 1 kelompok saja, yaitu Goal Keeper
fifa_clustered <- do.call("rbind", list(fifa_FWD, fifa_MID, fifa_DEF, fifa_GK))
fifa_optmzd <- fifa_clustered %>%
select(ID, Name, Age, Overall, Potential, PositionGroup, ValueP, WageP, Cluster, Class )
tmp <- dummyVars("~ PositionGroup", data = fifa_optmzd)
test1 <- data.frame(predict(tmp, newdata = fifa_optmzd))
fifa_optmzd2 <- fifa_optmzd %>%
cbind(test1) %>%
select(-PositionGroup.Unknown)
Mencari Kombinasi pemain dengan jumlah overall rating tertinggi dengan komposisi sebagai berikut :
num_fwd <- 2
num_def <- 4
num_mid <- 4
num_gk <- 1
max_value <- 291000000
obj <- fifa_optmzd2$Overall
const_dir = c("=", "=", "=", "=", "<=")
#matrix
const_mat <- matrix(c(fifa_optmzd2$PositionGroup.FWD,
fifa_optmzd2$PositionGroup.MID,
fifa_optmzd2$PositionGroup.DEF,
fifa_optmzd2$PositionGroup.GK,
fifa_optmzd2$ValueP),
nrow = 5, byrow = T)
const_rhs <- c(num_fwd, num_mid,
num_def, num_gk,
max_value)
Kombinasi pemain dengan total overall tertinggi :
x = lp("max", obj, const_mat, const_dir, const_rhs, all.bin=TRUE, all.int=TRUE)
fifa_optmzd2[which(x$solution==1), ] %>%
select (Name, Age, Overall, ValueP, PositionGroup, Class) %>%
kable()
Name | Age | Overall | ValueP | PositionGroup | Class | |
---|---|---|---|---|---|---|
2 | Cristiano Ronaldo | 33 | 94 | 77000000 | FWD | SuperStar |
23 | Iniesta | 34 | 86 | 21500000 | FWD | SuperStar |
3444 | Fernandinho | 33 | 86 | 18000000 | MID | SuperStar |
3475 | Quaresma | 34 | 84 | 15500000 | MID | Speedster |
3476 | A. Robben | 34 | 84 | 15500000 | MID | SuperStar |
3507 | D. De Rossi | 34 | 83 | 8000000 | MID | SuperStar |
10257 | Sergio Ramos | 32 | 91 | 51000000 | DEF | SuperStar |
10258 | D. Godín | 32 | 90 | 44000000 | DEF | SuperStar |
10259 | G. Chiellini | 33 | 89 | 27000000 | DEF | Tanker |
10280 | Naldo | 35 | 85 | 9000000 | DEF | SuperStar |
16130 | G. Buffon | 40 | 88 | 4000000 | GK | GK |
Total Overall :
## [1] 960
Total Budget :
## [1] "290,500,000"
Prediksi class digunakan untuk memprediksi class dari pemain yang baru datang, algoritma yang digunakan adalah Extreme Gradient Boost (XGBoost)
#set data train
settrain <- fifa_clustered %>%
select(Class, Attacking : Defending) %>%
filter(Class != "GK") %>%
mutate(Class = as.factor(Class))
train_index <- sample(1:nrow(settrain), nrow(settrain)*0.75)
Ubah label menjadi angka :
settrain<- settrain %>%
mutate(ClassNum = as.numeric(Class)) %>%
mutate(ClassNum = ClassNum -1) %>%
select(ClassNum, Attacking : Defending)
Memasukkan data dan label ke dalam xgb.matrix :
data_x <- as.matrix(settrain[,-1])
data_y <- settrain$ClassNum
data_matrix <- xgb.DMatrix(data = as.matrix(settrain), label = data_y)
# split train data and make xgb.DMatrix
train_data <- data_x[train_index,]
train_label <- data_y[train_index]
train_matrix <- xgb.DMatrix(data = train_data, label = train_label)
# split test data and make xgb.DMatrix
test_data <- data_x[-train_index,]
test_label <- data_y[-train_index]
test_matrix <- xgb.DMatrix(data = test_data, label = test_label)
Set parameter :
numberOfClasses <- length(unique(settrain$ClassNum))
xgb_params <- list("objective" = "multi:softprob",
"eval_metric" = "mlogloss",
"num_class" = numberOfClasses)
nround <- 50 # number of XGBoost rounds
Training Data :
Test prediksi :
test_pred <- predict(class_model, newdata = test_matrix)
test_prediction <- matrix(test_pred, nrow = numberOfClasses,
ncol=length(test_pred)/numberOfClasses) %>%
t() %>%
data.frame() %>%
mutate(label = test_label + 1,
max_prob = max.col(., "last"))
# confusion matrix of test set
confusionMatrix(factor(test_prediction$max_prob),
factor(test_prediction$label),
mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2 3 4
## 1 835 103 0 26
## 2 61 803 45 177
## 3 0 43 754 78
## 4 42 242 85 737
##
## Overall Statistics
##
## Accuracy : 0.7762
## 95% CI : (0.763, 0.789)
## No Information Rate : 0.2955
## P-Value [Acc > NIR] : < 0.00000000000000022
##
## Kappa : 0.7008
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 1 Class: 2 Class: 3 Class: 4
## Sensitivity 0.8902 0.6742 0.8529 0.7240
## Specificity 0.9583 0.9004 0.9616 0.8775
## Pos Pred Value 0.8662 0.7394 0.8617 0.6664
## Neg Pred Value 0.9664 0.8683 0.9588 0.9039
## Precision 0.8662 0.7394 0.8617 0.6664
## Recall 0.8902 0.6742 0.8529 0.7240
## F1 0.8780 0.7053 0.8573 0.6940
## Prevalence 0.2327 0.2955 0.2193 0.2525
## Detection Rate 0.2071 0.1992 0.1871 0.1828
## Detection Prevalence 0.2391 0.2694 0.2171 0.2744
## Balanced Accuracy 0.9242 0.7873 0.9072 0.8007
Variable yang akan digunakan : * Age * Nationality * Overall * Club * Value * Preferred Foot * International Reputation * Weak Foot * Skill Moves * Work Rate * Body Type * Position * Height * Weight * Skill dari Crossing : GKReflexes
fifamaster <- readRDS("www/fifa_master.rds")
fifamasterclean <- fifamaster %>%
select(Age, Nationality, Overall, Club, Position, Value = ValueP, Body.Type = `Body Type`, Work.Rate = `Work Rate`,Preferred.Foot = `Preferred Foot`, International.Reputation = `International Reputation`, Attacking : GoalKeeping) %>%
mutate(Body.Type = ifelse((Body.Type != "Normal" & Body.Type != "Lean" & Body.Type != "Stocky"),
"Normal",Body.Type)) %>%
mutate(Nationality = as.factor(Nationality),
Club = as.factor(Club),
Position = as.factor(Position),
Body.Type = as.factor(Body.Type),
Work.Rate = as.factor(Work.Rate),
Preferred.Foot = as.factor(Preferred.Foot),
International.Reputation = as.factor(International.Reputation)) %>%
filter(Value > 0) %>%
na.omit()
head(fifamasterclean) %>%
kable()
Age | Nationality | Overall | Club | Position | Value | Body.Type | Work.Rate | Preferred.Foot | International.Reputation | Attacking | Skill | Movement | Power | Mentality | Defending | GoalKeeping |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
31 | Argentina | 94 | FC Barcelona | RF | 110500000 | Normal | Medium/ Medium | Left | 5 | 85 | 93 | 92 | 76 | 72 | 29 | 11 |
33 | Portugal | 94 | Juventus | ST | 77000000 | Normal | High/ Low | Right | 5 | 87 | 83 | 87 | 90 | 75 | 27 | 12 |
26 | Brazil | 92 | Paris Saint-Germain | LW | 118500000 | Normal | High/ Medium | Right | 5 | 79 | 89 | 92 | 71 | 74 | 28 | 12 |
27 | Spain | 91 | Manchester United | GK | 72000000 | Lean | Medium/ Medium | Right | 4 | 23 | 30 | 62 | 43 | 43 | 16 | 89 |
27 | Belgium | 91 | Manchester City | RCM | 102000000 | Normal | High/ High | Right | 4 | 81 | 87 | 80 | 82 | 81 | 59 | 11 |
27 | Belgium | 91 | Chelsea | LF | 93000000 | Normal | High/ Medium | Right | 4 | 79 | 87 | 92 | 73 | 75 | 28 | 9 |
Split data train dan data test, dan pisahkan label dan datanya
set.seed(417)
intrain <- sample(nrow(fifamasterclean), nrow(fifamasterclean)*0.75)
fifa.train.sm <- fifamasterclean[intrain, ]
fifa.test.sm <- fifamasterclean[-intrain, ]
fifa.train.sm.x <- fifa.train.sm[,-6]
fifa.train.sm.y <- fifa.train.sm$Value
fifa.test.sm.x <- fifa.test.sm[,-6]
fifa.test.sm.y <- fifa.test.sm$Value
#Ubah ke matrix
fifa.train.sm.x.sparse <- sparse.model.matrix(~.,fifa.train.sm.x)[,-1]
fifa.test.sm.x.sparse <- sparse.model.matrix(~., fifa.test.sm.x)[,-1]
set.seed(123)
test2 <- xgboost(data = fifa.train.sm.x.sparse, label = fifa.train.sm.y, max_depth = 6,
eta = 0.1, nthread = 2, nrounds = 250, objective = "reg:squarederror")
Variable importance :
## Feature Gain Cover Frequency
## 1: Overall 0.9257671695 0.22983035 0.16985758
## 2: Age 0.0534295485 0.06709789 0.20908242
## 3: Attacking 0.0104288131 0.02881099 0.06163904
## 4: Skill 0.0047540249 0.02324050 0.03560588
## 5: Defending 0.0022556015 0.02475764 0.06000467
## 6: GoalKeeping 0.0003720146 0.01806931 0.02603315
Variable yang paling mempengaruhi value pemain adalah Overall, kemudian age juga sedikit mempengaruhi.
Predict dan error :
## [1] 470333.3
Range Value Pemain :
## [1] "Minimum : 10000"
## [1] "Maximum : 118500000"
Error senilai 470.333 memang cukup besar, tetapi dengan range data dari 1.000 - 118.500.000, eror yang dihasilkan tidak terlalu signifikan