El conjunto de datos Fifa.csv se encuentra disponible en la plataforma Kaggle: https://www.kaggle.com/artimous/complete-fifa-2017-player-dataset-global Este conjunto de datos contiene el estilo de juego del videojuego de consola Fifa 2017, así como estadísticas reales de los jugadores de futbol. El conjunto de datos contiene más de 17,500 registros y 53 variables. Estos datos nos ofrecen múltiples posibilidades para consolidar los conocimientos y competencias de manipulación de datos, preprocesado y análisis descriptivo.
Leer el fichero de datos “Fifa.csv” y guardar los datos en un objeto con identificador denominado fifa
library(readr)
fifa <- read_csv("Fifa.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## Name = col_character(),
## Nationality = col_character(),
## National_Position = col_character(),
## Club = col_character(),
## Club_Position = col_character(),
## Club_Joining = col_character(),
## Height = col_character(),
## Weight = col_character(),
## Preffered_Foot = col_character(),
## Birth_Date = col_character(),
## Preffered_Position = col_character(),
## Work_Rate = col_character()
## )
## See spec(...) for full column specifications.
View(fifa)
fifa <- read.csv ('Fifa.csv', header = TRUE , sep = "," , dec = ".")
Describir brevemente la estructura de los datos: qué tipo de datos contiene, número de filas y columnas y nombre de las columnas. Realizar un resumen descriptivo de los datos.
summary (fifa)
## Name Nationality National_Position
## Felipe : 6 England : 1618 :16513
## Danilo : 5 Argentina: 1097 Sub : 556
## Gabriel : 5 Spain : 1008 LCB : 48
## Carlos RodrÃguez: 4 France : 974 GK : 47
## Roberto : 4 Brazil : 921 RCB : 46
## Ã\201lvaro : 3 Italy : 751 LB : 39
## (Other) :17561 (Other) :11219 (Other): 339
## National_Kit Club Club_Position Club_Kit
## Min. : 1.00 Free Agents : 232 Sub :7492 Min. : 1.00
## 1st Qu.: 6.00 Angers SCO : 33 Res :3146 1st Qu.: 9.00
## Median :12.00 Arsenal : 33 RCB : 633 Median :18.00
## Mean :12.22 AS Monaco : 33 GK : 632 Mean :21.29
## 3rd Qu.:18.00 Bor. M'gladbach: 33 LCB : 631 3rd Qu.:27.00
## Max. :36.00 Bournemouth : 33 LB : 549 Max. :99.00
## NA's :16513 (Other) :17191 (Other):4505 NA's :1
## Club_Joining Contract_Expiry Rating Height
## 07/01/2016: 1193 Min. :2017 Min. :45.00 180 cm : 1529
## 07/01/2015: 907 1st Qu.:2017 1st Qu.:62.00 178 cm : 1195
## 07/01/2014: 558 Median :2019 Median :66.00 185 cm : 1193
## 01/01/2016: 412 Mean :2019 Mean :66.17 183 cm : 1120
## 07/01/2013: 404 3rd Qu.:2020 3rd Qu.:71.00 175 cm : 1039
## 01/01/2015: 391 Max. :2023 Max. :94.00 182 cm : 863
## (Other) :13723 NA's :1 (Other):10649
## Weight Preffered_Foot Birth_Date Age
## 75 kg : 1404 Left : 4094 02/29/1988: 160 Min. :17.00
## 70 kg : 1387 Right:13494 02/29/1984: 157 1st Qu.:22.00
## 72 kg : 1021 02/29/1992: 155 Median :25.00
## 78 kg : 967 01/01/1996: 13 Mean :25.46
## 80 kg : 960 11/11/1996: 13 3rd Qu.:29.00
## 73 kg : 915 01/08/1991: 12 Max. :47.00
## (Other):10934 (Other) :17078
## Preffered_Position Work_Rate Weak_foot Skill_Moves
## CB :2181 Medium / Medium:9897 Min. :1.000 Min. :1.000
## GK :2003 High / Medium :2918 1st Qu.:3.000 1st Qu.:2.000
## ST :1825 Medium / High :1534 Median :3.000 Median :2.000
## CM : 831 Medium / Low : 845 Mean :2.934 Mean :2.303
## LB : 808 High / High : 747 3rd Qu.:3.000 3rd Qu.:3.000
## RB : 689 High / Low : 730 Max. :5.000 Max. :5.000
## (Other):9251 (Other) : 917
## Ball_Control Dribbling Marking Sliding_Tackle
## Min. : 5.00 Min. : 4.0 Min. : 3.00 Min. : 5.00
## 1st Qu.:53.00 1st Qu.:47.0 1st Qu.:22.00 1st Qu.:23.00
## Median :63.00 Median :60.0 Median :48.00 Median :51.00
## Mean :57.97 Mean :54.8 Mean :44.23 Mean :45.57
## 3rd Qu.:69.00 3rd Qu.:68.0 3rd Qu.:64.00 3rd Qu.:64.00
## Max. :95.00 Max. :97.0 Max. :92.00 Max. :95.00
##
## Standing_Tackle Aggression Reactions Attacking_Position
## Min. : 3.00 Min. : 2.00 Min. :29.00 Min. : 2.00
## 1st Qu.:26.00 1st Qu.:44.00 1st Qu.:55.00 1st Qu.:37.00
## Median :54.00 Median :59.00 Median :62.00 Median :54.00
## Mean :47.44 Mean :55.92 Mean :61.77 Mean :49.59
## 3rd Qu.:66.00 3rd Qu.:70.00 3rd Qu.:68.00 3rd Qu.:64.00
## Max. :92.00 Max. :96.00 Max. :96.00 Max. :94.00
##
## Interceptions Vision Composure Crossing
## Min. : 3.00 Min. :10.00 Min. : 5.00 Min. : 6.00
## 1st Qu.:26.00 1st Qu.:43.00 1st Qu.:47.00 1st Qu.:38.00
## Median :52.00 Median :54.00 Median :57.00 Median :54.00
## Mean :46.79 Mean :52.71 Mean :55.85 Mean :49.74
## 3rd Qu.:64.00 3rd Qu.:64.00 3rd Qu.:66.00 3rd Qu.:64.00
## Max. :93.00 Max. :94.00 Max. :94.00 Max. :91.00
##
## Short_Pass Long_Pass Acceleration Speed
## Min. :10.00 Min. : 7.0 Min. :11.00 Min. :11.00
## 1st Qu.:52.00 1st Qu.:42.0 1st Qu.:57.00 1st Qu.:58.00
## Median :62.00 Median :56.0 Median :68.00 Median :68.00
## Mean :58.12 Mean :52.4 Mean :65.29 Mean :65.48
## 3rd Qu.:68.00 3rd Qu.:64.0 3rd Qu.:75.00 3rd Qu.:75.00
## Max. :92.00 Max. :93.0 Max. :96.00 Max. :96.00
##
## Stamina Strength Balance Agility
## Min. :10.00 Min. :20.00 Min. :10.00 Min. :11.00
## 1st Qu.:57.00 1st Qu.:57.00 1st Qu.:56.00 1st Qu.:55.00
## Median :66.00 Median :66.00 Median :65.00 Median :65.00
## Mean :63.48 Mean :65.09 Mean :64.01 Mean :63.21
## 3rd Qu.:74.00 3rd Qu.:74.00 3rd Qu.:74.00 3rd Qu.:74.00
## Max. :95.00 Max. :98.00 Max. :97.00 Max. :96.00
##
## Jumping Heading Shot_Power Finishing
## Min. :15.00 Min. : 4.00 Min. : 3.00 Min. : 2.00
## 1st Qu.:58.00 1st Qu.:45.00 1st Qu.:45.00 1st Qu.:29.00
## Median :65.00 Median :56.00 Median :59.00 Median :48.00
## Mean :64.92 Mean :52.39 Mean :55.58 Mean :45.16
## 3rd Qu.:73.00 3rd Qu.:65.00 3rd Qu.:69.00 3rd Qu.:61.00
## Max. :95.00 Max. :94.00 Max. :93.00 Max. :95.00
##
## Long_Shots Curve Freekick_Accuracy Penalties
## Min. : 4.0 Min. : 6.00 Min. : 4.00 Min. : 7.00
## 1st Qu.:32.0 1st Qu.:34.00 1st Qu.:31.00 1st Qu.:39.00
## Median :52.0 Median :48.00 Median :42.00 Median :50.00
## Mean :47.4 Mean :47.18 Mean :43.38 Mean :49.17
## 3rd Qu.:63.0 3rd Qu.:62.00 3rd Qu.:57.00 3rd Qu.:61.00
## Max. :91.0 Max. :92.00 Max. :93.00 Max. :96.00
##
## Volleys GK_Positioning GK_Diving GK_Kicking
## Min. : 3.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.:30.00 1st Qu.: 8.00 1st Qu.: 8.00 1st Qu.: 8.00
## Median :44.00 Median :11.00 Median :11.00 Median :11.00
## Mean :43.28 Mean :16.61 Mean :16.82 Mean :16.46
## 3rd Qu.:57.00 3rd Qu.:14.00 3rd Qu.:14.00 3rd Qu.:14.00
## Max. :93.00 Max. :91.00 Max. :89.00 Max. :95.00
##
## GK_Handling GK_Reflexes
## Min. : 1.00 Min. : 1.0
## 1st Qu.: 8.00 1st Qu.: 8.0
## Median :11.00 Median :11.0
## Mean :16.56 Mean :16.9
## 3rd Qu.:14.00 3rd Qu.:14.0
## Max. :91.00 Max. :90.0
##
ncol(fifa)
## [1] 53
nrow(fifa)
## [1] 17588
colnames (fifa)
## [1] "Name" "Nationality" "National_Position"
## [4] "National_Kit" "Club" "Club_Position"
## [7] "Club_Kit" "Club_Joining" "Contract_Expiry"
## [10] "Rating" "Height" "Weight"
## [13] "Preffered_Foot" "Birth_Date" "Age"
## [16] "Preffered_Position" "Work_Rate" "Weak_foot"
## [19] "Skill_Moves" "Ball_Control" "Dribbling"
## [22] "Marking" "Sliding_Tackle" "Standing_Tackle"
## [25] "Aggression" "Reactions" "Attacking_Position"
## [28] "Interceptions" "Vision" "Composure"
## [31] "Crossing" "Short_Pass" "Long_Pass"
## [34] "Acceleration" "Speed" "Stamina"
## [37] "Strength" "Balance" "Agility"
## [40] "Jumping" "Heading" "Shot_Power"
## [43] "Finishing" "Long_Shots" "Curve"
## [46] "Freekick_Accuracy" "Penalties" "Volleys"
## [49] "GK_Positioning" "GK_Diving" "GK_Kicking"
## [52] "GK_Handling" "GK_Reflexes"
str(fifa)
## 'data.frame': 17588 obs. of 53 variables:
## $ Name : Factor w/ 17341 levels "Ögmundur Kristinsson",..: 3365 9997 12509 10338 10615 3999 14216 5931 17327 16006 ...
## $ Nationality : Factor w/ 160 levels "Afghanistan",..: 122 6 20 155 59 139 121 158 143 14 ...
## $ National_Position : Factor w/ 28 levels "","CAM","CB",..: 14 25 15 14 6 6 14 24 1 6 ...
## $ National_Kit : num 7 10 10 9 1 1 9 11 NA 1 ...
## $ Club : Factor w/ 634 levels "1. FC Heidenheim",..: 461 207 207 207 209 364 209 461 364 149 ...
## $ Club_Position : Factor w/ 30 levels "","CAM","CB",..: 16 27 16 29 7 7 29 27 29 7 ...
## $ Club_Kit : num 7 10 11 9 1 1 9 11 9 13 ...
## $ Club_Joining : Factor w/ 1678 levels "","01/01/1993",..: 848 843 852 927 850 850 853 1247 855 1012 ...
## $ Contract_Expiry : num 2021 2018 2021 2021 2021 ...
## $ Rating : int 94 93 92 92 92 90 90 90 90 89 ...
## $ Height : Factor w/ 50 levels "155 cm","157 cm",..: 30 15 19 27 38 38 30 28 40 44 ...
## $ Weight : Factor w/ 56 levels "100 kg","101 kg",..: 37 29 25 42 49 39 36 31 52 48 ...
## $ Preffered_Foot : Factor w/ 2 levels "Left","Right": 2 1 2 2 2 2 2 1 2 1 ...
## $ Birth_Date : Factor w/ 6063 levels "01/01/1982","01/01/1983",..: 623 2991 630 412 1490 5212 3952 3362 4669 2265 ...
## $ Age : int 32 29 25 30 31 26 28 27 35 24 ...
## $ Preffered_Position: Factor w/ 292 levels "CAM","CAM/CDM",..: 172 237 157 266 113 113 266 237 266 113 ...
## $ Work_Rate : Factor w/ 9 levels "High / High",..: 2 9 3 3 9 9 3 3 8 9 ...
## $ Weak_foot : int 4 4 5 4 4 3 4 3 4 3 ...
## $ Skill_Moves : int 5 4 5 4 1 1 3 4 4 1 ...
## $ Ball_Control : int 93 95 95 91 48 31 87 88 90 23 ...
## $ Dribbling : int 92 97 96 86 30 13 85 89 87 13 ...
## $ Marking : int 22 13 21 30 10 13 25 51 15 11 ...
## $ Sliding_Tackle : int 23 26 33 38 11 13 19 52 27 16 ...
## $ Standing_Tackle : int 31 28 24 45 10 21 42 55 41 18 ...
## $ Aggression : int 63 48 56 78 29 38 80 65 84 23 ...
## $ Reactions : int 96 95 88 93 85 88 88 87 85 81 ...
## $ Attacking_Position: int 94 93 90 92 12 12 89 86 86 13 ...
## $ Interceptions : int 29 22 36 41 30 30 39 59 20 15 ...
## $ Vision : int 85 90 80 84 70 68 78 79 83 44 ...
## $ Composure : int 86 94 80 83 70 60 87 85 91 52 ...
## $ Crossing : int 84 77 75 77 15 17 62 87 76 14 ...
## $ Short_Pass : int 83 88 81 83 55 31 83 86 84 32 ...
## $ Long_Pass : int 77 87 75 64 59 32 65 80 76 31 ...
## $ Acceleration : int 91 92 93 88 58 56 79 93 69 46 ...
## $ Speed : int 92 87 90 77 61 56 82 95 74 52 ...
## $ Stamina : int 92 74 79 89 44 25 79 78 75 38 ...
## $ Strength : int 80 59 49 76 83 64 84 80 93 70 ...
## $ Balance : int 63 95 82 60 35 43 79 65 41 45 ...
## $ Agility : int 90 90 96 86 52 57 78 77 86 61 ...
## $ Jumping : int 95 68 61 69 78 67 84 85 72 68 ...
## $ Heading : int 85 71 62 77 25 21 85 86 80 13 ...
## $ Shot_Power : int 92 85 78 87 25 31 86 91 93 36 ...
## $ Finishing : int 93 95 89 94 13 13 91 87 90 14 ...
## $ Long_Shots : int 90 88 77 86 16 12 82 90 88 17 ...
## $ Curve : int 81 89 79 86 14 21 77 86 82 19 ...
## $ Freekick_Accuracy : int 76 90 84 84 11 19 76 85 82 11 ...
## $ Penalties : int 85 74 81 85 47 40 81 76 91 27 ...
## $ Volleys : int 88 85 83 88 11 13 86 76 93 12 ...
## $ GK_Positioning : int 14 14 15 33 91 86 8 5 9 86 ...
## $ GK_Diving : int 7 6 9 27 89 88 15 15 13 84 ...
## $ GK_Kicking : int 15 15 15 31 95 87 12 11 10 69 ...
## $ GK_Handling : int 11 11 9 25 90 85 6 15 15 91 ...
## $ GK_Reflexes : int 11 8 11 37 89 90 10 6 12 89 ...
names (fifa) [names(fifa)=="Nationality"] <- "Nacionalidad"
is.factor(fifa$Club)
## [1] TRUE
class (fifa$Club)
## [1] "factor"
is.factor(fifa$Nacionalidad)
## [1] TRUE
class (fifa$Nacionalidad)
## [1] "factor"
length(unique (fifa$Club))
## [1] 634
length(unique (fifa$Nacionalidad))
## [1] 160
fifa$Weight <- as.numeric(gsub (" kg","",fifa$Weight))
mean(fifa$Weight)
## [1] 75.25335
sd (fifa$Weight)
## [1] 6.897948
boxplot (fifa$Weight)
hist (fifa$Contract_Expiry,,freq=FALSE)
# Listar el nombre de los 5 jugadores con menos y major peso de la liga
unique(fifa$Name[head(sort(fifa$Weight,decreasing = TRUE), 10)])
## [1] Ralf Fährmann David Luiz Douglas Costa Kamil Glik
## [5] Alex Sandro
## 17341 Levels: Ögmundur Kristinsson ... Zymer Bytyqi
unique(fifa$Name[tail(sort(fifa$Weight,decreasing = TRUE), 10)])
## [1] Ã\201ngel Di MarÃa Coutinho Jordi Alba Sergio Busquets
## [5] Thomas Müller Bernd Leno
## 17341 Levels: Ögmundur Kristinsson ... Zymer Bytyqi
# Eliminar los outliers de la variable peso y guardar en un data frame fifa.avg
outlier_weight <- boxplot(fifa$Weight, plot = FALSE)$out
fifa.avg <- data.frame (fifa$Weight[fifa$Weight %in% outlier_weight==FALSE])
#Comparar los boxplots de los datos originales en fifa y de los datos de peso de fifa.avg
boxplot(fifa$Weight)
boxplot(fifa.avg)
El proceso inherente que se realiza es una discretización de Rating por intervalos del mismo tamaño. (Nota: el número de observaciones en cada intervalo será diferente)
fifa$Clasification <- cut(fifa$Rating, 3)
levels (fifa$Clasification) [1] <- "Regular"
levels (fifa$Clasification) [2] <- "Bueno"
levels (fifa$Clasification) [3] <- "Excelente"
Mostrar un boxplot del valor del rating en función de la clasificación.
sum (fifa$Club == "FC Barcelona" & fifa$Clasification == "Regular")
## [1] 0
sum (fifa$Club == "FC Barcelona" & fifa$Clasification == "Bueno")
## [1] 13
sum (fifa$Club == "FC Barcelona" & fifa$Clasification == "Excelente")
## [1] 20
fifa$Name[fifa$Club[fifa$Clasification=="Bueno"]=="Real Madrid"]
## [1] Jack Wilshere Jelle Van Damme Ramón Fernández
## [4] Landry Dimata Rafinha Fernando Brandán
## [7] Bruno Mendes Souleymane Doukara Junior Sambia
## [10] Franco Cristaldo Manuel Prietl Michael Salazar
## [13] Alassane Diaby Tom Walker Ali Lajami
## 17341 Levels: Ögmundur Kristinsson ... Zymer Bytyqi
RM_bueno <- list(fifa$Name[fifa$Club[fifa$clasification2=="Bueno"]=="Real Madrid"])
boxplot (fifa$Rating ~ fifa$Clasification)
which (colnames(fifa) == "Weak_foot")
## [1] 18
which (colnames(fifa) == "GK_Reflexes")
## [1] 53
rendimiento <- data.frame (fifa[,c(18:53)])
rendimiento <- cbind (rendimiento,fifa$Clasification)
which(colnames(rendimiento) == "fifa$Clasification") == ncol(rendimiento)
## [1] TRUE
which (is.na(rendimiento))
## integer(0)
train <- sample (nrow(rendimiento), 0.7*nrow(rendimiento))
test <- sample (nrow(rendimiento), 0.3*nrow(rendimiento))