La siguiente base de datos contiene estadisticas de jugadores de la temporada 2020-2021 de la Premier League. Obtenida de https://www.kaggle.com/datasets/rajatrc1705/english-premier-league202021/. Se pretende analizar si la posición de los jugadores influyó en la cantidad de goles anotados en la temporada.
library(ISLR)
library(factoextra)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(FactoMineR)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ✔ readr 2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
PREMIER <- read.csv("C:\\Users\\danbr\\OneDrive\\Desktop\\Mineria de Datos\\Actividad Constante - El Diario\\EPL_20_21.csv")
head(PREMIER, n=5)
## Name Club Nationality Position Age Matches Starts Mins Goals
## 1 Mason Mount Chelsea ENG MF,FW 21 36 32 2890 6
## 2 Edouard Mendy Chelsea SEN GK 28 31 31 2745 0
## 3 Timo Werner Chelsea GER FW 24 35 29 2602 6
## 4 Ben Chilwell Chelsea ENG DF 23 27 27 2286 3
## 5 Reece James Chelsea ENG DF 20 32 25 2373 1
## Assists Passes_Attempted Perc_Passes_Completed Penalty_Goals
## 1 5 1881 82.3 1
## 2 0 1007 84.6 0
## 3 8 826 77.2 0
## 4 5 1806 78.6 0
## 5 2 1987 85.0 0
## Penalty_Attempted xG xA Yellow_Cards Red_Cards
## 1 1 0.21 0.24 2 0
## 2 0 0.00 0.00 2 0
## 3 0 0.41 0.21 2 0
## 4 0 0.10 0.11 3 0
## 5 0 0.06 0.12 3 0
View(PREMIER)
sum(is.na(PREMIER))
## [1] 0
aggregate(Goals~Position, data= PREMIER, FUN = mean)
## Position Goals
## 1 DF 0.72471910
## 2 DF,FW 0.50000000
## 3 DF,MF 0.73333333
## 4 FW 5.45679012
## 5 FW,DF 1.33333333
## 6 FW,MF 2.63829787
## 7 GK 0.02380952
## 8 MF 1.58333333
## 9 MF,DF 1.07692308
## 10 MF,FW 2.30555556
ggplot(data = PREMIER, aes(x=Position, y=Goals, color=Position)) + geom_boxplot() + theme_light()
Este tipo de representación permite identificar de forma preliminar si existen asimetrías, datos atípicos o diferencia de varianzas. En este caso, los grupos (posiciones de jugadores) no parecen seguir una distribución simétrica. El boxplot nos permite visualizar que las posiciones FW (delanteros) y MF(mediocampistas) en promedio, tienen mas goles que DF (defensas) y GK (porteros). Dado que el tamaño de las cajas no es similar para todos los niveles, por lo que hay indicios de falta de homocedasticidad.
Para determinar el nivel de influencia estadistica de las posiciones de los jugadores con su promedio de goles, se realiza un ANOVA
ANOVA_PREMIER = aov(PREMIER$Goals ~ PREMIER$Position)
#Modelo para comparar goles contra posicion
summary(ANOVA_PREMIER)
## Df Sum Sq Mean Sq F value Pr(>F)
## PREMIER$Position 9 1503 166.95 19.74 <2e-16 ***
## Residuals 522 4414 8.46
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Pvalue <2e-16
#Se rechaza H1 y se acepta H0
#HO= Pvalue menor a 5% son diferentes
#H1= Pvalue mayor a 5% son iguales
TukeyHSD(ANOVA_PREMIER)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = PREMIER$Goals ~ PREMIER$Position)
##
## $`PREMIER$Position`
## diff lwr upr p adj
## DF,FW-DF -0.224719101 -4.0599209 3.6104827 1.0000000
## DF,MF-DF 0.008614232 -2.4755938 2.4928222 1.0000000
## FW-DF 4.732071022 3.4936671 5.9704749 0.0000000
## FW,DF-DF 0.608614232 -3.2265875 4.4438160 0.9999684
## FW,MF-DF 1.913578771 0.3982835 3.4288740 0.0027488
## GK-DF -0.700909577 -2.2859547 0.8841355 0.9254601
## MF-DF 0.858614232 -0.2683914 1.9856198 0.3155168
## MF,DF-DF 0.352203976 -2.3024012 3.0068092 0.9999932
## MF,FW-DF 1.580836454 -0.1077005 3.2693734 0.0887426
## DF,MF-DF,FW 0.233333333 -4.2299384 4.6966050 1.0000000
## FW-DF,FW 4.956790123 1.0474233 8.8661570 0.0025786
## FW,DF-DF,FW 0.833333333 -4.5012967 6.1679634 0.9999724
## FW,MF-DF,FW 2.138297872 -1.8674011 6.1439969 0.7974350
## GK-DF,FW -0.476190476 -4.5087917 3.5564108 0.9999976
## MF-DF,FW 1.083333333 -2.7921855 4.9588522 0.9967841
## MF,DF-DF,FW 0.576923077 -3.9833876 5.1372338 0.9999955
## MF,FW-DF,FW 1.805555556 -2.2688354 5.8799465 0.9245222
## FW-DF,MF 4.723456790 2.1262148 7.3206988 0.0000006
## FW,DF-DF,MF 0.600000000 -3.8632717 5.0632717 0.9999924
## FW,MF-DF,MF 1.904964539 -0.8351342 4.6450633 0.4518226
## GK-DF,MF -0.709523810 -3.4888024 2.0697548 0.9984071
## MF-DF,MF 0.850000000 -1.6960093 3.3960093 0.9880777
## MF,DF-DF,MF 0.343589744 -3.1576886 3.8448681 0.9999995
## MF,FW-DF,MF 1.572222222 -1.2673515 4.4117959 0.7604040
## FW,DF-FW -4.123456790 -8.0328236 -0.2140899 0.0292671
## FW,MF-FW -2.818492251 -4.5127461 -1.1242384 0.0000081
## GK-FW -5.432980600 -7.1898939 -3.6760673 0.0000000
## MF-FW -3.873456790 -5.2315871 -2.5153265 0.0000000
## MF,DF-FW -4.379867047 -7.1405382 -1.6191959 0.0000280
## MF,FW-FW -3.151234568 -5.0020542 -1.3004149 0.0000043
## FW,MF-FW,DF 1.304964539 -2.7007345 5.3106636 0.9899904
## GK-FW,DF -1.309523810 -5.3421251 2.7230775 0.9902184
## MF-FW,DF 0.250000000 -3.6255189 4.1255189 1.0000000
## MF,DF-FW,DF -0.256410256 -4.8167210 4.3039004 1.0000000
## MF,FW-FW,DF 0.972222222 -3.1021688 5.0466132 0.9990670
## GK-FW,MF -2.614488349 -4.5764312 -0.6525455 0.0011170
## MF-FW,MF -1.054964539 -2.6695832 0.5596542 0.5453026
## MF,DF-FW,MF -1.561374795 -4.4568507 1.3341011 0.7875959
## MF,FW-FW,MF -0.332742317 -2.3792049 1.7137202 0.9999611
## MF-GK 1.559523810 -0.1207267 3.2397743 0.0950419
## MF,DF-GK 1.053113553 -1.8794670 3.9856941 0.9801910
## MF,FW-GK 2.281746032 0.1831138 4.3803783 0.0209214
## MF,DF-MF -0.506410256 -3.2189372 2.2061167 0.9998751
## MF,FW-MF 0.722222222 -1.0559878 2.5004322 0.9555482
## MF,FW-MF,DF 1.228632479 -1.7611531 4.2184181 0.9521604
plot(TukeyHSD(ANOVA_PREMIER))
#Visualizar si son iguales o no, demuestra intervalos de condianza (si todas las lineas pasan por el inervalo 0, son iguales)
En el estudio realizado ANOVA se han encontrado significancia estadística para rechazar que las promedios de goles por posiciones son iguales entre todos los grupos.