Análisis de Componentes Principales por categoria

Contexto

Tenemos los siguientes datos:

para las posiciones tenemos la siguiente distribucion:


    Chi-squared test for given probabilities

data:  tablechi
X-squared = 1.7941, df = 2, p-value = 0.4078

Se realizo prueba chi cuadrado para verificar diferencias significativas en las frecuencias de categorias, con un valor p mayorr que 0.05, lo que indica no que existen diferencias.

A chi-square test was performed to verify significant differences in category frequencies, resulting in a p-value greater than 0.05, which indicates that no significant differences were found. ## Tratamiento de datos

(A fin de evitar superposiciones en los graficos finales, se realizaron los procesos de correlacion de kendall y pruebas de normalidad Shapiro wilk para determinar alta correlacion y normalidad respectivamente, realizandolo grupo a gupo con tres categorias principales, categoria infantil, prejuvenil y juvenil, de la categoria de variables de ataque, se presentan a continuacion las variables depuradas

In order to avoid overlapping in the final graphs, Kendall correlation and Shapiro-Wilk normality tests were performed to determine high correlation and normality respectively, conducting them group by group across three main categories: children, pre-youth, and youth categories. The refined variables from the attack variable category are presented below.

Analisis de Componentes principales

Realizamos entonces un analisis para las variables por categoria:

Realizamos el analisis de componentes principales para establecer la relacion entre la categoria y las distintas variables.

Mostramos a continucaion la matriz rotación, que nos indica cuánto contribuye cada variable original a cada componente principal. Los valores (cargas) varían de -1 a 1, donde valores absolutos más grandes indican una mayor contribución de la variable al componente.

We performed a principal component analysis to establish the relationship between the category and the different variables.

Below we show the rotation matrix, which indicates how much each original variable contributes to each principal component. The values (loadings) range from -1 to 1, where larger absolute values indicate a greater contribution of the variable to the component.

                                 PC1         PC2
Waist/hip ratio          -0.01830947 -0.03614273
Cormic index             -0.16737100 -0.16376338
Angle (Â°)               -0.03101572 -0.06125952
Total                     0.25190101  0.08144984
Contact time              0.40154975 -0.19937303
Average speed (km/h)_    -0.39758641  0.16465594
Leg length (cm)           0.20563338  0.42331643
Total time (s)_           0.45788518 -0.23784661
Contact time (ms)         0.21835523  0.21874129
Residual mass %           0.02312555  0.39892004
Muscle percentage        -0.01414816 -0.11115199
Bi-styloid diameter (mm)  0.11144740  0.21223402
Split 1 - 5 meters        0.20204190  0.13830400
COD deficit               0.41952789 -0.22399329
Residual mass kg          0.06792116  0.49051635
Arm span (cm)             0.23173731  0.29080781

Importance of components:
                         PC1   PC2       PC3
Standard deviation     3.175 2.433 2.398e-16
Proportion of Variance 0.630 0.370 0.000e+00
Cumulative Proportion  0.630 1.000 1.000e+00

Interpretación de PC1 y PC2 La primera dimension PC1 explica el 63% de la varianza total de los datos, es decir es la componente más importante. PC2 explica el 37% de la varianza total.

Juntas, PC1 y PC2 capturan el 100% de la varianza de los datos (según la proporción acumulada), lo que indica que estas dos componentes resumen completamente la información de las variables originales en este caso.

PC1: Componente de Rendimiento de Movimiento o Eficiencia, está dominada por variables relacionadas con el rendimiento físico, la velocidad y el tiempo de movimiento. Las cargas más altas (en valor absoluto) en PC1 son:

Total time (s): 0.458 (carga positiva alta)

COD deficit: 0.420 (carga positiva alta)

Contact time: 0.402 (carga positiva alta)

Average speed (km/h): -0.398 (carga negativa alta)

Los valores altos de PC1 con cargas positivas ( tiempo total, déficit de cambio de dirección y tiempo de contacto) se asocian con un mayor tiempo es decir mas ineficiencia en el movimiento(más tiempo en completar tareas, mayor déficit en cambios de dirección).

La variable Average speed tiene una carga negativa, inversa, es decir a mayor PC1, menor velocidad promedio.

PC2: Componente de Tamaño Corporal o Dimensiones Antropométricas está dominada por variables relacionadas con la morfología corporal, como la longitud de las extremidades y la masa residual. Las cargas más altas en PC2 son:

Residual mass kg: 0.491 (carga positiva alta)

Leg length (cm): 0.423 (carga positiva alta)

Residual mass %: 0.399 (carga positiva alta)

Arm span (cm): 0.291 (carga positiva moderada)

Las cargas positivas en variables como longitud de la pierna, masa residual (en kg y porcentaje) y envergadura del brazo sugieren que PC2 representa el tamaño corporal o dimensiones antropométricas se asocian con jugadoras de piernas más largas, mayor masa residual y envergadura mayor.

Interpretation of PC1 and PC2 The first dimension, PC1, explains 63% of the total variance in the data, making it the most important component.

PC2 explains 37% of the total variance.

Together, PC1 and PC2 capture 100% of the data variance (according to the cumulative proportion), indicating that these two components completely summarize the information from the original variables in this case.

PC1: Movement Performance or Efficiency Component This component is dominated by variables related to physical performance, speed, and movement time. The highest loadings (in absolute value) on PC1 are:

Total time (s): 0.458 (high positive loading)

COD deficit: 0.420 (high positive loading)

Contact time: 0.402 (high positive loading)

Average speed (km/h): -0.398 (high negative loading)

Interpretation:

High values of PC1 with positive loadings (total time, change of direction deficit, contact time) are associated with longer durations, indicating greater inefficiency in movement (more time to complete tasks, higher deficit in direction changes).

The Average speed variable has a negative (inverse) loading, meaning that higher PC1 values correspond to lower average speed.

PC2: Body Size or Anthropometric Dimensions Component This component is dominated by variables related to body morphology, such as limb length and residual mass. The highest loadings on PC2 are:

Residual mass kg: 0.491 (high positive loading)

Leg length (cm): 0.423 (high positive loading)

Residual mass %: 0.399 (high positive loading)

Arm span (cm): 0.291 (moderate positive loading)

Interpretation:

Positive loadings on variables like leg length, residual mass (in kg and percentage), and arm span suggest that PC2 represents body size or anthropometric dimensions. These are associated with players who have longer legs, greater residual mass, and larger arm span.

La grafica de brand, nos indica que la varianza de la primea componente es de 63% y la segunda de 37%.

The scree plot indicates that the variance of the first component is 63% and the second is 37%.

El PCA por categoría indica que en la categoría U13 se observan valores de puntos más bajos en el eje antropométrico (Y) y tiempos bajos en el eje de rendimiento (X). En la categoría U17, estos valores son generalmente más altos en el eje antropométrico y presentan tiempos mayores (lo que indica menor velocidad). Finalmente, en la categoría U15 se registran los valores con los niveles antropométricos más altos, pero acompañados de mayores tiempos y, en consecuencia, menor velocidad en la medición de dichas variables.

The PCA by category shows that the U13 category exhibits lower scores on the anthropometric axis (Y) and shorter times on the performance axis (X). In the U17 category, these values are generally higher on the anthropometric axis and show longer times (indicating lower speed). Finally, the U15 category records the highest anthropometric levels, but these are accompanied by longer times and consequently lower speed in the measurement of these variables.

El gráfico de biplot indica que la variable Cormic index, cuando presenta valores antropométricos más altos, se asocia con mayores tiempos. La categoría U17 también muestra este comportamiento. La categoría con mejor rendimiento en este caso es la U13, ya que sus tiempos son bajos, aunque su nivel antropométrico es alto, la categoria U15, tiene los niveles antropometricos mas bajos, aunque tiene mejores tiempos que la categoria U17.

The biplot graph indicates that the variable Cormic index, when showing higher anthropometric values, is associated with longer times. The U17 category also exhibits this behavior. The category with the best performance in this case is U13, as their times are short despite their high anthropometric level. The U15 category has the lowest anthropometric levels, yet shows better times than the U17 category.

Ejes y Varianza Explicada Dim1 (Eje X): Explica el 22.5% de la varianza total

Dim2 (Eje Y): Explica el 13.2% de la varianza total

Total: Ambos ejes explican 35.7% de la varianza total

Flechas/Vectores (Variables) Cada flecha representa una variable del análisis. Su posición e dirección indican:

Dirección: Variables que apuntan en direcciones similares están positivamente correlacionadas

Direcciones opuestas: Variables con flechas en sentido contrario están negativamente correlacionadas

Ángulo de 90°: Variables no correlacionadas

Longitud de las Flechas Flechas largas: Variables bien representadas en este plano 2D

Flechas cortas: Variables pobremente representadas en estas dos dimensiones

Colores (Cos2 - Calidad de Representación) La escala de color indica qué tan bien está representada cada variable:

Interpretación Práctica: Relaciones entre Variables: Las variables que forman grupos compactos con flechas cercanas están altamente correlacionadas

Las variables en cuadrantes opuestos tienen correlación negativa

Las variables perpendiculares no están relacionadas

Ejemplo de Interpretación: Si vieras que:

Variables de “peso” y “altura” están juntas en el cuadrante superior derecho → Están correlacionadas positivamente

Variables de “velocidad” en el cuadrante opuesto → Están negativamente correlacionadas con peso/altura

Limitaciones: Solo representa 35.7% de la varianza total, por lo que puede haber información importante en otras dimensiones no mostradas

Axes and Explained Variance Dim1 (X-axis): Explains 22.5% of the total variance

Dim2 (Y-axis): Explains 13.2% of the total variance

Total: Both axes together explain 35.7% of the total variance

Arrows/Vectors (Variables) Each arrow represents a variable in the analysis. Their position and direction indicate:

Direction: Variables pointing in similar directions are positively correlated

Opposite directions: Variables with arrows pointing in opposite directions are negatively correlated

90° angle: Variables are not correlated

Arrow Length Long arrows: Variables well represented in this 2D plane

Short arrows: Variables poorly represented in these two dimensions

Colors (Cos² - Quality of Representation) The color scale indicates how well each variable is represented:

Practical Interpretation: Relationships Between Variables: Variables forming compact groups with nearby arrows are highly correlated

Variables in opposite quadrants have negative correlation

Perpendicular variables are not related

Interpretation Example: If you observe that:

“Weight” and “height” variables are grouped together in the upper right quadrant → They are positively correlated

“Speed” variables in the opposite quadrant → They are negatively correlated with weight/height

Limitations: The plot represents only 35.7% of the total variance, so there may be important information in other dimensions not shown

Key Refinements: Structure: Organized with clear headings and bullet points for better readability

Terminology: Used standard statistical terms (“positively correlated”, “explained variance”)

Clarity: Maintained parallel structure in lists and consistent formatting

Precision: Kept technical accuracy while making the content accessible

Completeness: Ensured all original concepts were preserved and clearly expressed

This version maintains all the original analytical content while presenting it in a clear, professional English format suitable for academic or technical reporting.

Analisis descriptivo

Se realizó analisis ANOVA para cada una de las variables, comparando entre las tres categorias, se midio el efecto por medio del parametro Eta cuadrado

An ANOVA was performed for each variable, comparing the three categories, with the effect size measured using the Eta squared parameter.

Los resultados de las distintas pruebas realizadas a las futbolistas según sus categorias se presentan como media y desviación típica (DE). La normalidad y homocedasticidad de los datos se confirmaron mediante la prueba de Shapiro-Wilk, cuyos resultados mostraron que los datos no seguían una distribución normal. Las diferencias entre las distintas pruebas entre atacantes y defensores se analizaron mediante la prueba ANOVA. Se establecieron los siguientes valores p (* p < 0,05). Los tamaños del efecto se obtuvieron mediante el coeficiente Eta cuadrado. La interpretación de $\eta^{2}$: $\eta^{2}$ cercano a 0 indica que no hay diferencia significativa entre los grupos; $\eta^{2}$ cercano a 1 indica que hay diferencia entre las categorias para la variable en cuestion . Luego, para identificar el perfil de cada variable en las tres categorias, se utilizó el análisis de componentes principales (PCA). Las variables fueron escaladas y centradas (puntuación Z). Para definir el parámetro estadístico del PCA, se utilizó el determinante de la matriz de correlación de Kendall. En donde se obtuvo un valor cercano a 0, indicando una alta multicolinealidad y sugieriendo que variables presentan relaciones lineales significativas, con la mayor parte de la variabilidad de los datos concentrada en las dos primeras dimensiones. Se consideraron valores propios > 1 para la extracción de los componentes principales. Se aplicó un método de rotación ortogonal Varimax para identificar la alta correlación de los componentes y garantizar que cada componente principal proporcionara información diferente. Se mantuvo un umbral de 0,5 para cada carga de PC para su interpretación. Se adjuntaron los valores asignados a cada observación de todas las futbolistas y las 90 variables cuantitativas. Todos los análisis se realizaron con el software RStudio R version 4.5.0 (2025-04-11 ucrt)

The results of the various tests performed on female soccer players across different categories are presented as mean and standard deviation (SD). Data normality and homoscedasticity were assessed using the Shapiro-Wilk test, which indicated that the data did not follow a normal distribution. Differences in various tests between attackers and defenders were analyzed using ANOVA. The following p-values were established (* p < 0.05). Effect sizes were calculated using the Eta squared coefficient. Interpretation of $\eta^{2}$:$\eta^{2}$ close to 0 indicates no significant difference between groups; η² close to 1 indicates significant differences between categories for the given variable. To identify the profile of each variable across the three categories, principal component analysis (PCA) was employed. Variables were scaled and centered (Z-score). The determinant of the Kendall correlation matrix was used as the statistical parameter for PCA, yielding a value close to 0, indicating high multicollinearity and suggesting significant linear relationships among variables, with most data variability concentrated in the first two dimensions. Eigenvalues > 1 were considered for principal component extraction. A Varimax orthogonal rotation method was applied to identify high component correlations and ensure each principal component provided distinct information. A threshold of 0.5 was maintained for each PC loading for interpretation. Assigned values for all observations of female soccer players and the 90 quantitative variables are included. All analyses were conducted using RStudio software R version 4.5.0 (2025-04-11 ucrt).

--- title: "Análisis de Componentes Principales por categoria" format: html: self-contained: true code-tools: true --- ## Contexto ```{r,include=FALSE} ###################### PRIMERO CARGA LOS PAQUETES #install.packages("GGally") library(pheatmap) # Calcular tamaño del efecto (eta cuadrado) library(effectsize) library(readxl) library(DT) ###################### PRIMERO CARGA LOS PAQUETES if (!require("pacman")) install.packages("pacman") library(FactoMineR) library(gplots) library(RColorBrewer) library(corrplot) library(dplyr) library(forcats) # Para manejar factores # cargar paquetes necesarios pacman::p_load( FactoMineR, tidyverse, factoextra, haven, naniar, corrplot, readr, gridExtra, incidence, easypackages, readxl, dplyr, gplots, kableExtra, apyramid, janitor, flextable, lubridate, stringr, rio, bench, sf, cleaner, DT, leaflet, leaflet.extras, esquisse, tseries, forecast, skimr, tsibble, epicontacts, distcrete, epitrix, EpiEstim, projections, magrittr, binom, ape, outbreaker2, knitr, broom, ggridges, scales ) library(writexl) library(GGally) library(ggforce) # Para agregar elipses y círculos library(pheatmap) library(qgraph) #install.packages("kableExtra") #Instalar los paquetes #if (!require("pacman")) install.packages("pacman"); pacman::p_load(effsize, gplots, naniar, corrplot, readr, gridExtra, rcompanion, flextable, incidence, easypackages, FactoMineR, tidyverse, factoextra, haven, janitor, readxl, dplyr, kableExtra, apyramid, ggplot2, tidyr, lubridate, stringr, rio, bench, sf, cleaner, DT, leaflet, leaflet.extras, esquisse, tseries, forecast, skimr, tsibble, epicontacts, distcrete, epitrix, EpiEstim, projections, magrittr, binom, ape, outbreaker2, knitr, broom, ggridges, scales, psych, plotly, DescTools, effectsize) #if (!require("pacman")) install.packages("pacman"); pacman::p_load(caret,qgraph,webshot) library(effsize) #install.packages("effsize") #install.packages("gplots") #install.packages(c( "naniar", "corrplot")) #install.packages("readr") library(readr) library(gridExtra) # Instalar el paquete rcompanion si no está instalado if (!requireNamespace("rcompanion", quietly = TRUE)) { install.packages("rcompanion") } # Cargar el paquete library(rcompanion) # Mostrar la tabla con flextable library(flextable) library(incidence) #core functions library(easypackages)#recargar varias librerias en una sola linea paquetes <- c("FactoMineR", "tidyverse", "factoextra", "haven", "naniar", "corrplot") libraries(paquetes) #Sys.setlocale("LC_ALL", "en_US.UTF-8") #Sys.setenv(LANG = "spa") library(janitor)#para los comandos tably library(readxl) library(dplyr) library(gplots) library(FactoMineR)#GRAFICAR ACM library(factoextra) library(kableExtra)#PARA DECORAR LAS TABLAS library(apyramid)#PARA REALIZAR PIRAMIDE library(ggplot2)#PARA DIBUJAR library(readxl)#PARA LEER EXCEL library(dplyr) # manipulacion de dato library(janitor)#para los comandos tably library(factoextra) library(gplots)#graficos #if (!require("pacman")) install.packages("pacman") # cargar paquetes necesarios pacman::p_load( tidyr, dplyr, lubridate, stringr, rio, bench, janitor, # reclin2, sf, readr, cleaner, #plotly, DT, leaflet, leaflet.extras, esquisse., tseries, tidyverse, janitor, forecast, skimr, tsibble) #install.packages("epicontacts") library(epicontacts) #install.packages("distcrete") library(distcrete) #install.packages("epitrix") library(epitrix) #install.packages("EpiEstim") library(EpiEstim) library(projections) library(magrittr) library(binom) library(ape) library(outbreaker2) library(knitr) library(broom) library(ggridges) library(scales) #install.packages("ggridges") library(dplyr) library(tidyr) library(psych) library(PerformanceAnalytics) library(corrplot) library(plotly) library(flextable) library(dplyr) #install.packages(c("dplyr", "tidyverse", "readxl", "DescTools", "effectsize")) library(dplyr) library(readxl) library(DescTools) library(effectsize) library(readxl) library(dplyr) ``` Tenemos los siguientes datos: ```{r, include=FALSE, message=FALSE, warning=FALSE} Player_futbol <- read_excel("D:/nocturno/temporales/estadistica del deporte/futbol femenino/RESULTADOS_FINALES_SELECCION_BOGOTA_FEMENINA_DE_FUTBOL_(2)_l_Consolid_en.xlsx") #View(Player_performance_5_ligas) #colnames(Player_performance_5_ligas) #head(Player_performance_5_ligas[1]) #Factorizamos cada variable Player_futbol_ <- Player_futbol %>% mutate( # Conversión de tipos de datos across(1:4, as.factor),across(5, as.Date.numeric),across(6:96, as.numeric)) Player_futbol_$Category<-factor(Player_futbol_$Category,levels=levels(Player_futbol_$Category),labels = c("U13","U15","U17")) ``` ```{r, echo=FALSE, message=FALSE, warning=FALSE} datatable( Player_futbol_, rownames = FALSE, extensions = c("Buttons", "Scroller"), options = list( scrollX = TRUE, scrollY = "500px", scroller = TRUE, dom = "Bfrtip", buttons = c("copy", "csv", "excel") ) ) ``` para las posiciones tenemos la siguiente distribucion: ```{r, echo=FALSE} calc_ci <- function(count, total, conf.level = 0.95) { prop <- count / total z <- qnorm(1 - (1 - conf.level) / 2) se <- sqrt(prop * (1 - prop) / total) lower <- prop - z * se upper <- prop + z * se return(c(percent(lower, accuracy = 0.01), percent(upper, accuracy = 0.01))) } # Crear la tabla de frecuencias y formatear los porcentajes table <- Player_futbol_ %>% tabyl(Category)%>%adorn_pct_formatting() # Asignar nombres uniformes a las columnas colnames(table) <- c("Category", "Frec", "Percent") # Calcular intervalos de confianza para las proporciones total <- sum(table$Frec) ci <- t(apply(table[, 2, drop = FALSE], 1, function(x) calc_ci(x, total))) colnames(ci) <- c("CI_Lower", "CI_Upper") table <- cbind(table, ci) datatable( table, rownames = FALSE, extensions = c("Buttons", "Scroller"), options = list( scrollX = TRUE, scrollY = "500px", scroller = TRUE, dom = "Bfrtip", buttons = c("copy", "csv", "excel") ) ) ggplot(table, aes(x = Category, y = Frec, fill = Category)) + geom_bar(stat = "identity") + # Mostrar eje y como porcentajes labs(title = "Distribution of Categorys", x = "Category", y = "Frecuency") + theme_minimal() + theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1)) tablechi <- table(Player_futbol_[, 2]) # Prueba chi cuadrado chi <- chisq.test(tablechi) chi ``` Se realizo prueba chi cuadrado para verificar diferencias significativas en las frecuencias de categorias, con un valor p mayorr que 0.05, lo que indica no que existen diferencias. A chi-square test was performed to verify significant differences in category frequencies, resulting in a p-value greater than 0.05, which indicates that no significant differences were found. ## Tratamiento de datos (A fin de evitar superposiciones en los graficos finales, se realizaron los procesos de correlacion de kendall y pruebas de normalidad Shapiro wilk para determinar alta correlacion y normalidad respectivamente, realizandolo grupo a gupo con tres categorias principales, categoria infantil, prejuvenil y juvenil, de la categoria de variables de ataque, se presentan a continuacion las variables depuradas In order to avoid overlapping in the final graphs, Kendall correlation and Shapiro-Wilk normality tests were performed to determine high correlation and normality respectively, conducting them group by group across three main categories: children, pre-youth, and youth categories. The refined variables from the attack variable category are presented below. ```{r, echo=FALSE, message=FALSE} resultados_variables_list<-list() tabla_final_lista <-list() variables <- c("name", "Category", "Sex", "Somatotype", "Date of birth", "Body mass (kg)...6", "Height (cm)", "Sitting height (cm)", "Cormic index", "Arm span (cm)", "Triceps skinfold (mm)", "Subscapular skinfold (mm)", "Biceps skinfold (mm)", "Iliac crest skinfold (mm)", "Supraspinal skinfold (mm)", "Abdominal skinfold (mm)", "Thigh skinfold (mm)", "Calf skinfold (mm)", "Biceps circumference (cm)", "Contracted biceps circumference (cm)", "Waist circumference (cm)", "Hip circumference (cm)", "Thigh circumference (cm)", "Calf circumference (cm)", "Humerus diameter (mm)", "Bi-styloid diameter (mm)", "Femur diameter (mm)", "Body mass (kg)", "BMI", "Fat percentage", "Muscle percentage", "Basal kcal", "Biological age", "Visceral fat", "Lever (cm)", "90Â° squat (cm)", "Leg length (cm)", "Muscle mass %", "Muscle mass kg", "Bone mass %", "Bone mass kg", "Residual mass %", "Residual mass kg", "Adipose mass %", "Adipose mass kg", "Faulkner fat mass %", "Faulkner fat mass kg", "Carter fat mass %", "Carter fat mass kg", "Waist/hip ratio", "Waist/height ratio", "Jump height (cm)", "Flight time (ms)", "Force (N)", "Velocity (m/s)", "Power (W)", "Jump height (cm)", "Flight time (ms)", "Force (N)", "Velocity (m/s)", "Power (W)", "Jump height (cm)", "Flight time (ms)", "Force (N)", "Velocity (m/s)", "Power (W)", "Split 1 - 5 meters", "Split 2 - 10 meters", "Split 3 - 15 meters", "Total", "Total time (s)", "Average speed (km/h)", "Contact time", "10 m", "COD deficit", "Total time (s)", "Average speed (km/h)", "Contact time (ms)", "Torque (Nm)", "Angle (Â°)", "Time 1 (s)", "Power 1 (W)", "Time 2 (s)", "Power 2 (W)", "Time 3 (s)", "Power 3 (W)", "Time 4 (s)", "Power 4 (W)", "Time 5 (s)", "Power 5 (W)", "Time 6 (s)", "Power 6 (W)", "Maximum power (W)", "Minimum power (W)", "Average power (W)", "Fatigue index") basico <- c("name", "Category", "Sex", "Date of birth") antropometry <- c(variables[4],variables[6:51]) antropometry[2]<-c("Body mass (kg)_antr") #antropometry[47] #variables[52] Force_jum_Sj<- variables[52:56] Force_jum_Sj<-paste(Force_jum_Sj,rep("sJ",length(Force_jum_Sj))) #Force_jum_Sj<-paste(Force_jum_Sj,rep("js",length(Force_jum_Sj))) #<-c("Flight time (ms) j_s") Force_jum_CMJB<- variables[57:66] Force_jum_CMJB<- paste(Force_jum_CMJB,rep("CMJB",length(Force_jum_CMJB))) Force_jum_CMJB[1]<-c("Jump height (cm) CMJB_") Force_jum_CMJB[2]<-c("Flight time (ms) CMJB_") Force_jum_CMJB[3]<-c("Force (N) CMJB_") #Force_jum_CMJB[4]<-c("Velocity (m/s) CMJB_") Force_jum_CMJB[9]<-c("Velocity (m/s) CMJB_") Force_jum_CMJB[10]<-c("Power (W) CMJB_") velocity<- variables[67:70] test_5_0_5<-variables[71:75] test_5_0_5[1]<-c("Total time (s)_") test_5_0_5[2]<-c("Average speed (km/h)_") test_change<-variables[76:78] HAMSTRING_STRENGTH<-variables[79:80] Potencia<-variables[81:96] variablesnames<-c(basico[1:3],antropometry[1],basico[4],antropometry[2:47],Force_jum_Sj,Force_jum_CMJB,velocity,test_5_0_5,test_change,HAMSTRING_STRENGTH,Potencia) colnames(Player_futbol_)<-variablesnames #colnames(Player_futbol_) #Force_jum_CMJB # Crear una lista con las categorías lista_categorias <- list( basico,antropometry[2:47],Force_jum_Sj,Force_jum_CMJB,velocity,test_5_0_5,test_change,HAMSTRING_STRENGTH,Potencia) guardar_datos_baja_cor_diferenciados<-list() num_categoria<-2 for (num_categoria in 2:9) { sublista_ <- lista_categorias[[num_categoria]] # Usamos [[1]] para extraer el elemento, no [1] que devuelve una lista #sublista_ # Seleccionar elementos según los números #columnas_deseadas <- c(lista_categorias[1][num_categoria:length(Estadisticas_partidos)],lista_categorias$disparo[num_categoria+1:(length(Estadisticas_disparo)-1)]) columnas_deseadas <- c(sublista_) #which(colnames(Player_futbol_) %in% columnas_deseadas) #lista_categorias[1] #columnas_deseadas #colnames(Player_performance_5_ligas) #indices_columnas # Obtener los índices de las columnas indices_columnas <- which(colnames(Player_futbol_) %in% columnas_deseadas) #indices_columnas # Extraer las columnas usando los índices #analisis_pca1<-Player_performance_5_ligas_[, indices_columnas] #anexo_mas################################################################ analisis_pca1_2<-cbind.data.frame(Player_futbol_[2],Player_futbol_[6:96]) analisis_pca1_2<-cbind.data.frame(Player_futbol_[2],Player_futbol_[indices_columnas]) # Cargar librerías necesarias library(dplyr) # Asumimos que analisis_pca es tu base de datos # Seleccionamos solo las columnas numéricas (excluimos Pos si es categórica) numeric_vars <- analisis_pca1_2 %>% select_if(is.numeric) #colnames(numeric_vars) # Realizamos la prueba de Shapiro-Wilk para cada variable numérica normality_tests <- sapply(numeric_vars, function(x) { if (sum(!is.na(x)) > 3) { # Shapiro-Wilk requiere al menos 4 observaciones no NA shapiro.test(x)$p.value } else { NA # Si hay menos de 4 observaciones, devolvemos NA } }) #normality_tests # Creamos un dataframe con los resultados de normalidad normality_results <- data.frame( Variable = names(numeric_vars), P_Valor = normality_tests, Cumple_Normalidad = ifelse(normality_tests > 0.05, "Cumple", "No Cumple"), stringsAsFactors = FALSE ) # Mostramos los resultados de normalidad con la interpretación #print(normality_results) # Separamos las variables en dos subbases # Variables que cumplen el supuesto de normalidad vars_cumplen <- normality_results %>% filter(Cumple_Normalidad == "Cumple") %>% pull(Variable) #vars_cumplen # Variables que NO cumplen el supuesto de normalidad vars_no_cumplen <- normality_results %>% filter(Cumple_Normalidad == "No Cumple") %>% pull(Variable) #vars_no_cumplen # Creamos las subbases # Subbase con variables que cumplen el supuesto (incluye la columna Pos si existe) if (length(vars_cumplen) > 0) { analisis_pca_cumple <- analisis_pca1_2 %>% select(any_of(c("Category", vars_cumplen))) } else { analisis_pca_cumple <- data.frame(Category = analisis_pca1_2$Category) } # Subbase con variables que NO cumplen el supuesto (incluye la columna Pos si existe) if (length(vars_no_cumplen) > 0) { analisis_pca_no_cumple <- analisis_pca1_2 %>% select(any_of(c("Cat", vars_no_cumplen))) } else { analisis_pca_no_cumple <- data.frame(Cat = analisis_pca1_2$Category) # Si no hay variables, solo mantenemos Pos } # Mostramos las subbases #print("Subbase con variables que cumplen el supuesto de normalidad:") #print(head(analisis_pca_cumple)) #print("Subbase con variables que NO cumplen el supuesto de normalidad:") #print(head(analisis_pca_no_cumple)) #analisis_pca1_2[,2:length(colnames(analisis_pca1_2))] data_corr<-analisis_pca1_2[,2:length(colnames(analisis_pca1_2))] # O eliminar filas con NA antes de calcular data_complete <- na.omit(data_corr) #cor(data_complete, method = "kendall") matriz_cor <- cor(data_complete, method = "kendall") #matriz_cor[1:2,1:3] # Crear datos simulados set.seed(123) # Aplicar el código anterior umbral <- 0.8 library(tidyverse) cor_df <- as.data.frame(as.table(matriz_cor)) %>% filter(Var1 != Var2) %>% mutate(abs_cor = abs(Freq)) %>% filter(abs_cor > umbral) %>% select(Var1, Var2, abs_cor) vars_alta_cor <- unique(c(as.character(cor_df$Var1), as.character(cor_df$Var2))) vars_alta_cor datos_baja_cor1 <- analisis_pca1_2[, !colnames(analisis_pca1_2) %in% vars_alta_cor] # Grupo de baja guardar_datos_baja_cor_diferenciados[[num_categoria]]<-datos_baja_cor1[2:length(colnames(datos_baja_cor1))] tabla_final_lista <- guardar_datos_baja_cor_diferenciados[[num_categoria]] } combined_table <- do.call(cbind, guardar_datos_baja_cor_diferenciados[2:9]) data_corr_f<-combined_table[,2:length(colnames(combined_table))] data_corr_f <- na.omit(data_corr_f) matriz_cor_f <- cor(data_corr_f, method = "kendall") # Crear datos simulados set.seed(123) umbral <- 0.8 library(tidyverse) cor_df_f <- as.data.frame(as.table(matriz_cor_f)) %>% filter(Var1 != Var2) %>% mutate(abs_cor = abs(Freq)) %>% filter(abs_cor > umbral) %>% select(Var1, Var2, abs_cor) vars_alta_cor_f <- unique(c(as.character(cor_df_f$Var1), as.character(cor_df_f$Var2))) #vars_alta_cor_f datos_baja_cor1 <- combined_table[, !colnames(combined_table) %in% vars_alta_cor] # Grupo de baja #length(colnames(datos_baja_cor1)) variables_categorias_df <- data.frame(Variable = character(), Categoria = character(), stringsAsFactors = FALSE) #variables_categorias_df # Iterar sobre las variables que quedaron en datos_baja_cor1 #colnames(datos_baja_cor1) #var_numero<-63 #colnames(datos_baja_cor1)[var_numero] #lista_categorias#[[9]][1] #var<-colnames(datos_baja_cor1)[var_numero] #categoria<-1 variables_categorias_df <- data.frame(Variable = character(), Categoria = character(), stringsAsFactors = FALSE) variables_categorias_df <- data.frame(Variable = character(), Categoria = character(), stringsAsFactors = FALSE) for (var in colnames(datos_baja_cor1)) { for (categoria_index in 1:length(lista_categorias)) { if (var %in% lista_categorias[[categoria_index]]) { variables_categorias_df <- rbind(variables_categorias_df, data.frame(Variable = var, Categoria = as.character(categoria_index), stringsAsFactors = FALSE)) break } } } # Calcular la suma de correlaciones absolutas para cada variable en matriz_cor_f cor_sums <- rowSums(abs(matriz_cor_f), na.rm = TRUE) # Crear un dataframe con las variables y su puntaje de correlación cor_importance <- data.frame( Variable = names(cor_sums), Cor_Sum = cor_sums, stringsAsFactors = FALSE ) variables_importance<- cor_importance %>% filter(Variable %in% variables_categorias_df$Variable) %>% arrange(Cor_Sum) %>% # Ordenar por puntaje de correlación descendente slice_head(n = 20) # Seleccionar las 10 principales variables_importance<-variables_importance%>%filter(Variable!="Split 2 - 10 meters"&Variable!="10 m"&Variable!="Split 3 - 15 meters"&Variable!="90Â° squat (cm)") analisis_pca1_gen<-cbind.data.frame(Player_futbol_[2],Player_futbol_[6:length(colnames(Player_futbol_))]) pheatmap( matriz_cor_f, cluster_rows = FALSE, # Desactiva clustering/dendrograma en filas cluster_cols = FALSE, fontsize_row = 4, # Tamaño etiquetas filas fontsize_col = 4, # Tamaño etiquetas columnas display_numbers = FALSE, fontsize_number = 2, # Mostrar valores de correlación number_format = "%.2f", # Formato de 2 decimales number_color = "black", na_col = "white", # Color para celdas NA (triángulo superior) main = "General correlation matrix" ) datatable(variables_importance, caption = "Importance Variables") ``` ## Analisis de Componentes principales Realizamos entonces un analisis para las variables por categoria: ```{r, echo=FALSE} analisis_pca0<-cbind.data.frame(Player_futbol_[2],Player_futbol_[6:length(colnames(Player_futbol_))]) #colnames(analisis_pca0) columnas_deseadas <- c(unique(variables_importance$Variable)) # Obtener los índices de las columnas indices_columnas <- which(colnames(Player_futbol_) %in% columnas_deseadas) #indices_columnas # Extraer las columnas usando los índices analisis_pca<-cbind(analisis_pca0[1],datos_baja_cor1[, c(variables_importance$Variable)]) ##################################### ####################################3 #analisis_pca[2:11] analisis_pca<- na.omit(analisis_pca) result <- PCA(analisis_pca[2:11],graph=FALSE,scale.unit = FALSE) #result$var #plot(result,choix="var") analisis_pca_M<-as.matrix(analisis_pca[2:length(colnames(analisis_pca))]) brand.sc <- analisis_pca brand.sc [ , 2:length(colnames(analisis_pca))] <- data.frame(scale(analisis_pca [ , 2:length(colnames(analisis_pca))])) #summary(brand.sc) corrplot::corrplot(cor(brand.sc [ , 2:length(colnames(analisis_pca))], method = "kendall") , order="hclust") brand.mean <- aggregate (. ~ Category , data=brand.sc , mean) #brand.mean rownames(brand.mean) <- brand.mean [ , 1] # la marca como nombre de filas brand.mean <- brand.mean [ , -1] # eliminamos la columna de marca #brand.mean par(mar=c(1,1,1,1)) pheatmap( as.matrix(brand.mean), cluster_rows = FALSE, # Desactiva clustering/dendrograma en filas cluster_cols = FALSE, fontsize_row = 4, # Tamaño etiquetas filas fontsize_col = 4, # Tamaño etiquetas columnas display_numbers = TRUE, fontsize_number = 2, # Mostrar valores de correlación number_format = "%.2f", # Formato de 2 decimales number_color = "black", na_col = "white", # Color para celdas NA (triángulo superior) main = "Correlación Matrix" ) ``` Realizamos el analisis de componentes principales para establecer la relacion entre la categoria y las distintas variables. Mostramos a continucaion la matriz rotación, que nos indica cuánto contribuye cada variable original a cada componente principal. Los valores (cargas) varían de -1 a 1, donde valores absolutos más grandes indican una mayor contribución de la variable al componente. We performed a principal component analysis to establish the relationship between the category and the different variables. Below we show the rotation matrix, which indicates how much each original variable contributes to each principal component. The values (loadings) range from -1 to 1, where larger absolute values indicate a greater contribution of the variable to the component. ```{r, echo=FALSE, include=FALSE} # Normalizar las variables numéricas data_pca_scaled <- scale(select(analisis_pca, -Category)) # Aplicar PCA pca_result <- prcomp(data_pca_scaled, center = TRUE, scale. = TRUE) #tabla de rotacionales pca_result$rotation[,1:2] datatable( pca_result$rotation[,1:2], rownames = FALSE, extensions = c("Buttons", "Scroller"), options = list( scrollX = TRUE, scrollY = "500px", scroller = TRUE, dom = "Bfrtip", buttons = c("copy", "csv", "excel") ) ) brand.pc <- prcomp(brand.sc [ , 2:length(colnames(analisis_pca))]) #summary(brand.pc) par(mar=c(1,1,1,1)) plot(brand.pc , type="l") #brand.mean<-brand.mean[1:10,1:10] brand.mu.pc <- prcomp(brand.mean , scale=TRUE) #summary(brand.mu.pc) #brand.mu.pc$center par(mar=c(1,1,1,1)) biplot(brand.mu.pc , main="Player position" , cex=c(0.45 , 0.45)) #summary(brand.mu.pc) # Normalizar las variables numéricas data_pca_scaled <- scale(select(analisis_pca, -Category)) # Aplicar PCA pca_result <- prcomp(data_pca_scaled, center = TRUE, scale. = TRUE) #tabla de rotacionales pca_result$rotation[,1:2] datatable( pca_result$rotation[,1:2], rownames = FALSE, extensions = c("Buttons", "Scroller"), options = list( scrollX = TRUE, scrollY = "500px", scroller = TRUE, dom = "Bfrtip", buttons = c("copy", "csv", "excel") ) ) par(mar=c(1,1,1,1)) plot(brand.pc , type="l") # Convertir los resultados del PCA en un dataframe para graficar pca_df <- as.data.frame(pca_result$x) #pca_df pca_df$cat <- analisis_pca$Category # Agregar la posición del jugador # Graficar PCA con ggplot2 library(ggplot2) pca_pos<-ggplot(pca_df, aes(x = PC1, y = PC2, color = cat)) + geom_point(alpha = 0.7) + theme_minimal() + labs(title = "PCA for category", x = "Principal Component 1(Times)", y = "Principal Component 2(Anthropometric)") pca_pos #Los porteros tienden a estar más dispersos hacia PC1 positivo (valores entre 5 y 10). Esto indica que los porteros tienen características que los diferencian notablemente de otras posiciones en las variables que contribuyen a PC1. #Delanteros (FW, verde): Los delanteros también están más dispersos hacia PC1 positivo, pero menos que los porteros. Esto sugiere que comparten algunas características con los porteros en PC1, pero son menos extremos # Graficar un Biplot para ver la contribución de cada variable #par(mar=c(1,1,1,1)) #biplot(pca_result, scale = 0) fviz_pca_var(pca_result, col.var = "cos2", geom.var = "arrow", labelsize = 2, repel = FALSE) par(mar=c(1,1,1,1)) #biplot(pca_result, scale = 0, cex = 0.5, col = c("dodgerblue3", "deeppink3")) ``` ```{r, echo=FALSE} pca_result$rotation[,1:2] summary(brand.mu.pc) ``` Interpretación de PC1 y PC2 La primera dimension PC1 explica el 63% de la varianza total de los datos, es decir es la componente más importante. PC2 explica el 37% de la varianza total. Juntas, PC1 y PC2 capturan el 100% de la varianza de los datos (según la proporción acumulada), lo que indica que estas dos componentes resumen completamente la información de las variables originales en este caso. PC1: Componente de Rendimiento de Movimiento o Eficiencia, está dominada por variables relacionadas con el rendimiento físico, la velocidad y el tiempo de movimiento. Las cargas más altas (en valor absoluto) en PC1 son: Total time (s): 0.458 (carga positiva alta) COD deficit: 0.420 (carga positiva alta) Contact time: 0.402 (carga positiva alta) Average speed (km/h): -0.398 (carga negativa alta) Los valores altos de PC1 con cargas positivas ( tiempo total, déficit de cambio de dirección y tiempo de contacto) se asocian con un mayor tiempo es decir mas ineficiencia en el movimiento(más tiempo en completar tareas, mayor déficit en cambios de dirección). La variable Average speed tiene una carga negativa, inversa, es decir a mayor PC1, menor velocidad promedio. PC2: Componente de Tamaño Corporal o Dimensiones Antropométricas está dominada por variables relacionadas con la morfología corporal, como la longitud de las extremidades y la masa residual. Las cargas más altas en PC2 son: Residual mass kg: 0.491 (carga positiva alta) Leg length (cm): 0.423 (carga positiva alta) Residual mass %: 0.399 (carga positiva alta) Arm span (cm): 0.291 (carga positiva moderada) Las cargas positivas en variables como longitud de la pierna, masa residual (en kg y porcentaje) y envergadura del brazo sugieren que PC2 representa el tamaño corporal o dimensiones antropométricas se asocian con jugadoras de piernas más largas, mayor masa residual y envergadura mayor. Interpretation of PC1 and PC2 The first dimension, PC1, explains 63% of the total variance in the data, making it the most important component. PC2 explains 37% of the total variance. Together, PC1 and PC2 capture 100% of the data variance (according to the cumulative proportion), indicating that these two components completely summarize the information from the original variables in this case. PC1: Movement Performance or Efficiency Component This component is dominated by variables related to physical performance, speed, and movement time. The highest loadings (in absolute value) on PC1 are: Total time (s): 0.458 (high positive loading) COD deficit: 0.420 (high positive loading) Contact time: 0.402 (high positive loading) Average speed (km/h): -0.398 (high negative loading) Interpretation: High values of PC1 with positive loadings (total time, change of direction deficit, contact time) are associated with longer durations, indicating greater inefficiency in movement (more time to complete tasks, higher deficit in direction changes). The Average speed variable has a negative (inverse) loading, meaning that higher PC1 values correspond to lower average speed. PC2: Body Size or Anthropometric Dimensions Component This component is dominated by variables related to body morphology, such as limb length and residual mass. The highest loadings on PC2 are: Residual mass kg: 0.491 (high positive loading) Leg length (cm): 0.423 (high positive loading) Residual mass %: 0.399 (high positive loading) Arm span (cm): 0.291 (moderate positive loading) Interpretation: Positive loadings on variables like leg length, residual mass (in kg and percentage), and arm span suggest that PC2 represents body size or anthropometric dimensions. These are associated with players who have longer legs, greater residual mass, and larger arm span. ```{r, echo=FALSE} par(mar=c(1,1,1,1)) plot(brand.pc , type="l") ``` La grafica de brand, nos indica que la varianza de la primea componente es de 63% y la segunda de 37%. The scree plot indicates that the variance of the first component is 63% and the second is 37%. ```{r, echo=FALSE} pca_pos ``` El PCA por categoría indica que en la categoría U13 se observan valores de puntos más bajos en el eje antropométrico (Y) y tiempos bajos en el eje de rendimiento (X). En la categoría U17, estos valores son generalmente más altos en el eje antropométrico y presentan tiempos mayores (lo que indica menor velocidad). Finalmente, en la categoría U15 se registran los valores con los niveles antropométricos más altos, pero acompañados de mayores tiempos y, en consecuencia, menor velocidad en la medición de dichas variables. The PCA by category shows that the U13 category exhibits lower scores on the anthropometric axis (Y) and shorter times on the performance axis (X). In the U17 category, these values are generally higher on the anthropometric axis and show longer times (indicating lower speed). Finally, the U15 category records the highest anthropometric levels, but these are accompanied by longer times and consequently lower speed in the measurement of these variables. ```{r, echo=FALSE} biplot(brand.mu.pc , main="Player category" , cex=c(0.5 , 0.3)) ``` El gráfico de biplot indica que la variable Cormic index, cuando presenta valores antropométricos más altos, se asocia con mayores tiempos. La categoría U17 también muestra este comportamiento. La categoría con mejor rendimiento en este caso es la U13, ya que sus tiempos son bajos, aunque su nivel antropométrico es alto, la categoria U15, tiene los niveles antropometricos mas bajos, aunque tiene mejores tiempos que la categoria U17. The biplot graph indicates that the variable Cormic index, when showing higher anthropometric values, is associated with longer times. The U17 category also exhibits this behavior. The category with the best performance in this case is U13, as their times are short despite their high anthropometric level. The U15 category has the lowest anthropometric levels, yet shows better times than the U17 category. ```{r, echo=FALSE} fviz_pca_var(pca_result, col.var = "cos2", geom.var = "arrow", labelsize = 2, repel = FALSE) ``` 1. Ejes y Varianza Explicada Dim1 (Eje X): Explica el 22.5% de la varianza total Dim2 (Eje Y): Explica el 13.2% de la varianza total Total: Ambos ejes explican 35.7% de la varianza total 2. Flechas/Vectores (Variables) Cada flecha representa una variable del análisis. Su posición e dirección indican: Dirección: Variables que apuntan en direcciones similares están positivamente correlacionadas Direcciones opuestas: Variables con flechas en sentido contrario están negativamente correlacionadas Ángulo de 90°: Variables no correlacionadas 3. Longitud de las Flechas Flechas largas: Variables bien representadas en este plano 2D Flechas cortas: Variables pobremente representadas en estas dos dimensiones 4. Colores (Cos2 - Calidad de Representación) La escala de color indica qué tan bien está representada cada variable: Interpretación Práctica: Relaciones entre Variables: Las variables que forman grupos compactos con flechas cercanas están altamente correlacionadas Las variables en cuadrantes opuestos tienen correlación negativa Las variables perpendiculares no están relacionadas Ejemplo de Interpretación: Si vieras que: Variables de "peso" y "altura" están juntas en el cuadrante superior derecho → Están correlacionadas positivamente Variables de "velocidad" en el cuadrante opuesto → Están negativamente correlacionadas con peso/altura Limitaciones: Solo representa 35.7% de la varianza total, por lo que puede haber información importante en otras dimensiones no mostradas 1. Axes and Explained Variance Dim1 (X-axis): Explains 22.5% of the total variance Dim2 (Y-axis): Explains 13.2% of the total variance Total: Both axes together explain 35.7% of the total variance 2. Arrows/Vectors (Variables) Each arrow represents a variable in the analysis. Their position and direction indicate: Direction: Variables pointing in similar directions are positively correlated Opposite directions: Variables with arrows pointing in opposite directions are negatively correlated 90° angle: Variables are not correlated 3. Arrow Length Long arrows: Variables well represented in this 2D plane Short arrows: Variables poorly represented in these two dimensions 4. Colors (Cos² - Quality of Representation) The color scale indicates how well each variable is represented: Practical Interpretation: Relationships Between Variables: Variables forming compact groups with nearby arrows are highly correlated Variables in opposite quadrants have negative correlation Perpendicular variables are not related Interpretation Example: If you observe that: "Weight" and "height" variables are grouped together in the upper right quadrant → They are positively correlated "Speed" variables in the opposite quadrant → They are negatively correlated with weight/height Limitations: The plot represents only 35.7% of the total variance, so there may be important information in other dimensions not shown Key Refinements: Structure: Organized with clear headings and bullet points for better readability Terminology: Used standard statistical terms ("positively correlated", "explained variance") Clarity: Maintained parallel structure in lists and consistent formatting Precision: Kept technical accuracy while making the content accessible Completeness: Ensured all original concepts were preserved and clearly expressed This version maintains all the original analytical content while presenting it in a clear, professional English format suitable for academic or technical reporting. ## Analisis descriptivo Se realizó analisis ANOVA para cada una de las variables, comparando entre las tres categorias, se midio el efecto por medio del parametro Eta cuadrado An ANOVA was performed for each variable, comparing the three categories, with the effect size measured using the Eta squared parameter. ```{r, include=FALSE} # Función para calcular el intervalo de confianza del 95%, ignorando valores NA calcular_ic <- function(x) { # Calcular la longitud de x sin contar los NA n <- sum(!is.na(x)) # Calcular la desviación estándar y la media sin NA error_est <- sd(x, na.rm = TRUE) / sqrt(n) media <- mean(x, na.rm = TRUE) # Calcular el margen de error margen_error <- qnorm(0.975) * error_est # Redondear la media y el margen de error a dos decimales media <- round(media, 2) margen_error <- round(margen_error, 2) # Devolver el resultado en formato "media ± margen_error" return(paste(media, "±", margen_error)) } #calcular_ic(Player_futbol_$`Body mass (kg)_antr`) tabla_resultados <- data.frame(Variable = colnames(Player_futbol_)[7:96]) #45-31 # tests<-c(rep("antropometry",length(antropometry)-2), rep("Force_jum_Sj",length(Force_jum_Sj)),rep("Force_jum_CMJB",length(Force_jum_CMJB)),rep("velocity",length(velocity)),rep("test_5_0_5",length(test_5_0_5)),rep("test_change",length(test_change)),rep("HAMSTRING_STRENGTH",length(HAMSTRING_STRENGTH)),rep("Potency",length(Potencia))) tabla_resultados<-cbind.data.frame(tests,tabla_resultados) results_list<-list() # for (i in 7:95) { #i<-7 media_por_categoria <- Player_futbol_ %>% group_by(Category) %>% summarise(media = calcular_ic(pull(cur_data(), i))) # Definir el índice de la columna nombre_columna <- names(Player_futbol_)[i] anova_result <- aov(Player_futbol_[[nombre_columna]] ~ Category, data = Player_futbol_) # Resumen del ANOVA anova_resum<-summary(anova_result) eta_squared <- eta_squared(anova_result) # Extraer información del ANOVA f_value <- summary(anova_result)[[1]]$`F value`[1] df_between <- summary(anova_result)[[1]]$Df[1] df_residual <- summary(anova_result)[[1]]$Df[2] p_value <- summary(anova_result)[[1]]$`Pr(>F)`[1] # Calcular eta squared eta_sq <- eta_squared$Eta2 # Generar interpretación en inglés if (p_value < 0.05) { interpretation <- sprintf( "The ANOVA revealed statistically significant differences in %s between categories (F(%d,%d) = %.3f, p = %.3f).", nombre_columna, df_between, df_residual, f_value, p_value ) } else { interpretation <- sprintf( "The ANOVA did not reveal statistically significant differences in %s between categories (F(%d,%d) = %.3f, p = %.3f).", nombre_columna, df_between, df_residual, f_value, p_value ) } # Función para interpretar eta cuadrado interpret_eta_sq <- function(eta_sq) { if (eta_sq < 0.01) { magnitude <- "negligible" } else if (eta_sq < 0.06) { magnitude <- "small" } else if (eta_sq < 0.14) { magnitude <- "moderate" } else { magnitude <- "large" } return(magnitude) } # Luego, en tu código: magnitude <- interpret_eta_sq(eta_sq) effect_interpretation <- sprintf( "The effect size was %s (η² = %.3f), indicating that %.1f%% of the variability in %s is explained by category.", magnitude, eta_sq, eta_sq * 100, nombre_columna ) #nombre_columna, resultados<-c(media_por_categoria$media[1],media_por_categoria$media[2],media_por_categoria$media[3],round(f_value,3),round(p_value,3),round(eta_sq,3),effect_interpretation) #resultados #tabla_resultados[i-5, 2] results_list[[tabla_resultados[i-6, 2]]] <- resultados } ############################################ i<-96 #Player_futbol_$`Fatigue index` media_por_categoria <- Player_futbol_ %>% group_by(Category) %>% summarise(media = calcular_ic(`Fatigue index`)) #media_por_categoria # Definir el índice de la columna nombre_columna <- names(Player_futbol_)[i] anova_result <- aov(Player_futbol_[[nombre_columna]] ~ Category, data = Player_futbol_) # Resumen del ANOVA anova_resum<-summary(anova_result) eta_squared <- eta_squared(anova_result) # Extraer información del ANOVA f_value <- summary(anova_result)[[1]]$`F value`[1] df_between <- summary(anova_result)[[1]]$Df[1] df_residual <- summary(anova_result)[[1]]$Df[2] p_value <- summary(anova_result)[[1]]$`Pr(>F)`[1] # Calcular eta squared eta_sq <- eta_squared$Eta2 # Generar interpretación en inglés if (p_value < 0.05) { interpretation <- sprintf( "The ANOVA revealed statistically significant differences in %s between categories (F(%d,%d) = %.3f, p = %.3f).", nombre_columna, df_between, df_residual, f_value, p_value ) } else { interpretation <- sprintf( "The ANOVA did not reveal statistically significant differences in %s between categories (F(%d,%d) = %.3f, p = %.3f).", nombre_columna, df_between, df_residual, f_value, p_value ) } # Función para interpretar eta cuadrado interpret_eta_sq <- function(eta_sq) { if (eta_sq < 0.01) { magnitude <- "negligible" } else if (eta_sq < 0.06) { magnitude <- "small" } else if (eta_sq < 0.14) { magnitude <- "moderate" } else { magnitude <- "large" } return(magnitude) } # Luego, en tu código: magnitude <- interpret_eta_sq(eta_sq) effect_interpretation <- sprintf( "The effect size was %s (η² = %.3f), indicating that %.1f%% of the variability in %s is explained by category.", magnitude, eta_sq, eta_sq * 100, nombre_columna ) #nombre_columna, resultadosf<-c("Potency",nombre_columna,media_por_categoria$media[1],media_por_categoria$media[2],media_por_categoria$media[3],round(f_value,3),round(p_value,3),round(eta_sq,3),effect_interpretation) resultadosf ############################ #tabla_resultados[i-5, 2] # Convertimos en data frame combined_table <- do.call(rbind, results_list) # Data frame para exportar combined_table_processed <- data.frame(combined_table) Variablesfull<-row.names(combined_table_processed) length(Variablesfull) combined_table_processed<-cbind.data.frame(tests[1:89],Variablesfull,combined_table_processed) combined_table_processed <-rbind.data.frame(combined_table_processed,resultadosf) colnames(combined_table_processed) <- c("Test","Variable","U13", "U15","U17", "Estadistico F", "p", "Efecto Eta square","Interpretation") ``` ```{r, echo=FALSE} datatable( combined_table_processed, rownames = FALSE, extensions = c("Buttons", "Scroller"), options = list( scrollX = TRUE, scrollY = "500px", scroller = TRUE, dom = "Bfrtip", buttons = c("copy", "csv", "excel") ) ) ``` Los resultados de las distintas pruebas realizadas a las futbolistas según sus categorias se presentan como media y desviación típica (DE). La normalidad y homocedasticidad de los datos se confirmaron mediante la prueba de Shapiro-Wilk, cuyos resultados mostraron que los datos no seguían una distribución normal. Las diferencias entre las distintas pruebas entre atacantes y defensores se analizaron mediante la prueba ANOVA. Se establecieron los siguientes valores p (* p < 0,05). Los tamaños del efecto se obtuvieron mediante el coeficiente Eta cuadrado. La interpretación de $\eta^{2}$: $\eta^{2}$ cercano a 0 indica que no hay diferencia significativa entre los grupos; $\eta^{2}$ cercano a 1 indica que hay diferencia entre las categorias para la variable en cuestion . Luego, para identificar el perfil de cada variable en las tres categorias, se utilizó el análisis de componentes principales (PCA). Las variables fueron escaladas y centradas (puntuación Z). Para definir el parámetro estadístico del PCA, se utilizó el determinante de la matriz de correlación de Kendall. En donde se obtuvo un valor cercano a 0, indicando una alta multicolinealidad y sugieriendo que variables presentan relaciones lineales significativas, con la mayor parte de la variabilidad de los datos concentrada en las dos primeras dimensiones. Se consideraron valores propios > 1 para la extracción de los componentes principales. Se aplicó un método de rotación ortogonal Varimax para identificar la alta correlación de los componentes y garantizar que cada componente principal proporcionara información diferente. Se mantuvo un umbral de 0,5 para cada carga de PC para su interpretación. Se adjuntaron los valores asignados a cada observación de todas las futbolistas y las 90 variables cuantitativas. Todos los análisis se realizaron con el software RStudio R version 4.5.0 (2025-04-11 ucrt) The results of the various tests performed on female soccer players across different categories are presented as mean and standard deviation (SD). Data normality and homoscedasticity were assessed using the Shapiro-Wilk test, which indicated that the data did not follow a normal distribution. Differences in various tests between attackers and defenders were analyzed using ANOVA. The following p-values were established (* p < 0.05). Effect sizes were calculated using the Eta squared coefficient. Interpretation of $\eta^{2}$:$\eta^{2}$ close to 0 indicates no significant difference between groups; η² close to 1 indicates significant differences between categories for the given variable. To identify the profile of each variable across the three categories, principal component analysis (PCA) was employed. Variables were scaled and centered (Z-score). The determinant of the Kendall correlation matrix was used as the statistical parameter for PCA, yielding a value close to 0, indicating high multicollinearity and suggesting significant linear relationships among variables, with most data variability concentrated in the first two dimensions. Eigenvalues > 1 were considered for principal component extraction. A Varimax orthogonal rotation method was applied to identify high component correlations and ensure each principal component provided distinct information. A threshold of 0.5 was maintained for each PC loading for interpretation. Assigned values for all observations of female soccer players and the 90 quantitative variables are included. All analyses were conducted using RStudio software R version 4.5.0 (2025-04-11 ucrt).