Analisis de componentes Ciclomontañismo

Contexto

Tenemos los siguientes datos:

Los datos cuentan con varios NA, lo cual puede ser perjudicial para realizar comparaciones y el análisis de componentes, es por eso que se realiza una imputación con base en el teorema del límite central, el cual nos dice que la mayoría de los datos de una distribución se encuentra en el intervalo centrado en la media y más o menos tres veces la desviación estándar, por ejemplo para la columna talla tenemos 5 valores NA, realizando la imputación mostramos los valores obtenidos.

The data contain several NAs, which can be detrimental for making comparisons and component analysis. This is why imputation is performed based on the central limit theorem, which tells us that most data in a distribution falls within the interval centered on the mean plus or minus three times the standard deviation. For example, for the height column we have 5 NA values; by performing the imputation we show the obtained values.

i<-6
valores_na<-c(which(is.na(CICLOMOUNTAIN_[[i]])))
    media<-mean(CICLOMOUNTAIN_[[i]],na.rm = TRUE)
    sd<-sd(CICLOMOUNTAIN_[[i]],na.rm = TRUE)
    #aleatroios
    set.seed(123)  # Para resultados consistentes
    aleatorios<-runif(length(valores_na), -3, 3)
    signo_aleatorio <- sample(c(-1, 1), size = length(valores_na), replace = TRUE)
    valores_imputados<-c(round(media+signo_aleatorio*aleatorios*sd,2))

        colnames(CICLOMOUNTAIN_[i])

[1] "Sitting Height (cm)_Basic"

       valores_imputados

[1] 92.36 93.97 85.90 95.99

Realizamos el mismo proceso para las demas variables con faltantes, obtenemos entonces la siguiente base

Analisis descriptivo y comparativo

Para este analisis tenemos tres categorias en la variable edad, de 15 a 16 años, de 17 a 19 y de 20 a 25:


    Chi-squared test for given probabilities

data:  tablechi
X-squared = 0.28571, df = 2, p-value = 0.8669

Selecciona una variable:

Selecciona el tipo de gráfico:

Para el comparativo de cada variable se usó prueba de normalidad y de homogeneidad en los datos; aquellas que la cumplían se sometían a prueba ANOVA, con medida de efecto $\eta^{2}$ para las medias y la desviación estándar; aquellas que no cumplían la prueba se compararon a partir de la prueba no paramétrica Kruskal-Wallis, la cual compara medianas. En este caso se presenta entonces la mediana y sus rangos intercuartílicos; el tamaño del efecto se midió por medio del parámetro $e$.

For the comparison of each variable, normality and homogeneity tests were used on the data; those that met the assumptions were subjected to ANOVA test, with effect size $\eta^{2}$ for means and standard deviation; those that did not meet the test assumptions were compared using the non-parametric Kruskal-Wallis test, which compares medians. In this case, the median and its interquartile ranges are presented; the effect size was measured using the parameter $e$.


   15 a 16 años De 17 a 19 años De 20 a 25 años 
              8               7               6

Tratamiento de datos

(A fin de evitar superposiciones en los graficos finales, se realizaron los procesos de correlacion de kendall y pruebas de normalidad Shapiro wilk para determinar alta correlacion y normalidad respectivamente, realizandolo grupo a gupo con tres categorias principales, categoria infantil, prejuvenil y juvenil, de la categoria de variables de ataque, se presentan a continuacion las variables depuradas

In order to avoid overlapping in the final graphs, Kendall correlation and Shapiro-Wilk normality tests were performed to determine high correlation and normality respectively, conducting them group by group across three main categories: children, pre-youth, and youth categories. The refined variables from the attack variable category are presented below.

Analisis de Componentes principales

Realizamos entonces un analisis para las variables por categoria:

Realizamos el analisis de componentes principales para establecer la relacion entre la categoria y las distintas variables.

Mostramos a continucaion la matriz rotación, que nos indica cuánto contribuye cada variable original a cada componente principal. Los valores (cargas) varían de -1 a 1, donde valores absolutos más grandes indican una mayor contribución de la variable al componente.

We performed a principal component analysis to establish the relationship between the category and the different variables.

Below we show the rotation matrix, which indicates how much each original variable contributes to each principal component. The values (loadings) range from -1 to 1, where larger absolute values indicate a greater contribution of the variable to the component.

                                   PC1        PC2
RR (/min)_AT % Pred       -0.536315950  0.1046131
RR (/min)_Warm-up         -0.464262157 -0.3218510
RR (/min)_VT1             -0.511063284  0.1344272
Sitting Height (cm)_Basic  0.003099856 -0.1229076
RR (/min)_AT % max        -0.409221397  0.2641430
Wingspan (cm)_Basic       -0.016745741 -0.4037265
Height (cm)_Basic          0.118912756  0.2802232
VE/VCO2_Warm-up           -0.099616067  0.4842628
HR (/min)_Warm-up         -0.206886387 -0.3829729
AGE (years)_Basic         -0.031975116 -0.3997757

Vemos en las primeras dos dimensiones del análisis que en PC1 las variables con mayor valor absoluto son negativas y nos habla de la capacidad respiratoria/eficiencia ventilatoria:

RR (/min)_AT % Pred: -0.536 (mayor contribución)

RR (/min)_VT1: -0.511

RR (/min)_Warm-up: -0.464

RR (/min)_AT % max: -0.409

Al ser negativos los valores, tenemos una relación inversa; es decir, que ciclistas con puntajes altos en PC1 tendrían menores frecuencias respiratorias en todos los umbrales. Esto significa que a mayor eficiencia ventilatoria, los atletas mejor adaptados requieren menos respiraciones por minuto.

Para la dimensión PC2 tenemos variables con altos valores absolutos positivos y negativos, hablándonos de biometría y características antropométricas (estatura vs. envergadura) vs respuesta al ejercicio.

Variables con mayor peso positivo:

VE/VCO2_Warm-up: 0.484

Height (cm)_Basic: 0.280

RR (/min)_AT % max: 0.264

Variables con mayor peso negativo:

Wingspan (cm)_Basic: -0.404

AGE (years)_Basic: -0.400

HR (/min)_Warm-up: -0.383

Tenemos en esta dimensión el contraste: siendo positivas, mayor estatura + mayor relación VE/VCO2 (eficiencia del intercambio gaseoso); y las negativas, mayor envergadura + mayor edad + mayor frecuencia cardíaca en calentamiento. Es decir, que a mayores valores de este eje las variables nos hablarán de buena eficiencia ventilatoria pero mayor demanda respiratoria en intensidades máximas; y los valores negativos nos hablarán sobre mayor experiencia (edad), constitución física amplia (envergadura) y mayor estrés cardiovascular en calentamiento.

Variables con baja contribución en ambos componentes (como Sitting Height) podrían considerarse menos relevantes para diferenciar entre ciclistas.

We see in the first two dimensions of the analysis that in PC1 the variables with the highest absolute value are negative and it tells us about respiratory capacity/ventilatory efficiency:

RR (/min)_AT % Pred: -0.536 (highest contribution)

RR (/min)_VT1: -0.511

RR (/min)_Warm-up: -0.464

RR (/min)_AT % max: -0.409

Since the values are negative, we have an inverse relationship; that is, cyclists with high scores in PC1 would have lower respiratory frequencies at all thresholds. This means that with higher ventilatory efficiency, the better adapted athletes require fewer breaths per minute.

For the PC2 dimension we have variables with high positive and negative absolute values, telling us about biometrics and anthropometric characteristics (height vs. wingspan) vs exercise response.

Variables with highest positive weight:

VE/VCO2_Warm-up: 0.484

Height (cm)_Basic: 0.280

RR (/min)_AT % max: 0.264

Variables with highest negative weight:

Wingspan (cm)_Basic: -0.404

AGE (years)_Basic: -0.400

HR (/min)_Warm-up: -0.383

We have in this dimension the contrast: positive values indicate greater height + higher VE/VCO2 ratio (efficiency of gas exchange); and negative values indicate greater wingspan + older age + higher heart rate during warm-up. That is, at higher values of this axis the variables will indicate good ventilatory efficiency but greater respiratory demand at maximum intensities; and negative values will indicate greater experience (age), broader physical constitution (wingspan) and greater cardiovascular stress during warm-up.

Variables with low contribution in both components (such as Sitting Height) could be considered less relevant for differentiating between cyclists.

                            PC1      PC2
Standard deviation     1.765075 1.434247
Proportion of Variance 0.311550 0.205710
Cumulative Proportion  0.311550 0.517260

La gráfica de Brand nos muestra cómo en las primeras dos dimensiones está resumido el 52% de la varianza acumulada; además, la dimensión uno tiene el 31.15% de explicación y la dimensión dos el 20.57%.

The Brand graph shows that the first two dimensions account for 52% of the cumulative variance; with dimension one explaining 31.15% and dimension two 20.57%.

Analisis Grafico

A partir de las dimensiones halladas, establecemos entonces las interpretaciones de los cuatro cuadrantes que se forman:

Esquina superior-izquierda (PC1-, PC2+): Alta eficiencia respiratoria + buen intercambio gaseoso + estatura alta

Esquina superior-derecha (PC1+, PC2+): Baja eficiencia respiratoria + buen intercambio gaseoso + estatura alta

Esquina inferior-izquierda (PC1-, PC2-): Alta eficiencia respiratoria + mayor edad/envergadura + mayor FC en calentamiento

Esquina inferior-derecha (PC1+, PC2-): Baja eficiencia respiratoria + mayor edad/envergadura + mayor FC en calentamiento

Based on the identified dimensions, we then establish the interpretations of the four quadrants that are formed:

Upper-left corner (PC1-, PC2+): High respiratory efficiency + good gas exchange + tall height

Upper-right corner (PC1+, PC2+): Low respiratory efficiency + good gas exchange + tall height

Lower-left corner (PC1-, PC2-): High respiratory efficiency + older age/wingspan + higher HR during warm-up

Lower-right corner (PC1+, PC2-): Low respiratory efficiency + older age/wingspan + higher HR during warm-up

Vemos las variables Wingspan (cm)_Basic, RR y Age en el tercer cuadrante, lo que indica que estas variables se encuentran en una baja eficiencia respiratoria y mayor FC en el calentamiento. Vemos las variables Height y WCO2 asociadas en el cuarto cuadrante, lo que indica que estas variables están en una alta eficiencia respiratoria, mayor FC y mayor edad. Por lo demás, no se ve un grupo asociado a un conjunto de variables, lo que nos hace pensar que las mediciones entre cada deportista varían y tienden a ser independientes.

We see the variables Wingspan (cm)_Basic, RR and Age in the third quadrant, which indicates that these variables are associated with low respiratory efficiency and higher HR during warm-up. We see the variables Height and WCO2 associated in the fourth quadrant, which indicates that these variables are associated with high respiratory efficiency, higher HR and older age. Otherwise, no group associated with a set of variables is observed, which leads us to think that the measurements among each athlete vary and tend to be independent.

En esta gráfica podemos ver que las variables que tienen vectores más largos son aquellas que están mejor representadas; incluyen: variables respiratorias (RR) que dominan PC1 y variables de biometría que dominan PC2.

This graph shows that variables with longer vectors are better represented, including respiratory variables (RR) dominating PC1 and biometric variables dominating PC2.

--- title: "Analisis de componentes Ciclomontañismo" format: html: self-contained: true code-tools: true runtime: shiny --- ## Contexto ```{r,include=FALSE} ###################### PRIMERO CARGA LOS PAQUETES #install.packages("GGally") library(tidyverse) library(ggpubr) library(shiny) library(pheatmap) # Calcular tamaño del efecto (eta cuadrado) library(effectsize) library(readxl) library(DT) ###################### PRIMERO CARGA LOS PAQUETES if (!require("pacman")) install.packages("pacman") library(FactoMineR) library(gplots) library(RColorBrewer) library(corrplot) library(dplyr) library(forcats) # Para manejar factores # cargar paquetes necesarios pacman::p_load( FactoMineR, tidyverse, factoextra, haven, naniar, corrplot, readr, gridExtra, incidence, easypackages, readxl, dplyr, gplots, kableExtra, apyramid, janitor, flextable, lubridate, stringr, rio, bench, sf, cleaner, DT, leaflet, leaflet.extras, esquisse, tseries, forecast, skimr, tsibble, epicontacts, distcrete, epitrix, EpiEstim, projections, magrittr, binom, ape, outbreaker2, knitr, broom, ggridges, scales ) library(writexl) library(GGally) library(ggforce) # Para agregar elipses y círculos library(pheatmap) library(qgraph) #install.packages("kableExtra") #Instalar los paquetes #if (!require("pacman")) install.packages("pacman"); pacman::p_load(effsize, gplots, naniar, corrplot, readr, gridExtra, rcompanion, flextable, incidence, easypackages, FactoMineR, tidyverse, factoextra, haven, janitor, readxl, dplyr, kableExtra, apyramid, ggplot2, tidyr, lubridate, stringr, rio, bench, sf, cleaner, DT, leaflet, leaflet.extras, esquisse, tseries, forecast, skimr, tsibble, epicontacts, distcrete, epitrix, EpiEstim, projections, magrittr, binom, ape, outbreaker2, knitr, broom, ggridges, scales, psych, plotly, DescTools, effectsize) #if (!require("pacman")) install.packages("pacman"); pacman::p_load(caret,qgraph,webshot) library(effsize) #install.packages("effsize") #install.packages("gplots") #install.packages(c( "naniar", "corrplot")) #install.packages("readr") library(readr) library(gridExtra) # Instalar el paquete rcompanion si no está instalado if (!requireNamespace("rcompanion", quietly = TRUE)) { install.packages("rcompanion") } # Cargar el paquete library(rcompanion) # Mostrar la tabla con flextable library(flextable) library(incidence) #core functions library(easypackages)#recargar varias librerias en una sola linea library(car) library(ggpubr) paquetes <- c("FactoMineR", "tidyverse", "factoextra", "haven", "naniar", "corrplot") libraries(paquetes) #Sys.setlocale("LC_ALL", "en_US.UTF-8") #Sys.setenv(LANG = "spa") library(janitor)#para los comandos tably library(readxl) library(dplyr) library(gplots) library(FactoMineR)#GRAFICAR ACM library(factoextra) library(kableExtra)#PARA DECORAR LAS TABLAS library(apyramid)#PARA REALIZAR PIRAMIDE library(ggplot2)#PARA DIBUJAR library(readxl)#PARA LEER EXCEL library(dplyr) # manipulacion de dato library(janitor)#para los comandos tably library(factoextra) library(gplots)#graficos #if (!require("pacman")) install.packages("pacman") # cargar paquetes necesarios pacman::p_load( tidyr, dplyr, lubridate, stringr, rio, bench, janitor, # reclin2, sf, readr, cleaner, #plotly, DT, leaflet, leaflet.extras, esquisse., tseries, tidyverse, janitor, forecast, skimr, tsibble) #install.packages("epicontacts") library(epicontacts) #install.packages("distcrete") library(distcrete) #install.packages("epitrix") library(epitrix) #install.packages("EpiEstim") library(EpiEstim) library(projections) library(magrittr) library(binom) library(ape) library(outbreaker2) library(knitr) library(broom) library(ggridges) library(scales) #install.packages("ggridges") library(dplyr) library(tidyr) library(psych) library(PerformanceAnalytics) library(corrplot) library(plotly) library(flextable) library(dplyr) #install.packages(c("dplyr", "tidyverse", "readxl", "DescTools", "effectsize")) library(dplyr) library(readxl) library(DescTools) library(effectsize) library(readxl) library(dplyr) #############################################################Funciones # Función para calcular el intervalo de confianza del 95%, ignorando valores NA calcular_ic <- function(x) { # Calcular la longitud de x sin contar los NA n <- sum(!is.na(x)) # Calcular la desviación estándar y la media sin NA error_est <- sd(x, na.rm = TRUE) / sqrt(n) media <- mean(x, na.rm = TRUE) # Calcular el margen de error margen_error <- qnorm(0.975) * error_est # Redondear la media y el margen de error a dos decimales media <- round(media, 2) margen_error <- round(margen_error, 2) # Devolver el resultado en formato "media ± margen_error" return(paste(media, "±", margen_error)) } calcular_mediana_iqr <- function(x) { mediana <- round(median(x, na.rm = TRUE), 2) q1 <- round(quantile(x, 0.25, na.rm = TRUE), 2) q3 <- round(quantile(x, 0.75, na.rm = TRUE), 2) return(paste0(mediana, " (", q1, " - ", q3, ")")) } ``` Tenemos los siguientes datos: ```{r, echo=FALSE, warning=FALSE,message=FALSE} CICLOMOUNTAIN <- read_excel("D:/nocturno/temporales/estadistica del deporte/ciclomontain/CARACTERIZACION DE DATOS SELECCIÓN BOGOTÁ CICLOMONTAÑISMO AJUSTADO MA_D.xlsx") #Factorizamos cada variable CICLOMOUNTAIN_ <- CICLOMOUNTAIN %>% mutate( # Conversión de tipos de datos across(1:2, as.factor),across(3:18, as.numeric),across(19, as.factor),across(20:132, as.numeric)) CICLOMOUNTAIN_ <-CICLOMOUNTAIN_ %>% # Convertir NA a categoría "Sin data" después de cortar mutate( EDAD_C = cut( `EDAD (años)`, breaks = c(0, 16, 19, 25), labels = c( "15 a 16 años", "De 17 a 19 años", "De 20 a 25 años"), include.lowest = TRUE )) variables<-c("Rider", "FULL NAME_Basic", "AGE (years)_Basic", "Body Mass (kg)_Basic", "Height (cm)_Basic", "Sitting Height (cm)_Basic", "Wingspan (cm)_Basic", "BMI_Basic", "Muscle mass (%)_Body comp", "Muscle mass (kg)_Body comp", "Bone mass- Rocha (%)_Body comp", "Bone mass- Rocha (kg)_Body comp", "Adipose mass - Kerr (%)_Body comp", "Adipose mass - Kerr (kg)_Body comp", "Fat mass - Faulkner (%)_Body comp", "Fat mass - Faulkner (kg)_Body comp", "Fat mass - Carter (%)_Body comp", "Fat mass - Carter (kg)_Body comp", "Somatotype_Body comp", "Triceps Skinfold (mm)_Anthrop", "Subscapular Skinfold (mm)_Anthrop", "Biceps Skinfold (mm)_Anthrop", "Iliac Crest Skinfold (mm)_Anthrop", "Supraespinal Skinfold (mm)_Anthrop", "Abdominal Skinfold (mm)_Anthrop", "Thigh Skinfold (mm)_Anthrop", "Calf Skinfold (mm)_Anthrop", "Relaxed Arm Circumference (cm)_Anthrop", "Flexed Arm Circumference (cm)_Anthrop", "Waist Circumference (cm)_Anthrop", "Hip Circumference (cm)_Anthrop", "Mid-thigh Circumference (cm)_Anthrop", "Calf Circumference (cm)_Anthrop", "Humerus Diameter (mm)_Anthrop", "Biestyloid Diameter (mm)_Anthrop", "Femur Diameter (mm)_Anthrop", "VO2 (L/min)_Warm-up", "VO2/KG (ml/min/kg)_Warm-up", "VO2/HR (ml)_Warm-up", "WR (w)_Warm-up", "HR (/min)_Warm-up", "VE/VO2_Warm-up", "VE/VCO2_Warm-up", "RER_Warm-up", "VE (L/min)_Warm-up", "RR (/min)_Warm-up", "VO2 (L/min)_VT1", "VO2/KG (ml/min/kg)_VT1", "VO2/HR (ml)_VT1", "WR (w)_VT1", "HR (/min)_VT1", "VE/VO2_VT1", "VE/VCO2_VT1", "RER_VT1", "VE (L/min)_VT1", "RR (/min)_VT1", "VO2 (L/min)_AT % Pred", "VO2/KG (ml/min/kg)_AT % Pred", "VO2/HR (ml)_AT % Pred", "WR (w)_AT % Pred", "HR (/min)_AT % Pred", "VE (L/min)_AT % Pred", "RR (/min)_AT % Pred", "VO2 (L/min)_AT % max", "VO2/KG (ml/min/kg)_AT % max", "VO2/HR (ml)_AT % max", "WR (w)_AT % max", "HR (/min)_AT % max", "VE/VO2_AT % max", "VE/VCO2_AT % max", "VE (L/min)_AT % max", "RR (/min)_AT % max", "VO2 (L/min)_VT2", "VO2/KG (ml/min/kg)_VT2", "VO2/HR (ml)_VT2", "WR (w)_VT2", "HR (/min)_VT2", "VE/VO2_VT2", "VE/VCO2_VT2", "RER_VT2", "VE (L/min)_VT2", "RR (/min)_VT2", "VO2 (L/min)_RCP % Pred", "VO2/KG (ml/min/kg)_RCP % Pred", "VO2/HR (ml)_RCP % Pred", "WR (w)_RCP % Pred", "HR (/min)_RCP % Pred", "VE (L/min)_RCP % Pred", "RR (/min)_RCP % Pred", "VO2 (L/min)_RCP % Max", "VO2/KG (ml/min/kg)_RCP % Max", "VO2/HR (ml)_RCP % Max", "WR (w)_RCP % Max", "HR (/min)_RCP % Max", "VE/VO2_RCP % Max", "VE/VCO2_RCP % Max", "VE (L/min)_RCP % Max", "RR (/min)_RCP % Max", "VO2 (L/min)_VO2 Max or Peak", "VO2/KG (ml/min/kg)_VO2 Max or Peak", "VO2/HR (ml)_VO2 Max or Peak", "WR (w)_VO2 Max or Peak", "HR (/min)_VO2 Max or Peak", "VE/VO2_VO2 Max or Peak", "VE/VCO2_VO2 Max or Peak", "RER_VO2 Max or Peak", "VE (L/min)_VO2 Max or Peak", "RR (/min)_VO2 Max or Peak", "VO2 (L/min)_VO2 Peak % Prev", "VO2/KG (ml/min/kg)_VO2 Peak % Prev", "VO2/HR (ml)_VO2 Peak % Prev", "WR (w)_VO2 Peak % Prev", "HR (/min)_VO2 Peak % Prev", "VE (L/min)_VO2 Peak % Prev", "RR (/min)_VO2 Peak % Prev", "VO2 (L/min)_Predicted", "VO2/KG (ml/min/kg)_Predicted", "VO2/HR (ml)_Predicted", "WR (w)_Predicted", "HR (/min)_Predicted", "VE (L/min)_Predicted", "RR (/min)_Predicted", "VO2 (L/min)_Abs Max Val", "VO2/KG (ml/min/kg)_Abs Max Val", "VO2/HR (ml)_Abs Max Val", "WR (w)_Abs Max Val", "HR (/min)_Abs Max Val", "VE/VO2_Abs Max Val", "VE/VCO2_Abs Max Val", "RER_Abs Max Val", "VE (L/min)_Abs Max Val", "RR (/min)_Abs Max Val", "Age Category") basic<-variables[2:8] Body_composition<-variables[9:19] antropometrics<-variables[20:36] calentamiento<-variables[37:46] vt1<-variables[47:56] atprev<-variables[57:63] atpmax<-variables[64:72] vt2<-variables[73:82] rcp_prev<-variables[83:89] rcp_max<-variables[90:98] vo2_max_o_peak<-variables[99:108] vo2peak_prev<-variables[109:115] previsto<-variables[116:122] valores_absolutos_maximos<-variables[123:132] colnames(CICLOMOUNTAIN_)<-variables datatable( CICLOMOUNTAIN_, rownames = FALSE, extensions = c("Buttons", "Scroller"), options = list( scrollX = TRUE, scrollY = "500px", scroller = TRUE, dom = "Bfrtip", buttons = c("copy", "csv", "excel") ) ) ``` Los datos cuentan con varios NA, lo cual puede ser perjudicial para realizar comparaciones y el análisis de componentes, es por eso que se realiza una imputación con base en el teorema del límite central, el cual nos dice que la mayoría de los datos de una distribución se encuentra en el intervalo centrado en la media y más o menos tres veces la desviación estándar, por ejemplo para la columna talla tenemos 5 valores NA, realizando la imputación mostramos los valores obtenidos. The data contain several NAs, which can be detrimental for making comparisons and component analysis. This is why imputation is performed based on the central limit theorem, which tells us that most data in a distribution falls within the interval centered on the mean plus or minus three times the standard deviation. For example, for the height column we have 5 NA values; by performing the imputation we show the obtained values. ```{r} i<-6 valores_na<-c(which(is.na(CICLOMOUNTAIN_[[i]]))) media<-mean(CICLOMOUNTAIN_[[i]],na.rm = TRUE) sd<-sd(CICLOMOUNTAIN_[[i]],na.rm = TRUE) #aleatroios set.seed(123) # Para resultados consistentes aleatorios<-runif(length(valores_na), -3, 3) signo_aleatorio <- sample(c(-1, 1), size = length(valores_na), replace = TRUE) valores_imputados<-c(round(media+signo_aleatorio*aleatorios*sd,2)) colnames(CICLOMOUNTAIN_[i]) valores_imputados ``` Realizamos el mismo proceso para las demas variables con faltantes, obtenemos entonces la siguiente base ```{r, echo=FALSE, warning=FALSE} for (i in 3:18) { columna <- CICLOMOUNTAIN_[[i]] if (any(is.na(columna))) { valores_na<-c(which(is.na(CICLOMOUNTAIN_[[i]]))) media<-mean(CICLOMOUNTAIN_[[i]],na.rm = TRUE) sd<-sd(CICLOMOUNTAIN_[[i]],na.rm = TRUE) #aleatroios set.seed(123) # Para resultados consistentes aleatorios<-runif(length(valores_na), -3, 3) signo_aleatorio <- sample(c(-1, 1), size = length(valores_na), replace = TRUE) valores_imputados<-c(round(media+signo_aleatorio*aleatorios*sd),3) # Reemplazar los NA por los valores imputados en la columna original columna[valores_na] <- valores_imputados # Asignar la columna imputada de vuelta al data frame (si es necesario) CICLOMOUNTAIN_[[i]] <- columna } } for (i in 20:132) { columna <- CICLOMOUNTAIN_[[i]] if (any(is.na(columna))) { valores_na<-c(which(is.na(CICLOMOUNTAIN_[[i]]))) media<-mean(CICLOMOUNTAIN_[[i]],na.rm = TRUE) sd<-sd(CICLOMOUNTAIN_[[i]],na.rm = TRUE) #aleatroios set.seed(123) # Para resultados consistentes aleatorios<-runif(length(valores_na), -3, 3) signo_aleatorio <- sample(c(-1, 1), size = length(valores_na), replace = TRUE) valores_imputados<-c(round(media+signo_aleatorio*aleatorios*sd),3) # Reemplazar los NA por los valores imputados en la columna original columna[valores_na] <- valores_imputados # Asignar la columna imputada de vuelta al data frame (si es necesario) CICLOMOUNTAIN_[[i]] <- columna } } datatable( CICLOMOUNTAIN_, rownames = FALSE, extensions = c("Buttons", "Scroller"), options = list( scrollX = TRUE, scrollY = "500px", scroller = TRUE, dom = "Bfrtip", buttons = c("copy", "csv", "excel") ) ) ``` ## Analisis descriptivo y comparativo Para este analisis tenemos tres categorias en la variable edad, de 15 a 16 años, de 17 a 19 y de 20 a 25: ```{r, echo=FALSE} calc_ci <- function(count, total, conf.level = 0.95) { prop <- count / total z <- qnorm(1 - (1 - conf.level) / 2) se <- sqrt(prop * (1 - prop) / total) lower <- prop - z * se upper <- prop + z * se return(c(percent(lower, accuracy = 0.01), percent(upper, accuracy = 0.01))) } # Crear la tabla de frecuencias y formatear los porcentajes table <- CICLOMOUNTAIN_ %>% tabyl(`Age Category`)%>%adorn_pct_formatting() # Asignar nombres uniformes a las columnas colnames(table) <- c("Category", "Frec", "Percent") # Calcular intervalos de confianza para las proporciones total <- sum(table$Frec) ci <- t(apply(table[, 2, drop = FALSE], 1, function(x) calc_ci(x, total))) colnames(ci) <- c("CI_Lower", "CI_Upper") table <- cbind(table, ci) ggplot(table, aes(x = Category, y = Frec, fill = Category)) + geom_bar(stat = "identity") + # Mostrar eje y como porcentajes labs(title = "Distribution of Categorys", x = "Category", y = "Frecuency") + theme_minimal() + theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1)) tablechi <- table(CICLOMOUNTAIN_[, 133]) # Prueba chi cuadrado chi <- chisq.test(tablechi) chi ``` ```{r, echo=FALSE} CICLOMOUNTAIN_N<-cbind.data.frame(CICLOMOUNTAIN_[3:18],CICLOMOUNTAIN_[20:39],CICLOMOUNTAIN_[41:131],CICLOMOUNTAIN_[133]) vars_numericas <- names(CICLOMOUNTAIN_N)[ sapply(CICLOMOUNTAIN_N, is.numeric) & names(CICLOMOUNTAIN_N) != "Age Category"] #| label: selector selectInput( "var_seleccionada", "Selecciona una variable:", choices = vars_numericas, selected = vars_numericas[1]) # Segundo selector: Tipo de gráfico selectInput( "tipo_grafico", "Selecciona el tipo de gráfico:", choices = c("Gráfico Q-Q" = "qq", "Boxplot" = "boxplot"), selected = "qq") renderPlot({ req(input$var_seleccionada) # SOLUCIÓN 2: Usar reformulate que maneja mejor nombres complejos formula_modelo <- reformulate("`Age Category`", response = input$var_seleccionada) modelo <- lm(formula_modelo, data = CICLOMOUNTAIN_N) # Generar gráfico según selección if (input$tipo_grafico == "qq") { # Gráfico Q-Q de residuos ggqqplot( residuals(modelo), title = paste("Gráfico Q-Q para:", input$var_seleccionada) ) } else { # Gráfico de cajas y bigotes ggplot(CICLOMOUNTAIN_N, aes(x = `Age Category`, y = .data[[input$var_seleccionada]])) + geom_boxplot(fill = "lightblue", alpha = 0.7) + labs( title = paste("Boxplot de", input$var_seleccionada, "por Categoría de Edad"), x = "Categoría de Edad", y = input$var_seleccionada ) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) } }) ``` Para el comparativo de cada variable se usó prueba de normalidad y de homogeneidad en los datos; aquellas que la cumplían se sometían a prueba ANOVA, con medida de efecto $\eta^{2}$ para las medias y la desviación estándar; aquellas que no cumplían la prueba se compararon a partir de la prueba no paramétrica Kruskal-Wallis, la cual compara medianas. En este caso se presenta entonces la mediana y sus rangos intercuartílicos; el tamaño del efecto se midió por medio del parámetro $e$. For the comparison of each variable, normality and homogeneity tests were used on the data; those that met the assumptions were subjected to ANOVA test, with effect size $\eta^{2}$ for means and standard deviation; those that did not meet the test assumptions were compared using the non-parametric Kruskal-Wallis test, which compares medians. In this case, the median and its interquartile ranges are presented; the effect size was measured using the parameter $e$. ```{r, echo=FALSE, message=FALSE,warning=FALSE} tabla_resultados <- data.frame(Variable = c(colnames(CICLOMOUNTAIN_)[3:18],colnames(CICLOMOUNTAIN_)[20:39],colnames(CICLOMOUNTAIN_)[41:131])) #colnames(CICLOMOUNTAIN_)[20] #tabla_resultados[1:8,] tests<-c(rep("Basic",length(basic)-1),rep("Body composition",length(Body_composition)-1),rep("Antropometrycs",length(antropometrics)),rep("Warm up",length(calentamiento)-1),rep("Vt1",length(vt1)),rep("AT % Pred",length(atprev)),rep("AT % max",length(atpmax)),rep("VT2",length(vt2)),rep("RCP % Pred",length(rcp_prev)),rep("RCP % Max",length(rcp_max)),rep("VO2 Max or peak",length(vo2_max_o_peak)),rep("VO2 peak % prev",length(vo2peak_prev)),rep("Predicted",length(previsto)),rep("Abs Max val",length(valores_absolutos_maximos)-1)) tabla_resultados<-cbind.data.frame(tests,tabla_resultados) CICLOMOUNTAIN_N<-cbind.data.frame(CICLOMOUNTAIN_[3:18],CICLOMOUNTAIN_[20:39],CICLOMOUNTAIN_[41:131],CICLOMOUNTAIN_[133]) results_list<-list() #colnames(CICLOMOUNTAIN_)[20] # Crear secuencia que excluye el 19, 20:39, 41:129 #secuencia <- c(3:18,20:39) #secuencia <- c(20:39) #length(secuencia) for (i in 1:127) { #i<-20 # Definir el índice de la columna nombre_columna <- names(CICLOMOUNTAIN_N)[i] # nombre_columna modelo <- lm(CICLOMOUNTAIN_N[[i]] ~ `Age Category`, data = CICLOMOUNTAIN_N) # Prueba de normalidad Shapiro-Wilk normalidad_shapiro<-shapiro.test(residuals(modelo)) # Prueba de Levene Homogeneidad<-leveneTest(CICLOMOUNTAIN_N[[i]]~ `Age Category` , data = CICLOMOUNTAIN_N) # normalidad_shapiro$p.value>0.05 #normalidad_shapiro$p.value>0.05 # Tomar decisión basada en resultados if(normalidad_shapiro$p.value>0.05) { media_por_categoria <- CICLOMOUNTAIN_N %>% group_by(`Age Category`) %>% summarise(media = calcular_ic(pull(cur_data(), i))) anova_result <- aov(CICLOMOUNTAIN_N[[nombre_columna]] ~ `Age Category`, data = CICLOMOUNTAIN_N) # Resumen del ANOVA anova_resum<-summary(anova_result) eta_squared <- eta_squared(anova_result) # Extraer información del ANOVA f_value <- summary(anova_result)[[1]]$`F value`[1] df_between <- summary(anova_result)[[1]]$Df[1] df_residual <- summary(anova_result)[[1]]$Df[2] p_value <- summary(anova_result)[[1]]$`Pr(>F)`[1] # Calcular eta squared eta_sq <- eta_squared$Eta2 # Generar interpretación en inglés if (p_value < 0.05) { interpretation <- sprintf( "The ANOVA revealed statistically significant differences in %s between categories (F(%d,%d) = %.3f, p = %.3f).", nombre_columna, df_between, df_residual, f_value, p_value ) } else { interpretation <- sprintf( "The ANOVA did not reveal statistically significant differences in %s between categories (F(%d,%d) = %.3f, p = %.3f).", nombre_columna, df_between, df_residual, f_value, p_value ) } # Función para interpretar eta cuadrado interpret_eta_sq <- function(eta_sq) { if (eta_sq < 0.01) { magnitude <- "negligible" } else if (eta_sq < 0.06) { magnitude <- "small" } else if (eta_sq < 0.14) { magnitude <- "moderate" } else { magnitude <- "large" } return(magnitude) } # Luego, en tu código: magnitude <- interpret_eta_sq(eta_sq) effect_interpretation <- sprintf( "The effect size was %s (η² = %.3f), indicating that %.1f%% of the variability in %s is explained by category.", magnitude, eta_sq, eta_sq * 100, nombre_columna ) #nombre_columna, resultados<-c(media_por_categoria$media[1],media_por_categoria$media[2],media_por_categoria$media[3],round(f_value,3),round(p_value,3),"Eta^2",round(eta_sq,3),effect_interpretation) } else { # Si no hay normalidad: Kruskal-Wallis # Prueba Kruskal-Wallis kruskal_result <- kruskal.test(CICLOMOUNTAIN_N[[nombre_columna]] ~ `Age Category`, data = CICLOMOUNTAIN_N) # Estadísticos de Kruskal-Wallis H_value <- kruskal_result$statistic df_kruskal <- length(unique(CICLOMOUNTAIN_N$`Age Category`)) - 1 p_value <- kruskal_result$p.value # Calcular epsilon squared (tamaño de efecto para Kruskal-Wallis) n_total <- nrow(CICLOMOUNTAIN_N) epsilon_sq <- (H_value - df_kruskal) / (n_total - 1) # Asegurar que no sea negativo epsilon_sq <- max(0, epsilon_sq) medianas_por_categoria <- CICLOMOUNTAIN_N %>% group_by(`Age Category`) %>% summarise(mediana =calcular_mediana_iqr(pull(cur_data(), i))) # Generar interpretación en inglés if (p_value < 0.05) { interpretation <- sprintf( "The Kruskal-Wallis test revealed statistically significant differences in %s between categories (H(%d) = %.3f, p = %.3f).", nombre_columna, df_kruskal, H_value, p_value ) } else { interpretation <- sprintf( "The Kruskal-Wallis test did not reveal statistically significant differences in %s between categories (H(%d) = %.3f, p = %.3f).", nombre_columna, df_kruskal, H_value, p_value ) } # Función para interpretar epsilon squared (mismos criterios que eta squared) interpret_epsilon_sq <- function(epsilon_sq) { if (epsilon_sq < 0.01) { magnitude <- "negligible" } else if (epsilon_sq < 0.06) { magnitude <- "small" } else if (epsilon_sq < 0.14) { magnitude <- "moderate" } else { magnitude <- "large" } return(magnitude) } # Interpretación del tamaño de efecto magnitude <- interpret_epsilon_sq(epsilon_sq) effect_interpretation <- sprintf( "The effect size was %s (ε² = %.3f), indicating that %.1f%% of the variability in %s is explained by category.", magnitude, epsilon_sq, epsilon_sq * 100, nombre_columna ) # Resultados con medianas en lugar de medias resultados <- c(medianas_por_categoria$mediana[1], medianas_por_categoria$mediana[2], medianas_por_categoria$mediana[3], round(H_value, 3), round(p_value, 3),"e^2", round(epsilon_sq, 3), effect_interpretation) } #length(resultados) #tabla_resultados[i-5, 2] results_list[[tabla_resultados[i, 2]]] <- resultados } # Convertimos en data frame combined_table <- do.call(rbind, results_list) # Data frame para exportar combined_table_processed <- data.frame(combined_table) Variablesfull<-row.names(combined_table_processed) combined_table_processed<-cbind.data.frame(tests,Variablesfull,combined_table_processed) table(CICLOMOUNTAIN_$`Age Category`) colnames(combined_table_processed) <- c("Test","Variable","15-16(IC ó Med-RIC)", "17-19(IC ó Med-RIC)","20-25(IC ó Med-RIC)", "Estadistico", "p", "Tipo de Efecto", "valor efecto","Interpretation") datatable( combined_table_processed, rownames = FALSE, extensions = c("Buttons", "Scroller"), options = list( scrollX = TRUE, scrollY = "500px", scroller = TRUE, dom = "Bfrtip", buttons = c("copy", "csv", "excel") ) ) ``` ## **Tratamiento de datos** (A fin de evitar superposiciones en los graficos finales, se realizaron los procesos de correlacion de kendall y pruebas de normalidad Shapiro wilk para determinar alta correlacion y normalidad respectivamente, realizandolo grupo a gupo con tres categorias principales, categoria infantil, prejuvenil y juvenil, de la categoria de variables de ataque, se presentan a continuacion las variables depuradas In order to avoid overlapping in the final graphs, Kendall correlation and Shapiro-Wilk normality tests were performed to determine high correlation and normality respectively, conducting them group by group across three main categories: children, pre-youth, and youth categories. The refined variables from the attack variable category are presented below. ```{r, echo=FALSE, include=FALSE} resultados_variables_list<-list() tabla_final_lista <-list() Body_composition_1<-Body_composition[1:10] #variables[37:46] calentamiento_1<-c(variables[37:39],variables[41:46]) # Crear una lista con las dos categorías lista_categorias <- list( basic[2:7],Body_composition_1,antropometrics,calentamiento_1,vt1,atprev,atpmax,vt2,rcp_prev,rcp_max,vo2_max_o_peak,vo2peak_prev,previsto,valores_absolutos_maximos) guardar_datos_baja_cor_diferenciados<-list() #num_categoria<-1 for (num_categoria in 1:9) { sublista_ <- lista_categorias[[num_categoria]] # Usamos [[1]] para extraer el elemento, no [1] que devuelve una lista sublista_ # Seleccionar elementos según los números #columnas_deseadas <- c(lista_categorias[1][num_categoria:length(Estadisticas_partidos)],lista_categorias$disparo[num_categoria+1:(length(Estadisticas_disparo)-1)]) columnas_deseadas <- c(sublista_) #lista_categorias[1] columnas_deseadas#colnames(Player_performance_5_ligas) #indices_columnas # Obtener los índices de las columnas indices_columnas <- which(colnames(CICLOMOUNTAIN_) %in% columnas_deseadas) # Extraer las columnas usando los índices #analisis_pca1<-Player_performance_5_ligas_[, indices_columnas] #anexo_mas################################################################ analisis_pca1_2<-CICLOMOUNTAIN_[indices_columnas] analisis_pca1_2<-cbind.data.frame("Age Category"=CICLOMOUNTAIN_N$`Age Category`,analisis_pca1_2) # Cargar librerías necesarias library(dplyr) # Asumimos que analisis_pca es tu base de datos # Seleccionamos solo las columnas numéricas (excluimos Nation si es categórica) numeric_vars <- analisis_pca1_2 %>% select_if(is.numeric) # Realizamos la prueba de Shapiro-Wilk para cada variable numérica normality_tests <- sapply(numeric_vars, function(x) { if (sum(!is.na(x)) > 3) { # Shapiro-Wilk requiere al menos 4 observaciones no NA shapiro.test(x)$p.value } else { NA # Si hay menos de 4 observaciones, devolvemos NA } }) # Creamos un dataframe con los resultados de normalidad normality_results <- data.frame( Variable = names(numeric_vars), P_Valor = normality_tests, Cumple_Normalidad = ifelse(normality_tests > 0.05, "Cumple", "No Cumple"), stringsAsFactors = FALSE ) # Mostramos los resultados de normalidad con la interpretación print(normality_results) # Separamos las variables en dos subbases # Variables que cumplen el supuesto de normalidad vars_cumplen <- normality_results %>% filter(Cumple_Normalidad == "Cumple") %>% pull(Variable) #vars_cumplen # Variables que NO cumplen el supuesto de normalidad vars_no_cumplen <- normality_results %>% filter(Cumple_Normalidad == "No Cumple") %>% pull(Variable) #vars_no_cumplen # Creamos las subbases # Subbase con variables que cumplen el supuesto (incluye la columna Nation si existe) if (length(vars_cumplen) > 0) { analisis_pca_cumple <- analisis_pca1_2 %>% select(any_of(c("Age Category", vars_cumplen))) } else { analisis_pca_cumple <- data.frame("Age Category" = analisis_pca1_2$`Age Category`) # } # Subbase con variables que NO cumplen el supuesto (incluye la columna Nation si existe) if (length(vars_no_cumplen) > 0) { analisis_pca_no_cumple <- analisis_pca1_2 %>% select(any_of(c("Age Category", vars_no_cumplen))) } else { analisis_pca_no_cumple <- data.frame("Age Category" = analisis_pca1_2$`Age Category`) # Si no hay variables, solo mantenemos Nation } # Mostramos las subbases print("Subbase con variables que cumplen el supuesto de normalidad:") print(head(analisis_pca_cumple)) print("Subbase con variables que NO cumplen el supuesto de normalidad:") print(head(analisis_pca_no_cumple)) data_corr<-analisis_pca1_2[,2:length(colnames(analisis_pca1_2))] matriz_cor <- cor(data_corr, method = "kendall") # Crear datos simulados set.seed(123) # Aplicar el código anterior umbral <- 0.5 library(tidyverse) cor_df <- as.data.frame(as.table(matriz_cor)) %>% filter(Var1 != Var2) %>% mutate(abs_cor = abs(Freq)) %>% filter(abs_cor > umbral) %>% select(Var1, Var2, abs_cor) vars_alta_cor <- unique(c(as.character(cor_df$Var1), as.character(cor_df$Var2))) #vars_alta_cor datos_baja_cor1 <- analisis_pca1_2[, !colnames(analisis_pca1_2) %in% vars_alta_cor] # Grupo de baja guardar_datos_baja_cor_diferenciados[[num_categoria]]<-datos_baja_cor1 tabla_final_lista <- guardar_datos_baja_cor_diferenciados[[num_categoria]] } combined_table_corr <- do.call(cbind, guardar_datos_baja_cor_diferenciados) combined_table_corr_A<-cbind.data.frame(combined_table_corr[1:5],combined_table_corr[9:11],combined_table_corr[13:14],combined_table_corr[16],combined_table_corr[18]) data_corr_f<-combined_table_corr_A[,2:length(colnames(combined_table_corr_A))] matriz_cor_f <- cor(data_corr_f, method = "kendall") # Crear datos simulados set.seed(123) # Aplicar el código anterior umbral <- 0.1 library(tidyverse) cor_df_fin <- as.data.frame(as.table(matriz_cor_f)) %>% filter(Var1 != Var2) %>% mutate(abs_cor = abs(Freq)) %>% filter(abs_cor > umbral) %>% select(Var1, Var2, abs_cor) cor_df_fin vars_alta_cor_f <- unique(c(as.character(cor_df_fin$Var1), as.character(cor_df_fin$Var2))) vars_alta_cor_f datos_baja_cor1 <- combined_table_corr_A[, !colnames(combined_table_corr_A) %in% vars_alta_cor] # Grupo de baja datos_baja_cor1 variables_categorias_df <- data.frame(Variable = character(), Categoria = character(), stringsAsFactors = FALSE) lista_categorias names(lista_categorias)<- lista_categorias # Iterar sobre las variables que quedaron en datos_baja_cor1 for (var in colnames(datos_baja_cor1)) { # Iterar sobre las categorías en lista_categorias for (categoria in names(lista_categorias)) { # Verificar si la variable pertenece a la categoría actual if (var %in% lista_categorias[[categoria]]) { # Agregar la variable y su categoría al dataframe variables_categorias_df <- rbind(variables_categorias_df, data.frame(Variable = var, Categoria = categoria, stringsAsFactors = FALSE)) break # Salir del bucle una vez que se encuentra la categoría } } } variables_categorias_df # Calcular la suma de correlaciones absolutas para cada variable en matriz_cor_f cor_sums <- rowSums(abs(matriz_cor_f), na.rm = TRUE) # Crear un dataframe con las variables y su puntaje de correlación cor_importance <- data.frame( Variable = names(cor_sums), Cor_Sum = cor_sums, stringsAsFactors = FALSE ) # Unir con las categorías defensivas y ofensivas # Para Defensivas variables_importance <- cor_importance %>% filter(Variable %in% variables_categorias_df$Variable) %>% arrange(desc(Cor_Sum)) %>% # Ordenar por puntaje de correlación descendente slice_head(n = 10) # Seleccionar las 10 principales ``` ```{r, echo=FALSE} analisis_pca1_gen<-analisis_pca1_2 data_corr_gen<-analisis_pca1_gen[,2:length(colnames(analisis_pca1_gen))] matriz_cor_gen <- cor(data_corr_gen, method = "kendall") pheatmap( matriz_cor_gen, cluster_rows = FALSE, # Desactiva clustering/dendrograma en filas cluster_cols = FALSE, fontsize_row = 4, # Tamaño etiquetas filas fontsize_col = 4, # Tamaño etiquetas columnas display_numbers = FALSE, fontsize_number = 2, # Mostrar valores de correlación number_format = "%.2f", # Formato de 2 decimales number_color = "black", na_col = "white", # Color para celdas NA (triángulo superior) main = "General correlation matrix" ) datatable(variables_importance, caption = "Variables importance") ``` ## Analisis de Componentes principales Realizamos entonces un analisis para las variables por categoria: ```{r, echo=FALSE} analisis_pca0<-cbind.data.frame(CICLOMOUNTAIN_[133],CICLOMOUNTAIN_N) #colnames(analisis_pca0) columnas_deseadas <- c(unique(variables_importance$Variable)) # Obtener los índices de las columnas indices_columnas <- which(colnames(CICLOMOUNTAIN_) %in% columnas_deseadas) #indices_columnas # Extraer las columnas usando los índices analisis_pca<-cbind(analisis_pca0[1],datos_baja_cor1[, c(variables_importance$Variable)]) analisis_pca<- na.omit(analisis_pca) result <- PCA(analisis_pca[2:11],graph=FALSE,scale.unit = FALSE) #result$var #plot(result,choix="var") analisis_pca_M<-as.matrix(analisis_pca[2:length(colnames(analisis_pca))]) brand.sc <- analisis_pca brand.sc [ , 2:length(colnames(analisis_pca))] <- data.frame(scale(analisis_pca [ , 2:length(colnames(analisis_pca))])) #summary(brand.sc) corrplot::corrplot(cor(brand.sc [ , 2:length(colnames(analisis_pca))], method = "kendall") , order="hclust") brand.mean <- aggregate (. ~ `Age Category` , data=brand.sc , mean) #brand.mean rownames(brand.mean) <- brand.mean [ , 1] # la marca como nombre de filas brand.mean <- brand.mean [ , -1] # eliminamos la columna de marca #brand.mean par(mar=c(1,1,1,1)) pheatmap( as.matrix(brand.mean), cluster_rows = FALSE, # Desactiva clustering/dendrograma en filas cluster_cols = FALSE, fontsize_row = 4, # Tamaño etiquetas filas fontsize_col = 4, # Tamaño etiquetas columnas display_numbers = TRUE, fontsize_number = 2, # Mostrar valores de correlación number_format = "%.2f", # Formato de 2 decimales number_color = "black", na_col = "white", # Color para celdas NA (triángulo superior) main = "Correlación Matrix" ) ``` Realizamos el analisis de componentes principales para establecer la relacion entre la categoria y las distintas variables. Mostramos a continucaion la matriz rotación, que nos indica cuánto contribuye cada variable original a cada componente principal. Los valores (cargas) varían de -1 a 1, donde valores absolutos más grandes indican una mayor contribución de la variable al componente. We performed a principal component analysis to establish the relationship between the category and the different variables. Below we show the rotation matrix, which indicates how much each original variable contributes to each principal component. The values (loadings) range from -1 to 1, where larger absolute values indicate a greater contribution of the variable to the component. ```{r, echo=FALSE} data_pca_scaled <- scale(select(analisis_pca, -`Age Category`)) # Aplicar PCA pca_result <- prcomp(data_pca_scaled, center = TRUE, scale. = TRUE) #tabla de rotacionales pca_result$rotation[,1:2] datatable( pca_result$rotation[,1:2], rownames = FALSE, extensions = c("Buttons", "Scroller"), options = list( scrollX = TRUE, scrollY = "500px", scroller = TRUE, dom = "Bfrtip", buttons = c("copy", "csv", "excel") ) ) ``` Vemos en las primeras dos dimensiones del análisis que en PC1 las variables con mayor valor absoluto son negativas y nos habla de la capacidad respiratoria/eficiencia ventilatoria: RR (/min)_AT % Pred: -0.536 (mayor contribución) RR (/min)_VT1: -0.511 RR (/min)_Warm-up: -0.464 RR (/min)_AT % max: -0.409 Al ser negativos los valores, tenemos una relación inversa; es decir, que ciclistas con puntajes altos en PC1 tendrían menores frecuencias respiratorias en todos los umbrales. Esto significa que a mayor eficiencia ventilatoria, los atletas mejor adaptados requieren menos respiraciones por minuto. Para la dimensión PC2 tenemos variables con altos valores absolutos positivos y negativos, hablándonos de biometría y características antropométricas (estatura vs. envergadura) vs respuesta al ejercicio. Variables con mayor peso positivo: VE/VCO2_Warm-up: 0.484 Height (cm)_Basic: 0.280 RR (/min)_AT % max: 0.264 Variables con mayor peso negativo: Wingspan (cm)_Basic: -0.404 AGE (years)_Basic: -0.400 HR (/min)_Warm-up: -0.383 Tenemos en esta dimensión el contraste: siendo positivas, mayor estatura + mayor relación VE/VCO2 (eficiencia del intercambio gaseoso); y las negativas, mayor envergadura + mayor edad + mayor frecuencia cardíaca en calentamiento. Es decir, que a mayores valores de este eje las variables nos hablarán de buena eficiencia ventilatoria pero mayor demanda respiratoria en intensidades máximas; y los valores negativos nos hablarán sobre mayor experiencia (edad), constitución física amplia (envergadura) y mayor estrés cardiovascular en calentamiento. Variables con baja contribución en ambos componentes (como Sitting Height) podrían considerarse menos relevantes para diferenciar entre ciclistas. We see in the first two dimensions of the analysis that in PC1 the variables with the highest absolute value are negative and it tells us about respiratory capacity/ventilatory efficiency: RR (/min)_AT % Pred: -0.536 (highest contribution) RR (/min)_VT1: -0.511 RR (/min)_Warm-up: -0.464 RR (/min)_AT % max: -0.409 Since the values are negative, we have an inverse relationship; that is, cyclists with high scores in PC1 would have lower respiratory frequencies at all thresholds. This means that with higher ventilatory efficiency, the better adapted athletes require fewer breaths per minute. For the PC2 dimension we have variables with high positive and negative absolute values, telling us about biometrics and anthropometric characteristics (height vs. wingspan) vs exercise response. Variables with highest positive weight: VE/VCO2_Warm-up: 0.484 Height (cm)_Basic: 0.280 RR (/min)_AT % max: 0.264 Variables with highest negative weight: Wingspan (cm)_Basic: -0.404 AGE (years)_Basic: -0.400 HR (/min)_Warm-up: -0.383 We have in this dimension the contrast: positive values indicate greater height + higher VE/VCO2 ratio (efficiency of gas exchange); and negative values indicate greater wingspan + older age + higher heart rate during warm-up. That is, at higher values of this axis the variables will indicate good ventilatory efficiency but greater respiratory demand at maximum intensities; and negative values will indicate greater experience (age), broader physical constitution (wingspan) and greater cardiovascular stress during warm-up. Variables with low contribution in both components (such as Sitting Height) could be considered less relevant for differentiating between cyclists. ```{r, echo=FALSE} brand.pc <- prcomp(brand.sc [ , 2:length(colnames(analisis_pca))]) #summary(brand.pc) # La más simple y directa importance_matrix <- summary(brand.pc)$importance pc12_only <- importance_matrix[, 1:2] print(pc12_only) par(mar=c(1,1,1,1)) plot(brand.pc , type="l") ``` La gráfica de Brand nos muestra cómo en las primeras dos dimensiones está resumido el 52% de la varianza acumulada; además, la dimensión uno tiene el 31.15% de explicación y la dimensión dos el 20.57%. The Brand graph shows that the first two dimensions account for 52% of the cumulative variance; with dimension one explaining 31.15% and dimension two 20.57%. ## Analisis Grafico A partir de las dimensiones halladas, establecemos entonces las interpretaciones de los cuatro cuadrantes que se forman: Esquina superior-izquierda (PC1-, PC2+): Alta eficiencia respiratoria + buen intercambio gaseoso + estatura alta Esquina superior-derecha (PC1+, PC2+): Baja eficiencia respiratoria + buen intercambio gaseoso + estatura alta Esquina inferior-izquierda (PC1-, PC2-): Alta eficiencia respiratoria + mayor edad/envergadura + mayor FC en calentamiento Esquina inferior-derecha (PC1+, PC2-): Baja eficiencia respiratoria + mayor edad/envergadura + mayor FC en calentamiento Based on the identified dimensions, we then establish the interpretations of the four quadrants that are formed: Upper-left corner (PC1-, PC2+): High respiratory efficiency + good gas exchange + tall height Upper-right corner (PC1+, PC2+): Low respiratory efficiency + good gas exchange + tall height Lower-left corner (PC1-, PC2-): High respiratory efficiency + older age/wingspan + higher HR during warm-up Lower-right corner (PC1+, PC2-): Low respiratory efficiency + older age/wingspan + higher HR during warm-up ```{r, echo=FALSE} #summary(brand.mu.pc) # Convertir los resultados del PCA en un dataframe para graficar pca_df <- as.data.frame(pca_result$x) #pca_df pca_df$cat <- analisis_pca$`Age Category` # Agregar la posición del jugador # Graficar PCA con ggplot2 library(ggplot2) pca_pos<-ggplot(pca_df, aes(x = PC1, y = PC2, color = cat)) + geom_point(alpha = 0.7) + theme_minimal() + labs(title = "PCA for category", x = "Principal Component 1", y = "Principal Component 2") pca_pos ``` Vemos las variables Wingspan (cm)_Basic, RR y Age en el tercer cuadrante, lo que indica que estas variables se encuentran en una baja eficiencia respiratoria y mayor FC en el calentamiento. Vemos las variables Height y WCO2 asociadas en el cuarto cuadrante, lo que indica que estas variables están en una alta eficiencia respiratoria, mayor FC y mayor edad. Por lo demás, no se ve un grupo asociado a un conjunto de variables, lo que nos hace pensar que las mediciones entre cada deportista varían y tienden a ser independientes. We see the variables Wingspan (cm)_Basic, RR and Age in the third quadrant, which indicates that these variables are associated with low respiratory efficiency and higher HR during warm-up. We see the variables Height and WCO2 associated in the fourth quadrant, which indicates that these variables are associated with high respiratory efficiency, higher HR and older age. Otherwise, no group associated with a set of variables is observed, which leads us to think that the measurements among each athlete vary and tend to be independent. ```{r, echo=FALSE} #brand.mean<-brand.mean[1:10,1:10] brand.mu.pc <- prcomp(brand.mean , scale=TRUE) #summary(brand.mu.pc) #brand.mu.pc$center par(mar=c(1,1,1,1)) biplot(brand.mu.pc , main="Ciclomountain " , cex=c(0.45 , 0.45)) ``` En esta gráfica podemos ver que las variables que tienen vectores más largos son aquellas que están mejor representadas; incluyen: variables respiratorias (RR) que dominan PC1 y variables de biometría que dominan PC2. This graph shows that variables with longer vectors are better represented, including respiratory variables (RR) dominating PC1 and biometric variables dominating PC2. ```{r, echo=FALSE} fviz_pca_var(pca_result, col.var = "cos2", geom.var = "arrow", labelsize = 2, repel = FALSE) ```