#Cálculo de V(X) y R(X)
A efectos prácticos, sólo se requiere que todas las variables sean métricas (variables numéricas), pero para los propósitos de elaboración de indicadores, se verificará también el grado de multicolinealidad de la batería de indicadores (variables dentro del dataframe), se denotará en todo momento como X, o matriz de información, y X¯ el vector de medias
Se puede trabajar sobre la matriz de varianza-covarianza de X: V(x)=(X−X¯)t⋅(X−X¯)n−1 o sobre la matriz de correlación de X: R(X)=ZtX⋅ZXn−1 donde ZX es la matriz de datos estandarizada (distribución normal: zxj=xj−xj¯σxj)
##Ejemplo de Cálculo de V(X) y R(X) en R
#Para ilustrar el cálculo se usará el siguiente ejemplo:Adaptado de Moreno, A. B., & Chávez, A. G. (2007). 100 problemas resueltos de estadística multivariante. Delta Publicaciones. ver texto
#En el archivo datos_prob_7_3.txt, se puede ver una lista de variables que caracterizan el grado de desarrollo de algunos países del mundo.
#Las variables son:
#X1 = Tasa de mortalidad infantil por cada 1000 nacidos vivos,
#X2 = Porcentaje de mujeres en la población activa,
#X3 = Producto Nacional Bruto (PNB) per capita en 1995 (en $),
#X4 = Producción de electricidad (en millones de kw/h),
#X5 = Promedio de líneas telefónicas por cada 1000 habitantes,
#X6 = Consumo de agua per capita en m3 (de 1980 a 1995),
#X7 = Consumo de energía per capita en 1994, X8 = Emisión de CO2 per capita en 1992 (en Tm).
#*Nota por el momento no se normalizarán las variables en el esquema 0-1, que se vio en el apartado introductorio.
####Matriz de Información: X
library(readr)
library(kableExtra)
url_link<-"http://halweb.uc3m.es/esp/Personal/personas/agrane/libro/ficheros_datos/capitulo_7/datos_prob_7_3.txt"
mat_X<-read_table2(url_link,col_names = FALSE)
## Warning: `read_table2()` was deprecated in readr 2.0.0.
## Please use `read_table()` instead.
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## X2 = col_double(),
## X3 = col_double(),
## X4 = col_double(),
## X5 = col_double(),
## X6 = col_double(),
## X7 = col_double(),
## X8 = col_double()
## )
mat_X %>% head() %>%
kable(caption ="Matriz de información:" ,align = "c",digits = 6) %>%
kable_material(html_font = "sans-serif")
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 |
|---|---|---|---|---|---|---|---|
| 30 | 41 | 670 | 3903 | 12 | 94 | 341 | 1.2 |
| 124 | 46 | 410 | 955 | 6 | 57 | 89 | 0.5 |
| 95 | 48 | 370 | 6 | 5 | 26 | 20 | 0.1 |
| 90 | 43 | 680 | 435 | 8 | 20 | 331 | 1.6 |
| 112 | 41 | 100 | 1293 | 2 | 51 | 22 | 0.1 |
| 73 | 51 | 390 | 6115 | 4 | 35 | 93 | 0.2 |
##Cálculo de V(X): ###Cálculo “Manual”
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
##
## group_rows
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(kableExtra)
centrado<-function(x){
x-mean(x)
}
Xcentrada<-apply(X = mat_X,MARGIN = 2,centrado)
Xcentrada %>% head() %>%
kable(caption ="Matriz de Variables centradas:",
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif")
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 |
|---|---|---|---|---|---|---|---|
| -49.67 | -0.67 | 303.89 | 1463.5 | 2.94 | -60.17 | 196.72 | 0.71 |
| 44.33 | 4.33 | 43.89 | -1484.5 | -3.06 | -97.17 | -55.28 | 0.01 |
| 15.33 | 6.33 | 3.89 | -2433.5 | -4.06 | -128.17 | -124.28 | -0.39 |
| 10.33 | 1.33 | 313.89 | -2004.5 | -1.06 | -134.17 | 186.72 | 1.11 |
| 32.33 | -0.67 | -266.11 | -1146.5 | -7.06 | -103.17 | -122.28 | -0.39 |
| -6.67 | 9.33 | 23.89 | 3675.5 | -5.06 | -119.17 | -51.28 | -0.29 |
#*-------------------------------------------------------------------------------
n_obs<-nrow(mat_X)
mat_V<-t(Xcentrada)%*%Xcentrada/(n_obs-1)
mat_V %>% kable(caption ="Cálculo de V(X) forma manual:" ,
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | |
|---|---|---|---|---|---|---|---|---|
| X1 | 716.12 | 45.06 | -2689.61 | -16082.06 | -121.63 | -1019.06 | -1844.37 | -5.15 |
| X2 | 45.06 | 46.94 | -144.31 | 2756.71 | -24.63 | -938.41 | -205.25 | -0.42 |
| X3 | -2689.61 | -144.31 | 36389.87 | 123889.71 | 740.82 | 838.33 | 17499.38 | 73.48 |
| X4 | -16082.06 | 2756.71 | 123889.71 | 5736372.38 | 3078.97 | 6672.44 | 140343.50 | 412.79 |
| X5 | -121.63 | -24.63 | 740.82 | 3078.97 | 51.47 | 405.58 | 565.22 | 1.59 |
| X6 | -1019.06 | -938.41 | 838.33 | 6672.44 | 405.58 | 26579.56 | 3149.77 | -2.96 |
| X7 | -1844.37 | -205.25 | 17499.38 | 140343.50 | 565.22 | 3149.77 | 16879.39 | 64.51 |
| X8 | -5.15 | -0.42 | 73.48 | 412.79 | 1.59 | -2.96 | 64.51 | 0.28 |
###Cálculo con R base
library(dplyr)
library(kableExtra)
cov(mat_X) %>%
kable(caption="Cálculo de V(X) a través de R base",
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | |
|---|---|---|---|---|---|---|---|---|
| X1 | 716.12 | 45.06 | -2689.61 | -16082.06 | -121.63 | -1019.06 | -1844.37 | -5.15 |
| X2 | 45.06 | 46.94 | -144.31 | 2756.71 | -24.63 | -938.41 | -205.25 | -0.42 |
| X3 | -2689.61 | -144.31 | 36389.87 | 123889.71 | 740.82 | 838.33 | 17499.38 | 73.48 |
| X4 | -16082.06 | 2756.71 | 123889.71 | 5736372.38 | 3078.97 | 6672.44 | 140343.50 | 412.79 |
| X5 | -121.63 | -24.63 | 740.82 | 3078.97 | 51.47 | 405.58 | 565.22 | 1.59 |
| X6 | -1019.06 | -938.41 | 838.33 | 6672.44 | 405.58 | 26579.56 | 3149.77 | -2.96 |
| X7 | -1844.37 | -205.25 | 17499.38 | 140343.50 | 565.22 | 3149.77 | 16879.39 | 64.51 |
| X8 | -5.15 | -0.42 | 73.48 | 412.79 | 1.59 | -2.96 | 64.51 | 0.28 |
##Cálculo de R(X) ###Cálculo “Manual”
Zx<-scale(x = mat_X,center =TRUE)
Zx %>% head() %>%
kable(caption ="Matriz de Variables Estandarizadas:",
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif")
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 |
|---|---|---|---|---|---|---|---|
| -1.86 | -0.10 | 1.59 | 0.61 | 0.41 | -0.37 | 1.51 | 1.34 |
| 1.66 | 0.63 | 0.23 | -0.62 | -0.43 | -0.60 | -0.43 | 0.02 |
| 0.57 | 0.92 | 0.02 | -1.02 | -0.57 | -0.79 | -0.96 | -0.73 |
| 0.39 | 0.19 | 1.65 | -0.84 | -0.15 | -0.82 | 1.44 | 2.09 |
| 1.21 | -0.10 | -1.39 | -0.48 | -0.98 | -0.63 | -0.94 | -0.73 |
| -0.25 | 1.36 | 0.13 | 1.53 | -0.70 | -0.73 | -0.39 | -0.54 |
#*-------------------------------------------------------------------------------
n_obs<-nrow(mat_X)
mat_R<-t(Zx)%*%Zx/(n_obs-1)
mat_R %>% kable(caption ="Cálculo de R(X) forma manual:" ,
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | |
|---|---|---|---|---|---|---|---|---|
| X1 | 1.00 | 0.25 | -0.53 | -0.25 | -0.63 | -0.23 | -0.53 | -0.36 |
| X2 | 0.25 | 1.00 | -0.11 | 0.17 | -0.50 | -0.84 | -0.23 | -0.12 |
| X3 | -0.53 | -0.11 | 1.00 | 0.27 | 0.54 | 0.03 | 0.71 | 0.73 |
| X4 | -0.25 | 0.17 | 0.27 | 1.00 | 0.18 | 0.02 | 0.45 | 0.32 |
| X5 | -0.63 | -0.50 | 0.54 | 0.18 | 1.00 | 0.35 | 0.61 | 0.42 |
| X6 | -0.23 | -0.84 | 0.03 | 0.02 | 0.35 | 1.00 | 0.15 | -0.03 |
| X7 | -0.53 | -0.23 | 0.71 | 0.45 | 0.61 | 0.15 | 1.00 | 0.93 |
| X8 | -0.36 | -0.12 | 0.73 | 0.32 | 0.42 | -0.03 | 0.93 | 1.00 |
##Cálculo usando R base
library(dplyr)
library(kableExtra)
cor(mat_X) %>%
kable(caption="Cálculo de R(X) a través de R base",
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | |
|---|---|---|---|---|---|---|---|---|
| X1 | 1.00 | 0.25 | -0.53 | -0.25 | -0.63 | -0.23 | -0.53 | -0.36 |
| X2 | 0.25 | 1.00 | -0.11 | 0.17 | -0.50 | -0.84 | -0.23 | -0.12 |
| X3 | -0.53 | -0.11 | 1.00 | 0.27 | 0.54 | 0.03 | 0.71 | 0.73 |
| X4 | -0.25 | 0.17 | 0.27 | 1.00 | 0.18 | 0.02 | 0.45 | 0.32 |
| X5 | -0.63 | -0.50 | 0.54 | 0.18 | 1.00 | 0.35 | 0.61 | 0.42 |
| X6 | -0.23 | -0.84 | 0.03 | 0.02 | 0.35 | 1.00 | 0.15 | -0.03 |
| X7 | -0.53 | -0.23 | 0.71 | 0.45 | 0.61 | 0.15 | 1.00 | 0.93 |
| X8 | -0.36 | -0.12 | 0.73 | 0.32 | 0.42 | -0.03 | 0.93 | 1.00 |
###Versiones gráficas de R(X)
Pueden ser especialmente útiles en la presentación de reportes, además de proveer una vista rápida de algunas características propias de la correlación entre las variables.
####Usando el paquete PerformanceAnalytics
library(PerformanceAnalytics)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
chart.Correlation(as.matrix(mat_X),histogram = TRUE,pch=12)
####Usando el paquete corrplot más sobre el paquete corrplot
library(corrplot)
## corrplot 0.92 loaded
library(grDevices)
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
Mat_R<-rcorr(as.matrix(mat_X))
corrplot(Mat_R$r,
p.mat = Mat_R$r,
type="upper",
tl.col="black",
tl.srt = 20,
pch.col = "blue",
insig = "p-value",
sig.level = -1,
col = terrain.colors(100))
##Extracción de los Componentes La Primera componente Principal, se expreza como la combinación lineal de las variables originales C1j=a11⋅x11+a12⋅x12+a13⋅x13+…+a1n⋅x1n Matricialmente se tiene que:
C1=X⋅a Sí la matriz de información se encuentra normalizada, entonces: C1=Z⋅a Entonces la varianza de C1 se puede escribir como: V(C1)=∑nj=1C21n−1 Matricialmente se tiene que:
V(C1)=C1t⋅C1n−1=(Z⋅a)t⋅(Z⋅a)n−1=at⋅Zt⋅Z⋅an−1=at⋅R(X)⋅a Deben elegirse los pesos a, de manera tal que V(C1) sea máxima, y a debe ser de longitud unitaria, es decir de norma 1. Por lo tanto el problema se reduce a: Max at⋅R(X)⋅a s.a at⋅a=1 Aplicando la técnica de los multiplicadores de Lagrange:
L=at⋅R(X)⋅a−λ⋅(at⋅a−1) Aplicando la condición de primer orden: ∂L∂a=2⋅R(x)⋅a−λ⋅2⋅a=0 ∂L∂λ=at⋅a−1=0
Lo que implica que:
[R(x)−λ⋅I]⋅a=0 at⋅a=1 Es decir encontrando el máximo autovalor de R(X) y el respectivo autovector (valores de a), queda resuelto el problema de maximización.
De la misma manera se encuentran el resto de componentes.
En general para encontrar la totalidad de los componentes principales, se deberán calcular los autovalores de R(X) y los autovalores para cada uno de ellos.
###Ejemplo de extracción.
Cálculo “manual” de los componentes Siguiendo el ejemplo de los datos de “Desarrollo”, se encontraran todos los componentes de la batería de indicadores (dataframe):
library(kableExtra)
library(dplyr)
library(Hmisc)
Rx<-mat_X %>% as.matrix() %>% rcorr()
Rx$r %>% kable(caption="Matriz R(X)",
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | |
|---|---|---|---|---|---|---|---|---|
| X1 | 1.00 | 0.25 | -0.53 | -0.25 | -0.63 | -0.23 | -0.53 | -0.36 |
| X2 | 0.25 | 1.00 | -0.11 | 0.17 | -0.50 | -0.84 | -0.23 | -0.12 |
| X3 | -0.53 | -0.11 | 1.00 | 0.27 | 0.54 | 0.03 | 0.71 | 0.73 |
| X4 | -0.25 | 0.17 | 0.27 | 1.00 | 0.18 | 0.02 | 0.45 | 0.32 |
| X5 | -0.63 | -0.50 | 0.54 | 0.18 | 1.00 | 0.35 | 0.61 | 0.42 |
| X6 | -0.23 | -0.84 | 0.03 | 0.02 | 0.35 | 1.00 | 0.15 | -0.03 |
| X7 | -0.53 | -0.23 | 0.71 | 0.45 | 0.61 | 0.15 | 1.00 | 0.93 |
| X8 | -0.36 | -0.12 | 0.73 | 0.32 | 0.42 | -0.03 | 0.93 | 1.00 |
Rx$P %>% kable(caption="p-values de R(X)",
align = "c",
digits = 2) %>%
kable_classic_2(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | |
|---|---|---|---|---|---|---|---|---|
| X1 | NA | 0.33 | 0.02 | 0.32 | 0.00 | 0.35 | 0.02 | 0.14 |
| X2 | 0.33 | NA | 0.66 | 0.51 | 0.03 | 0.00 | 0.36 | 0.65 |
| X3 | 0.02 | 0.66 | NA | 0.28 | 0.02 | 0.92 | 0.00 | 0.00 |
| X4 | 0.32 | 0.51 | 0.28 | NA | 0.48 | 0.95 | 0.06 | 0.19 |
| X5 | 0.00 | 0.03 | 0.02 | 0.48 | NA | 0.16 | 0.01 | 0.08 |
| X6 | 0.35 | 0.00 | 0.92 | 0.95 | 0.16 | NA | 0.56 | 0.89 |
| X7 | 0.02 | 0.36 | 0.00 | 0.06 | 0.01 | 0.56 | NA | 0.00 |
| X8 | 0.14 | 0.65 | 0.00 | 0.19 | 0.08 | 0.89 | 0.00 | NA |
####Descomposición de autovalores y autovectores
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
descomposicion<-eigen(Rx$r)
t(descomposicion$values) %>% kable(caption="Autovalores de R(X)",
align = "c",
digits = 2) %>%
kable_classic_2(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| 3.75 | 1.93 | 0.84 | 0.72 | 0.34 | 0.31 | 0.1 | 0.01 |
descomposicion$vectors %>% kable(caption="Autovectores de R(X)",
align = "c",
digits = 2) %>%
kable_classic_2(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| -0.37 | -0.05 | -0.03 | 0.71 | -0.42 | 0.41 | 0.12 | 0.06 |
| -0.22 | -0.61 | 0.05 | -0.24 | 0.05 | 0.01 | 0.70 | -0.15 |
| 0.41 | -0.20 | -0.27 | -0.02 | 0.36 | 0.75 | -0.05 | 0.11 |
| 0.22 | -0.29 | 0.88 | 0.04 | -0.08 | 0.16 | -0.23 | -0.06 |
| 0.41 | 0.18 | -0.09 | -0.35 | -0.75 | 0.18 | 0.19 | -0.18 |
| 0.18 | 0.61 | 0.30 | 0.21 | 0.32 | 0.09 | 0.57 | -0.18 |
| 0.47 | -0.17 | -0.01 | 0.28 | -0.09 | -0.34 | 0.26 | 0.69 |
| 0.41 | -0.27 | -0.21 | 0.45 | 0.04 | -0.29 | -0.07 | -0.65 |
####Cálculo usando R:
library(dplyr)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(kableExtra)
library(stargazer)
library(ggplot2)
options(scipen = 99999)
PC<-princomp(x = mat_X,cor = TRUE,fix_sign = FALSE)
factoextra::get_eig(PC) %>% kable(caption="Resumen de PCA",
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("hover"))
| eigenvalue | variance.percent | cumulative.variance.percent | |
|---|---|---|---|
| Dim.1 | 3.75 | 46.92 | 46.92 |
| Dim.2 | 1.93 | 24.11 | 71.03 |
| Dim.3 | 0.84 | 10.45 | 81.48 |
| Dim.4 | 0.72 | 9.04 | 90.52 |
| Dim.5 | 0.34 | 4.26 | 94.77 |
| Dim.6 | 0.31 | 3.81 | 98.59 |
| Dim.7 | 0.10 | 1.24 | 99.83 |
| Dim.8 | 0.01 | 0.17 | 100.00 |
fviz_eig(PC,
choice = "eigenvalue",
barcolor = "red",
barfill = "red",
addlabels = TRUE,
)+labs(title = "Gráfico de Sedimentación",subtitle = "Usando princomp, con Autovalores")+
xlab(label = "Componentes")+
ylab(label = "Autovalores")+geom_hline(yintercept = 1)
fviz_eig(PC,
choice = "variance",
barcolor = "green",
barfill = "green",
addlabels = TRUE,
)+labs(title = "Gráfico de Sedimentación",
subtitle = "Usando princomp, con %Varianza Explicada")+
xlab(label = "Componentes")+
ylab(label = "%Varianza")
####Correlación de los componentes con las variables: rij=aj⋅λ−−√j
library(dplyr)
library(kableExtra)
raiz_lambda<-as.matrix(sqrt(descomposicion$values))
autovectores<-descomposicion$vectors
corr_componentes_coordenadas<-vector(mode = "list")
for(j in 1:8){raiz_lambda[j]*autovectores[,j]->corr_componentes_coordenadas[[j]]}
corr_componentes_coordenadas %>% bind_cols()->corr_componentes_coordenadas
## New names:
## * `` -> `...1`
## * `` -> `...2`
## * `` -> `...3`
## * `` -> `...4`
## * `` -> `...5`
## * `` -> `...6`
## * `` -> `...7`
## * `` -> `...8`
names(corr_componentes_coordenadas)<-paste0("Comp",1:8)
corr_componentes_coordenadas %>% as.data.frame() %>%
kable(caption="Correlación de X con las componentes",
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Comp1 | Comp2 | Comp3 | Comp4 | Comp5 | Comp6 | Comp7 | Comp8 |
|---|---|---|---|---|---|---|---|
| -0.72 | -0.06 | -0.03 | 0.60 | -0.25 | 0.22 | 0.04 | 0.01 |
| -0.43 | -0.85 | 0.04 | -0.20 | 0.03 | 0.01 | 0.22 | -0.02 |
| 0.80 | -0.28 | -0.25 | -0.02 | 0.21 | 0.42 | -0.02 | 0.01 |
| 0.42 | -0.40 | 0.81 | 0.03 | -0.05 | 0.09 | -0.07 | -0.01 |
| 0.80 | 0.25 | -0.08 | -0.29 | -0.44 | 0.10 | 0.06 | -0.02 |
| 0.34 | 0.84 | 0.27 | 0.18 | 0.19 | 0.05 | 0.18 | -0.02 |
| 0.91 | -0.23 | -0.01 | 0.24 | -0.05 | -0.19 | 0.08 | 0.08 |
| 0.80 | -0.38 | -0.20 | 0.38 | 0.02 | -0.16 | -0.02 | -0.08 |
library(dplyr)
library(factoextra)
library(kableExtra)
variables_pca<-get_pca_var(PC)
variables_pca$coord%>%
kable(caption="Correlación de X con las componentes, usando factoextra",
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Dim.1 | Dim.2 | Dim.3 | Dim.4 | Dim.5 | Dim.6 | Dim.7 | Dim.8 | |
|---|---|---|---|---|---|---|---|---|
| X1 | -0.72 | -0.06 | -0.03 | 0.60 | 0.25 | 0.22 | 0.04 | 0.01 |
| X2 | -0.43 | -0.85 | 0.04 | -0.20 | -0.03 | 0.01 | 0.22 | -0.02 |
| X3 | 0.80 | -0.28 | -0.25 | -0.02 | -0.21 | 0.42 | -0.02 | 0.01 |
| X4 | 0.42 | -0.40 | 0.81 | 0.03 | 0.05 | 0.09 | -0.07 | -0.01 |
| X5 | 0.80 | 0.25 | -0.08 | -0.29 | 0.44 | 0.10 | 0.06 | -0.02 |
| X6 | 0.34 | 0.84 | 0.27 | 0.18 | -0.19 | 0.05 | 0.18 | -0.02 |
| X7 | 0.91 | -0.23 | -0.01 | 0.24 | 0.05 | -0.19 | 0.08 | 0.08 |
| X8 | 0.80 | -0.38 | -0.20 | 0.38 | -0.02 | -0.16 | -0.02 | -0.08 |
# Representación Gráfica de las correlaciones en los ejes de los componentes
fviz_pca_var(PC,repel = TRUE,axes = c(1,2))
fviz_pca_var(PC,repel = TRUE,axes = c(3,4))
fviz_pca_var(PC,repel = TRUE,axes = c(5,6))
fviz_pca_var(PC,repel = TRUE,axes = c(7,8))
#Representación alternativa:
library(corrplot)
corrplot(variables_pca$coord,is.corr = FALSE,method = "square",addCoef.col="black",number.cex = 0.75)
##Análisis Factorial. En el caso anterior se encontraron unas variables “sintéticas” que pueden sustituir a las variables originales, pero aún no se ha reducido la dimensión de la información. en este apartado se explicarán las características de la técnica de Componentes Principales, en cuanto a su uso dentro del Análisis Factorial.
Modelo Factorial. El modelo factorial permite realizar la agrupación de las variables, garantizando la reducción de la dimensión del dataframe a nivel de variables. Hay “p” variables y “n” casos (la matriz X es de dimensión nxk: Xn×p) Hay “k” Factores comunes, en menor cantidad que las variables originales: (k<p) los Fj se denominan factores comunes al conjunto de variables del modelo y los ϵj representan los factores específicos de cada variable. los lij se denominan cargas factoriales X1=l11⋅F1+l12⋅F2+…+l1k⋅Fk+ϵ1X2=l21⋅F1+l22⋅F2+…+l2k⋅Fk+ϵ2⋮Xp=lp1⋅F1+lp2⋅F2+…+lpk⋅Fk+ϵp matricialmente se pueden escribir como: ⎡⎣⎢⎢⎢⎢X1X2⋮Xp⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢l11l21⋮lp1l12l22⋮lp2⋯⋯⋮⋯l1kl2k⋮lpk⎤⎦⎥⎥⎥⎥⋅⎡⎣⎢⎢⎢⎢F1F2⋮Fk⎤⎦⎥⎥⎥⎥+⎡⎣⎢⎢⎢⎢ϵ1ϵ2⋮ϵp⎤⎦⎥⎥⎥⎥
X=L⋅F+ϵ Los supuestos del modelo son los siguientes:
El valor esperado de las varianzas especificas son nulas: E[ϵ]=0 El valor esperado de los factores es nulo: E[F]=0 Los factores son independientes entre si (factores ortogonales): E[F⋅F′]=I Las varianzas especificas son independientes entre si: E[ϵ⋅ϵ′]=Ω
con Ω como matriz diagonal. La varianza común no esta correlacionada con la varianza especifica: E[F⋅ϵ′]=0 Cálculo de las comunalidades y especificidades: Tipificando la matriz de información, la expresión matricial queda como: Z=L⋅F+ϵ De la definición de la matriz de correlación se tiene que: Rp=E[Z⋅Z′] Sustituyendo Z=L⋅F+ϵ se tiene que: Rp=E[(L⋅F+ϵ)⋅(L⋅F+ϵ)′] Expandiendo dentro del operador de expectativas: Rp=E[L⋅F⋅F′L′+L⋅F⋅ϵ′+(L⋅F⋅ϵ′)′+ϵ⋅ϵ′] Aplicando el operador de expectativas: Rp=L⋅E[F⋅F′]⋅L′+E[L⋅F⋅ϵ′]+E[(L⋅F⋅ϵ′)′+E[ϵ⋅ϵ′] Aplicando los supuestos se tiene que: Rp=L⋅I⋅L′+0+0+Ω Por lo que se tiene que: Rp=L⋅L′+Ω Es decir en forma extensiva: ⎡⎣⎢⎢⎢⎢⎢1ρ21⋮ρp1ρ121⋮ρp2ρ13ρ23⋮ρp3⋯⋯⋱⋯ρ1pρ2p⋮1⎤⎦⎥⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢l11l21⋮lp1l12l22⋮lp2⋯⋯⋮⋯l1kl2k⋮lpk⎤⎦⎥⎥⎥⎥⋅⎡⎣⎢⎢⎢⎢l11l21⋮lp1l12l22⋮lp2⋯⋯⋮⋯l1kl2k⋮lpk⎤⎦⎥⎥⎥⎥′+⎡⎣⎢⎢⎢⎢⎢ω210⋮00ω22⋮0⋯⋯⋱⋯00⋮ω2p⎤⎦⎥⎥⎥⎥⎥
Al multiplicar L⋅L′, sumar Ω e igualar los elementos de la diagonal principal, se tiene:
1=l2j1+l2j2+⋯+l2jp+ω2j resumiendo h2j=l2j1+l2j2+⋯+l2jp Se tiene que: 1=h2j+ω2j Donde:
h2j se denomina comunalidad, o varianza común de Xj derivada de los factores comunes (los Fj)
ω2j se denomina especificidad, o varianza especifíca, ocasionada por el factor especifico de la variable el ϵj
###Extracción de los factores a través de Componentes Principales.
Si se aplica la descomposición SVD Rp=V⋅Λ⋅V′
Con Λ como la matriz diagonal de autovalores y V como matriz de autovectores Puede escribirse también así: Rp=V⋅Λ1/2⋅Λ1/2⋅V′ Definiendo ϕ=V⋅Λ1/2, entonces puede reescribirse así: Rp=ϕ⋅ϕ′ Con ϕ como la correlación de las X con sus Componentes Principales igualando a la expresión del modelo factorial se tiene que:
ϕ⋅ϕ′=L⋅L′+Ω Reteniendo las primeras “k” componentes, la varianza normalizada se puede obtener que: h2j=l2j1+l2j2+⋯+l2jk ω2j=l2j(k+1)+⋯+l2jp
y de be cumplirse que:
1=l2j1+l2j2+⋯+l2jk+l2j(k+1)+⋯+l2jp
###Análisis Factorial en R
library(psych)
##
## Attaching package: 'psych'
## The following object is masked from 'package:Hmisc':
##
## describe
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(corrplot)
library(dplyr)
#Modelo de 2 Factores (sin rotar)
numero_de_factores<-2
modelo_2_factores<-principal(r = Rx$r,
nfactors = numero_de_factores,
covar = FALSE,
rotate = "none")
modelo_2_factores
## Principal Components Analysis
## Call: principal(r = Rx$r, nfactors = numero_de_factores, rotate = "none",
## covar = FALSE)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 h2 u2 com
## X1 -0.72 0.06 0.53 0.472 1.0
## X2 -0.43 0.85 0.91 0.093 1.5
## X3 0.80 0.28 0.72 0.280 1.2
## X4 0.42 0.40 0.33 0.668 2.0
## X5 0.80 -0.25 0.70 0.302 1.2
## X6 0.34 -0.84 0.82 0.176 1.3
## X7 0.91 0.23 0.89 0.108 1.1
## X8 0.80 0.38 0.78 0.217 1.4
##
## PC1 PC2
## SS loadings 3.75 1.93
## Proportion Var 0.47 0.24
## Cumulative Var 0.47 0.71
## Proportion Explained 0.66 0.34
## Cumulative Proportion 0.66 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 2 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.09
##
## Fit based upon off diagonal values = 0.96
correlaciones_modelo<-variables_pca$coord
corrplot(correlaciones_modelo[,1:numero_de_factores],
is.corr = FALSE,
method = "square",addCoef.col="black",number.cex = 0.75)
library(psych)
library(corrplot)
library(dplyr)
#Modelo de 3 Factores (sin rotar)
numero_de_factores<-3
modelo_3_factores<-principal(r = Rx$r,
nfactors = numero_de_factores,
covar = FALSE,
rotate = "none")
modelo_3_factores
## Principal Components Analysis
## Call: principal(r = Rx$r, nfactors = numero_de_factores, rotate = "none",
## covar = FALSE)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 PC3 h2 u2 com
## X1 -0.72 0.06 -0.03 0.53 0.472 1.0
## X2 -0.43 0.85 0.04 0.91 0.092 1.5
## X3 0.80 0.28 -0.25 0.78 0.219 1.4
## X4 0.42 0.40 0.81 0.98 0.017 2.0
## X5 0.80 -0.25 -0.08 0.71 0.295 1.2
## X6 0.34 -0.84 0.27 0.90 0.101 1.6
## X7 0.91 0.23 -0.01 0.89 0.108 1.1
## X8 0.80 0.38 -0.20 0.82 0.179 1.6
##
## PC1 PC2 PC3
## SS loadings 3.75 1.93 0.84
## Proportion Var 0.47 0.24 0.10
## Cumulative Var 0.47 0.71 0.81
## Proportion Explained 0.58 0.30 0.13
## Cumulative Proportion 0.58 0.87 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 3 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.08
##
## Fit based upon off diagonal values = 0.97
correlaciones_modelo<-variables_pca$coord
corrplot(correlaciones_modelo[,1:numero_de_factores],
is.corr = FALSE,
method = "square",addCoef.col="black",number.cex = 0.75)
library(psych)
library(corrplot)
library(dplyr)
#Modelo de 4 Factores (sin rotar)
numero_de_factores<-4
modelo_4_factores<-principal(r = Rx$r,
nfactors = numero_de_factores,
covar = FALSE,
rotate = "none")
modelo_4_factores
## Principal Components Analysis
## Call: principal(r = Rx$r, nfactors = numero_de_factores, rotate = "none",
## covar = FALSE)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 PC3 PC4 h2 u2 com
## X1 -0.72 0.06 -0.03 0.60 0.89 0.112 2.0
## X2 -0.43 0.85 0.04 -0.20 0.95 0.050 1.6
## X3 0.80 0.28 -0.25 -0.02 0.78 0.219 1.4
## X4 0.42 0.40 0.81 0.03 0.98 0.016 2.0
## X5 0.80 -0.25 -0.08 -0.29 0.79 0.208 1.5
## X6 0.34 -0.84 0.27 0.18 0.93 0.070 1.7
## X7 0.91 0.23 -0.01 0.24 0.95 0.052 1.3
## X8 0.80 0.38 -0.20 0.38 0.97 0.032 2.1
##
## PC1 PC2 PC3 PC4
## SS loadings 3.75 1.93 0.84 0.72
## Proportion Var 0.47 0.24 0.10 0.09
## Cumulative Var 0.47 0.71 0.81 0.91
## Proportion Explained 0.52 0.27 0.12 0.10
## Cumulative Proportion 0.52 0.78 0.90 1.00
##
## Mean item complexity = 1.7
## Test of the hypothesis that 4 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.04
##
## Fit based upon off diagonal values = 0.99
correlaciones_modelo<-variables_pca$coord
corrplot(correlaciones_modelo[,1:numero_de_factores],
is.corr = FALSE,
method = "square",addCoef.col="black",number.cex = 0.75)
###Rotación de la Solución Esta operación se implementa para mejorar la interpretación de la extracción (solución) propuesta. Hay dos tipos de rotación: Ortogonal y Oblicua, la primera mantiene la independencia entre los factores, en cambio la segunda no.
Para efectos de la agrupación de variables, se implementará la rotación ortogonal VARIMAX, propuesta por Kaiser, ya que permite obtener factores independientes y mantener la estructura de varianza de la solución original.
library(psych)
library(corrplot)
library(dplyr)
#Modelo de 2 Factores (Rotado)
numero_de_factores<-2
modelo_2_factores<-principal(r = Rx$r,
nfactors = numero_de_factores,
covar = FALSE,
rotate = "varimax")
modelo_2_factores
## Principal Components Analysis
## Call: principal(r = Rx$r, nfactors = numero_de_factores, rotate = "varimax",
## covar = FALSE)
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC1 RC2 h2 u2 com
## X1 -0.63 -0.36 0.53 0.472 1.6
## X2 -0.04 -0.95 0.91 0.093 1.0
## X3 0.84 0.08 0.72 0.280 1.0
## X4 0.54 -0.19 0.33 0.668 1.2
## X5 0.62 0.56 0.70 0.302 2.0
## X6 -0.03 0.91 0.82 0.176 1.0
## X7 0.93 0.16 0.89 0.108 1.1
## X8 0.88 -0.01 0.78 0.217 1.0
##
## RC1 RC2
## SS loadings 3.45 2.24
## Proportion Var 0.43 0.28
## Cumulative Var 0.43 0.71
## Proportion Explained 0.61 0.39
## Cumulative Proportion 0.61 1.00
##
## Mean item complexity = 1.2
## Test of the hypothesis that 2 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.09
##
## Fit based upon off diagonal values = 0.96
correlaciones_modelo<-variables_pca$coord
correlaciones_modelo_rotada<-varimax(correlaciones_modelo[,1:numero_de_factores])$loadings
corrplot(correlaciones_modelo_rotada[,1:numero_de_factores],
is.corr = FALSE,
method = "square",
addCoef.col="black",
number.cex = 0.75)
###############################
#Modelo de 3 Factores (Rotado)
numero_de_factores<-3
modelo_3_factores<-principal(r = Rx$r,
nfactors = numero_de_factores,
covar = FALSE,
rotate = "varimax")
modelo_3_factores
## Principal Components Analysis
## Call: principal(r = Rx$r, nfactors = numero_de_factores, rotate = "varimax",
## covar = FALSE)
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC1 RC2 RC3 h2 u2 com
## X1 -0.62 -0.33 -0.18 0.53 0.472 1.7
## X2 -0.15 -0.92 0.18 0.91 0.092 1.1
## X3 0.88 -0.01 0.04 0.78 0.219 1.0
## X4 0.23 -0.06 0.96 0.98 0.017 1.1
## X5 0.67 0.51 0.04 0.71 0.295 1.9
## X6 -0.03 0.94 0.09 0.90 0.101 1.0
## X7 0.90 0.12 0.27 0.89 0.108 1.2
## X8 0.89 -0.09 0.12 0.82 0.179 1.1
##
## RC1 RC2 RC3
## SS loadings 3.30 2.13 1.09
## Proportion Var 0.41 0.27 0.14
## Cumulative Var 0.41 0.68 0.81
## Proportion Explained 0.51 0.33 0.17
## Cumulative Proportion 0.51 0.83 1.00
##
## Mean item complexity = 1.3
## Test of the hypothesis that 3 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.08
##
## Fit based upon off diagonal values = 0.97
correlaciones_modelo<-variables_pca$coord
correlaciones_modelo_rotada<-varimax(correlaciones_modelo[,1:numero_de_factores],
normalize = TRUE)$loadings
corrplot(correlaciones_modelo_rotada[,1:numero_de_factores],
is.corr = FALSE,
method = "square",
addCoef.col="black",
number.cex = 0.75)
###############################
#Modelo de 4 Factores (Rotado)
numero_de_factores<-4
modelo_4_factores<-principal(r = Rx$r,
nfactors = numero_de_factores,
covar = FALSE,
rotate = "varimax")
modelo_4_factores
## Principal Components Analysis
## Call: principal(r = Rx$r, nfactors = numero_de_factores, rotate = "varimax",
## covar = FALSE)
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC1 RC2 RC4 RC3 h2 u2 com
## X1 -0.22 -0.10 -0.90 -0.14 0.89 0.112 1.2
## X2 -0.12 -0.94 -0.16 0.18 0.95 0.050 1.2
## X3 0.76 -0.04 0.45 0.03 0.78 0.219 1.6
## X4 0.22 -0.06 0.11 0.96 0.98 0.016 1.1
## X5 0.38 0.37 0.72 0.01 0.79 0.208 2.1
## X6 -0.06 0.95 0.12 0.10 0.93 0.070 1.1
## X7 0.88 0.15 0.29 0.27 0.95 0.052 1.5
## X8 0.97 0.00 0.10 0.12 0.97 0.032 1.1
##
## RC1 RC2 RC4 RC3
## SS loadings 2.55 1.95 1.67 1.07
## Proportion Var 0.32 0.24 0.21 0.13
## Cumulative Var 0.32 0.56 0.77 0.91
## Proportion Explained 0.35 0.27 0.23 0.15
## Cumulative Proportion 0.35 0.62 0.85 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 4 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.04
##
## Fit based upon off diagonal values = 0.99
correlaciones_modelo<-variables_pca$coord
correlaciones_modelo_rotada<-varimax(correlaciones_modelo[,1:numero_de_factores],
normalize = TRUE)$loadings
corrplot(correlaciones_modelo_rotada[,1:numero_de_factores],
is.corr = FALSE,
method = "square",
addCoef.col="black",
number.cex = 0.75)
###Verificación de supuestos: Prueba de Barlett y KMO. Prueba de Esfericidad de Barlett Esta prueba identifica si a nivel poblacional, las variables presentan independencia estadistica (son ortogonales), a través de la matriz de correlación muestral R, y se verifica si a nivel poblacional dicha matriz de correlación corresponde a una matriz identidad, las hipótesis de la prueba son las siguientes:
{H0:H1:R∼I R≁I
Si no se rechaza H0, no hay evidencia de multicolinealidad, caso contrario Si se rechaza H0 hay evidencia de multicolinealidad El contraste se realiza a través de:
χ2B=−(n−1−2⋅p+56)ln(|R|)
con gl=p(p−1)/2
Rechazar H0 si χ2B≥V.C., o si p≤α
library(psych)
options(scipen = 99999)
Barlett<-cortest.bartlett(mat_X)
## R was not square, finding R from data
print(Barlett)
## $chisq
## [1] 99.52973
##
## $p.value
## [1] 0.0000000006035519
##
## $df
## [1] 28
library(psych)
KMO<-KMO(mat_X)
print(KMO)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = mat_X)
## Overall MSA = 0.5
## MSA for each item =
## X1 X2 X3 X4 X5 X6 X7 X8
## 0.75 0.42 0.62 0.53 0.50 0.35 0.51 0.43
library(rela)
KMO<-paf(as.matrix(mat_X))$KMO
print(KMO)
## [1] 0.49718