Guía de trabajo sobre Componentes Principales
<>
Desarrollo de la tarea 7
Ejercicio 1
Enunciado
Una empresa especializada en el diseño de automóviles de turismo desea estudiar cuáles son los deseos del público que compra automóviles.
Para ello diseña una encuesta con 10 preguntas donde se le pide a cada uno de los 20 encuestados que valore de 1 a 5 si una característica es a no muy importante. Los encuestados deberán contestar con un 5 si la característica es muy importante, un 4 si es importante, un 3 si tiene regular importancia, un 2 si es poco importante y 1 si no es nada importante. Las 10 características (V1 a V10) a valorar son: precio, financiación, consumo, combustible, seguridad, confort, capacidad, prestaciones, modernidad y aerodinámica.
matriz de informacion
library(kableExtra)
mat_X<-X6_2
mat_X1<-mat_X[,c(-1,-2)]
mat_X1 %>% head() %>%
kable(caption ="Matriz de información:" ,align = "c",digits = 6) %>%
kable_material(html_font = "sans-serif")| V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 |
|---|---|---|---|---|---|---|---|
| 4 | 3 | 3 | 2 | 4 | 4 | 4 | 4 |
| 4 | 4 | 3 | 3 | 4 | 1 | 1 | 3 |
| 3 | 1 | 4 | 2 | 1 | 5 | 4 | 5 |
| 1 | 1 | 4 | 4 | 2 | 5 | 5 | 4 |
| 2 | 1 | 5 | 5 | 4 | 3 | 3 | 2 |
| 5 | 5 | 3 | 3 | 4 | 2 | 2 | 1 |
Literal a)
Calcula la matriz de varianza covarianza para la batería de indicadores
1. De forma “manual
library(dplyr)
library(kableExtra)
centrado<-function(x){
x-mean(x)
}
Xcentrada<-apply(X = mat_X,MARGIN = 2,centrado)
Xcentrada %>% head() %>%
kable(caption ="Matriz de Variables centradas:",
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif")| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 |
|---|---|---|---|---|---|---|---|---|---|
| 0.3 | -2.4 | 0.5 | 0.2 | -0.7 | -1.7 | 0.35 | 1.15 | 1.2 | 1.35 |
| 1.3 | 1.6 | 0.5 | 1.2 | -0.7 | -0.7 | 0.35 | -1.85 | -1.8 | 0.35 |
| -1.7 | -2.4 | -0.5 | -1.8 | 0.3 | -1.7 | -2.65 | 2.15 | 1.2 | 2.35 |
| -2.7 | -2.4 | -2.5 | -1.8 | 0.3 | 0.3 | -1.65 | 2.15 | 2.2 | 1.35 |
| -2.7 | -2.4 | -1.5 | -1.8 | 1.3 | 1.3 | 0.35 | 0.15 | 0.2 | -0.65 |
| 1.3 | 1.6 | 1.5 | 2.2 | -0.7 | -0.7 | 0.35 | -0.85 | -0.8 | -1.65 |
n_obs<-nrow(mat_X)
mat_V<-t(Xcentrada)%*%Xcentrada/(n_obs-1)
mat_V %>% kable(caption ="Cálculo de V(X) forma manual:" ,
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| V1 | 1.80 | 1.92 | 1.32 | 1.73 | -0.62 | -0.31 | 0.36 | -1.21 | -1.27 | -0.90 |
| V2 | 1.92 | 2.67 | 1.42 | 2.14 | -0.66 | -0.14 | 0.52 | -1.78 | -1.81 | -1.54 |
| V3 | 1.32 | 1.42 | 1.42 | 1.53 | -0.53 | -0.32 | 0.29 | -0.92 | -1.11 | -0.87 |
| V4 | 1.73 | 2.14 | 1.53 | 2.48 | -0.80 | -0.48 | 0.35 | -1.61 | -1.83 | -1.39 |
| V5 | -0.62 | -0.66 | -0.53 | -0.80 | 0.85 | 0.80 | 0.21 | 0.37 | 0.46 | 0.15 |
| V6 | -0.31 | -0.14 | -0.32 | -0.48 | 0.80 | 1.38 | 0.63 | 0.22 | 0.09 | -0.37 |
| V7 | 0.36 | 0.52 | 0.29 | 0.35 | 0.21 | 0.63 | 1.61 | -0.53 | -0.34 | -0.71 |
| V8 | -1.21 | -1.78 | -0.92 | -1.61 | 0.37 | 0.22 | -0.53 | 1.92 | 1.81 | 1.37 |
| V9 | -1.27 | -1.81 | -1.11 | -1.83 | 0.46 | 0.09 | -0.34 | 1.81 | 2.17 | 1.56 |
| V10 | -0.90 | -1.54 | -0.87 | -1.39 | 0.15 | -0.37 | -0.71 | 1.37 | 1.56 | 1.82 |
2. Usando el comando cov de R base
library(dplyr)
library(kableExtra)
cov(mat_X1) %>%
kable(caption="Cálculo de V(X) para el diseño de automoviles de turismo",
align = "c",
digits = 3) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | |
|---|---|---|---|---|---|---|---|---|
| V3 | 1.421 | 1.526 | -0.526 | -0.316 | 0.289 | -0.921 | -1.105 | -0.868 |
| V4 | 1.526 | 2.484 | -0.800 | -0.484 | 0.347 | -1.611 | -1.832 | -1.389 |
| V5 | -0.526 | -0.800 | 0.853 | 0.800 | 0.205 | 0.374 | 0.463 | 0.153 |
| V6 | -0.316 | -0.484 | 0.800 | 1.379 | 0.626 | 0.216 | 0.095 | -0.374 |
| V7 | 0.289 | 0.347 | 0.205 | 0.626 | 1.608 | -0.529 | -0.337 | -0.708 |
| V8 | -0.921 | -1.611 | 0.374 | 0.216 | -0.529 | 1.924 | 1.811 | 1.366 |
| V9 | -1.105 | -1.832 | 0.463 | 0.095 | -0.337 | 1.811 | 2.168 | 1.558 |
| V10 | -0.868 | -1.389 | 0.153 | -0.374 | -0.708 | 1.366 | 1.558 | 1.818 |
Literal b)
Calcula la matriz de correlación para la batería de indicadores:
1. De forma “manual”
Zx<-scale(x=mat_X,center = TRUE)
Zx %>% head() %>%
kable(caption ="matriz de variables estandarizadas:" ,
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif")| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 |
|---|---|---|---|---|---|---|---|---|---|
| 0.22 | -1.47 | 0.42 | 0.13 | -0.76 | -1.45 | 0.28 | 0.83 | 0.81 | 1.00 |
| 0.97 | 0.98 | 0.42 | 0.76 | -0.76 | -0.60 | 0.28 | -1.33 | -1.22 | 0.26 |
| -1.27 | -1.47 | -0.42 | -1.14 | 0.32 | -1.45 | -2.09 | 1.55 | 0.81 | 1.74 |
| -2.01 | -1.47 | -2.10 | -1.14 | 0.32 | 0.26 | -1.30 | 1.55 | 1.49 | 1.00 |
| -2.01 | -1.47 | -1.26 | -1.14 | 1.41 | 1.11 | 0.28 | 0.11 | 0.14 | -0.48 |
| 0.97 | 0.98 | 1.26 | 1.40 | -0.76 | -0.60 | 0.28 | -0.61 | -0.54 | -1.22 |
n_obs<-nrow(mat_X)
mat_R<-t(Zx)%*%Zx/(n_obs-1)
mat_R %>% kable(caption ="Cálculo de R(X) forma manual:" ,
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| V1 | 1.00 | 0.87 | 0.82 | 0.82 | -0.50 | -0.19 | 0.21 | -0.65 | -0.64 | -0.50 |
| V2 | 0.87 | 1.00 | 0.73 | 0.83 | -0.44 | -0.07 | 0.25 | -0.78 | -0.75 | -0.70 |
| V3 | 0.82 | 0.73 | 1.00 | 0.81 | -0.48 | -0.23 | 0.19 | -0.56 | -0.63 | -0.54 |
| V4 | 0.82 | 0.83 | 0.81 | 1.00 | -0.55 | -0.26 | 0.17 | -0.74 | -0.79 | -0.65 |
| V5 | -0.50 | -0.44 | -0.48 | -0.55 | 1.00 | 0.74 | 0.18 | 0.29 | 0.34 | 0.12 |
| V6 | -0.19 | -0.07 | -0.23 | -0.26 | 0.74 | 1.00 | 0.42 | 0.13 | 0.05 | -0.24 |
| V7 | 0.21 | 0.25 | 0.19 | 0.17 | 0.18 | 0.42 | 1.00 | -0.30 | -0.18 | -0.41 |
| V8 | -0.65 | -0.78 | -0.56 | -0.74 | 0.29 | 0.13 | -0.30 | 1.00 | 0.89 | 0.73 |
| V9 | -0.64 | -0.75 | -0.63 | -0.79 | 0.34 | 0.05 | -0.18 | 0.89 | 1.00 | 0.78 |
| V10 | -0.50 | -0.70 | -0.54 | -0.65 | 0.12 | -0.24 | -0.41 | 0.73 | 0.78 | 1.00 |
2. Usando el comando cor de R base
library(dplyr)
library(kableExtra)
cor(mat_X1) %>%
kable(caption="Cálculo de R(X) para el diseño de automoviles de turismo",
align = "c",
digits = 3) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | |
|---|---|---|---|---|---|---|---|---|
| V3 | 1.000 | 0.812 | -0.478 | -0.226 | 0.192 | -0.557 | -0.630 | -0.540 |
| V4 | 0.812 | 1.000 | -0.550 | -0.262 | 0.174 | -0.737 | -0.789 | -0.654 |
| V5 | -0.478 | -0.550 | 1.000 | 0.738 | 0.175 | 0.292 | 0.341 | 0.123 |
| V6 | -0.226 | -0.262 | 0.738 | 1.000 | 0.421 | 0.132 | 0.055 | -0.236 |
| V7 | 0.192 | 0.174 | 0.175 | 0.421 | 1.000 | -0.301 | -0.180 | -0.414 |
| V8 | -0.557 | -0.737 | 0.292 | 0.132 | -0.301 | 1.000 | 0.886 | 0.730 |
| V9 | -0.630 | -0.789 | 0.341 | 0.055 | -0.180 | 0.886 | 1.000 | 0.785 |
| V10 | -0.540 | -0.654 | 0.123 | -0.236 | -0.414 | 0.730 | 0.785 | 1.000 |
3.Presenta la matriz de correlación de forma gráfica (las dos versiones propuestas en clase)
Existe una clara correlación entre las variables propuestas en la batería de indicadores, esto es gracias a los asteriscos representativos en el histograma. Las diversas correlaciones son significativas a más del 1% pero en este caso se mantienen correlaciones que no son significativas representando menos del 1%
library(corrplot)
library(grDevices)
library(Hmisc)
Mat_R<-rcorr(as.matrix(mat_X))
corrplot(Mat_R$r,
p.mat = Mat_R$r,
type="upper",
tl.col="red",
tl.srt = 20,
pch.col = "blue",
insig = "p-value",
sig.level = -1,
col = terrain.colors(100))Literal c)
Realiza un análisis de componentes principales, y con base en los criterios vistos en clase:
a. ¿Cuántas Componentes habría que retener?
calculo manual de los componentes
library(kableExtra)
library(dplyr)
library(Hmisc)
Rx<-mat_X %>% as.matrix() %>% rcorr()
Rx$r %>% kable(caption="Matriz R(X)",
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| V1 | 1.00 | 0.87 | 0.82 | 0.82 | -0.50 | -0.19 | 0.21 | -0.65 | -0.64 | -0.50 |
| V2 | 0.87 | 1.00 | 0.73 | 0.83 | -0.44 | -0.07 | 0.25 | -0.78 | -0.75 | -0.70 |
| V3 | 0.82 | 0.73 | 1.00 | 0.81 | -0.48 | -0.23 | 0.19 | -0.56 | -0.63 | -0.54 |
| V4 | 0.82 | 0.83 | 0.81 | 1.00 | -0.55 | -0.26 | 0.17 | -0.74 | -0.79 | -0.65 |
| V5 | -0.50 | -0.44 | -0.48 | -0.55 | 1.00 | 0.74 | 0.18 | 0.29 | 0.34 | 0.12 |
| V6 | -0.19 | -0.07 | -0.23 | -0.26 | 0.74 | 1.00 | 0.42 | 0.13 | 0.05 | -0.24 |
| V7 | 0.21 | 0.25 | 0.19 | 0.17 | 0.18 | 0.42 | 1.00 | -0.30 | -0.18 | -0.41 |
| V8 | -0.65 | -0.78 | -0.56 | -0.74 | 0.29 | 0.13 | -0.30 | 1.00 | 0.89 | 0.73 |
| V9 | -0.64 | -0.75 | -0.63 | -0.79 | 0.34 | 0.05 | -0.18 | 0.89 | 1.00 | 0.78 |
| V10 | -0.50 | -0.70 | -0.54 | -0.65 | 0.12 | -0.24 | -0.41 | 0.73 | 0.78 | 1.00 |
library(kableExtra)
library(dplyr)
Rx$P %>% kable(caption="p-values de R(X)",
align = "c",
digits = 2) %>%
kable_classic_2(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| V1 | NA | 0.00 | 0.00 | 0.00 | 0.02 | 0.41 | 0.37 | 0.00 | 0.00 | 0.03 |
| V2 | 0.00 | NA | 0.00 | 0.00 | 0.05 | 0.77 | 0.29 | 0.00 | 0.00 | 0.00 |
| V3 | 0.00 | 0.00 | NA | 0.00 | 0.03 | 0.34 | 0.42 | 0.01 | 0.00 | 0.01 |
| V4 | 0.00 | 0.00 | 0.00 | NA | 0.01 | 0.27 | 0.46 | 0.00 | 0.00 | 0.00 |
| V5 | 0.02 | 0.05 | 0.03 | 0.01 | NA | 0.00 | 0.46 | 0.21 | 0.14 | 0.61 |
| V6 | 0.41 | 0.77 | 0.34 | 0.27 | 0.00 | NA | 0.06 | 0.58 | 0.82 | 0.32 |
| V7 | 0.37 | 0.29 | 0.42 | 0.46 | 0.46 | 0.06 | NA | 0.20 | 0.45 | 0.07 |
| V8 | 0.00 | 0.00 | 0.01 | 0.00 | 0.21 | 0.58 | 0.20 | NA | 0.00 | 0.00 |
| V9 | 0.00 | 0.00 | 0.00 | 0.00 | 0.14 | 0.82 | 0.45 | 0.00 | NA | 0.00 |
| V10 | 0.03 | 0.00 | 0.01 | 0.00 | 0.61 | 0.32 | 0.07 | 0.00 | 0.00 | NA |
calculo usando r
library(dplyr)
library(factoextra)
library(kableExtra)
library(stargazer)
library(ggplot2)
options(scipen = 99999)
PC<-princomp(x = mat_X,cor = TRUE,fix_sign = FALSE)
factoextra::get_eig(PC) %>% kable(caption="Resumen de PCA",
align = "c",
digits = 2) %>%
kable_material(html_font = "sans-serif") %>%
kable_styling(bootstrap_options = c("hover"))| eigenvalue | variance.percent | cumulative.variance.percent | |
|---|---|---|---|
| Dim.1 | 5.70 | 57.01 | 57.01 |
| Dim.2 | 2.07 | 20.69 | 77.70 |
| Dim.3 | 0.72 | 7.20 | 84.91 |
| Dim.4 | 0.55 | 5.48 | 90.39 |
| Dim.5 | 0.32 | 3.16 | 93.54 |
| Dim.6 | 0.27 | 2.71 | 96.25 |
| Dim.7 | 0.15 | 1.46 | 97.72 |
| Dim.8 | 0.13 | 1.28 | 99.00 |
| Dim.9 | 0.07 | 0.68 | 99.68 |
| Dim.10 | 0.03 | 0.32 | 100.00 |
b. Incluye las tablas y gráficos vistos en clase
fviz_eig(PC,
choice = "eigenvalue",
barcolor = "yellow",
barfill = "yellow",
addlabels = TRUE,
)+labs(title = "Gráfico de Sedimentación",subtitle = "Usando princomp, con Autovalores")+
xlab(label = "Componentes")+
ylab(label = "Autovalores")+geom_hline(yintercept = 1)fviz_eig(PC,
choice = "variance",
barcolor = "orange",
barfill = "orange",
addlabels = TRUE,
)+labs(title = "Gráfico de Sedimentación",
subtitle = "Usando princomp, con %Varianza Explicada")+
xlab(label = "Componentes")+
ylab(label = "%Varianza")Al observar la tabla se puede determinar la cantidad de factores a retener:
Por el criterio de el bow se deben de retener 3 componentes
Por el criterio de porcentaje de varianzas explicadas (corresponde al 75% de varianza): Se tendrían tres componentes, superiores a 3/4 partes de la varianza total.