Ejercicio: Una empresa especializada en el diseño de automóviles de turismo desea estudiar

cuáles son los deseos del público que compra automóviles. Para ello diseña una encuesta con 10 preguntas donde se le pide a cada uno de los 20 encuestados que valore de 1 a 5 si una característica es o no muy importante. Los encuestados deberán contestar con un 5 si la característica es muy importante, un 4 si es importante, un 3 si tiene regular importancia, un 2 si es poco importante y un 1 si no es nada importante. Las 10 características (V1 a V10) a valorar son: precio, financiación, consumo, combustible, seguridad, confort, capacidad, prestaciones, modernidad y aerodinámica.

Realizar un análisis de Componentes Principales, una solución adecuada de la cantidad de

Componentes a retener y justifique su respuesta.

library(kableExtra)
library(knitr)

load("C:/Users/MINEDUCYT/Desktop/METODOS PARA EL ANALISIS ECONOMICO/6-2.RData")
Datos<- X6_2

tabla <- kable(Datos, "html") %>%
  kable_styling(bootstrap_options = "striped", full_width = F)

   tabla
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
4 1 4 3 3 2 4 4 4 4
5 5 4 4 3 3 4 1 1 3
2 1 3 1 4 2 1 5 4 5
1 1 1 1 4 4 2 5 5 4
1 1 2 1 5 5 4 3 3 2
5 5 5 5 3 3 4 2 2 1
4 5 4 4 2 2 5 1 1 1
3 2 3 1 4 4 2 5 5 5
4 4 4 3 4 4 3 1 1 1
5 5 5 5 2 2 3 2 2 2
2 2 2 1 5 4 4 3 4 3
4 4 5 5 4 5 5 2 1 2
3 2 2 1 4 5 4 4 3 3
5 5 4 4 5 4 4 1 2 2
4 3 3 1 4 4 5 3 4 4
5 5 4 4 4 5 4 2 1 1
4 4 5 2 4 5 5 4 4 2
5 5 4 4 2 2 1 2 2 3
3 3 2 2 4 4 5 4 5 4
5 5 4 4 4 5 4 3 2 1

a) Calcula la matriz de varianza covarianza para la batería de indicadores:

1. De forma “manual”
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.1
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
## 
##     group_rows
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(kableExtra)
centrado<-function(x){
  x-mean(x)
}
Xcentrada<-apply(X = Datos,MARGIN = 2,centrado)
Xcentrada %>% head() %>% 
  kable(caption ="Matriz de Variables centradas:",
        align = "c",
        digits = 2) %>% 
  kable_material(html_font = "sans-serif")
Matriz de Variables centradas:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
0.3 -2.4 0.5 0.2 -0.7 -1.7 0.35 1.15 1.2 1.35
1.3 1.6 0.5 1.2 -0.7 -0.7 0.35 -1.85 -1.8 0.35
-1.7 -2.4 -0.5 -1.8 0.3 -1.7 -2.65 2.15 1.2 2.35
-2.7 -2.4 -2.5 -1.8 0.3 0.3 -1.65 2.15 2.2 1.35
-2.7 -2.4 -1.5 -1.8 1.3 1.3 0.35 0.15 0.2 -0.65
1.3 1.6 1.5 2.2 -0.7 -0.7 0.35 -0.85 -0.8 -1.65
n_obs<-nrow(Datos)
mat_V<-t(Xcentrada)%*%Xcentrada/(n_obs-1) 
mat_V %>% kable(caption ="Cálculo de V(X) forma manual:" ,
                align = "c",
                digits = 2) %>% 
  kable_material(html_font = "sans-serif") %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))
Cálculo de V(X) forma manual:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
V1 1.80 1.92 1.32 1.73 -0.62 -0.31 0.36 -1.21 -1.27 -0.90
V2 1.92 2.67 1.42 2.14 -0.66 -0.14 0.52 -1.78 -1.81 -1.54
V3 1.32 1.42 1.42 1.53 -0.53 -0.32 0.29 -0.92 -1.11 -0.87
V4 1.73 2.14 1.53 2.48 -0.80 -0.48 0.35 -1.61 -1.83 -1.39
V5 -0.62 -0.66 -0.53 -0.80 0.85 0.80 0.21 0.37 0.46 0.15
V6 -0.31 -0.14 -0.32 -0.48 0.80 1.38 0.63 0.22 0.09 -0.37
V7 0.36 0.52 0.29 0.35 0.21 0.63 1.61 -0.53 -0.34 -0.71
V8 -1.21 -1.78 -0.92 -1.61 0.37 0.22 -0.53 1.92 1.81 1.37
V9 -1.27 -1.81 -1.11 -1.83 0.46 0.09 -0.34 1.81 2.17 1.56
V10 -0.90 -1.54 -0.87 -1.39 0.15 -0.37 -0.71 1.37 1.56 1.82
2. Usando el comando cov de R base
library(dplyr)
library(kableExtra)
cov(Datos) %>% 
  kable(caption="Cálculo de V(X) a través de R base",
        align = "c",
        digits = 2) %>% 
  kable_material(html_font = "sans-serif") %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))
Cálculo de V(X) a través de R base
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
V1 1.80 1.92 1.32 1.73 -0.62 -0.31 0.36 -1.21 -1.27 -0.90
V2 1.92 2.67 1.42 2.14 -0.66 -0.14 0.52 -1.78 -1.81 -1.54
V3 1.32 1.42 1.42 1.53 -0.53 -0.32 0.29 -0.92 -1.11 -0.87
V4 1.73 2.14 1.53 2.48 -0.80 -0.48 0.35 -1.61 -1.83 -1.39
V5 -0.62 -0.66 -0.53 -0.80 0.85 0.80 0.21 0.37 0.46 0.15
V6 -0.31 -0.14 -0.32 -0.48 0.80 1.38 0.63 0.22 0.09 -0.37
V7 0.36 0.52 0.29 0.35 0.21 0.63 1.61 -0.53 -0.34 -0.71
V8 -1.21 -1.78 -0.92 -1.61 0.37 0.22 -0.53 1.92 1.81 1.37
V9 -1.27 -1.81 -1.11 -1.83 0.46 0.09 -0.34 1.81 2.17 1.56
V10 -0.90 -1.54 -0.87 -1.39 0.15 -0.37 -0.71 1.37 1.56 1.82

b) Calcula la matriz de correlación para la batería de indicadores:

1. De forma “manual”
Zx<-scale(x = Datos,center =TRUE)
Zx %>% head() %>% 
  kable(caption ="Matriz de Variables Estandarizadas:",
        align = "c",
        digits = 2) %>% 
  kable_material(html_font = "sans-serif")
Matriz de Variables Estandarizadas:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
0.22 -1.47 0.42 0.13 -0.76 -1.45 0.28 0.83 0.81 1.00
0.97 0.98 0.42 0.76 -0.76 -0.60 0.28 -1.33 -1.22 0.26
-1.27 -1.47 -0.42 -1.14 0.32 -1.45 -2.09 1.55 0.81 1.74
-2.01 -1.47 -2.10 -1.14 0.32 0.26 -1.30 1.55 1.49 1.00
-2.01 -1.47 -1.26 -1.14 1.41 1.11 0.28 0.11 0.14 -0.48
0.97 0.98 1.26 1.40 -0.76 -0.60 0.28 -0.61 -0.54 -1.22
n_obs<-nrow(Datos)
mat_R<-t(Zx)%*%Zx/(n_obs-1) 
mat_R %>% kable(caption ="Cálculo de R(X) forma manual:" ,
                align = "c",
                digits = 2) %>% 
  kable_material(html_font = "sans-serif") %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))
Cálculo de R(X) forma manual:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
V1 1.00 0.87 0.82 0.82 -0.50 -0.19 0.21 -0.65 -0.64 -0.50
V2 0.87 1.00 0.73 0.83 -0.44 -0.07 0.25 -0.78 -0.75 -0.70
V3 0.82 0.73 1.00 0.81 -0.48 -0.23 0.19 -0.56 -0.63 -0.54
V4 0.82 0.83 0.81 1.00 -0.55 -0.26 0.17 -0.74 -0.79 -0.65
V5 -0.50 -0.44 -0.48 -0.55 1.00 0.74 0.18 0.29 0.34 0.12
V6 -0.19 -0.07 -0.23 -0.26 0.74 1.00 0.42 0.13 0.05 -0.24
V7 0.21 0.25 0.19 0.17 0.18 0.42 1.00 -0.30 -0.18 -0.41
V8 -0.65 -0.78 -0.56 -0.74 0.29 0.13 -0.30 1.00 0.89 0.73
V9 -0.64 -0.75 -0.63 -0.79 0.34 0.05 -0.18 0.89 1.00 0.78
V10 -0.50 -0.70 -0.54 -0.65 0.12 -0.24 -0.41 0.73 0.78 1.00
2. Usando el comando cor de R base
library(dplyr)
library(kableExtra)
cor(Datos) %>% 
  kable(caption="Cálculo de R(X) a través de R base",
        align = "c",
        digits = 2) %>% 
  kable_material(html_font = "sans-serif") %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))
Cálculo de R(X) a través de R base
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
V1 1.00 0.87 0.82 0.82 -0.50 -0.19 0.21 -0.65 -0.64 -0.50
V2 0.87 1.00 0.73 0.83 -0.44 -0.07 0.25 -0.78 -0.75 -0.70
V3 0.82 0.73 1.00 0.81 -0.48 -0.23 0.19 -0.56 -0.63 -0.54
V4 0.82 0.83 0.81 1.00 -0.55 -0.26 0.17 -0.74 -0.79 -0.65
V5 -0.50 -0.44 -0.48 -0.55 1.00 0.74 0.18 0.29 0.34 0.12
V6 -0.19 -0.07 -0.23 -0.26 0.74 1.00 0.42 0.13 0.05 -0.24
V7 0.21 0.25 0.19 0.17 0.18 0.42 1.00 -0.30 -0.18 -0.41
V8 -0.65 -0.78 -0.56 -0.74 0.29 0.13 -0.30 1.00 0.89 0.73
V9 -0.64 -0.75 -0.63 -0.79 0.34 0.05 -0.18 0.89 1.00 0.78
V10 -0.50 -0.70 -0.54 -0.65 0.12 -0.24 -0.41 0.73 0.78 1.00
3. Presenta la matriz de correlación de forma gráfica (las dos versiones

propuestas en clase)

PerformanceAnalytics
library(PerformanceAnalytics)
chart.Correlation(as.matrix(Datos),histogram = TRUE,pch=12)

corrplot
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.3.1
## corrplot 0.92 loaded
library(grDevices)
library(Hmisc)
## Warning: package 'Hmisc' was built under R version 4.3.1
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, units
Mat_R<-rcorr(as.matrix(Datos))
corrplot(Mat_R$r,
         p.mat = Mat_R$r,
         type="upper",
         tl.col="black",
         tl.srt = 20,
         pch.col = "blue",
         insig = "p-value",
         sig.level = -1,
         col = terrain.colors(100))

c) Realiza un análisis de componentes principales, y con base en los criterios vistos

en clase:

a. ¿Cuántas Componentes habría que retener?

library(dplyr)
library(factoextra)
## Warning: package 'factoextra' was built under R version 4.3.1
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(kableExtra)
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(ggplot2)
options(scipen = 99999)
PC<-princomp(x = Datos,cor = TRUE,fix_sign = FALSE)
factoextra::get_eig(PC) %>% kable(caption="Resumen de PCA",
        align = "c",
        digits = 2) %>% 
  kable_material(html_font = "sans-serif") %>% 
  kable_styling(bootstrap_options = c("hover"))
Resumen de PCA
eigenvalue variance.percent cumulative.variance.percent
Dim.1 5.70 57.01 57.01
Dim.2 2.07 20.69 77.70
Dim.3 0.72 7.20 84.91
Dim.4 0.55 5.48 90.39
Dim.5 0.32 3.16 93.54
Dim.6 0.27 2.71 96.25
Dim.7 0.15 1.46 97.72
Dim.8 0.13 1.28 99.00
Dim.9 0.07 0.68 99.68
Dim.10 0.03 0.32 100.00
Bajo el criterio de porcentaje acumulado de la varrianza se tienen que retener los primeros dos componentes ya que explica el 77.70% de la varianza de los datos originales
Bajo el criterio de raiz latente se tendrian que retener 2 componentes es decir Dim.1 y Dim.2 ya que dichos componentes poseen autovalores iguales o superiores a uno

b. Incluye las tablas y gráficos vistos en clase.

fviz_eig(PC,
         choice = "eigenvalue",
         barcolor = "red",
         barfill = "red",
         addlabels = TRUE, 
       )+labs(title = "Gráfico de Sedimentación",subtitle = "Usando princomp, con Autovalores")+
  xlab(label = "Componentes")+
  ylab(label = "Autovalores")+geom_hline(yintercept = 1)

###### Se confirma que solo dos componentes tienen que retenerse ya que solo dos componentes sobrepasan la linea del uno dentro den grafico.

fviz_eig(PC,
         choice = "variance",
         barcolor = "green",
         barfill = "green",
         addlabels = TRUE,
       )+labs(title = "Gráfico de Sedimentación",
              subtitle = "Usando princomp, con %Varianza Explicada")+
  xlab(label = "Componentes")+
  ylab(label = "%Varianza")