Análisis Factorial Exploratorio

Análisis Factorial Exploratorio (AFE)

De acuerdo con Kabacoff (2011: 332), el AFE es un conjunto de métodos diseñados para descubrir la estructura latente que subyace a un conjunto de variables. Busca un reducido conjunto de constructos subyacentes o latentes que explican la relación entre variables manifiestas u observadas. Por ejemplo, identificar si hay actitudes latentes diferentes respecto de las opiniones de las personas en relación a la distribución de roles de cuidado y trabajo en el hogar.

Ejemplo: Preguntas de la Encuesta sobre roles de Género 2012

Cargar Datos y Acondicionar las variables

Lo primero que debemos hacer es recodificar los valores de las variables para que las categorías estén ordenadas (ni de acuerdo ni es desacuerdo debe estar en el medio), y convertir las no respuestas en missing values:

table(genero12$P6A)

## 
##   1   2   3   4   5   8   9 
## 107 529 500  39  17   9   2

table(roles.genero$p6a.r, exclude = NULL)

## 
##   -2   -1    0    1    2 <NA> 
##   39  500   17  529  107   11

Número de Factores a Extraer

Al igual que en el análisis de componentes principales, un criterio para escoger la cantidad de factores a extraer es el de tomar como referencia los autovalores (o eigenvalues) de los factores comunes. Sin embargo hay que tomar en cuenta que:

Cuando se trata del análisis de componentes principales, extraermos los componentes que tengas autovalores mayores a 1.
Cuando se trata de análisis factorial exploratorio, extraemos factores que tengan autovalores mayores a 0.

En el siguiente gráfico de autovalores se comparan los autovalores de la matriz de datos a partir del método de componentes principales y del método de análisis factorial exploratorio. En ambos casos el resultado sugiere extraer 3 factores.

Gráfico de Autovalores

## Parallel analysis suggests that the number of factors =  0  and the number of components =  3

Matriz de Factores No Rotados

A continuación se muestra la matriz de factores no rotados usando el método de máxima verosimilitud:

## Factor Analysis using method =  ml
## Call: fa(r = roles.genero[, -1], nfactors = 3, rotate = "none", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##         ML1   ML2   ML3    h2   u2 com
## p6a.r -0.23  0.05  0.13 0.069 0.93 1.7
## p6b.r  0.42  0.33 -0.42 0.471 0.53 2.9
## p6c.r  0.38  0.30 -0.31 0.333 0.67 2.8
## p6d.r  0.64  0.12  0.41 0.590 0.41 1.8
## p6e.r  0.15  0.09  0.17 0.058 0.94 2.6
## p7a.r -0.39  0.64  0.17 0.586 0.41 1.8
## p7b.r  0.42 -0.08  0.08 0.186 0.81 1.2
## 
##                        ML1  ML2  ML3
## SS loadings           1.12 0.64 0.52
## Proportion Var        0.16 0.09 0.07
## Cumulative Var        0.16 0.25 0.33
## Proportion Explained  0.49 0.28 0.23
## Cumulative Proportion 0.49 0.77 1.00
## 
## Mean item complexity =  2.1
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  21  and the objective function was  0.42 with Chi Square of  499.73
## The degrees of freedom for the model are 3  and the objective function was  0 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic number of observations is  1174 with the empirical chi square  3.55  with prob <  0.31 
## The total number of observations was  1203  with Likelihood Chi Square =  2.78  with prob <  0.43 
## 
## Tucker Lewis Index of factoring reliability =  1.003
## RMSEA index =  0  and the 90 % confidence intervals are  0 0.047
## BIC =  -18.49
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML1  ML2  ML3
## Correlation of (regression) scores with factors   0.83 0.76 0.71
## Multiple R square of scores with factors          0.69 0.58 0.51
## Minimum correlation of possible factor scores     0.37 0.16 0.01

Matriz de Factores Rotados con Método Varimax

Con la finalidad de nombrar mejor a los factores extraídos, podemos rotar los factores usando un método ortogonal (minimiza la correlación entre los factores) como el método varimax:

## Factor Analysis using method =  ml
## Call: fa(r = roles.genero[, -1], nfactors = 3, rotate = "varimax", 
##     fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##         ML3   ML1   ML2    h2   u2 com
## p6a.r -0.17 -0.05  0.19 0.069 0.93 2.1
## p6b.r  0.68  0.06 -0.05 0.471 0.53 1.0
## p6c.r  0.57  0.10 -0.03 0.333 0.67 1.1
## p6d.r  0.13  0.74 -0.14 0.590 0.41 1.1
## p6e.r  0.02  0.24  0.04 0.058 0.94 1.1
## p7a.r  0.02  0.02  0.76 0.586 0.41 1.0
## p7b.r  0.12  0.31 -0.27 0.186 0.81 2.3
## 
##                        ML3  ML1  ML2
## SS loadings           0.85 0.72 0.72
## Proportion Var        0.12 0.10 0.10
## Cumulative Var        0.12 0.22 0.33
## Proportion Explained  0.37 0.32 0.31
## Cumulative Proportion 0.37 0.69 1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  21  and the objective function was  0.42 with Chi Square of  499.73
## The degrees of freedom for the model are 3  and the objective function was  0 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic number of observations is  1174 with the empirical chi square  3.55  with prob <  0.31 
## The total number of observations was  1203  with Likelihood Chi Square =  2.78  with prob <  0.43 
## 
## Tucker Lewis Index of factoring reliability =  1.003
## RMSEA index =  0  and the 90 % confidence intervals are  0 0.047
## BIC =  -18.49
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML3  ML1  ML2
## Correlation of (regression) scores with factors   0.76 0.77 0.78
## Multiple R square of scores with factors          0.58 0.59 0.61
## Minimum correlation of possible factor scores     0.15 0.18 0.22

Diagrama de Solución de factores rotados, método Varimax

Rotación Oblicua: Método “Promax”

Otra metodología para extraer factores es usar una rotación oblicua que permite cierta correlación entre los factores extraídos. Un método oblicuo común es la rotación “promax”. La rotación oblicua es más compleja pero es una representación más realista de los datos:

## Factor Analysis using method =  ml
## Call: fa(r = roles.genero[, -1], nfactors = 3, rotate = "promax", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##         ML3   ML2   ML1    h2   u2 com
## p6a.r -0.14  0.17  0.00 0.069 0.93 1.9
## p6b.r  0.73  0.07 -0.06 0.471 0.53 1.0
## p6c.r  0.60  0.08  0.02 0.333 0.67 1.0
## p6d.r -0.01 -0.04  0.76 0.590 0.41 1.0
## p6e.r -0.01  0.07  0.26 0.058 0.94 1.2
## p7a.r  0.19  0.85  0.13 0.586 0.41 1.1
## p7b.r  0.02 -0.23  0.28 0.186 0.81 2.0
## 
##                        ML3  ML2  ML1
## SS loadings           0.85 0.72 0.72
## Proportion Var        0.12 0.10 0.10
## Cumulative Var        0.12 0.22 0.33
## Proportion Explained  0.37 0.31 0.32
## Cumulative Proportion 0.37 0.68 1.00
## 
##  With factor correlations of 
##       ML3   ML2   ML1
## ML3  1.00 -0.41  0.35
## ML2 -0.41  1.00 -0.33
## ML1  0.35 -0.33  1.00
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  21  and the objective function was  0.42 with Chi Square of  499.73
## The degrees of freedom for the model are 3  and the objective function was  0 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic number of observations is  1174 with the empirical chi square  3.55  with prob <  0.31 
## The total number of observations was  1203  with Likelihood Chi Square =  2.78  with prob <  0.43 
## 
## Tucker Lewis Index of factoring reliability =  1.003
## RMSEA index =  0  and the 90 % confidence intervals are  0 0.047
## BIC =  -18.49
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML3  ML2  ML1
## Correlation of (regression) scores with factors   0.78 0.80 0.79
## Multiple R square of scores with factors          0.61 0.63 0.63
## Minimum correlation of possible factor scores     0.22 0.27 0.26

Diagrama de Solución de factores rotados, método promax

EJERCICIOS: 2012 vs 2016

En marzo del 2016 el IOP replicó algunos de las preguntas sobre roles de género, actitudes hacia la violencia de género y hacia la homosexualidad que se hicieron en la encuesta del 2010 que hemos utilizado como ejemplo en nuestras lecciones.

Replique el análisis de componentes principales y/o el análisis factorial exploratorio con los datos del 2016 y compárelos con los del 2012 con la finalidad de determinar si los indicadores empleados siguen manteniendo la misma estructura de relaciones. Puede descargar la base de datos 2016 y el cuestionario desde el siguiente enlace: Genero PUCP 2016

Sintaxis (1)

library(psych)
library(foreign)
library(car)
library(GPArotation)

genero12 <- read.spss("https://www.dropbox.com/s/kjva39yh3dzb07r/IOP_1212_01_B.sav?dl=1", 
                      to.data.frame = T, use.value.labels = F)
names(genero12)
roles.genero <- genero12[, c(1, 15:21)]

roles.genero$p6a.r <- recode(roles.genero$P6A, "1=2; 2=1; 3=-1; 4=-2; 5=0; else = NA")
roles.genero$p6b.r <- recode(roles.genero$P6B, "1=2; 2=1; 3=-1; 4=-2; 5=0; else = NA")
roles.genero$p6c.r <- recode(roles.genero$P6C, "1=2; 2=1; 3=-1; 4=-2; 5=0; else = NA")
roles.genero$p6d.r <- recode(roles.genero$P6D, "1=2; 2=1; 3=-1; 4=-2; 5=0; else = NA")
roles.genero$p6e.r <- recode(roles.genero$P6E, "1=2; 2=1; 3=-1; 4=-2; 5=0; else = NA")
roles.genero$p7a.r <- recode(roles.genero$P7A, "1=2; 2=1; 3=-1; 4=-2; 5=0; else = NA")
roles.genero$p7b.r <- recode(roles.genero$P7B, "1=2; 2=1; 3=-1; 4=-2; 5=0; else = NA")

roles.genero <- roles.genero[, c(1, 9:15)]
table(genero12$P6A)
table(roles.genero$p6a.r, exclude = NULL)

Sintaxis (2)

fa.parallel(roles.genero[, -1], fa="both", fm = "ml")

genero.efa1 <- fa(roles.genero[, -1], nfactors = 3, rotate = "none", fm="ml")
genero.efa1

genero.efa1b <- fa(roles.genero[, -1], nfactors = 3, rotate = "varimax", fm="ml")
genero.efa1b
fa.diagram(genero.efa1c)

genero.efa1c <- fa(roles.genero[, -1], nfactors = 3, rotate = "promax", fm="ml")
genero.efa1c
fa.diagram(genero.efa1c)

Análisis Factorial Exploratorio

David Sulmont - PUCP

27 de mayo de 2019