El Análisis Factorial es un método multivariante que pretende expresar un conjunto de p variables observables como una combinación lineal de m variables no observadas o latentes, denominadas factores.
El análisis factorial exploratorio, se utiliza para tratar de describir la estructura interna de un número relativamente grande de variables.
El análisis factorial confirmatorio trata de establecer si el número de factores obtenidos se corresponden con los que cabría esperar a la luz de una teoría previa acerca de los datos.
Se utilizan los datos de Holzinger y Swineford (1939) consiste en puntuaciones de pruebas de capacidad mental de niños de séptimo y octavo grado de dos escuelas diferentes, tomados del paquete levaan.
Solamente se utilizan las variables.
x1: Visual perception
x2: Cubes
x3: Lozenges
x4: Paragraph comprehension
x5: Sentence completion
x6: Word meaning
x7: Speeded addition
x8: Speeded counting of dots
x9: Speeded discrimination straight and curved capitals
dato<-data.frame(x1,x2,x3,x4,x5,x6,x7,x8,x9)
head(dato)
## x1 x2 x3 x4 x5 x6 x7 x8 x9
## 1 3.333333 7.75 0.375 2.333333 5.75 1.2857143 3.391304 5.75 6.361111
## 2 5.333333 5.25 2.125 1.666667 3.00 1.2857143 3.782609 6.25 7.916667
## 3 4.500000 5.25 1.875 1.000000 1.75 0.4285714 3.260870 3.90 4.416667
## 4 5.333333 7.75 3.000 2.666667 4.50 2.4285714 3.000000 5.30 4.861111
## 5 4.833333 4.75 0.875 2.666667 4.00 2.5714286 3.695652 6.30 5.916667
## 6 5.333333 5.00 2.250 1.000000 3.00 0.8571429 4.347826 6.65 7.500000
cova<-cov(dato);cova
## x1 x2 x3 x4 x5 x6 x7
## x1 1.36289774 0.40872923 0.58183232 0.5065178 0.4420843 0.4563242 0.08504799
## x2 0.40872923 1.38638981 0.45256748 0.2096198 0.2117947 0.2483697 -0.09707352
## x3 0.58183232 0.45256748 1.27911441 0.2088635 0.1126706 0.2449227 0.08863631
## x4 0.50651778 0.20961978 0.20886351 1.3551667 1.1014120 0.8985008 0.22047507
## x5 0.44208426 0.21179471 0.11267061 1.1014120 1.6653184 1.0179058 0.14347621
## x6 0.45632416 0.24836972 0.24492268 0.8985008 1.0179058 1.2003462 0.14455865
## x7 0.08504799 -0.09707352 0.08863631 0.2204751 0.1434762 0.1445587 1.18708326
## x8 0.26471714 0.11002492 0.21303038 0.1260120 0.1812072 0.1659824 0.53702937
## x9 0.45986634 0.24482282 0.37509875 0.2441739 0.2962255 0.2367836 0.37454154
## x8 x9
## x1 0.2647171 0.4598663
## x2 0.1100249 0.2448228
## x3 0.2130304 0.3750987
## x4 0.1260120 0.2441739
## x5 0.1812072 0.2962255
## x6 0.1659824 0.2367836
## x7 0.5370294 0.3745415
## x8 1.0253894 0.4588409
## x9 0.4588409 1.0183872
Uno de los requisitos que deben cumplirse en el análisis factorial es que las variables se encuentran altamente intercorrelacionadas. También se espera que las variables que tengan correlación muy alta entre sí la tengan con el mismo factor o factores.
corre<-cor(dato);corre
## x1 x2 x3 x4 x5 x6 x7
## x1 1.00000000 0.29734551 0.44066800 0.3727063 0.29344369 0.3567702 0.06686392
## x2 0.29734551 1.00000000 0.33984898 0.1529302 0.13938749 0.1925319 -0.07566892
## x3 0.44066800 0.33984898 1.00000000 0.1586396 0.07719823 0.1976610 0.07193105
## x4 0.37270627 0.15293019 0.15863957 1.0000000 0.73317017 0.7044802 0.17382912
## x5 0.29344369 0.13938749 0.07719823 0.7331702 1.00000000 0.7199555 0.10204475
## x6 0.35677019 0.19253190 0.19766102 0.7044802 0.71995554 1.0000000 0.12110170
## x7 0.06686392 -0.07566892 0.07193105 0.1738291 0.10204475 0.1211017 1.00000000
## x8 0.22392677 0.09227923 0.18601263 0.1068984 0.13866998 0.1496113 0.48675793
## x9 0.39034041 0.20604057 0.32865061 0.2078483 0.22746642 0.2141617 0.34064572
## x8 x9
## x1 0.22392677 0.3903404
## x2 0.09227923 0.2060406
## x3 0.18601263 0.3286506
## x4 0.10689838 0.2078483
## x5 0.13866998 0.2274664
## x6 0.14961132 0.2141617
## x7 0.48675793 0.3406457
## x8 1.00000000 0.4490154
## x9 0.44901545 1.0000000
corrplot(cor(dato),order = "hclust", tl.col="black", tl.cex=1) #Gráfico de las correlaciones
corre2 <- correlate(dato)
## Correlation computed with
## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'
rplot(corre2, legend = TRUE, colours = c("firebrick1", "black",
"darkcyan"), print_cor = TRUE)
pairs(dato)
matpoly <- hetcor(dato)$correlations;matpoly
## x1 x2 x3 x4 x5 x6 x7
## x1 1.00000000 0.29734551 0.44066800 0.3727063 0.29344369 0.3567702 0.06686392
## x2 0.29734551 1.00000000 0.33984898 0.1529302 0.13938749 0.1925319 -0.07566892
## x3 0.44066800 0.33984898 1.00000000 0.1586396 0.07719823 0.1976610 0.07193105
## x4 0.37270627 0.15293019 0.15863957 1.0000000 0.73317017 0.7044802 0.17382912
## x5 0.29344369 0.13938749 0.07719823 0.7331702 1.00000000 0.7199555 0.10204475
## x6 0.35677019 0.19253190 0.19766102 0.7044802 0.71995554 1.0000000 0.12110170
## x7 0.06686392 -0.07566892 0.07193105 0.1738291 0.10204475 0.1211017 1.00000000
## x8 0.22392677 0.09227923 0.18601263 0.1068984 0.13866998 0.1496113 0.48675793
## x9 0.39034041 0.20604057 0.32865061 0.2078483 0.22746642 0.2141617 0.34064572
## x8 x9
## x1 0.22392677 0.3903404
## x2 0.09227923 0.2060406
## x3 0.18601263 0.3286506
## x4 0.10689838 0.2078483
## x5 0.13866998 0.2274664
## x6 0.14961132 0.2141617
## x7 0.48675793 0.3406457
## x8 1.00000000 0.4490154
## x9 0.44901545 1.0000000
ggcorrplot(matpoly,type="lower",hc.order = T)
La prueba consiste en un estadístico que indica la proporción de varianza en las variables que pueden ser causadas por factores subyacentes. Los valores altos (cercanos a 1.0) generalmente indican que un análisis factorial puede ser útil con los datos. Si el valor es menor que 0,50, los resultados del análisis factorial probablemente no serán muy útiles.
KMO(dato)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = dato)
## Overall MSA = 0.75
## MSA for each item =
## x1 x2 x3 x4 x5 x6 x7 x8 x9
## 0.81 0.78 0.73 0.76 0.74 0.81 0.59 0.68 0.79
Prueba la hipótesis de que la matriz de correlaciones es una matriz de identidad, lo que indicaría que las variables no están relacionadas y, por lo tanto, no son adecuadas para la detección de estructuras. Los valores pequeños (menores que 0,05) del nivel de significación indican que un análisis factorial puede ser útil con los datos.
Bajo la hipótesis nula, el estadístico se distribuye asintóticamente según una chi-cuadrado.
Si la hipótesis nula es cierta, los valores propios valdrán uno, o su logaritmo será nulo y, por tanto, el estadístico del test valdría cero.
cortest.bartlett(corre, n = NULL,diag=TRUE)
## Warning in cortest.bartlett(corre, n = NULL, diag = TRUE): n not specified, 100
## used
## $chisq
## [1] 290.5118
##
## $p.value
## [1] 1.496205e-41
##
## $df
## [1] 36
ev <- eigen(cor(dato))
ev$values
## [1] 3.2163442 1.6387132 1.3651593 0.6989185 0.5843475 0.4996872 0.4731021
## [8] 0.2860024 0.2377257
facto1<-fa(matpoly,nfactors = 3,rotate="none",fm="mle")
facto1
## Factor Analysis using method = ml
## Call: fa(r = matpoly, nfactors = 3, rotate = "none", fm = "mle")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML1 ML2 ML3 h2 u2 com
## x1 0.49 0.31 0.39 0.49 0.51 2.7
## x2 0.24 0.17 0.40 0.25 0.75 2.1
## x3 0.27 0.41 0.47 0.46 0.54 2.6
## x4 0.83 -0.15 -0.03 0.72 0.28 1.1
## x5 0.84 -0.21 -0.10 0.76 0.24 1.2
## x6 0.82 -0.13 0.02 0.69 0.31 1.0
## x7 0.23 0.48 -0.46 0.50 0.50 2.4
## x8 0.27 0.62 -0.27 0.53 0.47 1.8
## x9 0.38 0.56 0.02 0.46 0.54 1.8
##
## ML1 ML2 ML3
## SS loadings 2.72 1.31 0.82
## Proportion Var 0.30 0.15 0.09
## Cumulative Var 0.30 0.45 0.54
## Proportion Explained 0.56 0.27 0.17
## Cumulative Proportion 0.56 0.83 1.00
##
## Mean item complexity = 1.8
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 3.05
## df of the model are 12 and the objective function was 0.08
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.03
##
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## ML1 ML2 ML3
## Correlation of (regression) scores with factors 0.95 0.86 0.78
## Multiple R square of scores with factors 0.90 0.73 0.60
## Minimum correlation of possible factor scores 0.80 0.46 0.21
f1<-sort(facto1$communality,decreasing=T);f1
## x5 x4 x6 x8 x7 x1 x3 x9
## 0.7571227 0.7208069 0.6947841 0.5314495 0.4977928 0.4874705 0.4572267 0.4567523
## x2
## 0.2512648
scree(matpoly)
fa.parallel(dato,fa="fa",fm="mle")
## Parallel analysis suggests that the number of factors = 3 and the number of components = NA
modelo2<-fa(matpoly,nfactors = 3,rotate="varimax", fa="minres")
print(modelo2$loadings,cut=0)
##
## Loadings:
## MR1 MR3 MR2
## x1 0.279 0.613 0.152
## x2 0.102 0.494 -0.030
## x3 0.038 0.660 0.129
## x4 0.832 0.161 0.099
## x5 0.859 0.088 0.089
## x6 0.799 0.214 0.085
## x7 0.093 -0.082 0.709
## x8 0.051 0.171 0.699
## x9 0.130 0.415 0.521
##
## MR1 MR3 MR2
## SS loadings 2.187 1.342 1.329
## Proportion Var 0.243 0.149 0.148
## Cumulative Var 0.243 0.392 0.540
fa.diagram(modelo2)
Se puede calcular la consistencia interna de los factores establecidos por el modelo factorial mediante el cálculo del coeficiente alpha de Cronbach.
f1 <- dato[c("x4", "x5", "x6")]
f2 <- dato[ , c("x1", "x2", "x3")]
f3 <- dato[ , c("x7", "x8","x9")]
alpha(f1)
## Number of categories should be increased in order to count frequencies.
##
## Reliability analysis
## Call: alpha(x = f1)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.88 0.88 0.84 0.72 7.7 0.011 3.2 1.1 0.72
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.86 0.88 0.90
## Duhachek 0.86 0.88 0.91
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## x4 0.83 0.84 0.72 0.72 5.1 0.019 NA 0.72
## x5 0.83 0.83 0.70 0.70 4.8 0.020 NA 0.70
## x6 0.84 0.85 0.73 0.73 5.5 0.018 NA 0.73
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## x4 301 0.90 0.90 0.82 0.78 3.1 1.2
## x5 301 0.92 0.91 0.84 0.79 4.3 1.3
## x6 301 0.89 0.90 0.81 0.77 2.2 1.1
alpha(f2)
## Number of categories should be increased in order to count frequencies.
##
## Reliability analysis
## Call: alpha(x = f2)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.63 0.63 0.54 0.36 1.7 0.037 4.4 0.88 0.34
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.55 0.63 0.69
## Duhachek 0.55 0.63 0.70
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## x1 0.51 0.51 0.34 0.34 1.03 0.057 NA 0.34
## x2 0.61 0.61 0.44 0.44 1.58 0.045 NA 0.44
## x3 0.46 0.46 0.30 0.30 0.85 0.062 NA 0.30
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## x1 301 0.77 0.77 0.58 0.45 4.9 1.2
## x2 301 0.73 0.72 0.47 0.37 6.1 1.2
## x3 301 0.78 0.78 0.62 0.48 2.3 1.1
alpha(f3)
## Number of categories should be increased in order to count frequencies.
##
## Reliability analysis
## Call: alpha(x = f3)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.69 0.69 0.6 0.43 2.2 0.031 5 0.81 0.45
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.62 0.69 0.74
## Duhachek 0.63 0.69 0.75
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## x7 0.62 0.62 0.45 0.45 1.6 0.044 NA 0.45
## x8 0.51 0.51 0.34 0.34 1.0 0.057 NA 0.34
## x9 0.65 0.65 0.49 0.49 1.9 0.040 NA 0.49
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## x7 301 0.79 0.78 0.59 0.49 4.2 1.1
## x8 301 0.82 0.82 0.69 0.57 5.5 1.0
## x9 301 0.75 0.76 0.55 0.46 5.4 1.0
-|—|—|
O.M.F.
-|—|—|
paquetes utilizados:
polycor, ggcorrplot
GPArotation, lavaan
tidyverse,semTools
semPlot,psych
MVN
diagram
lavaanPlot
piecewiseSEM
corrr
corrplot