Más aplicaciones estadísticas en:

Definición.

El Análisis Factorial es un método multivariante que pretende expresar un conjunto de p variables observables como una combinación lineal de m variables no observadas o latentes, denominadas factores.

El análisis factorial exploratorio, se utiliza para tratar de describir la estructura interna de un número relativamente grande de variables.

El análisis factorial confirmatorio trata de establecer si el número de factores obtenidos se corresponden con los que cabría esperar a la luz de una teoría previa acerca de los datos.

Datos utilizados.

Se utilizan los datos de Holzinger y Swineford (1939) consiste en puntuaciones de pruebas de capacidad mental de niños de séptimo y octavo grado de dos escuelas diferentes, tomados del paquete levaan.

Solamente se utilizan las variables.

x1: Visual perception

x2: Cubes

x3: Lozenges

x4: Paragraph comprehension

x5: Sentence completion

x6: Word meaning

x7: Speeded addition

x8: Speeded counting of dots

x9: Speeded discrimination straight and curved capitals

dato<-data.frame(x1,x2,x3,x4,x5,x6,x7,x8,x9)
head(dato)
##         x1   x2    x3       x4   x5        x6       x7   x8       x9
## 1 3.333333 7.75 0.375 2.333333 5.75 1.2857143 3.391304 5.75 6.361111
## 2 5.333333 5.25 2.125 1.666667 3.00 1.2857143 3.782609 6.25 7.916667
## 3 4.500000 5.25 1.875 1.000000 1.75 0.4285714 3.260870 3.90 4.416667
## 4 5.333333 7.75 3.000 2.666667 4.50 2.4285714 3.000000 5.30 4.861111
## 5 4.833333 4.75 0.875 2.666667 4.00 2.5714286 3.695652 6.30 5.916667
## 6 5.333333 5.00 2.250 1.000000 3.00 0.8571429 4.347826 6.65 7.500000

Matriz de covarianza.

cova<-cov(dato);cova
##            x1          x2         x3        x4        x5        x6          x7
## x1 1.36289774  0.40872923 0.58183232 0.5065178 0.4420843 0.4563242  0.08504799
## x2 0.40872923  1.38638981 0.45256748 0.2096198 0.2117947 0.2483697 -0.09707352
## x3 0.58183232  0.45256748 1.27911441 0.2088635 0.1126706 0.2449227  0.08863631
## x4 0.50651778  0.20961978 0.20886351 1.3551667 1.1014120 0.8985008  0.22047507
## x5 0.44208426  0.21179471 0.11267061 1.1014120 1.6653184 1.0179058  0.14347621
## x6 0.45632416  0.24836972 0.24492268 0.8985008 1.0179058 1.2003462  0.14455865
## x7 0.08504799 -0.09707352 0.08863631 0.2204751 0.1434762 0.1445587  1.18708326
## x8 0.26471714  0.11002492 0.21303038 0.1260120 0.1812072 0.1659824  0.53702937
## x9 0.45986634  0.24482282 0.37509875 0.2441739 0.2962255 0.2367836  0.37454154
##           x8        x9
## x1 0.2647171 0.4598663
## x2 0.1100249 0.2448228
## x3 0.2130304 0.3750987
## x4 0.1260120 0.2441739
## x5 0.1812072 0.2962255
## x6 0.1659824 0.2367836
## x7 0.5370294 0.3745415
## x8 1.0253894 0.4588409
## x9 0.4588409 1.0183872

Matriz de correlación.

Uno de los requisitos que deben cumplirse en el análisis factorial es que las variables se encuentran altamente intercorrelacionadas. También se espera que las variables que tengan correlación muy alta entre sí la tengan con el mismo factor o factores.

corre<-cor(dato);corre
##            x1          x2         x3        x4         x5        x6          x7
## x1 1.00000000  0.29734551 0.44066800 0.3727063 0.29344369 0.3567702  0.06686392
## x2 0.29734551  1.00000000 0.33984898 0.1529302 0.13938749 0.1925319 -0.07566892
## x3 0.44066800  0.33984898 1.00000000 0.1586396 0.07719823 0.1976610  0.07193105
## x4 0.37270627  0.15293019 0.15863957 1.0000000 0.73317017 0.7044802  0.17382912
## x5 0.29344369  0.13938749 0.07719823 0.7331702 1.00000000 0.7199555  0.10204475
## x6 0.35677019  0.19253190 0.19766102 0.7044802 0.71995554 1.0000000  0.12110170
## x7 0.06686392 -0.07566892 0.07193105 0.1738291 0.10204475 0.1211017  1.00000000
## x8 0.22392677  0.09227923 0.18601263 0.1068984 0.13866998 0.1496113  0.48675793
## x9 0.39034041  0.20604057 0.32865061 0.2078483 0.22746642 0.2141617  0.34064572
##            x8        x9
## x1 0.22392677 0.3903404
## x2 0.09227923 0.2060406
## x3 0.18601263 0.3286506
## x4 0.10689838 0.2078483
## x5 0.13866998 0.2274664
## x6 0.14961132 0.2141617
## x7 0.48675793 0.3406457
## x8 1.00000000 0.4490154
## x9 0.44901545 1.0000000
corrplot(cor(dato),order = "hclust", tl.col="black", tl.cex=1) #Gráfico de las correlaciones

corre2 <- correlate(dato)
## Correlation computed with
## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'
rplot(corre2, legend = TRUE, colours = c("firebrick1", "black", 
    "darkcyan"), print_cor = TRUE)

pairs(dato)

Matriz de correlación policórica.

matpoly <- hetcor(dato)$correlations;matpoly
##            x1          x2         x3        x4         x5        x6          x7
## x1 1.00000000  0.29734551 0.44066800 0.3727063 0.29344369 0.3567702  0.06686392
## x2 0.29734551  1.00000000 0.33984898 0.1529302 0.13938749 0.1925319 -0.07566892
## x3 0.44066800  0.33984898 1.00000000 0.1586396 0.07719823 0.1976610  0.07193105
## x4 0.37270627  0.15293019 0.15863957 1.0000000 0.73317017 0.7044802  0.17382912
## x5 0.29344369  0.13938749 0.07719823 0.7331702 1.00000000 0.7199555  0.10204475
## x6 0.35677019  0.19253190 0.19766102 0.7044802 0.71995554 1.0000000  0.12110170
## x7 0.06686392 -0.07566892 0.07193105 0.1738291 0.10204475 0.1211017  1.00000000
## x8 0.22392677  0.09227923 0.18601263 0.1068984 0.13866998 0.1496113  0.48675793
## x9 0.39034041  0.20604057 0.32865061 0.2078483 0.22746642 0.2141617  0.34064572
##            x8        x9
## x1 0.22392677 0.3903404
## x2 0.09227923 0.2060406
## x3 0.18601263 0.3286506
## x4 0.10689838 0.2078483
## x5 0.13866998 0.2274664
## x6 0.14961132 0.2141617
## x7 0.48675793 0.3406457
## x8 1.00000000 0.4490154
## x9 0.44901545 1.0000000
ggcorrplot(matpoly,type="lower",hc.order = T)

Prueba de Kaiser-Meyer-Olkin (KMO).

La prueba consiste en un estadístico que indica la proporción de varianza en las variables que pueden ser causadas por factores subyacentes. Los valores altos (cercanos a 1.0) generalmente indican que un análisis factorial puede ser útil con los datos. Si el valor es menor que 0,50, los resultados del análisis factorial probablemente no serán muy útiles.

KMO(dato)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = dato)
## Overall MSA =  0.75
## MSA for each item = 
##   x1   x2   x3   x4   x5   x6   x7   x8   x9 
## 0.81 0.78 0.73 0.76 0.74 0.81 0.59 0.68 0.79

Prueba de Bartlett.

Prueba la hipótesis de que la matriz de correlaciones es una matriz de identidad, lo que indicaría que las variables no están relacionadas y, por lo tanto, no son adecuadas para la detección de estructuras. Los valores pequeños (menores que 0,05) del nivel de significación indican que un análisis factorial puede ser útil con los datos.

Bajo la hipótesis nula, el estadístico se distribuye asintóticamente según una chi-cuadrado.

Si la hipótesis nula es cierta, los valores propios valdrán uno, o su logaritmo será nulo y, por tanto, el estadístico del test valdría cero.

cortest.bartlett(corre, n = NULL,diag=TRUE)
## Warning in cortest.bartlett(corre, n = NULL, diag = TRUE): n not specified, 100
## used
## $chisq
## [1] 290.5118
## 
## $p.value
## [1] 1.496205e-41
## 
## $df
## [1] 36

Análisis factorial.

ev <- eigen(cor(dato)) 
ev$values
## [1] 3.2163442 1.6387132 1.3651593 0.6989185 0.5843475 0.4996872 0.4731021
## [8] 0.2860024 0.2377257
facto1<-fa(matpoly,nfactors = 3,rotate="none",fm="mle")
facto1
## Factor Analysis using method =  ml
## Call: fa(r = matpoly, nfactors = 3, rotate = "none", fm = "mle")
## Standardized loadings (pattern matrix) based upon correlation matrix
##     ML1   ML2   ML3   h2   u2 com
## x1 0.49  0.31  0.39 0.49 0.51 2.7
## x2 0.24  0.17  0.40 0.25 0.75 2.1
## x3 0.27  0.41  0.47 0.46 0.54 2.6
## x4 0.83 -0.15 -0.03 0.72 0.28 1.1
## x5 0.84 -0.21 -0.10 0.76 0.24 1.2
## x6 0.82 -0.13  0.02 0.69 0.31 1.0
## x7 0.23  0.48 -0.46 0.50 0.50 2.4
## x8 0.27  0.62 -0.27 0.53 0.47 1.8
## x9 0.38  0.56  0.02 0.46 0.54 1.8
## 
##                        ML1  ML2  ML3
## SS loadings           2.72 1.31 0.82
## Proportion Var        0.30 0.15 0.09
## Cumulative Var        0.30 0.45 0.54
## Proportion Explained  0.56 0.27 0.17
## Cumulative Proportion 0.56 0.83 1.00
## 
## Mean item complexity =  1.8
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  36  with the objective function =  3.05
## df of  the model are 12  and the objective function was  0.08 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.03 
## 
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML1  ML2  ML3
## Correlation of (regression) scores with factors   0.95 0.86 0.78
## Multiple R square of scores with factors          0.90 0.73 0.60
## Minimum correlation of possible factor scores     0.80 0.46 0.21
f1<-sort(facto1$communality,decreasing=T);f1
##        x5        x4        x6        x8        x7        x1        x3        x9 
## 0.7571227 0.7208069 0.6947841 0.5314495 0.4977928 0.4874705 0.4572267 0.4567523 
##        x2 
## 0.2512648
scree(matpoly)

fa.parallel(dato,fa="fa",fm="mle")

## Parallel analysis suggests that the number of factors =  3  and the number of components =  NA
modelo2<-fa(matpoly,nfactors = 3,rotate="varimax", fa="minres")
print(modelo2$loadings,cut=0)
## 
## Loadings:
##    MR1    MR3    MR2   
## x1  0.279  0.613  0.152
## x2  0.102  0.494 -0.030
## x3  0.038  0.660  0.129
## x4  0.832  0.161  0.099
## x5  0.859  0.088  0.089
## x6  0.799  0.214  0.085
## x7  0.093 -0.082  0.709
## x8  0.051  0.171  0.699
## x9  0.130  0.415  0.521
## 
##                  MR1   MR3   MR2
## SS loadings    2.187 1.342 1.329
## Proportion Var 0.243 0.149 0.148
## Cumulative Var 0.243 0.392 0.540
fa.diagram(modelo2)

Consistencia interna de los factores.

Se puede calcular la consistencia interna de los factores establecidos por el modelo factorial mediante el cálculo del coeficiente alpha de Cronbach.

f1 <- dato[c("x4", "x5", "x6")]
f2 <- dato[ , c("x1", "x2", "x3")]
f3 <- dato[ , c("x7", "x8","x9")]
alpha(f1)
## Number of categories should be increased  in order to count frequencies.
## 
## Reliability analysis   
## Call: alpha(x = f1)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
##       0.88      0.88    0.84      0.72 7.7 0.011  3.2 1.1     0.72
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.86  0.88  0.90
## Duhachek  0.86  0.88  0.91
## 
##  Reliability if an item is dropped:
##    raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## x4      0.83      0.84    0.72      0.72 5.1    0.019    NA  0.72
## x5      0.83      0.83    0.70      0.70 4.8    0.020    NA  0.70
## x6      0.84      0.85    0.73      0.73 5.5    0.018    NA  0.73
## 
##  Item statistics 
##      n raw.r std.r r.cor r.drop mean  sd
## x4 301  0.90  0.90  0.82   0.78  3.1 1.2
## x5 301  0.92  0.91  0.84   0.79  4.3 1.3
## x6 301  0.89  0.90  0.81   0.77  2.2 1.1
alpha(f2)
## Number of categories should be increased  in order to count frequencies.
## 
## Reliability analysis   
## Call: alpha(x = f2)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.63      0.63    0.54      0.36 1.7 0.037  4.4 0.88     0.34
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.55  0.63  0.69
## Duhachek  0.55  0.63  0.70
## 
##  Reliability if an item is dropped:
##    raw_alpha std.alpha G6(smc) average_r  S/N alpha se var.r med.r
## x1      0.51      0.51    0.34      0.34 1.03    0.057    NA  0.34
## x2      0.61      0.61    0.44      0.44 1.58    0.045    NA  0.44
## x3      0.46      0.46    0.30      0.30 0.85    0.062    NA  0.30
## 
##  Item statistics 
##      n raw.r std.r r.cor r.drop mean  sd
## x1 301  0.77  0.77  0.58   0.45  4.9 1.2
## x2 301  0.73  0.72  0.47   0.37  6.1 1.2
## x3 301  0.78  0.78  0.62   0.48  2.3 1.1
alpha(f3)
## Number of categories should be increased  in order to count frequencies.
## 
## Reliability analysis   
## Call: alpha(x = f3)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.69      0.69     0.6      0.43 2.2 0.031    5 0.81     0.45
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.62  0.69  0.74
## Duhachek  0.63  0.69  0.75
## 
##  Reliability if an item is dropped:
##    raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## x7      0.62      0.62    0.45      0.45 1.6    0.044    NA  0.45
## x8      0.51      0.51    0.34      0.34 1.0    0.057    NA  0.34
## x9      0.65      0.65    0.49      0.49 1.9    0.040    NA  0.49
## 
##  Item statistics 
##      n raw.r std.r r.cor r.drop mean  sd
## x7 301  0.79  0.78  0.59   0.49  4.2 1.1
## x8 301  0.82  0.82  0.69   0.57  5.5 1.0
## x9 301  0.75  0.76  0.55   0.46  5.4 1.0

-|—|—|

O.M.F.

-|—|—|

paquetes utilizados:

polycor, ggcorrplot

GPArotation, lavaan

tidyverse,semTools

semPlot,psych

MVN

diagram

lavaanPlot

piecewiseSEM

corrr

corrplot