ANALISIS FACTORIAL EN R Y PYTHON EJEMPLO 1

1

El Análisis Factorial es otra técnica de interdependencia que se aplica a variables medidas en escala de intervalo o de razón. A pesar de las semejanzas con el Análisis de Componentes Principales visto en el capítulo anterior, tiene varias diferencias. En primer lugar, el modelo de un Análisis Factorial propone descomponer la variabilidad total de una variable en función a los factores comunes y específicos, en cambio en un ACP el modelo propone descomponer cada componente como una combinación lineal de las variables. En segundo lugar, cuando se trabaja con la matriz de correlación, los componentes principales tienen media 0 y variancia igual a los autovalores, en cambio en un análisis factorial los factores comunes son variables ya estandarizadas (media cero y variancia uno).

Estos datos corresponden a 100 observaciones de 7 variables influyentes en la elección de distribuidor, en un ejemplo de un estudio de segmentación de la situación empresa a empresa, específicamente un informe sobre los clientes actuales de HATCO.

Code
library(haven)
datos <- read_sav("HATCO.SAV")
head(datos)
# A tibble: 6 × 15
     ID    X1    X2    X3    X4    X5    X6    X7 X8           X9   X10 X11     
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl+lbl> <dbl> <dbl> <dbl+lb>
1     1   4.1   0.6   6.9   4.7   2.4   2.3   5.2 0 [SMALL]    32   4.2 1 [Tota…
2     2   1.8   3     6.3   6.6   2.5   4     8.4 1 [LARGE]    43   4.3 0 [Spec…
3     3   3.4   5.2   5.7   6     4.3   2.7   8.2 1 [LARGE]    48   5.2 0 [Spec…
4     4   2.7   1     7.1   5.9   1.8   2.3   7.8 1 [LARGE]    32   3.9 0 [Spec…
5     5   6     0.9   9.6   7.8   3.4   4.6   4.5 0 [SMALL]    58   6.8 1 [Tota…
6     6   1.9   3.3   7.9   4.8   2.6   1.9   9.7 1 [LARGE]    45   4.4 0 [Spec…
# ℹ 3 more variables: X12 <dbl+lbl>, X13 <dbl+lbl>, X14 <dbl+lbl>
Code
# No considerar la primera columna Id
datos$ID<- NULL
str(datos)
tibble [100 × 14] (S3: tbl_df/tbl/data.frame)
 $ X1 : num [1:100] 4.1 1.8 3.4 2.7 6 1.9 4.6 1.3 5.5 4 ...
  ..- attr(*, "label")= chr "Delivery Speed"
  ..- attr(*, "format.spss")= chr "F3.1"
  ..- attr(*, "display_width")= int 3
 $ X2 : num [1:100] 0.6 3 5.2 1 0.9 3.3 2.4 4.2 1.6 3.5 ...
  ..- attr(*, "label")= chr "Price Level"
  ..- attr(*, "format.spss")= chr "F3.1"
  ..- attr(*, "display_width")= int 3
 $ X3 : num [1:100] 6.9 6.3 5.7 7.1 9.6 7.9 9.5 6.2 9.4 6.5 ...
  ..- attr(*, "label")= chr "Price Flexibility"
  ..- attr(*, "format.spss")= chr "F4.1"
  ..- attr(*, "display_width")= int 4
 $ X4 : num [1:100] 4.7 6.6 6 5.9 7.8 4.8 6.6 5.1 4.7 6 ...
  ..- attr(*, "label")= chr "Manufacturer Image"
  ..- attr(*, "format.spss")= chr "F3.1"
  ..- attr(*, "display_width")= int 3
 $ X5 : num [1:100] 2.4 2.5 4.3 1.8 3.4 2.6 3.5 2.8 3.5 3.7 ...
  ..- attr(*, "label")= chr "Service"
  ..- attr(*, "format.spss")= chr "F3.1"
  ..- attr(*, "display_width")= int 3
 $ X6 : num [1:100] 2.3 4 2.7 2.3 4.6 1.9 4.5 2.2 3 3.2 ...
  ..- attr(*, "label")= chr "Salesforce Image"
  ..- attr(*, "format.spss")= chr "F3.1"
  ..- attr(*, "display_width")= int 3
 $ X7 : num [1:100] 5.2 8.4 8.2 7.8 4.5 9.7 7.6 6.9 7.6 8.7 ...
  ..- attr(*, "label")= chr "Product Quality"
  ..- attr(*, "format.spss")= chr "F4.1"
  ..- attr(*, "display_width")= int 4
 $ X8 : dbl+lbl [1:100] 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, ...
   ..@ label        : chr "Firm Size"
   ..@ format.spss  : chr "F1.0"
   ..@ display_width: int 1
   ..@ labels       : Named num [1:2] 0 1
   .. ..- attr(*, "names")= chr [1:2] "SMALL" "LARGE"
 $ X9 : num [1:100] 32 43 48 32 58 45 46 44 63 54 ...
  ..- attr(*, "label")= chr "Usage Level"
  ..- attr(*, "format.spss")= chr "F4.1"
  ..- attr(*, "display_width")= int 4
 $ X10: num [1:100] 4.2 4.3 5.2 3.9 6.8 4.4 5.8 4.3 5.4 5.4 ...
  ..- attr(*, "label")= chr "Satisfaction Level"
  ..- attr(*, "format.spss")= chr "F3.1"
  ..- attr(*, "display_width")= int 3
 $ X11: dbl+lbl [1:100] 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, ...
   ..@ label        : chr "Specification Buying"
   ..@ format.spss  : chr "F1.0"
   ..@ display_width: int 1
   ..@ labels       : Named num [1:2] 0 1
   .. ..- attr(*, "names")= chr [1:2] "Specification Buying" "Total Value Analysis"
 $ X12: dbl+lbl [1:100] 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, ...
   ..@ label        : chr "Structure of Procurement"
   ..@ format.spss  : chr "F1.0"
   ..@ display_width: int 1
   ..@ labels       : Named num [1:2] 0 1
   .. ..- attr(*, "names")= chr [1:2] "DECENTRALIZED" "CENTRALIZED"
 $ X13: dbl+lbl [1:100] 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, ...
   ..@ label        : chr "Type of Industry (SIC)"
   ..@ format.spss  : chr "F1.0"
   ..@ display_width: int 1
   ..@ labels       : Named num [1:2] 0 1
   .. ..- attr(*, "names")= chr [1:2] "FIRM TYPE ONE" "FIRM TYPE TWO"
 $ X14: dbl+lbl [1:100] 1, 1, 2, 1, 3, 2, 1, 2, 3, 2, 1, 2, 1, 1, 3, 3, 2, 2, ...
   ..@ label        : chr "Type of Buying Situation"
   ..@ format.spss  : chr "F1.0"
   ..@ display_width: int 1
   ..@ labels       : Named num [1:3] 1 2 3
   .. ..- attr(*, "names")= chr [1:3] "New Task" "Modified Rebuy" "Straight Rebuy"
Code
library(psych)
datos=datos[1:10,1:6]
print(describe(datos))
   vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 10 3.53 1.60   3.70    3.50 2.08 1.3 6.0   4.7  0.07    -1.55 0.51
X2    2 10 2.57 1.54   2.70    2.49 1.93 0.6 5.2   4.6  0.18    -1.46 0.49
X3    3 10 7.51 1.49   7.00    7.47 1.26 5.7 9.6   3.9  0.38    -1.70 0.47
X4    4 10 5.82 1.02   5.95    5.71 1.11 4.7 7.8   3.1  0.43    -1.06 0.32
X5    5 10 3.05 0.75   3.10    3.05 0.82 1.8 4.3   2.5 -0.01    -1.27 0.24
X6    6 10 3.07 0.98   2.85    3.02 0.89 1.9 4.6   2.7  0.43    -1.55 0.31
Code
print(cor(datos))
           X1          X2         X3          X4        X5         X6
X1  1.0000000 -0.52053250  0.7050021  0.32820433 0.4896457  0.5450101
X2 -0.5205325  1.00000000 -0.5754632 -0.08644631 0.4880877 -0.2004142
X3  0.7050021 -0.57546319  1.0000000  0.24318296 0.1274531  0.5134706
X4  0.3282043 -0.08644631  0.2431830  1.00000000 0.2509333  0.8367596
X5  0.4896457  0.48808772  0.1274531  0.25093333 1.0000000  0.3708891
X6  0.5450101 -0.20041423  0.5134706  0.83675959 0.3708891  1.0000000
Code
corr.test(datos)
Call:corr.test(x = datos)
Correlation matrix 
      X1    X2    X3    X4   X5    X6
X1  1.00 -0.52  0.71  0.33 0.49  0.55
X2 -0.52  1.00 -0.58 -0.09 0.49 -0.20
X3  0.71 -0.58  1.00  0.24 0.13  0.51
X4  0.33 -0.09  0.24  1.00 0.25  0.84
X5  0.49  0.49  0.13  0.25 1.00  0.37
X6  0.55 -0.20  0.51  0.84 0.37  1.00
Sample Size 
[1] 10
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
     X1   X2   X3   X4   X5   X6
X1 0.00 1.00 0.32 1.00 1.00 1.00
X2 0.12 0.00 1.00 1.00 1.00 1.00
X3 0.02 0.08 0.00 1.00 1.00 1.00
X4 0.35 0.81 0.50 0.00 1.00 0.04
X5 0.15 0.15 0.73 0.48 0.00 1.00
X6 0.10 0.58 0.13 0.00 0.29 0.00

 To see confidence intervals of the correlations, print with the short=FALSE option
Code
library(xts)
library(zoo)
library(PerformanceAnalytics)
chart.Correlation(datos, histogram=TRUE, pch=20)

Code
# Otra Forma
library(psych)
cor.plot(cor(datos),
         main="Mapa de Calor", 
         diag=F,
         show.legend = TRUE) 

Code
KMO(datos)
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = datos)
Overall MSA =  0.28
MSA for each item = 
  X1   X2   X3   X4   X5   X6 
0.30 0.21 0.41 0.27 0.17 0.33 
Code
cortest.bartlett(datos)
$chisq
[1] 59.29283

$p.value
[1] 3.335439e-07

$df
[1] 15
Code
library(psych)
facto.sin.rota <- principal(r=datos,
                            nfactors=2, #nfactors inicial=6
                            covar=F,#matriz de correlacion
                            rotate="none")
Code
# Autovalores
facto.sin.rota$values
[1] 2.9605196016 1.6514559664 0.9676482461 0.3148071695 0.1051429685
[6] 0.0004260478
Code
# Primera forma
plot(facto.sin.rota$values,type="b",pch=20,col="blue",
     main="Gráfico de sedimentación")
abline(h=1,lty=3,col="red")

Code
# Segunda forma
eig.val <- facto.sin.rota$values
barplot(eig.val, names.arg=1:6, 
        main = "Gráfico de Sedimentación",
        xlab = "Factores",
        ylab = "Autovalores",
        col ="steelblue")
lines(x = 1:6, eig.val, 
      type="b", pch=19, col = "red")

Code
facto.sin.rota$communality
       X1        X2        X3        X4        X5        X6 
0.7537533 0.9500358 0.7501484 0.5830233 0.7639664 0.8110483 
Code
#Cargas Factoriales, Correlaciones Factor, Variable

facto.sin.rota$loadings 

Loadings:
   PC1    PC2   
X1  0.863       
X2 -0.479  0.849
X3  0.787 -0.361
X4  0.682  0.343
X5  0.398  0.778
X6  0.862  0.262

                 PC1   PC2
SS loadings    2.961 1.651
Proportion Var 0.493 0.275
Cumulative Var 0.493 0.769
Code
##Gráfica de círculo de correlaciones

library(ade4)
load.sin.rota <- facto.sin.rota$loadings[,1:2]
s.corcircle(load.sin.rota,grid=FALSE)

Code
##Análisis Factorial con rotacion con función principal

library(psych)
facto.con.rota <- principal(r=datos,
                            nfactors=3,
                            rotate="varimax")
str(facto.con.rota)
List of 31
 $ values      : num [1:6] 2.961 1.651 0.968 0.315 0.105 ...
 $ rotation    : chr "varimax"
 $ n.obs       : int 10
 $ communality : Named num [1:6] 0.929 0.954 0.811 0.955 0.996 ...
  ..- attr(*, "names")= chr [1:6] "X1" "X2" "X3" "X4" ...
 $ loadings    : 'loadings' num [1:6, 1:3] 0.8867 -0.7294 0.8766 0.0724 0.158 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "X1" "X2" "X3" "X4" ...
  .. ..$ : chr [1:3] "RC1" "RC3" "RC2"
 $ fit         : num 0.991
 $ fit.off     : num 0.991
 $ fn          : chr "principal"
 $ Call        : language principal(r = datos, nfactors = 3, rotate = "varimax")
 $ uniquenesses: Named num [1:6] 0.0709 0.04597 0.18862 0.04535 0.00354 ...
  ..- attr(*, "names")= chr [1:6] "X1" "X2" "X3" "X4" ...
 $ complexity  : Named num [1:6] 1.37 1.99 1.11 1.02 1.13 ...
  ..- attr(*, "names")= chr [1:6] "X1" "X2" "X3" "X4" ...
 $ valid       : num [1:3] 0.97 0.967 0.968
 $ chi         : num 0.61
 $ EPVAL       : logi NA
 $ R2          : Named num [1:3] 1 1 1
  ..- attr(*, "names")= chr [1:3] "RC1" "RC3" "RC2"
 $ objective   : num 4.51
 $ residual    : num [1:6, 1:6] 0.0709 -0.0557 -0.1113 0.026 0.0153 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "X1" "X2" "X3" "X4" ...
  .. ..$ : chr [1:6] "X1" "X2" "X3" "X4" ...
 $ rms         : num 0.0451
 $ factors     : int 3
 $ dof         : num 0
 $ null.dof    : num 15
 $ null.model  : num 9.62
 $ criteria    : Named num [1:3] 4.51 NA NA
  ..- attr(*, "names")= chr [1:3] "objective" "" ""
 $ STATISTIC   : num 18.8
 $ PVAL        : logi NA
 $ weights     : num [1:6, 1:3] 0.445 -0.332 0.421 -0.214 0.111 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "X1" "X2" "X3" "X4" ...
  .. ..$ : chr [1:3] "RC1" "RC3" "RC2"
 $ r.scores    : num [1:3, 1:3] 1.00 -4.13e-14 -3.05e-14 -4.12e-14 1.00 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "RC1" "RC3" "RC2"
  .. ..$ : chr [1:3] "RC1" "RC3" "RC2"
 $ rot.mat     : num [1:3, 1:3] 0.754 -0.468 0.461 0.167 0.815 ...
 $ Vaccounted  : num [1:5, 1:3] 2.25 0.375 0.375 0.403 0.403 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:5] "SS loadings" "Proportion Var" "Cumulative Var" "Proportion Explained" ...
  .. ..$ : chr [1:3] "RC1" "RC3" "RC2"
 $ Structure   : 'loadings' num [1:6, 1:3] 0.8867 -0.7294 0.8766 0.0724 0.158 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "X1" "X2" "X3" "X4" ...
  .. ..$ : chr [1:3] "RC1" "RC3" "RC2"
 $ scores      : num [1:10, 1:3] 0.571 -1.182 -0.96 -0.191 1.238 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:3] "RC1" "RC3" "RC2"
 - attr(*, "class")= chr [1:2] "psych" "principal"
Code
facto.con.rota$values
[1] 2.9605196016 1.6514559664 0.9676482461 0.3148071695 0.1051429685
[6] 0.0004260478
Code
facto.con.rota$communality
       X1        X2        X3        X4        X5        X6 
0.9290991 0.9540317 0.8113774 0.9546461 0.9964610 0.9340086 
Code
facto.con.rota$loadings 

Loadings:
   RC1    RC3    RC2   
X1  0.887  0.227  0.302
X2 -0.729         0.647
X3  0.877  0.206       
X4         0.973       
X5  0.158  0.184  0.968
X6  0.365  0.880  0.163

                 RC1   RC3   RC2
SS loadings    2.250 1.851 1.478
Proportion Var 0.375 0.309 0.246
Cumulative Var 0.375 0.684 0.930
Code
head(cbind(datos,
           as.data.frame(scale(datos)),
           facto.con.rota$scores ))
   X1  X2  X3  X4  X5  X6          X1         X2         X3          X4
1 4.1 0.6 6.9 4.7 2.4 2.3  0.35655190 -1.2817273 -0.4084964 -1.10321182
2 1.8 3.0 6.3 6.6 2.5 4.0 -1.08216628  0.2797679 -0.8102961  0.76830823
3 3.4 5.2 5.7 6.0 4.3 2.7 -0.08131885  1.7111385 -1.2120958  0.17730190
4 2.7 1.0 7.1 5.9 1.8 2.3 -0.51918960 -1.0214781 -0.2745631  0.07880084
5 6.0 0.9 9.6 7.8 3.4 4.6  1.54505822 -1.0865404  1.3996024  1.95032090
6 1.9 3.3 7.9 4.8 2.6 1.9 -1.01961332  0.4749548  0.2611698 -1.00471077
          X5         X6        RC1         RC3         RC2
1 -0.8664528 -0.7822199  0.5705907 -1.06714507 -0.88338324
2 -0.7331523  0.9447590 -1.1819880  1.25720991 -0.78084121
3  1.6662553 -0.3758719 -0.9604831 -0.07612362  1.82469742
4 -1.6662553 -0.7822199 -0.1909306 -0.10217150 -1.69726848
5  0.4665515  1.5542810  1.2377758  1.63916935 -0.06984696
6 -0.5998519 -1.1885678 -0.3270609 -1.05263987 -0.28372078
Code
#Gráfica de variables sobre el primer plano de componentes

load <- facto.con.rota$loadings[,1:2]

plot(load, pch=20, 
     xlim=c(-1,1), 
     ylim=c(-1,1),
     xlab="Factor 1",
     ylab="Factor 2") 
abline(h=0,lty=3)
abline(v=0,lty=3)
text(load,pos=1,labels=names(datos),cex=0.8)

Code
#Gráfica de círculo de correlaciones

library(ade4)
load <- facto.con.rota$loadings[,1:2]
s.corcircle(load,grid=FALSE)

Code
fa.diagram(facto.con.rota)

Code
##Gráfica de Matriz de Correlación entre Variables y Factores

library(psych)
salida.facto <- cbind(datos,facto.con.rota$scores)
cor.plot(cor(salida.facto),
         main="Mapa de Calor", 
         diag=TRUE,number=T, 
         show.legend = TRUE)

Code
library(GGally)
ggpairs(salida.facto)

2 ANALISIS FACTORIAL EN PYTHON

SE PUEDE VER UN EJEMPLO COMPLETO DE UN ANALISIS FACTORIAL EN PYTHON EN EL SIGUIENTE Rpubs https://rpubs.com/jaimeisaacp/974784

Code
import pandas as pd
from factor_analyzer import FactorAnalyzer
import matplotlib.pyplot as plt

# Leer el archivo CSV
datos = pd.read_spss("HATCO.sav")

# Ver las primeras filas del dataset
print(datos.head())
    ID   X1   X2  ...            X12            X13             X14
0  1.0  4.1  0.6  ...  DECENTRALIZED  FIRM TYPE TWO        New Task
1  2.0  1.8  3.0  ...    CENTRALIZED  FIRM TYPE ONE        New Task
2  3.0  3.4  5.2  ...    CENTRALIZED  FIRM TYPE TWO  Modified Rebuy
3  4.0  2.7  1.0  ...    CENTRALIZED  FIRM TYPE TWO        New Task
4  5.0  6.0  0.9  ...  DECENTRALIZED  FIRM TYPE TWO  Straight Rebuy

[5 rows x 15 columns]
Code
# Seleccionar columnas desde Variable2 (índice 1) hasta Variable7 (índice 6)
datos = datos.iloc[:, 1:7]  # ":" indica "todas las filas", "1:7" indica "columnas desde índice 1 hasta 6"
print(datos)
     X1   X2   X3   X4   X5   X6
0   4.1  0.6  6.9  4.7  2.4  2.3
1   1.8  3.0  6.3  6.6  2.5  4.0
2   3.4  5.2  5.7  6.0  4.3  2.7
3   2.7  1.0  7.1  5.9  1.8  2.3
4   6.0  0.9  9.6  7.8  3.4  4.6
..  ...  ...  ...  ...  ...  ...
95  0.6  1.6  6.4  5.0  0.7  2.1
96  6.1  0.5  9.2  4.8  3.3  2.8
97  2.0  2.8  5.2  5.0  2.4  2.7
98  3.1  2.2  6.7  6.8  2.6  2.9
99  2.5  1.8  9.0  5.0  2.2  3.0

[100 rows x 6 columns]
Code
# Calcular la matriz de correlación
correlaciones = datos.corr()
print(correlaciones)
          X1        X2        X3        X4        X5        X6
X1  1.000000 -0.349225  0.509295  0.050414  0.611901  0.077115
X2 -0.349225  1.000000 -0.487213  0.272187  0.512981  0.186243
X3  0.509295 -0.487213  1.000000 -0.116104  0.066617 -0.034316
X4  0.050414  0.272187 -0.116104  1.000000  0.298677  0.788225
X5  0.611901  0.512981  0.066617  0.298677  1.000000  0.240808
X6  0.077115  0.186243 -0.034316  0.788225  0.240808  1.000000
Code
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity, calculate_kmo

# Test de esfericidad de Bartlett
chi_square_value, p_value = calculate_bartlett_sphericity(datos)
print(f"Chi-cuadrado: {chi_square_value}, p-valor: {p_value}")
Chi-cuadrado: 523.9498050173258, p-valor: 4.821355307731794e-102
Code
# Índice KMO
kmo_all, kmo_model = calculate_kmo(datos)
print(f"KMO global: {kmo_model}")
KMO global: 0.3764741116937912

Comentarios:

  • El test de Bartlett prueba si la matriz de correlación es una matriz de identidad (hipótesis nula). Un p-valor bajo (< 0.05) sugiere que el análisis factorial es apropiado.

  • El índice KMO mide la adecuación del análisis factorial. Valores > 0.6 son aceptables.

Code
# Crear el objeto FactorAnalyzer
fa = FactorAnalyzer(n_factors=2, rotation="varimax")

# Ajustar el modelo
fa.fit(datos)
FactorAnalyzer(n_factors=2, rotation='varimax', rotation_kwargs={})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Code
# Mostrar las cargas factoriales
cargas_factoriales = fa.loadings_
print(cargas_factoriales)
[[ 0.2729935   0.98153529]
 [ 0.48092868 -0.44937817]
 [-0.13383267  0.60516534]
 [ 0.76776671 -0.11555155]
 [ 0.61577209  0.29605204]
 [ 0.67496239 -0.05637946]]
Code
# Mostrar las comunalidades
comunalidades = fa.get_communalities()
print(comunalidades)
[1.03793697 0.43323313 0.38413628 0.60281788 0.46682208 0.45875287]
Code
# Mostrar la varianza explicada
varianza_explicada = fa.get_factor_variance()
print(varianza_explicada)
(array([1.74794423, 1.63575496]), array([0.29132404, 0.27262583]), array([0.29132404, 0.56394987]))

Comentarios:

  • Las comunalidades indican qué proporción de la varianza de cada variable es explicada por los factores.

  • La varianza explicada muestra qué proporción de la varianza total es explicada por cada factor.

Code
# Crear un gráfico de las cargas factoriales
plt.scatter(cargas_factoriales[:, 0], cargas_factoriales[:, 1])
for i, txt in enumerate(datos.columns):
    plt.annotate(txt, (cargas_factoriales[i, 0], cargas_factoriales[i, 1]))
plt.xlabel("Factor 1")
plt.ylabel("Factor 2")
plt.title("Cargas Factoriales")
plt.show()

Code
# !pip install factor-analyzer 
# Crear una instancia del objeto de análisis factorial 
from factor_analyzer.factor_analyzer import FactorAnalyzer 
fa = FactorAnalyzer(rotation='varimax') 
fa.fit(datos) 
FactorAnalyzer(rotation='varimax', rotation_kwargs={})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Code
# Verificar valores propios 
ev, v = fa.get_eigenvalues() 
ev
array([2.23345791, 1.92234526, 1.18106245, 0.44889245, 0.2050207 ,
       0.00922123])
Code
v
array([ 2.11742645,  1.74324963,  1.07143835,  0.02361827, -0.00540772,
       -0.02523974])

Tras ordenarlos, se observa que tres valores propios eran mayores que 1, por lo que podemos elegir tres factores.

Code
sorted(ev)
[np.float64(0.00922123237016147), np.float64(0.20502069575048437), np.float64(0.4488924515471145), np.float64(1.1810624479368041), np.float64(1.922345261741817), np.float64(2.233457910653617)]
Code

# Create scree plot using matplotlib
#plt.scatter(range(1,df.shape[1]+1),ev)
#plt.plot(range(1,df.shape[1]+1),ev)
#plt.title('Scree Plot')
#plt.xlabel('Factors')
#plt.ylabel('Eigenvalue')
#plt.grid()
#plt.show()

Tras elegir tres factores, se quiere comprobar si podía interpretarlos utilizando los valores de carga. Aquí, observo que, para el Factor 1 , las habitaciones, el número de camas y el número de plazas tienen cargas factoriales altas.

Code
pd.DataFrame(fa.loadings_, columns=['Factor1', 'Factor2', 'Factor3'], index=[datos.columns])
     Factor1   Factor2   Factor3
X1  0.769723  0.051982  0.604878
X2 -0.852539  0.138559  0.500168
X3  0.607274 -0.053159  0.073745
X4 -0.100729  0.941312  0.124519
X5  0.007134  0.185940  0.981563
X6 -0.022387  0.819697  0.092627
Code
# Cargar bibliotecas necesarias
library(psych)  # Para análisis factorial
library(GPArotation)  # Para rotaciones de factores
library(ggplot2)  # Para visualización

# Cargar el conjunto de datos mtcars
data("mtcars")

# Ver la estructura del conjunto de datos
str(mtcars)
'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

El análisis factorial requiere que las variables sean numéricas y estén estandarizadas (media = 0, desviación estándar = 1). Eliminamos variables categóricas (vs, am, gear, carb) y estandarizamos las variables restantes.

Code
# Seleccionar solo variables numéricas
mtcars_numeric <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec")]

# Estandarizar las variables
mtcars_scaled <- scale(mtcars_numeric)

# Verificar la estandarización
colMeans(mtcars_scaled)  # Medias cercanas a 0
          mpg           cyl          disp            hp          drat 
 7.112366e-17 -1.474515e-17 -9.085614e-17  1.040834e-17 -2.918672e-16 
           wt          qsec 
 4.682398e-17  5.299580e-16 
Code
apply(mtcars_scaled, 2, sd)  # Desviaciones estándar cercanas a 1
 mpg  cyl disp   hp drat   wt qsec 
   1    1    1    1    1    1    1 
Code
# Calcular los autovalores y vectores propios
fa.parallel(mtcars_scaled, fm = "pa", fa = "fa")  # Gráfico de sedimentación

Parallel analysis suggests that the number of factors =  1  and the number of components =  NA 
Code
# Realizar el análisis factorial con 2 factores
fa_result <- fa(mtcars_scaled, nfactors = 2, rotate = "varimax", fm = "pa")

# Mostrar los resultados
print(fa_result)
Factor Analysis using method =  pa
Call: fa(r = mtcars_scaled, nfactors = 2, rotate = "varimax", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
       PA1   PA2   h2    u2 com
mpg  -0.85 -0.37 0.85 0.150 1.4
cyl   0.80  0.53 0.92 0.082 1.7
disp  0.88  0.38 0.91 0.086 1.4
hp    0.59  0.69 0.83 0.175 2.0
drat -0.78 -0.04 0.62 0.384 1.0
wt    0.94  0.12 0.90 0.103 1.0
qsec -0.07 -0.99 0.98 0.025 1.0

                       PA1  PA2
SS loadings           3.97 2.02
Proportion Var        0.57 0.29
Cumulative Var        0.57 0.86
Proportion Explained  0.66 0.34
Cumulative Proportion 0.66 1.00

Mean item complexity =  1.4
Test of the hypothesis that 2 factors are sufficient.

df null model =  21  with the objective function =  8.77 with Chi Square =  244.19
df of  the model are 8  and the objective function was  0.49 

The root mean square of the residuals (RMSR) is  0.02 
The df corrected root mean square of the residuals is  0.04 

The harmonic n.obs is  32 with the empirical chi square  0.67  with prob <  1 
The total n.obs was  32  with Likelihood Chi Square =  12.97  with prob <  0.11 

Tucker Lewis Index of factoring reliability =  0.938
RMSEA index =  0.136  and the 90 % confidence intervals are  0 0.277
BIC =  -14.75
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   PA1  PA2
Correlation of (regression) scores with factors   0.99 0.99
Multiple R square of scores with factors          0.97 0.98
Minimum correlation of possible factor scores     0.94 0.95

2.0.1 Interpretación:

  • Factor 1 : Representa características relacionadas con el tamaño y potencia del motor (cyl, disp, hp, wt). Las variables tienen cargas altas y positivas en este factor.

  • Factor 2 : Representa características relacionadas con la eficiencia y rendimiento (mpg, drat, qsec). Las variables tienen cargas altas y negativas/positivas en este factor.

Code
# Convertir las cargas factoriales a un DataFrame
loadings_df <- as.data.frame(fa_result$loadings)

# Agregar nombres de variables
loadings_df$Variable <- rownames(loadings_df)

# Verificar las dimensiones del DataFrame
dim(loadings_df)
[1] 14  2

El conjunto de datos mtcars tiene 32 observaciones (filas) y 11 variables (columnas). Las variables incluyen características como:

  • mpg: Millas por galón.

  • cyl: Número de cilindros.

  • disp: Desplazamiento del motor.

  • hp: Caballos de fuerza.

  • drat: Relación de eje trasero.

  • wt: Peso del vehículo.

  • qsec: Tiempo de aceleración.

  • vs: Motor en V o recto.

  • am: Transmisión (automática o manual).

  • gear: Número de marchas.

  • carb: Número de carburadores.