Módulo 1: Introducción al Análisis Multivariado
Teoría: ¿Qué es el Análisis Multivariado?
El análisis multivariado es un conjunto de técnicas estadísticas utilizadas para analizar datos que contienen más de una variable. A diferencia del análisis univariado (una variable) o bivariado (dos variables), el análisis multivariado explora las relaciones conjuntas entre tres o más variables simultáneamente. Esto nos permite obtener una comprensión mucho más rica y realista de los fenómenos que estudiamos, ya que los problemas del mundo real raramente involucran una sola variable.
Objetivos clave:
Reducción de la dimensionalidad: Simplificar la estructura de los datos sin perder información relevante (Ej: Análisis de Componentes Principales).
Clasificación y agrupación: Agrupar observaciones o variables similares en categorías (Ej: Análisis de Clúster, Análisis Discriminante).
Investigación de la dependencia: Analizar la relación entre dos conjuntos de variables (Ej: Correlación Canónica, MANOVA).
Predicción: Construir modelos para predecir los valores de ciertas variables basándose en otras.
Conceptos Fundamentales:
Vector de Medias:
Es el análogo multivariado de la media. Para un conjunto de datos con \(p\) variables, el vector de medias \(\mu\) es:
\[\\ \mathbf{\mu} = \begin{pmatrix} \mu\_1 \\ \mu\_2 \\ \vdots \\ \mu\_p \end{pmatrix}\]
Donde
\[\mu_i\] es la media de la variable \(i\).
Matriz de Covarianzas:
\[\\ \mathbf{\Sigma} = \begin{pmatrix} \sigma\_{11} & \sigma\_{12} & \cdots & \sigma\_{1p} \\ \sigma\_{21} & \sigma\_{22} & \cdots & \sigma\_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma\_{p1} & \sigma\_{p2} & \cdots & \sigma\_{pp} \end{pmatrix}\]
Donde
\[\sigma_{ii}\] (la diagonal) es la varianza de la variable \(i\), y \[\sigma_{ij}\] (fuera de la diagonal) es la covarianza entre las variables \(i\) y \(j\).
Código en R Studio: Preparando el Entorno
Para empezar, vamos a cargar algunos datos clásicos en R y a calcular estas matrices básicas. Usaremos el famoso dataset iris, que contiene mediciones de 3 especies de flores.
# 1. Cargar y explorar los datos
# El dataset 'iris' ya viene cargado en R.
# Tomaremos solo las variables numéricas (las primeras 4 columnas).
data <- iris[, 1:4]
head(data)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 5.1 3.5 1.4 0.2
## 2 4.9 3.0 1.4 0.2
## 3 4.7 3.2 1.3 0.2
## 4 4.6 3.1 1.5 0.2
## 5 5.0 3.6 1.4 0.2
## 6 5.4 3.9 1.7 0.4
# 2. Calcular el Vector de Medias
# La función colMeans calcula la media de cada columna.
vector_medias <- colMeans(data)
print("Vector de Medias:")
## [1] "Vector de Medias:"
print(vector_medias)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 5.843333 3.057333 3.758000 1.199333
# 3. Calcular la Matriz de Covarianzas
# La función cov() calcula la matriz de covarianzas.
matriz_cov <- cov(data)
print("Matriz de Covarianzas:")
## [1] "Matriz de Covarianzas:"
print(matriz_cov)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length 0.6856935 -0.0424340 1.2743154 0.5162707
## Sepal.Width -0.0424340 0.1899794 -0.3296564 -0.1216394
## Petal.Length 1.2743154 -0.3296564 3.1162779 1.2956094
## Petal.Width 0.5162707 -0.1216394 1.2956094 0.5810063
# 4. Calcular la Matriz de Correlación
# La función cor() es útil para ver las relaciones estandarizadas.
matriz_cor <- cor(data)
print("Matriz de Correlación:")
## [1] "Matriz de Correlación:"
print(matriz_cor)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
## Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
## Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
## Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
Interpretación:
El vector de medias te da el valor promedio para cada una de las cuatro medidas (largo del sépalo, ancho del sépalo, etc.).
La matriz de covarianzas muestra en su diagonal la varianza de cada variable. Los valores fuera de la diagonal indican cómo dos variables cambian juntas. Por ejemplo, una covarianza positiva entre Sepal.Length y Petal.Length significa que a medida que una crece, la otra tiende a crecer también.
La matriz de correlación es una versión estandarizada de la matriz de covarianzas. Es más fácil de interpretar, ya que los valores van de -1 a 1. Un valor cercano a 1 indica una fuerte relación positiva, y cercano a -1, una fuerte relación negativa.
Módulo 2: Análisis de Componentes Principales (PCA)
Teoría: Reduciendo la Complejidad
El Análisis de Componentes Principales (PCA, por sus siglas en inglés) es una técnica de reducción de dimensionalidad. Su objetivo es transformar un conjunto de variables posiblemente correlacionadas en un nuevo conjunto de variables no correlacionadas llamadas componentes principales.
Estos nuevos componentes son combinaciones lineales de los originales y están ordenados de tal manera que el primer componente principal (PC1) captura la mayor cantidad posible de variabilidad de los datos, el segundo (PC2) captura la segunda mayor cantidad de variabilidad (siendo ortogonal/no correlacionado con el PC1), y así sucesivamente. A menudo, los primeros dos o tres componentes son suficientes para explicar una gran parte de la varianza total, permitiéndonos visualizar y analizar los datos en un espacio de menor dimensión.
Fórmulas Matemáticas
El PCA se basa en la diagonalización de la matriz de covarianzas \(\Sigma\). Los componentes principales se encuentran resolviendo la siguiente ecuación de valores y vectores propios (eigenvalues y eigenvectors):
$$( - ) = 0
$$Donde:
El Componente Principal \(k\) (PCk) se calcula como:
\[PC_k= e_{k1}X_1 + e_{k2}X_2 + \cdots + e_{kp}X_p\]
Ejemplo y Código en RStudio
Usaremos el dataset mtcars para reducir la
dimensionalidad de las características de los coches.
# 1. Cargar los datos
# Usaremos solo las columnas numéricas de mtcars.
data_pca <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec")]
# Es MUY importante estandarizar los datos antes de PCA si las variables
# tienen escalas diferentes (ej. millas vs. caballos de fuerza).
data_pca_scaled <- scale(data_pca)
# 2. Realizar el PCA
# La función prcomp() realiza el análisis.
pca_result <- prcomp(data_pca_scaled, center = TRUE, scale. = TRUE)
# 3. Explorar los resultados
# summary() nos muestra la importancia de cada componente.
summary(pca_result)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.2552 1.0754 0.58724 0.39741 0.3599 0.27542 0.22181
## Proportion of Variance 0.7266 0.1652 0.04926 0.02256 0.0185 0.01084 0.00703
## Cumulative Proportion 0.7266 0.8918 0.94107 0.96364 0.9821 0.99297 1.00000
# Ver los vectores propios (loadings)
print(pca_result$rotation)
## PC1 PC2 PC3 PC4 PC5 PC6
## mpg -0.4127573 0.08296098 -0.2416477 -0.766798834 0.2127946 0.09002238
## cyl 0.4247315 0.07844163 -0.1880252 -0.193926827 -0.2383825 -0.78055217
## disp 0.4225036 -0.08239922 0.1180359 -0.587679498 -0.1488730 0.16169873
## hp 0.3877611 0.33696384 0.2027400 0.006884691 0.8314576 -0.04607937
## drat -0.3311703 0.44858845 0.7552915 -0.117073263 -0.2217502 -0.23319304
## wt 0.3913153 -0.32236122 0.4405532 -0.107346915 -0.1666473 0.36612190
## qsec -0.2399275 -0.74932087 0.2943885 -0.061382684 0.3278152 -0.40735764
## PC7
## mpg 0.35069828
## cyl 0.27276548
## disp -0.63803744
## hp 0.03873894
## drat -0.03702814
## wt 0.61280387
## qsec -0.13083413
# 4. Visualizar los resultados
# Un biplot muestra tanto las observaciones como las variables en el nuevo espacio.
# Nos ayuda a interpretar qué variables contribuyen a cada componente.
biplot(pca_result, scale = 1)
Interpretación:
summary(pca_result): Fíjate en la “Proportion of Variance” y “Cumulative Proportion”. Verás que los primeros dos componentes (PC1 y PC2) explican un gran porcentaje de la variabilidad total (probablemente más del 80%). Esto significa que podemos representar la mayor parte de la información de las 7 variables originales con solo 2.
pca_result rotation: Esta matriz muestra los
“loadings” o coeficientes. Por ejemplo, en PC1, verás que variables como
cyl, disp, hp y wt
tienen coeficientes negativos grandes, mientras que mpg
tiene uno positivo. Esto sugiere que PC1 representa un eje que va desde
coches pequeños y eficientes (alto mpg) a coches grandes y
potentes (altos cyl, hp, wt).
Biplot: Las flechas representan las variables originales. Flechas que apuntan en la misma dirección están positivamente correlacionadas. Las que apuntan en direcciones opuestas, negativamente. Los puntos son las observaciones (los coches). Coches que están cerca en el gráfico son similares en términos de sus características originales.
Ejemplo 2
# Cargar datos
library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(ggplot2)
library(GGally)
data(USArrests)
# Ver estructura
head(USArrests)
## Murder Assault UrbanPop Rape
## Alabama 13.2 236 58 21.2
## Alaska 10.0 263 48 44.5
## Arizona 8.1 294 80 31.0
## Arkansas 8.8 190 50 19.5
## California 9.0 276 91 40.6
## Colorado 7.9 204 78 38.7
str(USArrests)
## 'data.frame': 50 obs. of 4 variables:
## $ Murder : num 13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
## $ Assault : int 236 263 294 190 276 204 110 238 335 211 ...
## $ UrbanPop: int 58 48 80 50 91 78 77 72 80 60 ...
## $ Rape : num 21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...
# Estandarizar (importante en PCA si las escalas difieren)
USArrests_scaled <- scale(USArrests)
# Realizar PCA
pca_result <- prcomp(USArrests_scaled, center = TRUE, scale. = TRUE)
# Resumen
summary(pca_result)
## Importance of components:
## PC1 PC2 PC3 PC4
## Standard deviation 1.5749 0.9949 0.59713 0.41645
## Proportion of Variance 0.6201 0.2474 0.08914 0.04336
## Cumulative Proportion 0.6201 0.8675 0.95664 1.00000
# Gráfico de varianza explicada
fviz_eig(pca_result, addlabels = TRUE)
## Warning in geom_bar(stat = "identity", fill = barfill, color = barcolor, :
## Ignoring empty aesthetic: `width`.
# Biplot
fviz_pca_biplot(pca_result,
col.ind = "cos2",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
## Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Interpretación
PC1 y PC2 explican el 80-90% de la varianza.
Las flechas indican la dirección y magnitud de cada variable original.
Los puntos son las observaciones (estados).
Módulo 3: Análisis de Clúster
Teoría: Encontrando Grupos Naturales
El análisis de clúster es una técnica de aprendizaje no supervisado. Su objetivo es agrupar un conjunto de observaciones de tal manera que las observaciones dentro de un mismo grupo (o clúster) sean más similares entre sí que con las de otros grupos. No hay una variable de “respuesta” o “etiqueta” predefinida; el algoritmo busca la estructura inherente en los datos.
Existen dos tipos principales:
Fórmulas Matemáticas
El concepto central es la distancia. La métrica más común es la Distancia Euclidiana entre dos puntos \(A = (a_1, a_2, ..., a_p)\) y \(B = (b_1, b_2, ..., b_p)\):
\[d(A, B) = \sqrt{\sum_{i=1}^{p} (a_i - b_i)^2}\] En K-Means, el objetivo es minimizar la suma de cuadrados dentro de los clústeres (Within-Cluster Sum of Squares, WCSS):
\[\text{WCSS} = \sum_{k=1}^{K} \sum_{i \in C_k} d(x_i, \mu_k)^2\]
Donde K es el número de clústeres, Ck es el clúster k, y μk es el centroide del clúster k.
Ejemplo y Código en RStudio
Vamos a usar los datos iris para ver si podemos identificar las tres especies de flores sin usar sus etiquetas, solo basándonos en sus medidas.
1. Clustering Jerárquico
# 1. Preparar los datos
# Usaremos las 4 variables numéricas y las estandarizaremos.
data_cluster <- iris[, 1:4]
data_cluster_scaled <- scale(data_cluster)
# 2. Calcular la matriz de distancias
dist_matrix <- dist(data_cluster_scaled, method = "euclidean")
# 3. Realizar el clustering jerárquico aglomerativo
# Usaremos el método de Ward, que minimiza la varianza dentro de los clústeres.
hierarchical_cluster <- hclust(dist_matrix, method = "ward.D2")
# 4. Visualizar el dendrograma
plot(hierarchical_cluster, main = "Dendrograma de Flores Iris")
# Podemos cortar el árbol en 3 clústeres.
rect.hclust(hierarchical_cluster, k = 3, border = "red")
Interpretación del Dendrograma: El eje vertical representa la distancia o “disimilitud”. Las fusiones que ocurren en la parte inferior del gráfico son entre observaciones muy similares. Para decidir el número de clústeres, buscas el “salto” vertical más grande que puedas hacer sin cortar ninguna línea horizontal y cuentas las líneas verticales que cruzas. En este caso, cortar a una altura de ~4 nos daría claramente 3 grupos.
Eejemplo
# Distancia + enlace de Ward
d <- dist(USArrests_scaled, method = "euclidean")
hc <- hclust(d, method = "ward.D2")
# Dendrograma
fviz_dend(hc, k = 4, rect = TRUE, cex = 0.7) +
labs(title = "Dendrograma: Clustering Jerárquico")
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the factoextra package.
## Please report the issue at <https://github.com/kassambara/factoextra/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the factoextra package.
## Please report the issue at <https://github.com/kassambara/factoextra/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
2. K-Means Clustering
# 1. Determinar el número óptimo de clústeres (k)
# Método del codo (Elbow Method)
library(factoextra)
library(ggplot2)
wcss <- numeric(10)
for (i in 1:10) {
wcss[i] <- sum(kmeans(data_cluster_scaled, centers = i)$withinss)
}
ggplot() +
geom_point(aes(x = 1:10, y = wcss), color = 'blue') +
geom_line(aes(x = 1:10, y = wcss), color = 'blue') +
ggtitle("Método del Codo para Encontrar K Óptimo") +
xlab("Número de Clústeres (K)") +
ylab("WCSS") +
theme_minimal()
# El "codo" aparece en k=3, lo que sugiere 3 clústeres.
# 2. Realizar el clustering K-Means con k=3
set.seed(123) # Para reproducibilidad
kmeans_result <- kmeans(data_cluster_scaled, centers = 3, nstart = 25)
# 3. Explorar los resultados
print(kmeans_result$size) # Tamaño de cada clúster
## [1] 50 53 47
print(kmeans_result$centers) # Centroides de cada clúster
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 -1.01119138 0.85041372 -1.3006301 -1.2507035
## 2 -0.05005221 -0.88042696 0.3465767 0.2805873
## 3 1.13217737 0.08812645 0.9928284 1.0141287
# 4. Visualizar los clústeres
# Usaremos los dos primeros componentes principales para visualizar.
fviz_cluster(kmeans_result, data = data_cluster_scaled,
palette = c("#2E9FDF", "#00AFBB", "#E7B800"),
geom = "point",
ellipse.type = "convex",
ggtheme = theme_bw()
)
Interpretación:
El método del codo muestra que a partir de 3 clústeres, añadir más no reduce significativamente la varianza total dentro de los grupos.
Los resultados de kmeans_result te dirán cuántas observaciones cayeron en cada clúster.
La visualización te mostrará los tres grupos que el algoritmo encontró. Si comparas estos grupos con las especies reales de iris$Species, verás que K-Means hizo un trabajo excelente para separar las flores basándose únicamente en sus medidas.
Ejemplo
# Método del codo
fviz_nbclust(USArrests_scaled, kmeans, method = "wss") +
geom_vline(xintercept = 4, linetype = 2) +
labs(title = "Método del codo para K-means")
# Aplicar K-means
set.seed(123)
km <- kmeans(USArrests_scaled, centers = 4, nstart = 25)
# Visualizar en espacio PCA
fviz_cluster(km, data = USArrests_scaled,
palette = "jco",
ellipse.type = "euclid") +
labs(title = "Clusters K-means proyectados en PCA")
Módulo 4: Análisis Discriminante Lineal (LDA)
Teoría: Clasificando ObservacionesA diferencia del clustering, el Análisis Discriminante Lineal (LDA) es una técnica de aprendizaje supervisado. Su objetivo principal es la clasificación. Dado un conjunto de observaciones que ya pertenecen a grupos conocidos (etiquetas), el LDA busca encontrar una combinación lineal de las variables que mejor separe (o discrimine) esos grupos.Una vez que se encuentran estas “funciones discriminantes”, se pueden usar para predecir a qué grupo pertenece una nueva observación. El LDA asume que las variables predictoras siguen una distribución normal y que las covarianzas entre los grupos son iguales.Fórmulas MatemáticasEl LDA busca maximizar la relación entre la varianza entre los grupos (\(S_B\)) y la varianza dentro de los grupos (\(S_W\)). Esto se logra encontrando un vector de coeficientes \(w\) que maximiza el siguiente criterio:\[J(w) = \frac{w^T S_B w}{w^T S_W w} \]Donde:
La solución a este problema también se encuentra a través de un problema de valores y vectores propios.
El dataset iris es el ejemplo canónico para LDA, ya que
tenemos las medidas (predictores) y la especie (la clase que queremos
predecir).
# 1. Cargar el paquete necesario y preparar los datos
# El paquete MASS contiene la función lda().
#install.packages("MASS")
library(MASS)
# Datos de iris
data_lda <- iris
# 2. Dividir los datos en entrenamiento y prueba
# Esto es crucial para evaluar el modelo de forma honesta.
set.seed(123)
sample_index <- sample(1:nrow(data_lda), 0.7 * nrow(data_lda))
train_data <- data_lda[sample_index, ]
test_data <- data_lda[-sample_index, ]
# 3. Construir el modelo LDA
# La fórmula Species ~ . significa "predecir Especie usando todas las demás variables".
lda_model <- lda(Species ~ ., data = train_data)
# Imprimir el modelo
print(lda_model)
## Call:
## lda(Species ~ ., data = train_data)
##
## Prior probabilities of groups:
## setosa versicolor virginica
## 0.3428571 0.3047619 0.3523810
##
## Group means:
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa 4.966667 3.394444 1.461111 0.2555556
## versicolor 5.971875 2.787500 4.309375 1.3500000
## virginica 6.586486 2.948649 5.529730 1.9945946
##
## Coefficients of linear discriminants:
## LD1 LD2
## Sepal.Length 0.887242 -0.1977697
## Sepal.Width 1.267866 -1.8705232
## Petal.Length -2.101235 1.1717820
## Petal.Width -2.934756 -3.1681626
##
## Proportion of trace:
## LD1 LD2
## 0.9937 0.0063
# 4. Realizar predicciones en el conjunto de prueba
predictions <- predict(lda_model, newdata = test_data)
# Ver las predicciones
predicted_classes <- predictions$class
print(predicted_classes)
## [1] setosa setosa setosa setosa setosa setosa
## [7] setosa setosa setosa setosa setosa setosa
## [13] setosa setosa versicolor versicolor versicolor versicolor
## [19] versicolor versicolor versicolor versicolor versicolor versicolor
## [25] versicolor versicolor versicolor virginica versicolor versicolor
## [31] versicolor versicolor virginica virginica virginica virginica
## [37] virginica virginica virginica virginica virginica virginica
## [43] virginica virginica virginica
## Levels: setosa versicolor virginica
# 5. Evaluar el rendimiento del modelo
# Creamos una tabla de confusión para ver qué tan bien clasificó.
confusion_matrix <- table(Predicted = predicted_classes, Actual = test_data$Species)
print(confusion_matrix)
## Actual
## Predicted setosa versicolor virginica
## setosa 14 0 0
## versicolor 0 17 0
## virginica 0 1 13
# Calcular la precisión (accuracy)
accuracy <- sum(diag(confusion_matrix)) / sum(confusion_matrix)
print(paste("Precisión del modelo:", round(accuracy * 100, 2), "%"))
## [1] "Precisión del modelo: 97.78 %"
# 6. Visualizar los resultados
# lda_model crea nuevas dimensiones (LD1, LD2) que separan óptimamente los grupos.
# Podemos graficar las observaciones en este nuevo espacio.
lda_plot_data <- cbind(train_data, predict(lda_model)$x)
ggplot(lda_plot_data, aes(x = LD1, y = LD2, color = Species)) +
geom_point() +
ggtitle("Visualización del Análisis Discriminante Lineal") +
theme_minimal()
Interpretación:
print(lda_model): Te mostrará los “coeficientes de la discriminante lineal”. Estos te indican qué variables son más importantes para separar los grupos.
Matriz de Confusión: Es la herramienta clave para la evaluación. La diagonal muestra las clasificaciones correctas. Los valores fuera de la diagonal son los errores.
Precisión: Un porcentaje alto (en este caso, probablemente cercano al 98-100%) indica que el modelo es muy bueno para clasificar las flores en su especie correcta basándose en sus medidas.
Gráfico LDA: Verás qué tan bien las funciones discriminantes (LD1, LD2) separan los puntos de las tres especies. Idealmente, los clústeres de puntos de cada color estarán muy separados y serán muy compactos.
Modulo 5: Análisis Factorial (FA)
Objetivo Explicar las correlaciones entre variables observadas mediante factores latentes (no observables).
Modelo factorial ortogonal \[X=μ+LF+ε\] Donde:
\(X∈R^p\) :variables observadas (centradas).
\(F∈R^m\) :factores comunes \((m<p)\), \(E[F]=0\) , \(Cov(F)=Im\)
\(L∈ R^{p×m}\) : matriz de cargas factoriales.
\(ε\) : errores únicos, \(E[ε]=0 , Cov(ε)=Ψ\) (diagonal).
Entonces:
\(Σ=LL^⊤ +Ψ\)
library(psych)
# Determinar número de factores
fa.parallel(USArrests_scaled, fa = "fa", n.iter = 100)
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Parallel analysis suggests that the number of factors = 2 and the number of components = NA
# Extraer 2 factores con rotación Varimax
fa_model <- fa(USArrests_scaled, nfactors = 2, rotate = "varimax")
# Cargas factoriales
print(fa_model$loadings, cutoff = 0.3)
##
## Loadings:
## MR1 MR2
## Murder 0.948
## Assault 0.841 0.308
## UrbanPop 0.669
## Rape 0.586 0.560
##
## MR1 MR2
## SS loadings 1.953 0.856
## Proportion Var 0.488 0.214
## Cumulative Var 0.488 0.702
# Diagrama de factores
fa.diagram(fa_model)