CC0293 - Análise
Multivariada
ANÁLISE DE FATORES ORTOGONAIS
Prof. Silvia Maria de Freitas
Jonas Freire Ribeiro
548254
Os dados na Tabela 1 abaixo, mostram as porcentagens da força de trabalho em nove diferentes tipos de indústrias para 30 países europeus. Nesse caso, métodos multivariados podem ser úteis para isolar grupos de países com padrões similares de empregos, e, em geral, ajudar o entendimento dos relacionamentos entre os países. Diferenças entre países que são relacionados a grupos políticos (UE, a União Européia; AELC, a área européia de livre comércio; países do leste europeu e outros países) podem ser de particular interesse. Realize uma Análise Exploratória na base de dados e responda os itens abaixo.
RESPOSTAS:
Inicialmente, os dados da questão foram organizados em um data frame.
############################################################
# Dados
############################################################
paises <- c("Alemanha","Áustria","Bélgica","Dinamarca","Espanha","Finlândia",
"França","Grécia","Holanda","Irlanda","Itália","Luxemburgo",
"Portugal","Reino Unido","Suécia","Islândia","Noruega","Suíça",
"Bulgária","Hungria","Polônia","Rep. Tcheca","Eslováquia",
"Lituânia","Turquia","EUA","Canadá","Japão")
grupo <- c("UE","UE","UE","UE","UE","UE","UE","UE","UE","UE","UE","UE","UE",
"UE","UE","AELC","AELC","AELC","Leste","Leste","Leste","Leste",
"Leste","Leste","Outro","Outro","Outro","Outro")
dados <- data.frame(
AGR = c(1.3, 3.2, 1.3, 3.3, 2.8, 7.3, 2.6, 16.3, 4.0, 7.7, 5.5, 2.0, 6.3, 1.9, 1.6,
12.0, 2.3, 1.9, 14.3, 6.0, 5.1, 3.9, 4.3, 8.8, 15.0, 2.0, 2.0, 1.8),
MIN = c(1.3, 1.3, 0.6, 1.0, 1.0, 1.6, 0.5, 0.8, 1.6, 1.0, 0.5, 0.7, 0.5, 1.0, 0.9,
8.5, 14.2, 0.7, 3.5, 1.2, 1.2, 1.4, 1.3, 0.6, 1.7, 1.0, 6.8, 0.3),
FAB = c(33.1, 29.5, 24.2, 22.6, 27.4, 29.5, 22.8, 16.0, 25.6, 27.3, 28.7, 23.5,
23.6, 21.8, 24.7, 20.4, 18.2, 22.2, 23.9, 23.6, 24.2, 27.3, 26.9, 24.7,
22.0, 22.0, 20.7, 28.0),
FE = c(1.7, 3.4, 2.3, 1.3, 2.0, 2.2, 3.0, 10.9, 3.7, 3.4, 2.3, 1.4, 3.4, 1.3,
1.2, 12.0, 8.8, 2.2, 7.0, 4.4, 4.2, 4.7, 4.3, 3.3, 8.0, 2.0, 3.0, 1.2),
CON = c(8.0, 10.0, 4.7, 5.7, 7.4, 6.8, 6.0, 8.4, 6.0, 6.6, 4.7, 6.5, 7.6, 5.4,
6.5, 7.1, 7.2, 5.8, 5.4, 6.6, 6.4, 7.0, 7.4, 6.5, 5.0, 5.0, 6.0, 9.0),
SER = c(54.6, 52.5, 62.3, 65.0, 59.3, 52.6, 65.1, 48.2, 58.6, 53.9, 57.9, 64.0,
58.6, 59.8, 59.3, 52.0, 49.3, 63.6, 45.9, 58.4, 59.0, 55.7, 56.2, 51.2,
48.0, 64.0, 63.0, 60.0),
FIN = c(3.3, 5.7, 7.3, 6.7, 5.8, 5.5, 6.5, 4.6, 6.1, 5.8, 5.4, 9.1, 6.4, 5.2,
6.7, 4.0, 4.2, 6.5, 3.9, 2.9, 3.4, 2.5, 2.7, 2.1, 3.0, 6.0, 5.0, 6.0),
SSP = c(35.3, 32.6, 27.0, 31.3, 25.4, 31.5, 29.4, 20.3, 28.6, 26.0, 25.2, 30.0,
28.6, 27.0, 32.9, 17.0, 18.0, 26.7, 20.0, 20.2, 21.5, 21.4, 20.2, 20.7,
18.0, 25.0, 22.0, 18.0),
TC = c(11.0, 14.0, 15.1, 15.8, 15.0, 10.0, 15.0, 7.8, 14.8, 12.6, 13.0, 13.0,
14.2, 13.2, 16.4, 10.0, 9.4, 11.5, 8.9, 9.8, 10.3, 10.2, 9.8, 10.4,
8.0, 15.0, 13.0, 16.0)
)
rownames(dados) <- paises
knitr::kable(head(dados, 10), caption = "10 primeiras linhas do banco de dados")| AGR | MIN | FAB | FE | CON | SER | FIN | SSP | TC | |
|---|---|---|---|---|---|---|---|---|---|
| Alemanha | 1.3 | 1.3 | 33.1 | 1.7 | 8.0 | 54.6 | 3.3 | 35.3 | 11.0 |
| Áustria | 3.2 | 1.3 | 29.5 | 3.4 | 10.0 | 52.5 | 5.7 | 32.6 | 14.0 |
| Bélgica | 1.3 | 0.6 | 24.2 | 2.3 | 4.7 | 62.3 | 7.3 | 27.0 | 15.1 |
| Dinamarca | 3.3 | 1.0 | 22.6 | 1.3 | 5.7 | 65.0 | 6.7 | 31.3 | 15.8 |
| Espanha | 2.8 | 1.0 | 27.4 | 2.0 | 7.4 | 59.3 | 5.8 | 25.4 | 15.0 |
| Finlândia | 7.3 | 1.6 | 29.5 | 2.2 | 6.8 | 52.6 | 5.5 | 31.5 | 10.0 |
| França | 2.6 | 0.5 | 22.8 | 3.0 | 6.0 | 65.1 | 6.5 | 29.4 | 15.0 |
| Grécia | 16.3 | 0.8 | 16.0 | 10.9 | 8.4 | 48.2 | 4.6 | 20.3 | 7.8 |
| Holanda | 4.0 | 1.6 | 25.6 | 3.7 | 6.0 | 58.6 | 6.1 | 28.6 | 14.8 |
| Irlanda | 7.7 | 1.0 | 27.3 | 3.4 | 6.6 | 53.9 | 5.8 | 26.0 | 12.6 |
Agora vamos para a análise exploratória dos dados. Para iniciar, foi feito o seguinte gráfico que resume a estrutura de correlação das variáveis quantitativas.
Além disso, o seguinte gráfico ajuda na visualização do nível de correlação entre as variáveis.
R <- cor(dados) # matriz de correlação
corrplot(R, method = "color", addCoef.col = "black", number.cex = 0.7, tl.cex = 0.8)Para visualizar a distribuição das variáveis, foi calculado algumas
medidas de posição com a função summary e foram feitos
boxplots.
## AGR MIN FAB FE
## Min. : 1.300 Min. : 0.300 Min. :16.00 Min. : 1.200
## 1st Qu.: 2.000 1st Qu.: 0.700 1st Qu.:22.15 1st Qu.: 2.000
## Median : 3.600 Median : 1.000 Median :24.05 Median : 3.150
## Mean : 5.232 Mean : 2.025 Mean :24.44 Mean : 3.879
## 3rd Qu.: 6.550 3rd Qu.: 1.450 3rd Qu.:27.30 3rd Qu.: 4.325
## Max. :16.300 Max. :14.200 Max. :33.10 Max. :12.000
## CON SER FIN SSP
## Min. : 4.700 Min. :45.90 Min. :2.100 Min. :17.00
## 1st Qu.: 5.775 1st Qu.:52.58 1st Qu.:3.775 1st Qu.:20.27
## Median : 6.500 Median :58.50 Median :5.450 Median :25.30
## Mean : 6.596 Mean :57.07 Mean :5.082 Mean :24.99
## 3rd Qu.: 7.250 3rd Qu.:60.58 3rd Qu.:6.175 3rd Qu.:28.80
## Max. :10.000 Max. :65.10 Max. :9.100 Max. :35.30
## TC
## Min. : 7.80
## 1st Qu.:10.00
## Median :12.80
## Mean :12.26
## 3rd Qu.:14.85
## Max. :16.40
par(mfrow = c(3,3))
boxplot(dados$AGR, main = "AGR")
boxplot(dados$SSP, main = "SSP")
boxplot(dados$FAB, main = "FAB")
boxplot(dados$MIN, main = "MIN")
boxplot(dados$FE, main = "FE")
boxplot(dados$CON, main = "CON")
boxplot(dados$SER, main = "SER")
boxplot(dados$FIN, main = "FIN")
boxplot(dados$TC, main = "TC")Para verificar se a estrutura de correlação é adequada para a aplicação da técnica de análise fatorial, foi realizado o teste KMO.
## KMO:
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = R)
## Overall MSA = 0.65
## MSA for each item =
## AGR MIN FAB FE CON SER FIN SSP TC
## 0.54 0.39 0.63 0.73 0.20 0.65 0.69 0.81 0.88
############################################################
# (a) Análise Fatorial com 3 e 4 fatores
############################################################
fa3 <- fa(dados, nfactors = 3, rotate = "none",
fm = "pc", cor = "cor", covar = F,
missing = F, scores = "regression")
fa4 <- fa(dados, nfactors = 4, rotate = "none",
fm = "pc", cor = "cor", covar = F,
missing = F, scores = "regression")
cat("\n===== Solução 3 fatores =====\n")##
## ===== Solução 3 fatores =====
## Factor Analysis using method = minres
## Call: fa(r = dados, nfactors = 3, rotate = "none", scores = "regression",
## covar = F, missing = F, fm = "pc", cor = "cor")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 MR3 h2 u2 com
## AGR -0.82 0.14 -0.56 1.01 -0.0107 1.8
## MIN -0.49 -0.34 0.50 0.60 0.4020 2.7
## FAB 0.45 0.88 0.15 1.00 -0.0002 1.6
## FE -0.92 -0.17 0.06 0.87 0.1311 1.1
## CON -0.08 0.29 0.21 0.14 0.8632 2.0
## SER 0.81 -0.39 -0.05 0.81 0.1944 1.4
## FIN 0.65 -0.33 -0.16 0.56 0.4417 1.6
## SSP 0.69 0.17 -0.09 0.52 0.4814 1.2
## TC 0.84 -0.19 -0.01 0.74 0.2634 1.1
##
## MR1 MR2 MR3
## SS loadings 4.22 1.35 0.66
## Proportion Var 0.47 0.15 0.07
## Cumulative Var 0.47 0.62 0.69
## Proportion Explained 0.68 0.22 0.11
## Cumulative Proportion 0.68 0.89 1.00
##
## Mean item complexity = 1.6
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 7.18 with Chi Square = 166.35
## df of the model are 12 and the objective function was 1.26
##
## The root mean square of the residuals (RMSR) is 0.06
## The df corrected root mean square of the residuals is 0.1
##
## The harmonic n.obs is 28 with the empirical chi square 6.38 with prob < 0.9
## The total n.obs was 28 with Likelihood Chi Square = 26.74 with prob < 0.0084
##
## Tucker Lewis Index of factoring reliability = 0.619
## RMSEA index = 0.206 and the 90 % confidence intervals are 0.103 0.323
## BIC = -13.24
## Fit based upon off diagonal values = 0.99
##
## ===== Solução 4 fatores =====
## Factor Analysis using method = minres
## Call: fa(r = dados, nfactors = 4, rotate = "none", scores = "regression",
## covar = F, missing = F, fm = "pc", cor = "cor")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 MR3 MR4 h2 u2 com
## AGR -0.81 -0.09 0.59 -0.02 1.01 -0.0149 1.8
## MIN -0.49 0.38 -0.46 0.25 0.66 0.3417 3.4
## FAB 0.42 -0.80 -0.10 0.10 0.84 0.1600 1.6
## FE -0.92 0.23 0.02 0.21 0.94 0.0622 1.2
## CON -0.09 -0.30 -0.15 0.35 0.24 0.7591 2.5
## SER 0.83 0.38 -0.06 -0.31 0.93 0.0749 1.7
## FIN 0.72 0.45 0.33 0.42 1.00 -0.0024 2.8
## SSP 0.70 -0.20 0.13 0.23 0.60 0.4010 1.5
## TC 0.83 0.16 0.01 0.11 0.73 0.2729 1.1
##
## MR1 MR2 MR3 MR4
## SS loadings 4.31 1.34 0.72 0.57
## Proportion Var 0.48 0.15 0.08 0.06
## Cumulative Var 0.48 0.63 0.71 0.77
## Proportion Explained 0.62 0.19 0.10 0.08
## Cumulative Proportion 0.62 0.81 0.92 1.00
##
## Mean item complexity = 2
## Test of the hypothesis that 4 factors are sufficient.
##
## df null model = 36 with the objective function = 7.18 with Chi Square = 166.35
## df of the model are 6 and the objective function was 0.48
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.05
##
## The harmonic n.obs is 28 with the empirical chi square 0.69 with prob < 0.99
## The total n.obs was 28 with Likelihood Chi Square = 9.92 with prob < 0.13
##
## Tucker Lewis Index of factoring reliability = 0.788
## RMSEA index = 0.148 and the 90 % confidence intervals are 0 0.321
## BIC = -10.07
## Fit based upon off diagonal values = 1
############################################################
# (b) Rotação Varimax
############################################################
nf <- 3
fa_rot <- fa(dados, nfactors = nf, rotate = "varimax",
fm = "pc", cor = "cor", covar = F,
missing = F, scores = "regression")
cat("\n===== Rotação Varimax =====\n")##
## ===== Rotação Varimax =====
## Factor Analysis using method = minres
## Call: fa(r = dados, nfactors = nf, rotate = "varimax", scores = "regression",
## covar = F, missing = F, fm = "pc", cor = "cor")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR3 MR2 h2 u2 com
## AGR -0.98 0.05 -0.21 1.01 -0.0107 1.1
## MIN -0.12 -0.76 0.03 0.60 0.4020 1.1
## FAB 0.19 0.58 0.79 1.00 -0.0002 2.0
## FE -0.72 -0.58 -0.09 0.87 0.1311 2.0
## CON -0.07 -0.04 0.36 0.14 0.8632 1.1
## SER 0.79 0.23 -0.35 0.81 0.1944 1.6
## FIN 0.60 0.26 -0.37 0.56 0.4417 2.1
## SSP 0.52 0.49 0.08 0.52 0.4814 2.0
## TC 0.78 0.33 -0.17 0.74 0.2634 1.4
##
## MR1 MR3 MR2
## SS loadings 3.40 1.73 1.10
## Proportion Var 0.38 0.19 0.12
## Cumulative Var 0.38 0.57 0.69
## Proportion Explained 0.55 0.28 0.18
## Cumulative Proportion 0.55 0.82 1.00
##
## Mean item complexity = 1.6
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 7.18 with Chi Square = 166.35
## df of the model are 12 and the objective function was 1.26
##
## The root mean square of the residuals (RMSR) is 0.06
## The df corrected root mean square of the residuals is 0.1
##
## The harmonic n.obs is 28 with the empirical chi square 6.38 with prob < 0.9
## The total n.obs was 28 with Likelihood Chi Square = 26.74 with prob < 0.0084
##
## Tucker Lewis Index of factoring reliability = 0.619
## RMSEA index = 0.206 and the 90 % confidence intervals are 0.103 0.323
## BIC = -13.24
## Fit based upon off diagonal values = 0.99
############################################################
# (c) Matriz de cargas fatoriais L
############################################################
L <- round(fa_rot$loadings[,], 3)
cat("\n===== Cargas fatoriais (Varimax) =====\n")##
## ===== Cargas fatoriais (Varimax) =====
## MR1 MR3 MR2
## AGR -0.981 0.050 -0.214
## MIN -0.124 -0.763 0.028
## FAB 0.194 0.575 0.795
## FE -0.723 -0.581 -0.094
## CON -0.073 -0.039 0.360
## SER 0.795 0.233 -0.346
## FIN 0.597 0.261 -0.366
## SSP 0.517 0.494 0.082
## TC 0.776 0.327 -0.166
############################################################
# (d) Comunalidades e especificidades
############################################################
comunalidades <- round(fa_rot$communality, 3)
especificidades <- round(1 - fa_rot$communality, 3)
cat("\n===== Comunalidades =====\n")##
## ===== Comunalidades =====
## AGR MIN FAB FE CON SER FIN SSP TC
## 1.011 0.598 1.000 0.869 0.137 0.806 0.558 0.519 0.737
##
## ===== Especificidades =====
## AGR MIN FAB FE CON SER FIN SSP TC
## -0.011 0.402 0.000 0.131 0.863 0.194 0.442 0.481 0.263
############################################################
# (e) Escores fatoriais
############################################################
scores <- as.data.frame(fa_rot$scores)
rownames(scores) <- paises
cat("\n===== Escores Fatoriais =====\n")##
## ===== Escores Fatoriais =====
## MR1 MR3 MR2
## Alemanha 0.313695395 0.57668075 2.40003968
## Áustria 0.458818985 0.13307778 1.53171860
## Bélgica 1.582214709 -0.75727872 0.39098027
## Dinamarca 0.378420910 1.25320764 -1.97359415
## Espanha 0.527159777 0.56409773 0.43081820
## Finlândia -0.925728295 1.72494637 0.69665024
## França 0.813915233 0.17910360 -0.99683606
## Grécia -2.052804687 -0.77860848 -1.74031370
## Holanda 0.609740930 -0.09845223 0.32634348
## Irlanda -0.540299343 1.00693038 0.40580905
## Itália -0.005122713 0.88870933 0.89630936
## Luxemburgo 0.948697201 0.54165372 -0.66484916
## Portugal -0.196481356 0.94184903 -1.09355033
## Reino Unido 0.970338780 -0.55130147 -0.67434709
## Suécia 1.164868402 0.09861854 -0.33053769
## Islândia -1.601694899 -1.20594630 -0.32056816
## Noruega 0.652636269 -3.99770494 0.78493767
## Suíça 0.879573791 -0.28698498 -0.55064080
## Bulgária -2.054094338 0.43452805 0.01094726
## Hungria -0.513621735 -0.03030902 -0.21184135
## Polônia -0.201777365 -0.16183361 0.07414851
## Rep. Tcheca 0.152766802 -0.86494293 1.68730031
## Eslováquia -0.085573228 -0.47565636 1.29995241
## Lituânia -1.125540733 0.55646724 -0.20441644
## Turquia -2.082229498 0.02512828 -0.30274818
## EUA 1.024784804 -0.23972576 -0.92309642
## Canadá 0.211781770 -0.10832588 -1.57727484
## Japão 0.695554429 0.63207226 0.62865933
############################################################
# (f) Clusterização K-means com 4 clusters
############################################################
set.seed(123)
km <- kmeans(scores, centers = 4, nstart = 50) # agrupamento kmeans
comparacao <- data.frame(
Pais = paises,
Grupo_original = grupo,
Cluster = km$cluster
)##
## ===== Comparação individual entre Grupo Original × Cluster =====
##
## 1 2 3 4
## 11 6 10 1
##
## AELC Leste Outro UE
## 3 6 4 15
##
## ===== Comparação Grupo Original × Cluster =====
##
## AELC Leste Outro UE
## 1 0 3 1 7
## 2 1 3 1 1
## 3 1 0 2 7
## 4 1 0 0 0
############################################################
# (g) Contagem por cluster
############################################################
cat("\n===== Quantidade de países por cluster =====\n")##
## ===== Quantidade de países por cluster =====
##
## 1 2 3 4
## 11 6 10 1