library(readr)
library(dplyr)
library(corrplot)
library(factoextra)
library(plotly)
trust <- read.csv("~/Desktop/global_trust-rate.csv")
summary(trust)
## Country Neighbourhood Government Scientist
## Length:113 Min. :22.99 Min. :17.75 Min. :38.97
## Class :character 1st Qu.:65.15 1st Qu.:43.55 1st Qu.:68.42
## Mode :character Median :76.03 Median :55.19 Median :84.42
## Mean :73.63 Mean :56.93 Mean :79.63
## 3rd Qu.:86.37 3rd Qu.:72.34 3rd Qu.:91.78
## Max. :96.13 Max. :97.45 Max. :98.17
## NA's :6
## Journalist Doctor.and.Nurses Philantropist Traditional.Healers
## Min. :12.45 Min. :45.99 Min. :29.04 Min. : 8.54
## 1st Qu.:49.77 1st Qu.:78.89 1st Qu.:61.01 1st Qu.:30.34
## Median :60.43 Median :87.08 Median :70.51 Median :44.47
## Mean :59.56 Mean :84.66 Mean :69.60 Mean :44.98
## 3rd Qu.:68.09 3rd Qu.:94.31 3rd Qu.:79.69 3rd Qu.:59.28
## Max. :90.83 Max. :99.40 Max. :94.02 Max. :92.47
## NA's :1
psych::describe(trust[,2:8])[11:12]
There are 7 numeric variables suitable for PCA.
The distributions of the numeric variables are normal or close to normal.
boxplot.stats(trust$Neighbourhood)$out # Neighbourhood
## [1] 22.99 28.19 31.17
boxplot.stats(trust$Government)$out # Government
## numeric(0)
boxplot.stats(trust$Scientist)$out # Scientist
## numeric(0)
boxplot.stats(trust$Journalist)$out # Journalist
## [1] 12.45
boxplot.stats(trust$Doctor.and.Nurses)$out # Doctor.and.Nurses
## [1] 45.99 53.54 53.78 54.34
boxplot.stats(trust$Philantropist)$out # Philantropist
## [1] 29.04
boxplot.stats(trust$Traditional.Healers)$out # Traditional.Healers
## numeric(0)
There are ot a lot of outliers, should not affect PCA drastically.
table(is.na(trust))
##
## FALSE TRUE
## 897 7
trust <- trust %>% na.omit()
NAs are bad for further analysis, hence, we need to exclude them (it’s only 7 NAs)
trust[,2:8] %>%
cor() %>%
corrplot()
The correlations are generally high, except for traditional healers variable that has small-medium correlations with other variables, linearity is present.
pca_trust <- prcomp(trust[,2:8], center = T, scale. = T)
summary(pca_trust)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.1029 0.9663 0.9020 0.61961 0.45810 0.39502 0.28398
## Proportion of Variance 0.6317 0.1334 0.1162 0.05484 0.02998 0.02229 0.01152
## Cumulative Proportion 0.6317 0.7651 0.8814 0.93621 0.96619 0.98848 1.00000
PC1 alone already explains 63% of the total variance, PC1 and PC2 together explain 77% of total variance. Depending on the goal, even 1 principal component can be left.
The minimum shared variance in the first 2 PCs is 2/7 = 0.3 - satisfied.
Eigenvalues:
pca_trust$sdev ^ 2
## [1] 4.42218658 0.93376323 0.81359378 0.38391176 0.20985775 0.15604132 0.08064558
Scree plot:
fviz_screeplot(pca_trust, addlabels = TRUE,
main = "Explained variances of 7 obtained principal components",
barfill = "lightblue",
xlab = "Principal Components")
Loadings:
pca_trust$rotation
## PC1 PC2 PC3 PC4 PC5
## Neighbourhood 0.4227475 -0.30633480 0.15366145 -0.15071745 0.14269113
## Government 0.3430336 0.44070925 -0.34911938 0.68436735 0.27370439
## Scientist 0.4158478 -0.37970339 0.19329855 0.20427495 -0.19935184
## Journalist 0.3537762 0.42087150 -0.40484873 -0.46542163 -0.54124293
## Doctor.and.Nurses 0.4319574 -0.22204919 0.09664218 0.24727667 -0.38639948
## Philantropist 0.4184969 -0.08977587 -0.20641745 -0.42370592 0.64499958
## Traditional.Healers 0.2092811 0.57728790 0.77542835 -0.09953684 0.08192111
## PC6 PC7
## Neighbourhood -0.69645274 0.419176275
## Government -0.15138099 0.005475873
## Scientist -0.07286810 -0.747497430
## Journalist -0.12541641 -0.092285669
## Doctor.and.Nurses 0.54669251 0.495423271
## Philantropist 0.40998526 -0.102729659
## Traditional.Healers 0.06354305 -0.031537463
PC1: Neighbourhood, Scientist, Doctor.and.Nurses, Philantropist PC2: Government, Journalist, Traditional.Healers
fviz_pca_biplot(pca_trust,
repel = TRUE, # Avoid text overlapping
label = "var",
col.ind = "grey",
alpha.ind = 0.5,
col.var = "brown") +
xlim(-6,6) +
xlab("PC1 (63.2% explained variance)") +
ylab("PC2 (13.3% explained variance)") +
annotate("text", x = 3.2, y = 2.5, label = "Laos", col = "lightblue") +
annotate("text", x = -2.5, y = -2.3, label = "Greece", col = "lightblue") +
annotate("text", x = -1, y = -2.2, label = "Chile", col = "lightblue") +
annotate("text", x = -4.5, y = 2.1, label = "Benin", col = "lightblue") +
annotate("text", x = -2.4, y = 2, label = "Congo \nBrazzaville", col = "lightblue") +
annotate("text", x = -1.7, y = -1.6, label = "Russia", col = "lightblue") +
ggtitle("Biplot for PC1 and PC2") +
theme(plot.title = element_text(hjust = 0.5))
Some findings:
Countries highly trusting in everything: Laos, Sri Lanka, Cambodia, Malta, Switzerland.
Countries slightly trusting in anything: Greece, Chile, Russia, Ukraine.
Countries highly trusting in government, journalists or healers but slightly trusting in scientists, doctors, neighbourhood and philanthropists: Benin, Congo Brazzaville, Cameroon.