Packages and Data for Analysis

library(readr)
library(dplyr)
library(corrplot)
library(factoextra)
library(plotly)
trust <- read.csv("~/Desktop/global_trust-rate.csv")

Descriptive statistics

summary(trust)
##    Country          Neighbourhood     Government      Scientist    
##  Length:113         Min.   :22.99   Min.   :17.75   Min.   :38.97  
##  Class :character   1st Qu.:65.15   1st Qu.:43.55   1st Qu.:68.42  
##  Mode  :character   Median :76.03   Median :55.19   Median :84.42  
##                     Mean   :73.63   Mean   :56.93   Mean   :79.63  
##                     3rd Qu.:86.37   3rd Qu.:72.34   3rd Qu.:91.78  
##                     Max.   :96.13   Max.   :97.45   Max.   :98.17  
##                                     NA's   :6                      
##    Journalist    Doctor.and.Nurses Philantropist   Traditional.Healers
##  Min.   :12.45   Min.   :45.99     Min.   :29.04   Min.   : 8.54      
##  1st Qu.:49.77   1st Qu.:78.89     1st Qu.:61.01   1st Qu.:30.34      
##  Median :60.43   Median :87.08     Median :70.51   Median :44.47      
##  Mean   :59.56   Mean   :84.66     Mean   :69.60   Mean   :44.98      
##  3rd Qu.:68.09   3rd Qu.:94.31     3rd Qu.:79.69   3rd Qu.:59.28      
##  Max.   :90.83   Max.   :99.40     Max.   :94.02   Max.   :92.47      
##  NA's   :1
psych::describe(trust[,2:8])[11:12]

There are 7 numeric variables suitable for PCA.

The distributions of the numeric variables are normal or close to normal.

boxplot.stats(trust$Neighbourhood)$out # Neighbourhood
## [1] 22.99 28.19 31.17
boxplot.stats(trust$Government)$out # Government
## numeric(0)
boxplot.stats(trust$Scientist)$out # Scientist
## numeric(0)
boxplot.stats(trust$Journalist)$out # Journalist
## [1] 12.45
boxplot.stats(trust$Doctor.and.Nurses)$out # Doctor.and.Nurses
## [1] 45.99 53.54 53.78 54.34
boxplot.stats(trust$Philantropist)$out # Philantropist
## [1] 29.04
boxplot.stats(trust$Traditional.Healers)$out # Traditional.Healers
## numeric(0)

There are ot a lot of outliers, should not affect PCA drastically.

table(is.na(trust))
## 
## FALSE  TRUE 
##   897     7
trust <- trust %>% na.omit()

NAs are bad for further analysis, hence, we need to exclude them (it’s only 7 NAs)

Correlation of variables

trust[,2:8] %>% 
  cor() %>% 
  corrplot()

The correlations are generally high, except for traditional healers variable that has small-medium correlations with other variables, linearity is present.

Performing PCA

pca_trust <- prcomp(trust[,2:8], center = T, scale. = T)

Model fit

summary(pca_trust)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.1029 0.9663 0.9020 0.61961 0.45810 0.39502 0.28398
## Proportion of Variance 0.6317 0.1334 0.1162 0.05484 0.02998 0.02229 0.01152
## Cumulative Proportion  0.6317 0.7651 0.8814 0.93621 0.96619 0.98848 1.00000

PC1 alone already explains 63% of the total variance, PC1 and PC2 together explain 77% of total variance. Depending on the goal, even 1 principal component can be left.

The minimum shared variance in the first 2 PCs is 2/7 = 0.3 - satisfied.

Eigenvalues:

pca_trust$sdev ^ 2
## [1] 4.42218658 0.93376323 0.81359378 0.38391176 0.20985775 0.15604132 0.08064558

Scree plot:

fviz_screeplot(pca_trust, addlabels = TRUE, 
               main = "Explained variances of 7 obtained principal components", 
               barfill = "lightblue",
               xlab = "Principal Components")

Loadings:

pca_trust$rotation
##                           PC1         PC2         PC3         PC4         PC5
## Neighbourhood       0.4227475 -0.30633480  0.15366145 -0.15071745  0.14269113
## Government          0.3430336  0.44070925 -0.34911938  0.68436735  0.27370439
## Scientist           0.4158478 -0.37970339  0.19329855  0.20427495 -0.19935184
## Journalist          0.3537762  0.42087150 -0.40484873 -0.46542163 -0.54124293
## Doctor.and.Nurses   0.4319574 -0.22204919  0.09664218  0.24727667 -0.38639948
## Philantropist       0.4184969 -0.08977587 -0.20641745 -0.42370592  0.64499958
## Traditional.Healers 0.2092811  0.57728790  0.77542835 -0.09953684  0.08192111
##                             PC6          PC7
## Neighbourhood       -0.69645274  0.419176275
## Government          -0.15138099  0.005475873
## Scientist           -0.07286810 -0.747497430
## Journalist          -0.12541641 -0.092285669
## Doctor.and.Nurses    0.54669251  0.495423271
## Philantropist        0.40998526 -0.102729659
## Traditional.Healers  0.06354305 -0.031537463

PC1: Neighbourhood, Scientist, Doctor.and.Nurses, Philantropist PC2: Government, Journalist, Traditional.Healers

Biplot

fviz_pca_biplot(pca_trust,
             repel = TRUE,     # Avoid text overlapping
             label = "var",
             col.ind = "grey",
             alpha.ind = 0.5,
             col.var = "brown") +
             xlim(-6,6) +
             xlab("PC1 (63.2% explained variance)") +
             ylab("PC2 (13.3% explained variance)") +
  annotate("text", x = 3.2, y = 2.5, label = "Laos", col = "lightblue") +
  annotate("text", x = -2.5, y = -2.3, label = "Greece", col = "lightblue") +
  annotate("text", x = -1, y = -2.2, label = "Chile", col = "lightblue") +
  annotate("text", x = -4.5, y = 2.1, label = "Benin", col = "lightblue") + 
  annotate("text", x = -2.4, y = 2, label = "Congo \nBrazzaville", col = "lightblue") +
  annotate("text", x = -1.7, y = -1.6, label = "Russia", col = "lightblue") +
  ggtitle("Biplot for PC1 and PC2") +
  theme(plot.title = element_text(hjust = 0.5))

Some findings:

Countries highly trusting in everything: Laos, Sri Lanka, Cambodia, Malta, Switzerland.

Countries slightly trusting in anything: Greece, Chile, Russia, Ukraine.

Countries highly trusting in government, journalists or healers but slightly trusting in scientists, doctors, neighbourhood and philanthropists: Benin, Congo Brazzaville, Cameroon.