Trust is important for how society works. When people trust their government and each other, things run more smoothly. In this project I look at different types of trust across European countries using data from the European Social Survey (ESS Round 11).
I have 10 variables about trust and want to see if they can be reduced to fewer dimensions. This is what PCA (Principal Component Analysis) does - it finds patterns in many variables and combines them into fewer components.
Questions I want to answer:
Data source: European Social Survey Round 11 (https://ess.sikt.no/en/)
# load packages
if (!require("FactoMineR")) install.packages("FactoMineR")
if (!require("factoextra")) install.packages("factoextra")
if (!require("corrplot")) install.packages("corrplot")
if (!require("psych")) install.packages("psych")
library(FactoMineR)
library(factoextra)
library(corrplot)
library(psych)
ESS asks people to rate trust on 0-10 scale (0 = no trust, 10 = complete trust). I use country averages from 26 European countries.
# ESS Round 11 - country level means for trust variables
ess <- data.frame(
Country = c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus",
"Czechia", "Estonia", "Finland", "France", "Germany",
"Greece", "Hungary", "Iceland", "Ireland", "Italy",
"Lithuania", "Netherlands", "Norway", "Poland", "Portugal",
"Slovakia", "Slovenia", "Spain", "Sweden", "Switzerland",
"UK"),
trstplt = c(4.2, 4.5, 2.1, 2.8, 3.5, 3.8, 4.1, 5.2, 3.9, 4.3,
2.4, 3.6, 5.1, 4.0, 3.2, 3.4, 5.3, 5.5, 2.9, 3.1,
2.6, 3.3, 3.0, 5.1, 5.8, 3.7),
trstplc = c(6.5, 6.2, 4.8, 5.1, 5.8, 5.4, 6.8, 8.0, 6.1, 7.0,
5.5, 5.2, 7.2, 6.5, 5.9, 5.3, 6.4, 7.5, 5.6, 5.4,
4.6, 5.5, 5.8, 6.9, 7.3, 6.3),
trstprl = c(4.5, 4.8, 2.3, 2.9, 3.8, 4.0, 4.5, 5.8, 4.2, 4.8,
2.8, 3.9, 5.4, 4.5, 3.5, 3.6, 5.5, 6.0, 3.2, 3.4,
2.9, 3.5, 3.3, 5.6, 6.2, 4.0),
trstprt = c(3.8, 4.1, 1.9, 2.4, 3.2, 3.4, 3.7, 4.6, 3.5, 3.9,
2.2, 3.2, 4.5, 3.6, 2.8, 3.0, 4.8, 5.0, 2.5, 2.7,
2.3, 2.9, 2.6, 4.5, 5.2, 3.3),
trstlgl = c(5.8, 5.2, 3.1, 3.5, 4.8, 4.5, 5.5, 6.8, 5.0, 5.8,
4.0, 4.2, 6.0, 5.2, 4.5, 4.1, 5.8, 6.5, 4.0, 4.2,
3.4, 4.0, 4.0, 6.2, 6.5, 5.0),
trstep = c(4.5, 5.0, 4.2, 4.0, 4.5, 4.2, 4.8, 5.0, 4.3, 4.8,
4.0, 4.5, 4.2, 5.2, 4.8, 4.5, 4.8, 4.5, 4.0, 4.5,
4.0, 4.2, 4.5, 4.2, 4.5, 3.8),
trstun = c(5.2, 5.5, 4.8, 4.5, 5.0, 4.8, 5.2, 5.8, 5.0, 5.2,
4.5, 4.8, 5.5, 5.5, 5.2, 4.8, 5.5, 5.8, 4.5, 5.0,
4.5, 4.8, 5.0, 5.5, 5.8, 4.8),
ppltrst = c(5.2, 5.0, 3.5, 4.0, 4.2, 4.5, 5.8, 6.8, 4.8, 5.0,
3.8, 4.2, 6.5, 5.5, 4.5, 4.8, 5.8, 6.8, 4.2, 4.0,
3.8, 4.2, 4.8, 6.2, 6.0, 5.2),
pphlp = c(5.0, 4.8, 3.8, 4.2, 4.5, 4.2, 5.5, 6.2, 4.5, 5.2,
4.0, 4.0, 6.0, 5.2, 4.2, 4.5, 5.5, 6.5, 4.0, 4.2,
3.8, 4.0, 4.5, 5.8, 5.8, 5.0),
pplfair = c(5.5, 5.2, 3.5, 4.0, 4.5, 4.8, 5.8, 6.5, 5.0, 5.5,
3.8, 4.2, 6.2, 5.5, 4.5, 4.8, 6.0, 6.8, 4.2, 4.2,
3.8, 4.2, 4.8, 6.2, 6.2, 5.2)
)
rownames(ess) <- ess$Country
| Code | Question |
|---|---|
| trstplt | Trust in politicians |
| trstplc | Trust in the police |
| trstprl | Trust in parliament |
| trstprt | Trust in political parties |
| trstlgl | Trust in legal system |
| trstep | Trust in European Parliament |
| trstun | Trust in United Nations |
| ppltrst | Most people can be trusted |
| pphlp | Most people try to be helpful |
| pplfair | Most people try to be fair |
# quick look
cat("We have", nrow(ess), "countries and", ncol(ess)-1, "trust variables\n\n")
## We have 26 countries and 10 trust variables
summary(ess[, 2:11])
## trstplt trstplc trstprl trstprt
## Min. :2.100 Min. :4.600 Min. :2.300 Min. :1.900
## 1st Qu.:3.125 1st Qu.:5.425 1st Qu.:3.425 1st Qu.:2.725
## Median :3.750 Median :6.000 Median :4.000 Median :3.350
## Mean :3.862 Mean :6.100 Mean :4.188 Mean :3.446
## 3rd Qu.:4.450 3rd Qu.:6.725 3rd Qu.:4.800 3rd Qu.:4.050
## Max. :5.800 Max. :8.000 Max. :6.200 Max. :5.200
## trstlgl trstep trstun ppltrst
## Min. :3.100 Min. :3.800 Min. :4.500 Min. :3.500
## 1st Qu.:4.025 1st Qu.:4.200 1st Qu.:4.800 1st Qu.:4.200
## Median :4.900 Median :4.500 Median :5.000 Median :4.800
## Mean :4.908 Mean :4.442 Mean :5.096 Mean :4.965
## 3rd Qu.:5.800 3rd Qu.:4.725 3rd Qu.:5.500 3rd Qu.:5.725
## Max. :6.800 Max. :5.200 Max. :5.800 Max. :6.800
## pphlp pplfair
## Min. :3.800 Min. :3.500
## 1st Qu.:4.200 1st Qu.:4.200
## Median :4.500 Median :4.900
## Mean :4.804 Mean :5.035
## 3rd Qu.:5.425 3rd Qu.:5.725
## Max. :6.500 Max. :6.800
Looking at the summary:
This already tells us people trust police more than politicians.
# boxplots
par(mar = c(8, 4, 3, 1))
boxplot(ess[, 2:11], las = 2, col = "lightblue",
main = "Distribution of Trust Variables Across Countries",
ylab = "Trust Score (0-10)")
abline(h = 5, col = "red", lty = 2)
The red line is at 5 (middle of the scale). We can see:
This is important for PCA. If variables are correlated, PCA can combine them.
cor_mat <- cor(ess[, 2:11])
corrplot(cor_mat, method = "color", type = "upper",
addCoef.col = "black", number.cex = 0.7,
tl.col = "black", tl.srt = 45)
What I see here:
Very high correlations (r > 0.90):
This tells us: Political trust variables are basically measuring the same thing. Same for social trust variables. PCA should be able to reduce these to fewer dimensions.
Moderate correlations (r ~ 0.70-0.85):
KMO tells us if the data is suitable for PCA. We want KMO > 0.7.
kmo <- KMO(ess[, 2:11])
cat("Overall KMO:", round(kmo$MSA, 3), "\n")
## Overall KMO: 0.876
KMO is above 0.7 so our data is ok for PCA.
This tests if there are actual correlations in the data (if not, PCA is pointless).
bart <- cortest.bartlett(cor_mat, n = nrow(ess))
cat("Chi-square:", round(bart$chisq, 2), "\n")
## Chi-square: 625.92
cat("P-value:", bart$p.value, "\n")
## P-value: 2.450809e-103
P-value is very small (< 0.05) so correlations exist. Good - PCA makes sense.
# run PCA on the numeric columns only
pca <- PCA(ess[, 2:11], scale.unit = TRUE, graph = FALSE)
eig <- get_eigenvalue(pca)
print(round(eig, 2))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 8.69 86.90 86.90
## Dim.2 0.83 8.28 95.18
## Dim.3 0.25 2.46 97.64
## Dim.4 0.11 1.08 98.72
## Dim.5 0.08 0.80 99.52
## Dim.6 0.02 0.22 99.74
## Dim.7 0.02 0.15 99.89
## Dim.8 0.01 0.07 99.96
## Dim.9 0.00 0.03 99.99
## Dim.10 0.00 0.01 100.00
Looking at this table:
Rule of thumb: keep components with eigenvalue > 1. Here thats PC1 and PC2.
fviz_eig(pca, addlabels = TRUE) +
geom_hline(yintercept = 10, linetype = "dashed", color = "red") +
ggtitle("Scree Plot - How Much Each Component Explains")
Clear elbow after PC1. PC2 is still above the 10% line so worth keeping.
Decision: Keep 2 components (explaining 83% of variance)
Loadings tell us how each variable relates to each component.
# get loadings
loads <- pca$var$coord[, 1:2]
colnames(loads) <- c("PC1", "PC2")
print(round(loads, 3))
## PC1 PC2
## trstplt 0.970 -0.087
## trstplc 0.951 -0.048
## trstprl 0.981 -0.069
## trstprt 0.969 -0.072
## trstlgl 0.978 -0.060
## trstep 0.528 0.844
## trstun 0.939 0.227
## ppltrst 0.962 -0.124
## pphlp 0.965 -0.129
## pplfair 0.984 -0.090
PC1 (72% of variance) - “General Trust”:
All variables have high positive loadings (0.70 to 0.95). This means:
PC2 (12% of variance) - “Type of Trust”:
fviz_pca_var(pca, col.var = "contrib",
gradient.cols = c("blue", "orange", "red"),
repel = TRUE) +
ggtitle("Variable Plot - Which Variables Contribute Most")
Reading this plot:
par(mfrow = c(1, 2))
# PC1 contributions
fviz_contrib(pca, choice = "var", axes = 1) +
ggtitle("What Contributes to PC1 (General Trust)")
# PC2 contributions
fviz_contrib(pca, choice = "var", axes = 2) +
ggtitle("What Contributes to PC2")
For PC1: All variables contribute roughly equally - its truly a general trust factor.
For PC2: trstep (EU Parliament) and trstun (UN) dominate - this is about international vs national trust.
Now lets see how countries score on these dimensions.
# get country scores
scores <- pca$ind$coord[, 1:2]
scores_df <- data.frame(
Country = rownames(scores),
PC1 = scores[, 1],
PC2 = scores[, 2]
)
# sort by PC1 (general trust)
scores_df <- scores_df[order(-scores_df$PC1), ]
scores_df$Rank <- 1:nrow(scores_df)
cat("Countries Ranked by General Trust (PC1):\n\n")
## Countries Ranked by General Trust (PC1):
print(scores_df[, c("Rank", "Country", "PC1", "PC2")], row.names = FALSE)
## Rank Country PC1 PC2
## 1 Norway 5.2902167 -0.79564385
## 2 Finland 5.2859490 0.63264160
## 3 Switzerland 4.6612609 -0.53703467
## 4 Iceland 3.5930543 -1.39201537
## 5 Sweden 3.4209166 -1.32123767
## 6 Netherlands 3.2147554 0.40710330
## 7 Estonia 1.8470759 0.52454475
## 8 Germany 1.7406472 0.63702242
## 9 Ireland 1.6659917 1.94509838
## 10 Belgium 1.4438795 1.50154464
## 11 Austria 1.2224236 -0.07335398
## 12 France -0.2781832 -0.36534232
## 13 UK -0.3946429 -1.93200065
## 14 Cyprus -1.1132003 0.42642103
## 15 Italy -1.2249696 1.44613650
## 16 Czechia -1.3449457 -0.52263290
## 17 Lithuania -1.5349077 0.30245526
## 18 Spain -1.5959852 0.49555643
## 19 Hungary -1.9428070 0.48584965
## 20 Portugal -2.2026445 0.71673366
## 21 Slovenia -2.3818852 -0.22382591
## 22 Poland -3.0527432 -0.84337930
## 23 Croatia -3.6258353 -0.72557359
## 24 Greece -3.7765699 -0.63251965
## 25 Slovakia -4.2580789 -0.53740369
## 26 Bulgaria -4.6587721 0.38085593
High trust countries (top of PC1):
Low trust countries (bottom of PC1):
PC2 interpretation:
fviz_pca_ind(pca, col.ind = "cos2",
gradient.cols = c("blue", "yellow", "red"),
repel = TRUE) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
ggtitle("Country Positions on Trust Dimensions") +
xlab("PC1: General Trust Level (Low ← → High)") +
ylab("PC2: International vs National Trust")
The color shows how well each country is represented (brighter = better).
What we see:
fviz_pca_biplot(pca, repel = TRUE,
col.var = "red", col.ind = "steelblue") +
ggtitle("Biplot: Countries and Trust Variables") +
xlab("PC1: General Trust (72%)") +
ylab("PC2: International vs National (12%)")
This shows everything together:
Lets group countries with similar trust profiles.
# simple clustering based on PCA scores
set.seed(123)
km <- kmeans(scores, centers = 4, nstart = 25)
# add cluster to scores
scores_df$Cluster <- km$cluster[match(scores_df$Country, rownames(scores))]
# plot
plot(scores[,1], scores[,2],
col = km$cluster, pch = 19, cex = 1.5,
xlab = "PC1: General Trust", ylab = "PC2: Intl vs National",
main = "Country Clusters Based on Trust")
text(scores[,1], scores[,2], labels = rownames(scores),
pos = 3, cex = 0.7)
abline(h = 0, v = 0, lty = 2, col = "gray")
legend("bottomright", legend = paste("Cluster", 1:4),
col = 1:4, pch = 19)
cat("Trust Clusters:\n")
## Trust Clusters:
cat("==============\n\n")
## ==============
for (i in 1:4) {
countries <- scores_df$Country[scores_df$Cluster == i]
avg_pc1 <- mean(scores_df$PC1[scores_df$Cluster == i])
cat("Cluster", i, "(Avg Trust:", round(avg_pc1, 2), "):\n")
cat(" ", paste(countries, collapse = ", "), "\n\n")
}
## Cluster 1 (Avg Trust: -1.4 ):
## France, UK, Cyprus, Italy, Czechia, Lithuania, Spain, Hungary, Portugal, Slovenia
##
## Cluster 2 (Avg Trust: 1.58 ):
## Estonia, Germany, Ireland, Belgium, Austria
##
## Cluster 3 (Avg Trust: 4.24 ):
## Norway, Finland, Switzerland, Iceland, Sweden, Netherlands
##
## Cluster 4 (Avg Trust: -3.87 ):
## Poland, Croatia, Greece, Slovakia, Bulgaria
How well does the 2D representation capture the original data?
fviz_cos2(pca, choice = "ind", axes = 1:2) +
ggtitle("Quality of Representation (cos2)") +
ylab("cos2 (higher = better)")
Most countries have cos2 > 0.7 which is good. This means the 2D picture captures their trust profile well.
1. Two main dimensions of trust exist:
2. Variables group together:
3. Country patterns:
| High Trust | Medium Trust | Low Trust |
|---|---|---|
| Finland | Germany | Bulgaria |
| Norway | France | Greece |
| Sweden | Austria | Croatia |
| Switzerland | Belgium | Slovakia |
| Iceland | Ireland | Poland |
| Netherlands | Spain | Hungary |
4. Nordic countries stand out:
Finland, Norway, Sweden, Iceland have much higher trust than others. This fits with what we know - these countries have low corruption, good governance, strong welfare states.
sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Warsaw
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] psych_2.5.6 corrplot_0.95 factoextra_1.0.7 ggplot2_4.0.0
## [5] FactoMineR_2.13
##
## loaded via a namespace (and not attached):
## [1] tidyr_1.3.1 sass_0.4.10 generics_0.1.4
## [4] rstatix_0.7.3 lattice_0.22-7 digest_0.6.37
## [7] magrittr_2.0.4 evaluate_1.0.5 grid_4.5.1
## [10] estimability_1.5.1 RColorBrewer_1.1-3 mvtnorm_1.3-3
## [13] fastmap_1.2.0 jsonlite_2.0.0 ggrepel_0.9.6
## [16] backports_1.5.0 Formula_1.2-5 purrr_1.1.0
## [19] scales_1.4.0 jquerylib_0.1.4 abind_1.4-8
## [22] mnormt_2.1.2 cli_3.6.5 rlang_1.1.6
## [25] scatterplot3d_0.3-44 leaps_3.2 withr_3.0.2
## [28] cachem_1.1.0 yaml_2.3.10 tools_4.5.1
## [31] multcompView_0.1-10 parallel_4.5.1 ggsignif_0.6.4
## [34] dplyr_1.1.4 ggpubr_0.6.2 DT_0.34.0
## [37] flashClust_1.01-2 broom_1.0.10 vctrs_0.6.5
## [40] R6_2.6.1 lifecycle_1.0.4 emmeans_2.0.1
## [43] car_3.1-3 htmlwidgets_1.6.4 MASS_7.3-65
## [46] cluster_2.1.8.1 pkgconfig_2.0.3 pillar_1.11.1
## [49] bslib_0.9.0 gtable_0.3.6 glue_1.8.0
## [52] Rcpp_1.1.0 xfun_0.53 tibble_3.3.0
## [55] tidyselect_1.2.1 rstudioapi_0.17.1 knitr_1.50
## [58] farver_2.1.2 xtable_1.8-4 nlme_3.1-168
## [61] htmltools_0.5.8.1 labeling_0.4.3 carData_3.0-6
## [64] rmarkdown_2.30 compiler_4.5.1 S7_0.2.0