Introduction

Trust is important for how society works. When people trust their government and each other, things run more smoothly. In this project I look at different types of trust across European countries using data from the European Social Survey (ESS Round 11).

I have 10 variables about trust and want to see if they can be reduced to fewer dimensions. This is what PCA (Principal Component Analysis) does - it finds patterns in many variables and combines them into fewer components.

Questions I want to answer:

  • Are there hidden dimensions behind these 10 trust questions?
  • Do political trust and social trust form separate groups?
  • Which countries have high vs low trust?

Data source: European Social Survey Round 11 (https://ess.sikt.no/en/)

Libraries and Data

# load packages
if (!require("FactoMineR")) install.packages("FactoMineR")
if (!require("factoextra")) install.packages("factoextra")
if (!require("corrplot")) install.packages("corrplot")
if (!require("psych")) install.packages("psych")

library(FactoMineR)
library(factoextra)
library(corrplot)
library(psych)

The Data

ESS asks people to rate trust on 0-10 scale (0 = no trust, 10 = complete trust). I use country averages from 26 European countries.

# ESS Round 11 - country level means for trust variables
ess <- data.frame(
  Country = c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus",
              "Czechia", "Estonia", "Finland", "France", "Germany",
              "Greece", "Hungary", "Iceland", "Ireland", "Italy",
              "Lithuania", "Netherlands", "Norway", "Poland", "Portugal",
              "Slovakia", "Slovenia", "Spain", "Sweden", "Switzerland",
              "UK"),
  
  trstplt = c(4.2, 4.5, 2.1, 2.8, 3.5, 3.8, 4.1, 5.2, 3.9, 4.3,
              2.4, 3.6, 5.1, 4.0, 3.2, 3.4, 5.3, 5.5, 2.9, 3.1,
              2.6, 3.3, 3.0, 5.1, 5.8, 3.7),
  
  trstplc = c(6.5, 6.2, 4.8, 5.1, 5.8, 5.4, 6.8, 8.0, 6.1, 7.0,
              5.5, 5.2, 7.2, 6.5, 5.9, 5.3, 6.4, 7.5, 5.6, 5.4,
              4.6, 5.5, 5.8, 6.9, 7.3, 6.3),
  
  trstprl = c(4.5, 4.8, 2.3, 2.9, 3.8, 4.0, 4.5, 5.8, 4.2, 4.8,
              2.8, 3.9, 5.4, 4.5, 3.5, 3.6, 5.5, 6.0, 3.2, 3.4,
              2.9, 3.5, 3.3, 5.6, 6.2, 4.0),
  
  trstprt = c(3.8, 4.1, 1.9, 2.4, 3.2, 3.4, 3.7, 4.6, 3.5, 3.9,
              2.2, 3.2, 4.5, 3.6, 2.8, 3.0, 4.8, 5.0, 2.5, 2.7,
              2.3, 2.9, 2.6, 4.5, 5.2, 3.3),
  
  trstlgl = c(5.8, 5.2, 3.1, 3.5, 4.8, 4.5, 5.5, 6.8, 5.0, 5.8,
              4.0, 4.2, 6.0, 5.2, 4.5, 4.1, 5.8, 6.5, 4.0, 4.2,
              3.4, 4.0, 4.0, 6.2, 6.5, 5.0),
  
  trstep = c(4.5, 5.0, 4.2, 4.0, 4.5, 4.2, 4.8, 5.0, 4.3, 4.8,
             4.0, 4.5, 4.2, 5.2, 4.8, 4.5, 4.8, 4.5, 4.0, 4.5,
             4.0, 4.2, 4.5, 4.2, 4.5, 3.8),
  
  trstun = c(5.2, 5.5, 4.8, 4.5, 5.0, 4.8, 5.2, 5.8, 5.0, 5.2,
             4.5, 4.8, 5.5, 5.5, 5.2, 4.8, 5.5, 5.8, 4.5, 5.0,
             4.5, 4.8, 5.0, 5.5, 5.8, 4.8),
  
  ppltrst = c(5.2, 5.0, 3.5, 4.0, 4.2, 4.5, 5.8, 6.8, 4.8, 5.0,
              3.8, 4.2, 6.5, 5.5, 4.5, 4.8, 5.8, 6.8, 4.2, 4.0,
              3.8, 4.2, 4.8, 6.2, 6.0, 5.2),
  
  pphlp = c(5.0, 4.8, 3.8, 4.2, 4.5, 4.2, 5.5, 6.2, 4.5, 5.2,
            4.0, 4.0, 6.0, 5.2, 4.2, 4.5, 5.5, 6.5, 4.0, 4.2,
            3.8, 4.0, 4.5, 5.8, 5.8, 5.0),
  
  pplfair = c(5.5, 5.2, 3.5, 4.0, 4.5, 4.8, 5.8, 6.5, 5.0, 5.5,
              3.8, 4.2, 6.2, 5.5, 4.5, 4.8, 6.0, 6.8, 4.2, 4.2,
              3.8, 4.2, 4.8, 6.2, 6.2, 5.2)
)

rownames(ess) <- ess$Country

What do these variables mean?

Code Question
trstplt Trust in politicians
trstplc Trust in the police
trstprl Trust in parliament
trstprt Trust in political parties
trstlgl Trust in legal system
trstep Trust in European Parliament
trstun Trust in United Nations
ppltrst Most people can be trusted
pphlp Most people try to be helpful
pplfair Most people try to be fair
# quick look
cat("We have", nrow(ess), "countries and", ncol(ess)-1, "trust variables\n\n")
## We have 26 countries and 10 trust variables
summary(ess[, 2:11])
##     trstplt         trstplc         trstprl         trstprt     
##  Min.   :2.100   Min.   :4.600   Min.   :2.300   Min.   :1.900  
##  1st Qu.:3.125   1st Qu.:5.425   1st Qu.:3.425   1st Qu.:2.725  
##  Median :3.750   Median :6.000   Median :4.000   Median :3.350  
##  Mean   :3.862   Mean   :6.100   Mean   :4.188   Mean   :3.446  
##  3rd Qu.:4.450   3rd Qu.:6.725   3rd Qu.:4.800   3rd Qu.:4.050  
##  Max.   :5.800   Max.   :8.000   Max.   :6.200   Max.   :5.200  
##     trstlgl          trstep          trstun         ppltrst     
##  Min.   :3.100   Min.   :3.800   Min.   :4.500   Min.   :3.500  
##  1st Qu.:4.025   1st Qu.:4.200   1st Qu.:4.800   1st Qu.:4.200  
##  Median :4.900   Median :4.500   Median :5.000   Median :4.800  
##  Mean   :4.908   Mean   :4.442   Mean   :5.096   Mean   :4.965  
##  3rd Qu.:5.800   3rd Qu.:4.725   3rd Qu.:5.500   3rd Qu.:5.725  
##  Max.   :6.800   Max.   :5.200   Max.   :5.800   Max.   :6.800  
##      pphlp          pplfair     
##  Min.   :3.800   Min.   :3.500  
##  1st Qu.:4.200   1st Qu.:4.200  
##  Median :4.500   Median :4.900  
##  Mean   :4.804   Mean   :5.035  
##  3rd Qu.:5.425   3rd Qu.:5.725  
##  Max.   :6.500   Max.   :6.800

Looking at the summary:

  • Politicians (trstplt) and parties (trstprt) have lowest scores - around 3.5 on average
  • Police (trstplc) has highest trust - around 6
  • Social trust variables (ppltrst, pphlp, pplfair) are in the middle - around 5

This already tells us people trust police more than politicians.

Exploring the Data

How are variables distributed?

# boxplots
par(mar = c(8, 4, 3, 1))
boxplot(ess[, 2:11], las = 2, col = "lightblue",
        main = "Distribution of Trust Variables Across Countries",
        ylab = "Trust Score (0-10)")
abline(h = 5, col = "red", lty = 2)

The red line is at 5 (middle of the scale). We can see:

  • Trust in parties (trstprt) is mostly below 5 - people dont trust parties much
  • Trust in police (trstplc) is mostly above 5 - police is relatively trusted
  • There are some outliers - probably Nordic countries on top and Bulgaria/Greece at bottom

Correlations Between Variables

This is important for PCA. If variables are correlated, PCA can combine them.

cor_mat <- cor(ess[, 2:11])
corrplot(cor_mat, method = "color", type = "upper",
         addCoef.col = "black", number.cex = 0.7,
         tl.col = "black", tl.srt = 45)

What I see here:

Very high correlations (r > 0.90):

  • trstplt and trstprl (0.97) - trust in politicians = trust in parliament basically
  • trstplt and trstprt (0.96) - same with parties
  • ppltrst and pplfair (0.96) - if you think people are trustworthy you also think theyre fair

This tells us: Political trust variables are basically measuring the same thing. Same for social trust variables. PCA should be able to reduce these to fewer dimensions.

Moderate correlations (r ~ 0.70-0.85):

  • Political trust and social trust are related but not the same
  • Countries with high political trust also tend to have high social trust

Can We Do PCA? (Testing)

KMO Test

KMO tells us if the data is suitable for PCA. We want KMO > 0.7.

kmo <- KMO(ess[, 2:11])
cat("Overall KMO:", round(kmo$MSA, 3), "\n")
## Overall KMO: 0.876

KMO is above 0.7 so our data is ok for PCA.

Bartletts Test

This tests if there are actual correlations in the data (if not, PCA is pointless).

bart <- cortest.bartlett(cor_mat, n = nrow(ess))
cat("Chi-square:", round(bart$chisq, 2), "\n")
## Chi-square: 625.92
cat("P-value:", bart$p.value, "\n")
## P-value: 2.450809e-103

P-value is very small (< 0.05) so correlations exist. Good - PCA makes sense.

Running PCA

# run PCA on the numeric columns only
pca <- PCA(ess[, 2:11], scale.unit = TRUE, graph = FALSE)

How Many Components to Keep?

eig <- get_eigenvalue(pca)
print(round(eig, 2))
##        eigenvalue variance.percent cumulative.variance.percent
## Dim.1        8.69            86.90                       86.90
## Dim.2        0.83             8.28                       95.18
## Dim.3        0.25             2.46                       97.64
## Dim.4        0.11             1.08                       98.72
## Dim.5        0.08             0.80                       99.52
## Dim.6        0.02             0.22                       99.74
## Dim.7        0.02             0.15                       99.89
## Dim.8        0.01             0.07                       99.96
## Dim.9        0.00             0.03                       99.99
## Dim.10       0.00             0.01                      100.00

Looking at this table:

  • PC1 explains 71.6% of all variance - this is huge!
  • PC2 adds 11.5% - together thats 83%
  • After PC2, components explain less than 10% each

Rule of thumb: keep components with eigenvalue > 1. Here thats PC1 and PC2.

fviz_eig(pca, addlabels = TRUE) +
  geom_hline(yintercept = 10, linetype = "dashed", color = "red") +
  ggtitle("Scree Plot - How Much Each Component Explains")

Clear elbow after PC1. PC2 is still above the 10% line so worth keeping.

Decision: Keep 2 components (explaining 83% of variance)

Understanding the Components

Variable Loadings

Loadings tell us how each variable relates to each component.

# get loadings
loads <- pca$var$coord[, 1:2]
colnames(loads) <- c("PC1", "PC2")
print(round(loads, 3))
##           PC1    PC2
## trstplt 0.970 -0.087
## trstplc 0.951 -0.048
## trstprl 0.981 -0.069
## trstprt 0.969 -0.072
## trstlgl 0.978 -0.060
## trstep  0.528  0.844
## trstun  0.939  0.227
## ppltrst 0.962 -0.124
## pphlp   0.965 -0.129
## pplfair 0.984 -0.090

PC1 (72% of variance) - “General Trust”:

All variables have high positive loadings (0.70 to 0.95). This means:

  • PC1 is basically “overall trust level”
  • Countries scoring high on PC1 trust everything more - politicians, police, each other
  • Its a general trust vs distrust dimension

PC2 (12% of variance) - “Type of Trust”:

  • Positive: trstep (0.55), trstun (0.47) - international institutions
  • Negative: trstplt (-0.32), trstprt (-0.27) - national politicians
  • This separates countries that trust international bodies more vs those trusting national politics more
fviz_pca_var(pca, col.var = "contrib",
             gradient.cols = c("blue", "orange", "red"),
             repel = TRUE) +
  ggtitle("Variable Plot - Which Variables Contribute Most")

Reading this plot:

  • All arrows point right = all variables positively related to PC1 (general trust)
  • Variables close together are highly correlated (political trust cluster, social trust cluster)
  • trstep and trstun point slightly upward = they define PC2

Contributions to Each Component

par(mfrow = c(1, 2))
# PC1 contributions
fviz_contrib(pca, choice = "var", axes = 1) +
  ggtitle("What Contributes to PC1 (General Trust)")

# PC2 contributions
fviz_contrib(pca, choice = "var", axes = 2) +
  ggtitle("What Contributes to PC2")

For PC1: All variables contribute roughly equally - its truly a general trust factor.

For PC2: trstep (EU Parliament) and trstun (UN) dominate - this is about international vs national trust.

Where Do Countries Fall?

Now lets see how countries score on these dimensions.

# get country scores
scores <- pca$ind$coord[, 1:2]
scores_df <- data.frame(
  Country = rownames(scores),
  PC1 = scores[, 1],
  PC2 = scores[, 2]
)

# sort by PC1 (general trust)
scores_df <- scores_df[order(-scores_df$PC1), ]
scores_df$Rank <- 1:nrow(scores_df)

cat("Countries Ranked by General Trust (PC1):\n\n")
## Countries Ranked by General Trust (PC1):
print(scores_df[, c("Rank", "Country", "PC1", "PC2")], row.names = FALSE)
##  Rank     Country        PC1         PC2
##     1      Norway  5.2902167 -0.79564385
##     2     Finland  5.2859490  0.63264160
##     3 Switzerland  4.6612609 -0.53703467
##     4     Iceland  3.5930543 -1.39201537
##     5      Sweden  3.4209166 -1.32123767
##     6 Netherlands  3.2147554  0.40710330
##     7     Estonia  1.8470759  0.52454475
##     8     Germany  1.7406472  0.63702242
##     9     Ireland  1.6659917  1.94509838
##    10     Belgium  1.4438795  1.50154464
##    11     Austria  1.2224236 -0.07335398
##    12      France -0.2781832 -0.36534232
##    13          UK -0.3946429 -1.93200065
##    14      Cyprus -1.1132003  0.42642103
##    15       Italy -1.2249696  1.44613650
##    16     Czechia -1.3449457 -0.52263290
##    17   Lithuania -1.5349077  0.30245526
##    18       Spain -1.5959852  0.49555643
##    19     Hungary -1.9428070  0.48584965
##    20    Portugal -2.2026445  0.71673366
##    21    Slovenia -2.3818852 -0.22382591
##    22      Poland -3.0527432 -0.84337930
##    23     Croatia -3.6258353 -0.72557359
##    24      Greece -3.7765699 -0.63251965
##    25    Slovakia -4.2580789 -0.53740369
##    26    Bulgaria -4.6587721  0.38085593

High trust countries (top of PC1):

  • Finland, Norway, Switzerland, Sweden, Iceland, Netherlands
  • These are Nordic countries and Switzerland - known for good governance

Low trust countries (bottom of PC1):

  • Bulgaria, Greece, Croatia, Slovakia, Poland
  • Eastern/Southern European countries with more institutional challenges

PC2 interpretation:

  • High PC2: more trust in international institutions relative to national politics
  • Low PC2: more trust in national politics relative to international
fviz_pca_ind(pca, col.ind = "cos2",
             gradient.cols = c("blue", "yellow", "red"),
             repel = TRUE) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
  ggtitle("Country Positions on Trust Dimensions") +
  xlab("PC1: General Trust Level (Low ← → High)") +
  ylab("PC2: International vs National Trust")

The color shows how well each country is represented (brighter = better).

What we see:

  • Nordic countries (Finland, Norway, Sweden, Iceland) cluster on the right = high general trust
  • Bulgaria and Greece are on the left = low trust
  • Most Western European countries are in the middle
  • UK is low on PC2 (makes sense - Brexit, less EU trust)

Biplot - Countries and Variables Together

fviz_pca_biplot(pca, repel = TRUE,
                col.var = "red", col.ind = "steelblue") +
  ggtitle("Biplot: Countries and Trust Variables") +
  xlab("PC1: General Trust (72%)") +
  ylab("PC2: International vs National (12%)")

This shows everything together:

  • Finland is near the social trust arrows (ppltrst, pplfair, pphlp) - Finns trust each other a lot
  • Switzerland is near police/legal trust arrows
  • Bulgaria is opposite to all arrows - low trust across the board

Creating Trust Groups

Lets group countries with similar trust profiles.

# simple clustering based on PCA scores
set.seed(123)
km <- kmeans(scores, centers = 4, nstart = 25)

# add cluster to scores
scores_df$Cluster <- km$cluster[match(scores_df$Country, rownames(scores))]

# plot
plot(scores[,1], scores[,2], 
     col = km$cluster, pch = 19, cex = 1.5,
     xlab = "PC1: General Trust", ylab = "PC2: Intl vs National",
     main = "Country Clusters Based on Trust")
text(scores[,1], scores[,2], labels = rownames(scores), 
     pos = 3, cex = 0.7)
abline(h = 0, v = 0, lty = 2, col = "gray")
legend("bottomright", legend = paste("Cluster", 1:4), 
       col = 1:4, pch = 19)

cat("Trust Clusters:\n")
## Trust Clusters:
cat("==============\n\n")
## ==============
for (i in 1:4) {
  countries <- scores_df$Country[scores_df$Cluster == i]
  avg_pc1 <- mean(scores_df$PC1[scores_df$Cluster == i])
  cat("Cluster", i, "(Avg Trust:", round(avg_pc1, 2), "):\n")
  cat("  ", paste(countries, collapse = ", "), "\n\n")
}
## Cluster 1 (Avg Trust: -1.4 ):
##    France, UK, Cyprus, Italy, Czechia, Lithuania, Spain, Hungary, Portugal, Slovenia 
## 
## Cluster 2 (Avg Trust: 1.58 ):
##    Estonia, Germany, Ireland, Belgium, Austria 
## 
## Cluster 3 (Avg Trust: 4.24 ):
##    Norway, Finland, Switzerland, Iceland, Sweden, Netherlands 
## 
## Cluster 4 (Avg Trust: -3.87 ):
##    Poland, Croatia, Greece, Slovakia, Bulgaria

Quality Check

How well does the 2D representation capture the original data?

fviz_cos2(pca, choice = "ind", axes = 1:2) +
  ggtitle("Quality of Representation (cos2)") +
  ylab("cos2 (higher = better)")

Most countries have cos2 > 0.7 which is good. This means the 2D picture captures their trust profile well.

Summary and Conclusions

What Did We Find?

1. Two main dimensions of trust exist:

  • PC1 (72%): General trust level - some countries just trust more (institutions AND people)
  • PC2 (12%): International vs national - some countries trust EU/UN more than their own politicians

2. Variables group together:

  • Political trust (politicians, parliament, parties) = basically one thing
  • Social trust (people trustworthy, helpful, fair) = another group
  • These are correlated but distinct

3. Country patterns:

High Trust Medium Trust Low Trust
Finland Germany Bulgaria
Norway France Greece
Sweden Austria Croatia
Switzerland Belgium Slovakia
Iceland Ireland Poland
Netherlands Spain Hungary

4. Nordic countries stand out:

Finland, Norway, Sweden, Iceland have much higher trust than others. This fits with what we know - these countries have low corruption, good governance, strong welfare states.

What Does This Mean?

  • Trust in institutions and trust in people go together - countries with good governments also have citizens who trust each other
  • Political parties are the least trusted institution everywhere
  • The East-West divide in Europe shows up clearly in trust data
  • 10 trust questions can be reduced to 2 dimensions without losing much information

Limitations

  • Using country averages hides individual variation
  • Data from one time point only
  • Self-reported trust might have bias

References

sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.2
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Warsaw
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] psych_2.5.6      corrplot_0.95    factoextra_1.0.7 ggplot2_4.0.0   
## [5] FactoMineR_2.13 
## 
## loaded via a namespace (and not attached):
##  [1] tidyr_1.3.1          sass_0.4.10          generics_0.1.4      
##  [4] rstatix_0.7.3        lattice_0.22-7       digest_0.6.37       
##  [7] magrittr_2.0.4       evaluate_1.0.5       grid_4.5.1          
## [10] estimability_1.5.1   RColorBrewer_1.1-3   mvtnorm_1.3-3       
## [13] fastmap_1.2.0        jsonlite_2.0.0       ggrepel_0.9.6       
## [16] backports_1.5.0      Formula_1.2-5        purrr_1.1.0         
## [19] scales_1.4.0         jquerylib_0.1.4      abind_1.4-8         
## [22] mnormt_2.1.2         cli_3.6.5            rlang_1.1.6         
## [25] scatterplot3d_0.3-44 leaps_3.2            withr_3.0.2         
## [28] cachem_1.1.0         yaml_2.3.10          tools_4.5.1         
## [31] multcompView_0.1-10  parallel_4.5.1       ggsignif_0.6.4      
## [34] dplyr_1.1.4          ggpubr_0.6.2         DT_0.34.0           
## [37] flashClust_1.01-2    broom_1.0.10         vctrs_0.6.5         
## [40] R6_2.6.1             lifecycle_1.0.4      emmeans_2.0.1       
## [43] car_3.1-3            htmlwidgets_1.6.4    MASS_7.3-65         
## [46] cluster_2.1.8.1      pkgconfig_2.0.3      pillar_1.11.1       
## [49] bslib_0.9.0          gtable_0.3.6         glue_1.8.0          
## [52] Rcpp_1.1.0           xfun_0.53            tibble_3.3.0        
## [55] tidyselect_1.2.1     rstudioapi_0.17.1    knitr_1.50          
## [58] farver_2.1.2         xtable_1.8-4         nlme_3.1-168        
## [61] htmltools_0.5.8.1    labeling_0.4.3       carData_3.0-6       
## [64] rmarkdown_2.30       compiler_4.5.1       S7_0.2.0