Data preparation

First, we have to inspect the data and prepare it for dimension reduction.

str(data)

## 'data.frame':    50116 obs. of  23 variables:
##  $ name    : chr  "ESS11e04_1" "ESS11e04_1" "ESS11e04_1" "ESS11e04_1" ...
##  $ essround: int  11 11 11 11 11 11 11 11 11 11 ...
##  $ edition : num  4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 ...
##  $ proddate: chr  "12.01.2026" "12.01.2026" "12.01.2026" "12.01.2026" ...
##  $ idno    : int  50014 50030 50057 50106 50145 50158 50211 50212 50213 50235 ...
##  $ cntry   : chr  "AT" "AT" "AT" "AT" ...
##  $ dweight : num  1.185 0.61 1.392 0.556 0.723 ...
##  $ pspwght : num  0.393 0.325 4 0.176 1.061 ...
##  $ pweight : num  0.331 0.331 0.331 0.331 0.331 ...
##  $ anweight: num  0.13 0.1076 1.3237 0.0583 0.3511 ...
##  $ ppltrst : int  5 10 6 6 6 8 7 8 7 3 ...
##  $ pplfair : int  5 0 9 6 3 8 7 8 8 4 ...
##  $ pplhlp  : int  5 1 8 6 8 4 8 8 7 3 ...
##  $ trstprl : int  6 6 7 5 6 3 6 9 6 8 ...
##  $ trstlgl : int  9 6 5 6 8 5 5 9 7 8 ...
##  $ trstplc : int  10 4 8 9 8 7 8 9 8 10 ...
##  $ trstplt : int  5 1 4 3 5 5 4 7 3 8 ...
##  $ trstprt : int  5 0 4 3 5 5 5 6 2 8 ...
##  $ trstep  : int  5 5 7 4 6 4 4 7 5 4 ...
##  $ trstun  : int  5 5 5 4 8 6 6 7 6 8 ...
##  $ prob    : num  0.000579 0.001124 0.000493 0.001233 0.000949 ...
##  $ stratum : int  107 69 18 101 115 7 58 38 62 105 ...
##  $ psu     : int  317 128 418 295 344 373 86 3 108 314 ...

The missing values coded with values over 10 are cleaned, the variables are renamed, and the data is scaled. Although all the variables are measured in the same scale, the variance can differ in between variables, possibly causing certain ones to dominate.

variables <- c("ppltrst", "pplfair", "pplhlp", 
                    "trstprl", "trstlgl", "trstplc", 
                    "trstplt", "trstprt", "trstep", "trstun")

data_subset <- data[, variables]
data_subset[data_subset > 10] <- NA
data_subset <- na.omit(data_subset)
data_subset <- data_subset %>% rename(
  social_trust = ppltrst,
  people_helpful = pplhlp,
  people_fair = pplfair,
  trust_politicians = trstplt,
  trust_police = trstplc,
  trust_parliament = trstprl,
  trust_parties = trstprt,
  trust_legalsystem = trstlgl,
  trust_ep = trstep,
  trust_un = trstun)
data_scaled <- scale(data_subset)

All the variables utilized in the analysis with their descriptions are presented below.

Description of variables used in Dimension Reduction (ESS Round 11)
Variable	Label	Scale
Social Trust Variables
social_trust	Most people can be trusted or you can’t be too careful	0 (Low/Distrust) to 10 (High/Trust)
people_fair	Most people try to take advantage of you, or try to be fair	0 (Low/Distrust) to 10 (High/Trust)
people_helpful	Most of the time people helpful or mostly looking out for themselves	0 (Low/Distrust) to 10 (High/Trust)
Institutional Trust Variables
trust_parliament	Trust in country’s parliament	0 (Low/Distrust) to 10 (High/Trust)
trust_legalsystem	Trust in the legal system	0 (Low/Distrust) to 10 (High/Trust)
trust_police	Trust in the police	0 (Low/Distrust) to 10 (High/Trust)
trust_politicians	Trust in politicians	0 (Low/Distrust) to 10 (High/Trust)
trust_parties	Trust in political parties	0 (Low/Distrust) to 10 (High/Trust)
trust_ep	Trust in the European Parliament	0 (Low/Distrust) to 10 (High/Trust)
trust_un	Trust in the United Nations	0 (Low/Distrust) to 10 (High/Trust)

Exploratory Analysis

The correlation matrix visualization serves as the initial validation for dimension reduction. Distinct clusters are visible: social variables (social_trust, people_fair, people_helpful) correlate strongly with one another, while political and international institutions form a separate group. These significant correlations confirm that the data is not an identity matrix, justifying the use of Principal Component Analysis.

cor_matrix <- cor(data_scaled)
library(corrplot)

## corrplot 0.95 loaded

corrplot(cor_matrix, method = "circle")

Before performing the Principal Component Analysis (PCA), two statistical tests were conducted to ensure the data was suitable for dimension reduction. Kaiser-Meyer-Olkin test indicates the proportion of variance in the variables that might be caused by underlying factors. The result of 0.86 means the patterns of correlations are very strong and the data is highly suitable for PCA.

KMO(data_scaled)

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = data_scaled)
## Overall MSA =  0.86
## MSA for each item = 
##      social_trust       people_fair    people_helpful  trust_parliament 
##              0.85              0.83              0.88              0.94 
## trust_legalsystem      trust_police trust_politicians     trust_parties 
##              0.88              0.88              0.84              0.85 
##          trust_ep          trust_un 
##              0.84              0.83

The Bartlett test checks the hypothesis that the correlation matrix is an identity matrix. The result yielded a p-value of 0, which allows us to reject the null hypothesis. This confirms that there are significant relationships between the variables to be reduced.

cortest.bartlett(data_scaled)

## R was not square, finding R from data

## $chisq
## [1] 268502.6
## 
## $p.value
## [1] 0
## 
## $df
## [1] 45

The scree plot is used to determine the optimal number of components to retain for the analysis. A sharp decline is observed between the first and second components, with a distinct “elbow” point appearing at the third component. The first two components exhibit eigenvalues well above the threshold of 1, while subsequent components drop below this mark. Based on these criteria, a two-component solution was selected, as these two dimensions capture the majority of the information (over 66% of variance) within the original 10 variables.

pca_base <- prcomp(data_scaled)
plot(pca_base, type = "l")

summary(pca_base)

## Importance of components:
##                          PC1    PC2     PC3    PC4     PC5     PC6     PC7
## Standard deviation     2.265 1.2253 0.90600 0.8503 0.69831 0.63023 0.56294
## Proportion of Variance 0.513 0.1501 0.08208 0.0723 0.04876 0.03972 0.03169
## Cumulative Proportion  0.513 0.6631 0.74519 0.8175 0.86625 0.90597 0.93766
##                            PC8     PC9    PC10
## Standard deviation     0.50627 0.49421 0.35051
## Proportion of Variance 0.02563 0.02442 0.01229
## Cumulative Proportion  0.96329 0.98771 1.00000

fviz_eig(pca_base, choice = "eigenvalue", ncp = 22, barfill = "lightblue", barcolor = "skyblue", linecolor = "purple",  addlabels = TRUE,   main = "Eigenvalues")

Rotated PCA

The PCA with Varimax rotation revealed a clear two-dimensional structure. Component 1 (RC1) represents Institutional Trust, encompassing faith in both national political bodies and international organizations. Component 2 (RC2) represents Social Trust, focusing on interpersonal relations and the perceived integrity of others. Together, these components account for 66.3% of the total variance, providing a robust reduction of the original 10 variables.

pca_rotated <- principal(data_scaled, nfactors = 2, rotate = "varimax")
print(pca_rotated$loadings, cutoff = 0.3)

## 
## Loadings:
##                   RC1   RC2  
## social_trust            0.816
## people_fair             0.837
## people_helpful          0.794
## trust_parliament  0.813      
## trust_legalsystem 0.749      
## trust_police      0.650      
## trust_politicians 0.843      
## trust_parties     0.836      
## trust_ep          0.777      
## trust_un          0.753      
## 
##                  RC1   RC2
## SS loadings    4.327 2.304
## Proportion Var 0.433 0.230
## Cumulative Var 0.433 0.663

The analysis of Uniqueness and Complexity further validates the two-component solution. Most variables show low uniqueness values, indicating that the extracted components capture the majority of their variance. The highest uniqueness is found in trust_police (0.51), suggesting this variable carries more specific information than others.

pca_rotated$uniqueness

##      social_trust       people_fair    people_helpful  trust_parliament 
##         0.2886051         0.2735228         0.3410544         0.2785612 
## trust_legalsystem      trust_police trust_politicians     trust_parties 
##         0.3627700         0.5086262         0.2432133         0.2602280 
##          trust_ep          trust_un 
##         0.3904570         0.4219063

Most items exhibit complexity scores near 1.0, confirming a simple structure where each variable primarily associates with only one dimension. This makes the distinction between Social and Institutional trust clear and statistically robust.

pca_rotated$complexity

##      social_trust       people_fair    people_helpful  trust_parliament 
##          1.138616          1.071988          1.092538          1.178897 
## trust_legalsystem      trust_police trust_politicians     trust_parties 
##          1.266817          1.316650          1.129647          1.116215 
##          trust_ep          trust_un 
##          1.018921          1.036667

This diagnostic plot provides a final quality check on how well each individual variable fits within the reduced two-dimensional space. The vertical axis measures the variance not shared with other variables (uniqueness): nearly all items (except trust_police) are well below the 0.5 threshold, meaning they are well-represented by the model. Most variables cluster near a complexity score of 1.1 to 1.2, confirming they predominantly load onto a single factor rather than being split between two. Variables like trust_politicians, trust_parties, and people_fair are located in the lower-left quadrant, signifying low uniqueness and low complexity.

diag_data <- data.frame(
  Variable = names(pca_rotated$uniqueness),
  Complexity = as.numeric(pca_rotated$complexity),
  Uniqueness = as.numeric(pca_rotated$uniqueness)
)

ggplot(diag_data, aes(x = Complexity, y = Uniqueness, label = Variable)) +
  geom_point(color = "darkblue", size = 3) +
  geom_text_repel() + 
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "red") + 
  geom_vline(xintercept = 1.5, linetype = "dashed", color = "red") + 
  theme_minimal() +
  labs(
    title = "Diagnostic Plot: Complexity vs Uniqueness",
    x = "Complexity",
    y = "Uniqueness"
  )

Visualization

PCA Variables Factor Map clearly illustrates two distinct clusters of variables. Vectors representing trust in parliament, politicians, parties, and international organizations align strongly with the horizontal axis, explaining the majority of the variance (51.3%). The interpersonal trust variables (people_fair, social_trust, people_helpful) point towards the vertical axis, confirming they represent a separate construct.

fviz_pca_var(pca_base, col.var = "contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE)

This visualization displays the percentage of variance each variable contributes to the two primary dimensions. While these contribution scores were generated using the standard prcomp method (unrotated), they perfectly illustrate the structural foundation of our model. Although we ultimately utilized a Varimax rotation to achieve a simpler and more interpretable structure, these charts confirm that the underlying data variance naturally supports the division between institutional and social trust. The top chart shows that political variables—trust in politicians, parliament, and parties—are the primary drivers of the first dimension, most of them exceeding the expected average contribution threshold (red dashed line). The bottom chart highlights that interpersonal variables—people being fair, helpful, and social trust—contribute the most significantly to this second factor.

a <- fviz_contrib(pca_base, "var", axes = 1, xtickslab.rt = 45)
b <- fviz_contrib(pca_base, "var", axes = 2, xtickslab.rt = 45)
grid.arrange(a, b, top = 'Contribution to the first two Principal Components')

Conclusion

The objective of this analysis was to reduce 10 original trust-related variables into a smaller set of meaningful dimensions. Based on the statistical evidence, a two-component solution was selected. The Scree Plot and Kaiser’s Criterion (eigenvalues > 1) both confirmed that two components are optimal, explaining 66.3% of the total variance in the dataset. High KMO (0.86) and a significant Bartlett’s test (p-value = 0) confirmed that the variables are sufficiently correlated for reduction.

To enhance interpretability, a Varimax rotation was applied, resulting in a simple structure where each variable aligns clearly with one specific dimension. The RC1 dimension captures confidence in national political systems (parliament, parties, politicians) and legal frameworks (police, legal system), as well as international bodies like the UN and EP. The RC2 dimension represents interpersonal faith, grouping perceptions of people’s trustworthiness, fairness, and helpfulness into a single social construct.

The reduction makes strong theoretical and statistical sense. It shows that in the European Social Survey, a person’s level of trust is not a single value but is split between how they view formal institutions and how they view their fellow citizens.

Homework: Dimension Reduction

Natalia Szymanko

Data preparation

Exploratory Analysis

Rotated PCA

Visualization

Conclusion