First, we have to inspect the data and prepare it for dimension reduction.
## 'data.frame': 50116 obs. of 23 variables:
## $ name : chr "ESS11e04_1" "ESS11e04_1" "ESS11e04_1" "ESS11e04_1" ...
## $ essround: int 11 11 11 11 11 11 11 11 11 11 ...
## $ edition : num 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 ...
## $ proddate: chr "12.01.2026" "12.01.2026" "12.01.2026" "12.01.2026" ...
## $ idno : int 50014 50030 50057 50106 50145 50158 50211 50212 50213 50235 ...
## $ cntry : chr "AT" "AT" "AT" "AT" ...
## $ dweight : num 1.185 0.61 1.392 0.556 0.723 ...
## $ pspwght : num 0.393 0.325 4 0.176 1.061 ...
## $ pweight : num 0.331 0.331 0.331 0.331 0.331 ...
## $ anweight: num 0.13 0.1076 1.3237 0.0583 0.3511 ...
## $ ppltrst : int 5 10 6 6 6 8 7 8 7 3 ...
## $ pplfair : int 5 0 9 6 3 8 7 8 8 4 ...
## $ pplhlp : int 5 1 8 6 8 4 8 8 7 3 ...
## $ trstprl : int 6 6 7 5 6 3 6 9 6 8 ...
## $ trstlgl : int 9 6 5 6 8 5 5 9 7 8 ...
## $ trstplc : int 10 4 8 9 8 7 8 9 8 10 ...
## $ trstplt : int 5 1 4 3 5 5 4 7 3 8 ...
## $ trstprt : int 5 0 4 3 5 5 5 6 2 8 ...
## $ trstep : int 5 5 7 4 6 4 4 7 5 4 ...
## $ trstun : int 5 5 5 4 8 6 6 7 6 8 ...
## $ prob : num 0.000579 0.001124 0.000493 0.001233 0.000949 ...
## $ stratum : int 107 69 18 101 115 7 58 38 62 105 ...
## $ psu : int 317 128 418 295 344 373 86 3 108 314 ...
The missing values coded with values over 10 are cleaned, the variables are renamed, and the data is scaled. Although all the variables are measured in the same scale, the variance can differ in between variables, possibly causing certain ones to dominate.
variables <- c("ppltrst", "pplfair", "pplhlp",
"trstprl", "trstlgl", "trstplc",
"trstplt", "trstprt", "trstep", "trstun")
data_subset <- data[, variables]
data_subset[data_subset > 10] <- NA
data_subset <- na.omit(data_subset)
data_subset <- data_subset %>% rename(
social_trust = ppltrst,
people_helpful = pplhlp,
people_fair = pplfair,
trust_politicians = trstplt,
trust_police = trstplc,
trust_parliament = trstprl,
trust_parties = trstprt,
trust_legalsystem = trstlgl,
trust_ep = trstep,
trust_un = trstun)
data_scaled <- scale(data_subset)All the variables utilized in the analysis with their descriptions are presented below.
| Variable | Label | Scale |
|---|---|---|
| Social Trust Variables | ||
| social_trust | Most people can be trusted or you can’t be too careful | 0 (Low/Distrust) to 10 (High/Trust) |
| people_fair | Most people try to take advantage of you, or try to be fair | 0 (Low/Distrust) to 10 (High/Trust) |
| people_helpful | Most of the time people helpful or mostly looking out for themselves | 0 (Low/Distrust) to 10 (High/Trust) |
| Institutional Trust Variables | ||
| trust_parliament | Trust in country’s parliament | 0 (Low/Distrust) to 10 (High/Trust) |
| trust_legalsystem | Trust in the legal system | 0 (Low/Distrust) to 10 (High/Trust) |
| trust_police | Trust in the police | 0 (Low/Distrust) to 10 (High/Trust) |
| trust_politicians | Trust in politicians | 0 (Low/Distrust) to 10 (High/Trust) |
| trust_parties | Trust in political parties | 0 (Low/Distrust) to 10 (High/Trust) |
| trust_ep | Trust in the European Parliament | 0 (Low/Distrust) to 10 (High/Trust) |
| trust_un | Trust in the United Nations | 0 (Low/Distrust) to 10 (High/Trust) |
The correlation matrix visualization serves as the initial validation for dimension reduction. Distinct clusters are visible: social variables (social_trust, people_fair, people_helpful) correlate strongly with one another, while political and international institutions form a separate group. These significant correlations confirm that the data is not an identity matrix, justifying the use of Principal Component Analysis.
## corrplot 0.95 loaded
Before performing the Principal Component Analysis (PCA), two statistical tests were conducted to ensure the data was suitable for dimension reduction. Kaiser-Meyer-Olkin test indicates the proportion of variance in the variables that might be caused by underlying factors. The result of 0.86 means the patterns of correlations are very strong and the data is highly suitable for PCA.
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = data_scaled)
## Overall MSA = 0.86
## MSA for each item =
## social_trust people_fair people_helpful trust_parliament
## 0.85 0.83 0.88 0.94
## trust_legalsystem trust_police trust_politicians trust_parties
## 0.88 0.88 0.84 0.85
## trust_ep trust_un
## 0.84 0.83
The Bartlett test checks the hypothesis that the correlation matrix is an identity matrix. The result yielded a p-value of 0, which allows us to reject the null hypothesis. This confirms that there are significant relationships between the variables to be reduced.
## R was not square, finding R from data
## $chisq
## [1] 268502.6
##
## $p.value
## [1] 0
##
## $df
## [1] 45
The scree plot is used to determine the optimal number of components to retain for the analysis. A sharp decline is observed between the first and second components, with a distinct “elbow” point appearing at the third component. The first two components exhibit eigenvalues well above the threshold of 1, while subsequent components drop below this mark. Based on these criteria, a two-component solution was selected, as these two dimensions capture the majority of the information (over 66% of variance) within the original 10 variables.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.265 1.2253 0.90600 0.8503 0.69831 0.63023 0.56294
## Proportion of Variance 0.513 0.1501 0.08208 0.0723 0.04876 0.03972 0.03169
## Cumulative Proportion 0.513 0.6631 0.74519 0.8175 0.86625 0.90597 0.93766
## PC8 PC9 PC10
## Standard deviation 0.50627 0.49421 0.35051
## Proportion of Variance 0.02563 0.02442 0.01229
## Cumulative Proportion 0.96329 0.98771 1.00000
fviz_eig(pca_base, choice = "eigenvalue", ncp = 22, barfill = "lightblue", barcolor = "skyblue", linecolor = "purple", addlabels = TRUE, main = "Eigenvalues")The PCA with Varimax rotation revealed a clear two-dimensional structure. Component 1 (RC1) represents Institutional Trust, encompassing faith in both national political bodies and international organizations. Component 2 (RC2) represents Social Trust, focusing on interpersonal relations and the perceived integrity of others. Together, these components account for 66.3% of the total variance, providing a robust reduction of the original 10 variables.
pca_rotated <- principal(data_scaled, nfactors = 2, rotate = "varimax")
print(pca_rotated$loadings, cutoff = 0.3)##
## Loadings:
## RC1 RC2
## social_trust 0.816
## people_fair 0.837
## people_helpful 0.794
## trust_parliament 0.813
## trust_legalsystem 0.749
## trust_police 0.650
## trust_politicians 0.843
## trust_parties 0.836
## trust_ep 0.777
## trust_un 0.753
##
## RC1 RC2
## SS loadings 4.327 2.304
## Proportion Var 0.433 0.230
## Cumulative Var 0.433 0.663
The analysis of Uniqueness and Complexity further validates the two-component solution. Most variables show low uniqueness values, indicating that the extracted components capture the majority of their variance. The highest uniqueness is found in trust_police (0.51), suggesting this variable carries more specific information than others.
## social_trust people_fair people_helpful trust_parliament
## 0.2886051 0.2735228 0.3410544 0.2785612
## trust_legalsystem trust_police trust_politicians trust_parties
## 0.3627700 0.5086262 0.2432133 0.2602280
## trust_ep trust_un
## 0.3904570 0.4219063
Most items exhibit complexity scores near 1.0, confirming a simple structure where each variable primarily associates with only one dimension. This makes the distinction between Social and Institutional trust clear and statistically robust.
## social_trust people_fair people_helpful trust_parliament
## 1.138616 1.071988 1.092538 1.178897
## trust_legalsystem trust_police trust_politicians trust_parties
## 1.266817 1.316650 1.129647 1.116215
## trust_ep trust_un
## 1.018921 1.036667
This diagnostic plot provides a final quality check on how well each individual variable fits within the reduced two-dimensional space. The vertical axis measures the variance not shared with other variables (uniqueness): nearly all items (except trust_police) are well below the 0.5 threshold, meaning they are well-represented by the model. Most variables cluster near a complexity score of 1.1 to 1.2, confirming they predominantly load onto a single factor rather than being split between two. Variables like trust_politicians, trust_parties, and people_fair are located in the lower-left quadrant, signifying low uniqueness and low complexity.
diag_data <- data.frame(
Variable = names(pca_rotated$uniqueness),
Complexity = as.numeric(pca_rotated$complexity),
Uniqueness = as.numeric(pca_rotated$uniqueness)
)
ggplot(diag_data, aes(x = Complexity, y = Uniqueness, label = Variable)) +
geom_point(color = "darkblue", size = 3) +
geom_text_repel() +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "red") +
geom_vline(xintercept = 1.5, linetype = "dashed", color = "red") +
theme_minimal() +
labs(
title = "Diagnostic Plot: Complexity vs Uniqueness",
x = "Complexity",
y = "Uniqueness"
)PCA Variables Factor Map clearly illustrates two distinct clusters of variables. Vectors representing trust in parliament, politicians, parties, and international organizations align strongly with the horizontal axis, explaining the majority of the variance (51.3%). The interpersonal trust variables (people_fair, social_trust, people_helpful) point towards the vertical axis, confirming they represent a separate construct.
fviz_pca_var(pca_base, col.var = "contrib",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE)This visualization displays the percentage of variance each variable contributes to the two primary dimensions. While these contribution scores were generated using the standard prcomp method (unrotated), they perfectly illustrate the structural foundation of our model. Although we ultimately utilized a Varimax rotation to achieve a simpler and more interpretable structure, these charts confirm that the underlying data variance naturally supports the division between institutional and social trust. The top chart shows that political variables—trust in politicians, parliament, and parties—are the primary drivers of the first dimension, most of them exceeding the expected average contribution threshold (red dashed line). The bottom chart highlights that interpersonal variables—people being fair, helpful, and social trust—contribute the most significantly to this second factor.
a <- fviz_contrib(pca_base, "var", axes = 1, xtickslab.rt = 45)
b <- fviz_contrib(pca_base, "var", axes = 2, xtickslab.rt = 45)
grid.arrange(a, b, top = 'Contribution to the first two Principal Components')The objective of this analysis was to reduce 10 original trust-related variables into a smaller set of meaningful dimensions. Based on the statistical evidence, a two-component solution was selected. The Scree Plot and Kaiser’s Criterion (eigenvalues > 1) both confirmed that two components are optimal, explaining 66.3% of the total variance in the dataset. High KMO (0.86) and a significant Bartlett’s test (p-value = 0) confirmed that the variables are sufficiently correlated for reduction.
To enhance interpretability, a Varimax rotation was applied, resulting in a simple structure where each variable aligns clearly with one specific dimension. The RC1 dimension captures confidence in national political systems (parliament, parties, politicians) and legal frameworks (police, legal system), as well as international bodies like the UN and EP. The RC2 dimension represents interpersonal faith, grouping perceptions of people’s trustworthiness, fairness, and helpfulness into a single social construct.
The reduction makes strong theoretical and statistical sense. It shows that in the European Social Survey, a person’s level of trust is not a single value but is split between how they view formal institutions and how they view their fellow citizens.