AS2 POLS0013

Question 1.1

An alternative concept here might be “Trust in a country’s public sector.” Trusting a country’s public sector means trusting all the institutions and agents that have an obligation to serve the public below the country level.

Ideal indicators for measuring “Trust in a country’s public sector” would be: “trust_legal_system,” “trust_police,” “trust_parliament,” “trust_politicians,” and “trust_parties.” The reason for excluding “trust_eu_parliament” and “trust_united_nations” is to avoid over-capturing the alternative concept, as these measure international agencies.

Question 1.2

By looking Figure 1.0, it seems both indicators, “trust_politicians” and “trust_parties,” are a bit more right-skewed than the other indicators, matching lower average values of 3.551 and 3.534. Higher variability is observed for these two, with SD scores of 2.475 and 2.417.

“Trust_police” and “trust_legal_system” tend to have higher values on average: 6.447 and 5.389 (see Figure 1.1). The histograms for these two seem to show a bit more observations at the higher end of the scale.

image.png

Question 1.3

From the histogram of the normalized trust index (see Figure 1.2), we can see no strong skewness to the left or right. A moderate level of trust score is observed, as the highest frequency of observations falls between 4-6.

The unit of measure here is the level of trust in political institutions, with a range from 0 to 10. The first assumption made in the construction of the measure is that each trust indicator has equal weighting. The second assumption made is that each indicator captures different dimensions of the target concept. Aggregating them together then becomes an additive linear index, which aims to better measure the target concept.

image.png

Question 1.4

The country used as the reference category is “Austria.” An intercept of 4.613 represents the average trust index score for the reference category, i.e., “Austria.” From the Figure 1.3 and Figure 1.4, we can see that Bulgaria, located in Eastern Europe, has the lowest trust, with an average score of 3.111. However, northern European countries such as Norway (6.549), Switzerland (6.000), and Iceland (5.909) perform among the highest levels of trust on average. Countries performing at a moderate score are mostly located in southern and western Europe, such as Germany and France, with scores of 4.780 and 4.641.

image.png

Question 1.5

More trust in institutions is likely related to higher household income. Compared to household_incomeQ1, household_incomeQ2 shows a 0.141 unit improvement in trust in institutions on average, and household_incomeQ3 shows a 0.324 unit improvement on average.

Individuals with a degree have 0.693 units higher trust in institutions on average compared to individuals without a degree.

Compared to “other” major activities, individuals with “paid work” as their main activity and those who are “retired” report 0.350 units and 0.301 units higher trust, respectively, on average.

Respondents living in urban areas also report 0.167 units more trust than those in non-urban areas, although this effect seems smaller than the other factors reported above. Being female is associated with 0.058 units more trust in institutions compared to being non-female, admitting a small effect as well.

All coefficients, except “household_incomemissing,” are statistically significant with a p-value smaller than 0.01, meaning the effects of these indicators on the equal weight index are unlikely to have occurred by chance.

image.png

Question 1.6

The coefficient of 0.024 from Question 1.6 regression result suggests that, with other variables held constant, a one-unit increase in trustdata_normalized(equal weight trust index) is associated with an increase in the probability of voting by 2.4 percentage points (see Figure 1.6). This relationship is statistically significant at the 0.01 level, where the p-value is smaller than 0.01.

The intercept is 0.704, meaning that the probability of voting when the equal-weight trust index (trustdata_normalized) remains 0 is still 70.4%. This can be interpreted as most respondents still wanting to vote even with very low trust in institutions.

image.png

Question 1.7

Because of the presence of measurement error, we could draw incorrect measurement inferences. Incorrect measurement inferences would reduce the precision of estimation. Lower precision in estimation could lead to a wrong conclusion about the relationships we are interested in.

Assuming mismeasured index we constructed is a dependent variable. We could get a misattribution of the causal effect. Instead of affecting the outcome “mu,” the treatment variable “T” might have an effect on other factors “O” that are non-indicators and affecting the measurement “m”. If there is a causal relationship between “T” and “Mu”, it should also result in a causal effect of “T” on “m” through “mu” and its indicators.

If the index we constructed is an independent variable, the measurement error in the independent variable would introduce attenuation bias and flatten the regression coefficient, thus underestimating the effect of the explanatory variable on the outcome variable. Moreover, the misattribution of the causal effect can happen as well, we still need to worry about whether there are any other pathways such that a factor “O” substituting target concept “mu” and affecting measurement “m” and thus affect outcome “Y”.

Overall, when analysing the implementation of the measures, measurement error can potentially lead to mistaken conclusions.

Question 1.8

As we can see from the plot for Figure 1.7, the PC1 scores and the equal weight index almost share the same trend, meaning that when capturing the target concept, these two measurement strategies generate very similar outcomes.

Considering that the equal weight index successfully captures the variance of trust indicators, the equal weight index is more efficient in measuring people’s institutional trust. It is more direct and easier to implement.

image.png

Question 2.1

From the Figure 1.9, we can separate these pairwise correlations into two groups: stronger pairwise correlations and weaker pairwise correlations.

One of the strongest pairwise correlations is between courts_treatsame_imp and fair_elections_imp, with a coefficient of 0.557. This is not surprising since the phrase used in the survey question, “the will of the people,” is somewhat similar to “the views of ordinary people.”

Another strong correlation is between peopleview_prevail_imp and willpeople_unstoppable_imp, with a correlation of 0.535. This is also not surprising since the phrase used in the survey question, “the will of the people,” is somewhat similar to “the views of ordinary people.”

The weakest pairwise correlation observed is between peopleviews_prevail_imp and fair_elections_imp, with a correlation of 0.175. This is not surprising because one question asks about fair elections, while the other asks about people’s views prevailing over the views of political elites.

Another weak pairwise correlation observed is between peopleviews_prevail_imp and critical_media_imp, with a correlation of 0.204, though it is higher than 0.175. This could be due to the fact that “media” can act as a medium for “the views of ordinary people.”

image.png

Question 2.2

Exploratory factor analysis hypothesizes latent dimensions, with the probability of observing different values of the observed indicators depending on where units are on those dimensions. It belongs to the generative measurement family (strategy).

Unlike EFA, PCA derives scales as linear combinations of observed indicators. It belongs to the discriminative measurement family (strategy). It is when a measure discriminates between different levels of the concept, implying that the concept is being defined by the measurement strategy.

Principal components are linear combinations of the observed indicators, and each seeks to explain the variation unexplained by the previous principal component.

Two similarities:

Both PCA and EFA aim to reduce dimensions and provide a relatively simple summary of the correlations across variables in the data.

Practically speaking, both PCA and EFA are based on linear calculations. As a consequence, when they are applied to the same data, they tend to yield similar conclusions.

Two dissimilarities:

The order of any PCA result is not arbitrary: Each PC is uncorrelated with the others, and there will always be as many PCs as there are indicators. Factors with different ordering from EFA are hypothesized latent factors.

In PCA, each component is a linear function of the indicator variables, and the coefficients describe the association between each indicator and the respective PC.

In EFA, each indicator variable is a linear function of the factors, and the coefficients describe the association between each factor and the respective indicator.

Question 2.3

By looking at the EFA loadings for Question 2.3 (see Figure 2.0), for Factor 1, high-loading variables include courts_treatsame_imp (0.676), fair_elections_imp (0.753), and critical_media_imp (0.605). These indicate that Factor 1 is likely related to court fairness, fairness and freedom of elections, and media freedom and independence. Factor 1 might be measuring institutional fairness and accountability.

For Factor 2, high-loading variables include finalsay_referendum_imp (0.543), peopleviews_prevail_imp (0.728), and willpeople_unstoppable_imp (0.733). These indicate that Factor 2 might be measuring public influence and sentiment.

image.png

Question 3.1

Simply turning all “_sq” indicators into an additive and equal-weighted index without providing good justification for non-additivity or for any potential choice of unequal weights is not sensible. Equal weighting commits us an idea that the measure m will be increasing for any increase in any of the indicators.

In the previous case, an EFA on “_imp” indicators showed that some “_imp” indicators have higher loadings and thus a stronger association with Factor 1, whereas other “_imp” indicators have higher loadings and thus a stronger association with Factor 2. Assigning equal weighting to all these indicators hinders these characteristics. Analyst-specified weighting or expert-specified weighting may be more suitable in this context.

Concluding from the previous case, before simply using equal weighting, we must have sufficient reasons to reject analyst-specified weighting or expert-specified weighting. To reject these weighting methods, an EFA or PCA should be conducted to determine the relative importance of each indicator.

Question 3.2

Factor loadings are standardized regression weights, representing the relationship between the latent variable and indicators. They are not supposed to be directly used as weights during the construction of an index. The multiplication of indicator scores and factor loadings does not give values representing their levels of contribution to the constructed index but rather the weighted contribution of each indicator to the latent factor. As a result, the multiplication process has limited meaning for the constructed index unless we assume the latent factor(s) can represent our indicators.

Question 3.3

According to the ranked table (Figure 2.4) of average scores for SQ_factor1 and SQ_factor2, Spain, Germany, and Portugal ranked much higher in Country_sq_factor2 than in factor1. This means weights for factor 2 favor these countries more than factor 1.

By observing the weights table for SQ_factor1 and SQ_factor2, indicators of fair_elections_sq and critical_media_sq were given the least weights (-0.189 and -0.031) on factor 1 but the most weights on factor 2 (0.651 and 0.263). As a result, these countries are most likely to perform better under indicators such as courts_treatsame_sq (0.152) but, most importantly, fair_elections_sq (0.651) and critical_media_sq (0.263).

However, Slovakia, Czechia, and Hungary ranked much higher in Country_sq_factor1 than in factor 2. By looking at the weights table of factor 1, this ranking difference can be interpreted as these countries performing better under indicators such as finalsay_referendum_sq, peopleviews_prevail_sq, and willpeople_unstoppable_sq.

image.png

Appendix:


# 0013 coursework jl
setwd("~/Downloads/Year 3 UCL PIR/POLS0013/coursework")
load("ess.rda")
library(texreg)
library(sjPlot)
library(GGally)
library(kableExtra)
library(tidyverse)
library(stargazer)
library(htmltools)

# Q 1.2
par(mfrow = c(3,3))

for (i in 10:16)
  hist(
  ess[[i]],
  main = names(ess)[i],
  xlab = "",
  breaks = c(-0.5, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5)
  )

title(main = "Figure 1.0: Question 1.2 Histograms of trust_ variables",
      outer = TRUE ,
      line = -1)

trust_patterns <- data.frame(Mean = sapply(ess[, 10:16], mean, na.rm = TRUE),
                             SD = sapply(ess[, 10:16], sd, na.rm = TRUE))

kable(round(trust_patterns, 3), 
      caption = "Figure 1.1: Question 1.2 Means and SD of trust_ Variables") %>% 
  kable_styling(full_width = F, position = "center")

# Q 1.3
par(mfrow = c(1,1))

ess1 <- ess[complete.cases(ess[, grepl("trust", names(ess))]), ]

ess1$trustdata <- rowSums(ess1[,10:16])

ess1$trustdata_normalized <- (ess1$trustdata)/7

hist(
  ess1$trustdata_normalized,
  main = "Figure 1.2: Question 1.3 Histogram of Normalized Trust Index",
  xlab = "Trust Index (0-10)",
  col = "grey",
  border = "black", 
  breaks = seq(0, 10, by = 1)
)

# Q 1.4
model1 <- lm(trustdata_normalized ~ country ,
             data = ess1,
             weights = wgt)

levels(ess1$country)

plot_model(model1, type = "pred", sort.est = T) + labs(
  y = "Average score",
  x = "Country",
  title = "Figure 1.3: 
  Question 1.4 Predicted Average Score of Trust Index by Country",
  subtitle = "Reference Category: Austria"
)

coefficients <- model1$coefficients

average_scores <- coefficients[1] + coefficients[-1]

average_scores <- c(coefficients[1], average_scores)

names(average_scores)[1] <- "countryAustria"

average_scores_ranked <- average_scores %>%
  round(3) %>%
  sort(, decreasing = TRUE)

average_scores_ranked

kable(average_scores_ranked, caption =
        "Figure 1.4: Question 1.4 Predicted Average score
      of Trust Index by Country ranked") %>% 
  kable_styling(full_width = F, position = "center")

# Q 1.5
model2 <- lm(
  trustdata_normalized ~
  female + age + degree + urban + household_income + activity,
  data = ess1,
  weights = wgt
)

screenreg(model2, digits = 3)

html_table_1.5 <- stargazer(
  model2,
  type = "html",        
  digits = 3,          
  title = "Figure 1.5: Question 1.5 Regression Results for Model 2",
  covariate.labels = c("Intercept"),
  intercept.bottom = FALSE
  )

browsable(HTML(html_table_1.5))

# Q 1.6
model3 <- lm(voted ~ trustdata_normalized ,
             data = ess1 ,
             weights = wgt)

screenreg(model3, digits = 3)

html_table_1.6 <- stargazer(
  model3,
  type = "html",
  digits = 3,                         
  title = "Figure 1.6: Question 1.6 Regression Results for Model 3",
  dep.var.labels = "voted",
  covariate.labels = c("Intercept"),
  intercept.bottom = FALSE)

browsable(HTML(html_table_1.6))

# Q 1.8
pcafit <- prcomp(ess1[,10:16], scale.=FALSE)

pcafit1 <- pcafit$x[,1]

pca_plot1 <- plot(ess1$trustdata_normalized, pcafit1, 
     xlab = "Equal Weight Index", ylab = "PC1 Scores", 
     main = "Figure 1.7: Question 1.8 
     Plot of First Principal Component and Equal Weight Index")

cor(pcafit1, ess1$trustdata_normalized)

pca_table1 <- data.frame(
  Component = rownames(pcafit$rotation),
  PC1 = round(pcafit$rotation[, "PC1"], 3),
  PC2 = round(pcafit$rotation[, "PC2"], 3),
  PC3 = round(pcafit$rotation[, "PC3"], 3),
  PC4 = round(pcafit$rotation[, "PC4"], 3),
  PC5 = round(pcafit$rotation[, "PC5"], 3),
  PC6 = round(pcafit$rotation[, "PC6"], 3),
  PC7 = round(pcafit$rotation[, "PC7"], 3)
)

pca_table1

kable(pca_table1 , booktabs = F, caption = "Figure 1.8: 
      Question 1.8 PCA Loadings") %>%
  kable_styling(full_width = F , position = "center")

# Q 2.1
sub_indices <- ess[, 17:22]

ggpairs(ess[, 17:22],
        lower = list(continuous = wrap(
        "points",
        position = position_jitter(height = .2, width = .2),
        alpha = 0.2
        )),
        diag = list(continuous = "barDiag")) +
        labs(title = "Figure 1.9: Question 2.1
        Pairwise correlations between indicators ending in _imp")

# Q 2.3
fafit <- factanal(ess[,grepl("*imp$", names(ess))],
                  factors = 2, scores="regression",rotation = "varimax")

fafit

fafit_data_frame <- data.frame("Factor1" = fafit$loadings[, 1], 
                         "Factor2" = fafit$loadings[, 2])

kable(round(fafit_data_frame, 3) ,
      caption = "Figure 2.0: Question 2.3 EFA loadings") %>%
  kable_styling(full_width = F , position = "center")

# Q 3.3
fafit3 <- factanal(ess[,grepl("*sq$", names(ess))],
                   factors = 2, scores="regression",rotation = "varimax")

fa_weights <- solve(fafit3$correlation) %*% fafit3$loadings

ess$sq_factor1 <- as.matrix(ess[,grepl("*sq$", names(ess))]) %*% fa_weights[,1]

ess$sq_factor2 <- as.matrix(ess[,grepl("*sq$", names(ess))]) %*% fa_weights[,2]

sq_factor1_table <- data.frame(Weight_SQ_Factor1 = round(fa_weights[, 1], 3))

sq_factor2_table <- data.frame(Weight_SQ_Factor2 = round(fa_weights[, 2], 3))

kable(sq_factor1_table, booktabs = TRUE, 
      caption = "Figure 2.1: Question 3.2 Weights for SQ Factor 1") %>%
  kable_styling(full_width = FALSE, position = "center")

kable(sq_factor2_table, booktabs = TRUE, 
      caption = "Figure 2.2: Question 3.2 Weights for SQ Factor 2") %>%
  kable_styling(full_width = FALSE, position = "center")

model4 <- lm(sq_factor1 ~ country , data = ess , weights = wgt)

model5 <- lm(sq_factor2 ~ country , data = ess , weights = wgt)

screenreg(list(model4, model5))

html_table_3.3 <- stargazer(
  model4,
  model5,
  type = "html",
  digits = 3,                         
  title = "Figure 2.3: Question 3.3 Regression Results for Model 4 and Model 5",
  covariate.labels = c("Intercept"),
  intercept.bottom = FALSE)

browsable(HTML(html_table_3.3))

model4_coefficients <- model4$coefficients

average_scores_model4 <- model4_coefficients[1] + model4_coefficients[-1]

average_scores_model4 <- c(model4_coefficients[1], average_scores_model4)

names(average_scores_model4)[1] <- "countryAustria"

average_scores_ranked_model4 <- average_scores_model4 %>%
  round(3) %>%
  sort(decreasing = TRUE)

average_scores_ranked_model4

model5_coefficients <- model5$coefficients

average_scores_model5 <- model5_coefficients[1] + model5_coefficients[-1]

average_scores_model5 <- c(model5_coefficients[1], average_scores_model5)

names(average_scores_model5)[1] <- "countryAustria"

average_scores_ranked_model5 <- average_scores_model5 %>%
  round(3) %>%
  sort(decreasing = TRUE)

average_scores_ranked_model5

ranked_data_Q3.3 <- data.frame(
  Country_sq_factor1 = names(average_scores_ranked_model4),
  Scores_sq_factor1 = as.numeric(average_scores_ranked_model4),
  Country_sq_factor2 = names(average_scores_ranked_model5),
  Scores_sq_factor2 = as.numeric(average_scores_ranked_model5)
)

kable(ranked_data_Q3.3, booktabs = TRUE, caption = 
        "Figure 2.4: 
      Question 3.3 Ranked Average Scores for sq_factor1 and sq_factor2") %>%
  kable_styling(full_width = FALSE , position = "center")